All of lore.kernel.org
 help / color / mirror / Atom feed
* SSD - TRIM command
@ 2011-02-07 20:07 Roberto Spadim
  2011-02-08 17:37 ` maurice
  0 siblings, 1 reply; 70+ messages in thread
From: Roberto Spadim @ 2011-02-07 20:07 UTC (permalink / raw)
  To: Linux-RAID

hi guys, could md send TRIM command to ssd? using ext4 discart mount option?
if i mix ssd and hd, could this TRIM be rewrite to non TRIM compatible disks?

-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-07 20:07 SSD - TRIM command Roberto Spadim
@ 2011-02-08 17:37 ` maurice
  2011-02-08 18:31   ` Roberto Spadim
  2011-02-09  7:44   ` Stan Hoeppner
  0 siblings, 2 replies; 70+ messages in thread
From: maurice @ 2011-02-08 17:37 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: linux-raid

On 2/7/2011 1:07 PM, Roberto Spadim wrote:
> hi guys, could md send TRIM command to ssd? using ext4 discart mount option?
> if i mix ssd and hd, could this TRIM be rewrite to non TRIM compatible disks?
>
I have read that using md with SSDs is not a great idea:
Form the Fedora 14 documentation:

"Take note as well that software RAID levels 1, 4, 5, and 6 are not 
recommended for use on SSDs.
During the initialization stage of these RAID levels, some RAID 
management utilities (such as mdadm)
write to all of the blocks on the storage device to ensure that 
checksums operate properly.
This will cause the performance of the SSD to degrade quickly. "

https://docs.fedoraproject.org/en-US/Fedora/14/html/Storage_Administration_Guide/newmds-ssdtuning.html


-- 
Cheers,
Maurice Hilarius
eMail: /mhilarius@gmail.com/

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-08 17:37 ` maurice
@ 2011-02-08 18:31   ` Roberto Spadim
       [not found]     ` <AANLkTik5SumqyTN5LZVntna8nunvPe7v38TSFf9eCfcU@mail.gmail.com>
  2011-02-09  7:44   ` Stan Hoeppner
  1 sibling, 1 reply; 70+ messages in thread
From: Roberto Spadim @ 2011-02-08 18:31 UTC (permalink / raw)
  To: maurice; +Cc: linux-raid

it's resync running?
i don't think it's a problem...
any device will die some day...
ssd is faster than hd, why not use it?
i'm using hp smart array p212 with 3.0 firmware, it write on all blocks too
maybe a just command line option to start array without sync could help...
i don't know if resync is write intensive or just write on diferent
blocks, if it's just diff it's not a problem for ssd...

again...
i know that the 'translate' of trim command to non compatible devices
is a problem for device layer not md layer, but can md send trim
command to all mirrors/disks?

2011/2/8 maurice <mhilarius@gmail.com>:
> On 2/7/2011 1:07 PM, Roberto Spadim wrote:
>>
>> hi guys, could md send TRIM command to ssd? using ext4 discart mount
>> option?
>> if i mix ssd and hd, could this TRIM be rewrite to non TRIM compatible
>> disks?
>>
> I have read that using md with SSDs is not a great idea:
> Form the Fedora 14 documentation:
>
> "Take note as well that software RAID levels 1, 4, 5, and 6 are not
> recommended for use on SSDs.
> During the initialization stage of these RAID levels, some RAID management
> utilities (such as mdadm)
> write to all of the blocks on the storage device to ensure that checksums
> operate properly.
> This will cause the performance of the SSD to degrade quickly. "
>
> https://docs.fedoraproject.org/en-US/Fedora/14/html/Storage_Administration_Guide/newmds-ssdtuning.html
>
>
> --
> Cheers,
> Maurice Hilarius
> eMail: /mhilarius@gmail.com/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
       [not found]     ` <AANLkTik5SumqyTN5LZVntna8nunvPe7v38TSFf9eCfcU@mail.gmail.com>
@ 2011-02-08 20:50       ` Roberto Spadim
  2011-02-08 21:18         ` maurice
  0 siblings, 1 reply; 70+ messages in thread
From: Roberto Spadim @ 2011-02-08 20:50 UTC (permalink / raw)
  To: Scott E. Armitage; +Cc: maurice, linux-raid

=] now the right answer :)
question: maybe in future... could we make trim compatible with md?

obs:
i understanded that trim is just for ssd making sectors clean without
writing 000000000000-000000 at the entire sector (a ssd optimization)
if we translate trim to not supported trim disks at device level could
we send TRIM to all disks on md device?
just a option at mdadm --assemble --allow-trim
and send trim received by filesystem

2011/2/8 Scott E. Armitage <launchpad@scott.armitage.name>:
> The problem as I understand it is that md treats the entire device (or
> partition) as "in use" -- even if the filesystem isn't using a particular
> set of blocks, those blocks must still be consistent across the array. The
> SSD TRIM command is used to tell the physical drive which blocks are no
> longer in use by the filesystem, so that it can optimize write operations.
> Running under md, all blocks would be "used", so there would be nothing to
> send with the TRIM command.
> -Scott
>
> On Tue, Feb 8, 2011 at 1:31 PM, Roberto Spadim <roberto@spadim.com.br>
> wrote:
>>
>> it's resync running?
>> i don't think it's a problem...
>> any device will die some day...
>> ssd is faster than hd, why not use it?
>> i'm using hp smart array p212 with 3.0 firmware, it write on all blocks
>> too
>> maybe a just command line option to start array without sync could help...
>> i don't know if resync is write intensive or just write on diferent
>> blocks, if it's just diff it's not a problem for ssd...
>>
>> again...
>> i know that the 'translate' of trim command to non compatible devices
>> is a problem for device layer not md layer, but can md send trim
>> command to all mirrors/disks?
>>
>> 2011/2/8 maurice <mhilarius@gmail.com>:
>> > On 2/7/2011 1:07 PM, Roberto Spadim wrote:
>> >>
>> >> hi guys, could md send TRIM command to ssd? using ext4 discart mount
>> >> option?
>> >> if i mix ssd and hd, could this TRIM be rewrite to non TRIM compatible
>> >> disks?
>> >>
>> > I have read that using md with SSDs is not a great idea:
>> > Form the Fedora 14 documentation:
>> >
>> > "Take note as well that software RAID levels 1, 4, 5, and 6 are not
>> > recommended for use on SSDs.
>> > During the initialization stage of these RAID levels, some RAID
>> > management
>> > utilities (such as mdadm)
>> > write to all of the blocks on the storage device to ensure that
>> > checksums
>> > operate properly.
>> > This will cause the performance of the SSD to degrade quickly. "
>> >
>> >
>> > https://docs.fedoraproject.org/en-US/Fedora/14/html/Storage_Administration_Guide/newmds-ssdtuning.html
>> >
>> >
>> > --
>> > Cheers,
>> > Maurice Hilarius
>> > eMail: /mhilarius@gmail.com/
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Scott Armitage, B.A.Sc., M.A.Sc. candidate
> Space Flight Laboratory
> University of Toronto Institute for Aerospace Studies
> 4925 Dufferin Street, Toronto, Ontario, Canada, M3H 5T6
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-08 20:50       ` Roberto Spadim
@ 2011-02-08 21:18         ` maurice
  2011-02-08 21:33           ` Roberto Spadim
  0 siblings, 1 reply; 70+ messages in thread
From: maurice @ 2011-02-08 21:18 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: linux-raid

On 2/8/2011 1:50 PM, Roberto Spadim wrote:
> =] now the right answer :)
> question: maybe in future... could we make trim compatible with md?
>
I  hope that future is "real soon now"
MLC SSD is now starting to appear in the "Enterprise space.
Companies like Pliant have released products for that.
Typical SAN RAID controllers have specific performance limits which can 
be saturated with a not very large number of SSDs.
To get higher IOs we need a more powerful RAID engine
A typical 48 core, 128GB RAM box using AMD CPUs and 4 SAS HBAs to disk 
JBD cases can be a ridiculously power RAID engine for a
reasonable cost ( at least reasonable compered to NetApp, EMC, Hitachi 
SANs, etc) with a large number of devices.

BUT: To use SSDs in the design we need mdadm to be more SSD friendly.


-- 
Cheers,
Maurice Hilarius
eMail: /mhilarius@gmail.com/

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-08 21:18         ` maurice
@ 2011-02-08 21:33           ` Roberto Spadim
  0 siblings, 0 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-08 21:33 UTC (permalink / raw)
  To: maurice; +Cc: linux-raid

yeah, we will make it :)
maurice, i was making some raid1 new read balance, could you help me
benchmark it?
it's kernel 2.6.37 based, here is the code:
www.spadim.com.br/raid1/
there's raid1.new.c raid1.new.h, raid1.old.c raid1.old.h
the old and new kernel source code

for user space we can now make this:
/sys/block/mdXXX/md/read_balance_mode
/sys/block/mdXXX/md/read_balance_stripe_shift
/sys/block/mdXXX/md/read_balance_config

at read_balance_mode we have now 4 modes:
near_head (default, working without problems, very good for hd only,
ssd should other mode)

round_robin (normal round robin, with per mirror counter (can make
round_robin) after some reads, very good for ssd only array)

stripe (like raid0, with read_balance_stripe_shift we can shift the
sector number with " >> " command and after select the disk with %
raid_disks, very good for hd or ssd, a good number for shift is >=5,
but not so much since this can make math formula use only the first
disk)

time_based (based on head positioning time + read time + i/o queue
time, selecting the best disk to read, work with ssd and hd very well,
current implementation don't have i/o queue time but i will study and
put it to work too)

all configurations for round_robin and time_based as sent to kernel by
read_balance_config
type cat /sys/block/mdxxx/md/read_balance_config
and send per disk the parameters
the first line on cat command is the parameters list, after | is read
only variables, you can't change it, just read
use echo "0 0 0 0 0 0 0 0 0 0"> read_balance_config to change values

thanks =]

2011/2/8 maurice <mhilarius@gmail.com>:
> On 2/8/2011 1:50 PM, Roberto Spadim wrote:
>>
>> =] now the right answer :)
>> question: maybe in future... could we make trim compatible with md?
>>
> I  hope that future is "real soon now"
> MLC SSD is now starting to appear in the "Enterprise space.
> Companies like Pliant have released products for that.
> Typical SAN RAID controllers have specific performance limits which can be
> saturated with a not very large number of SSDs.
> To get higher IOs we need a more powerful RAID engine
> A typical 48 core, 128GB RAM box using AMD CPUs and 4 SAS HBAs to disk JBD
> cases can be a ridiculously power RAID engine for a
> reasonable cost ( at least reasonable compered to NetApp, EMC, Hitachi SANs,
> etc) with a large number of devices.
>
> BUT: To use SSDs in the design we need mdadm to be more SSD friendly.
>
>
> --
> Cheers,
> Maurice Hilarius
> eMail: /mhilarius@gmail.com/
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-08 17:37 ` maurice
  2011-02-08 18:31   ` Roberto Spadim
@ 2011-02-09  7:44   ` Stan Hoeppner
  2011-02-09  9:05     ` Eric D. Mudama
  2011-02-09 13:29     ` David Brown
  1 sibling, 2 replies; 70+ messages in thread
From: Stan Hoeppner @ 2011-02-09  7:44 UTC (permalink / raw)
  To: maurice; +Cc: Roberto Spadim, linux-raid

maurice put forth on 2/8/2011 11:37 AM:
> On 2/7/2011 1:07 PM, Roberto Spadim wrote:
>> hi guys, could md send TRIM command to ssd? using ext4 discart mount option?
>> if i mix ssd and hd, could this TRIM be rewrite to non TRIM compatible disks?
>>
> I have read that using md with SSDs is not a great idea:
> Form the Fedora 14 documentation:

Using any RAID level but pure striping with SSDs is a bad idea, for the exact
reason in that documentation:  excessive writes.

SSD - Solid State Drive

Note the first two words.  Solid state device = integrated circuit.  ICs,
including those comprised of flash memory transistors, have totally different
failure modes than spinning rust disks, SRDs, or "plain old mechanical hard drives".

RAID'ing SSDs with any data duplicative RAID level, any mirroring or parity RAID
levels, _decreases_ the life of all SSDs in the array.  This is the opposite
effect of what you want:  reliability and lifespan.

People have a misconception that SSDs are like hard disks.  The only thing they
have in common is that both store data and they can have a similar interface
(SATA).  The similarities end there.

RAID is not a proper method of extending the life of SSD storage nor protecting
the data on SSD devices.  If you want to pool all the capacity of multiple SSDs
into a single logical device, use RAID 0 or spanning, _not_ a mirror or parity
RAID level.  If you want to protect the data, snap it to a single large SATA
drive, or a D2D backup array, and then to tape.

-- 
Stan

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09  7:44   ` Stan Hoeppner
@ 2011-02-09  9:05     ` Eric D. Mudama
  2011-02-09 15:45       ` Chris Worley
  2011-02-09 13:29     ` David Brown
  1 sibling, 1 reply; 70+ messages in thread
From: Eric D. Mudama @ 2011-02-09  9:05 UTC (permalink / raw)
  To: Stan Hoeppner; +Cc: maurice, Roberto Spadim, linux-raid

On Wed, Feb  9 at  1:44, Stan Hoeppner wrote:
>maurice put forth on 2/8/2011 11:37 AM:
>> On 2/7/2011 1:07 PM, Roberto Spadim wrote:
>>> hi guys, could md send TRIM command to ssd? using ext4 discart mount option?
>>> if i mix ssd and hd, could this TRIM be rewrite to non TRIM compatible disks?
>>>
>> I have read that using md with SSDs is not a great idea:
>> Form the Fedora 14 documentation:
>
>Using any RAID level but pure striping with SSDs is a bad idea, for the exact
>reason in that documentation:  excessive writes.

If I mirror two SSDs, and write 1 unit of data to the mirror, each
element of the mirror should see 1 unit of write.  How does this
perform excessive writes, compared to the same workload applied to a
single SSD?

I agree that in aggregate we've now done 2 units worth of writes,
however, in a mirror case, we're protecting against both whole-device
failure and single-sector failure modes, so hardly seems like a bad
idea in all applications.


-- 
Eric D. Mudama
edmudama@bounceswoosh.org


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09  7:44   ` Stan Hoeppner
  2011-02-09  9:05     ` Eric D. Mudama
@ 2011-02-09 13:29     ` David Brown
  2011-02-09 14:39       ` Roberto Spadim
  1 sibling, 1 reply; 70+ messages in thread
From: David Brown @ 2011-02-09 13:29 UTC (permalink / raw)
  To: linux-raid

On 09/02/2011 08:44, Stan Hoeppner wrote:
> maurice put forth on 2/8/2011 11:37 AM:
>> On 2/7/2011 1:07 PM, Roberto Spadim wrote:
>>> hi guys, could md send TRIM command to ssd? using ext4 discart
>>> mount option? if i mix ssd and hd, could this TRIM be rewrite to
>>> non TRIM compatible disks?
>>>
>> I have read that using md with SSDs is not a great idea: Form the
>> Fedora 14 documentation:
>
> Using any RAID level but pure striping with SSDs is a bad idea, for
> the exact reason in that documentation:  excessive writes.
>
> SSD - Solid State Drive
>
> Note the first two words.  Solid state device = integrated circuit.
> ICs, including those comprised of flash memory transistors, have
> totally different failure modes than spinning rust disks, SRDs, or
> "plain old mechanical hard drives".
>
> RAID'ing SSDs with any data duplicative RAID level, any mirroring or
> parity RAID levels, _decreases_ the life of all SSDs in the array.
> This is the opposite effect of what you want:  reliability and
> lifespan.
>
> People have a misconception that SSDs are like hard disks.  The only
> thing they have in common is that both store data and they can have a
> similar interface (SATA).  The similarities end there.
>
> RAID is not a proper method of extending the life of SSD storage nor
> protecting the data on SSD devices.  If you want to pool all the
> capacity of multiple SSDs into a single logical device, use RAID 0 or
> spanning, _not_ a mirror or parity RAID level.  If you want to
> protect the data, snap it to a single large SATA drive, or a D2D
> backup array, and then to tape.
>

First off, let me agree with you that backup is important no matter what 
you use as your primary storage.

But beyond that, you've got a basic assumption wrong here.

Good quality, modern SSDs do not have write-endurance issues.  It's a 
thing of the past.  Internally, of course, the flash /does/ have 
endurance limits.  But these are high (especially with SLC devices 
rather than MLC devices), and the combination of ECC, wear-levelling and 
redundant blocks means that you can write to these devices continuously 
at high speed for /years/ before endurance issues become visible by the 
host.  An additional effect of the extensive ECC is that undetected read 
errors are much less likely than with hard disks - when a failure /does/ 
occur, you know it has occurred.

Many SSD models suffer from a certain amount of performance degradation 
when they have been used for a while.  Intel's devices were notorious 
for this, though apparently they are better now.  But that's a speed 
issue, not a reliability or lifetime issue.

SSDs (again, I refer to good quality modern devices - earlier models had 
more problems) are inherently more reliable than HDs, and have longer 
expected lifetimes.  This means that it is often fine to put your SSDs 
in a RAID0 combination - you still have a greater reliability than you 
would with a single HDD.

However, SSDs are not infallible - using redundant RAID with SSDs is a 
perfectly valid setup.  Obviously you will have a whole disks worth of 
extra writes when you set up the RAID, and redundant writes means more 
writes, but the SSDs will handle those writes perfectly well.


There is plenty of scope for md / SSD optimisation, however.  Good TRIM 
support is just one aspect.  Other points include matching stripe sizes 
to fit the geometry of the SSD, and taking advantage of the seek speeds 
of SSD (this is particularly important if you are mirroring an SSD and 
an HD).



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 13:29     ` David Brown
@ 2011-02-09 14:39       ` Roberto Spadim
  2011-02-09 15:00         ` Scott E. Armitage
  2011-02-09 15:49         ` David Brown
  0 siblings, 2 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-09 14:39 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

guys...
if my ssd fail, i buy another...
let's make software ok, the hardware is another problem
raid1 should work with floppy disks, hard disks, ssd, nbd... that's the point
make solutions for hardware mix
the question is simple, could we send TRIM command to all mirrors (for
stripe just disks that should receive it)? if device don't have TRIM
we should translate it for a similar command, with the same READ
effect (no problem if it's not atomic)

the point of good read, i sent a email to maurice, and many others
emails in this raid list, there's a new read balance mode for kernel
2.6.37 if you want try to benchmark it please test it:
www.spadim.com.br/raid1
for me it's work very well with hd and ssd mixed array, i need more
test and benchmark to neil accept it as a default feature of md
the sysfs interface is poor yet, in future it should change
the time based mode work, but it should have some features implemented
in futures (queue time estimation)


2011/2/9 David Brown <david@westcontrol.com>:
> On 09/02/2011 08:44, Stan Hoeppner wrote:
>>
>> maurice put forth on 2/8/2011 11:37 AM:
>>>
>>> On 2/7/2011 1:07 PM, Roberto Spadim wrote:
>>>>
>>>> hi guys, could md send TRIM command to ssd? using ext4 discart
>>>> mount option? if i mix ssd and hd, could this TRIM be rewrite to
>>>> non TRIM compatible disks?
>>>>
>>> I have read that using md with SSDs is not a great idea: Form the
>>> Fedora 14 documentation:
>>
>> Using any RAID level but pure striping with SSDs is a bad idea, for
>> the exact reason in that documentation:  excessive writes.
>>
>> SSD - Solid State Drive
>>
>> Note the first two words.  Solid state device = integrated circuit.
>> ICs, including those comprised of flash memory transistors, have
>> totally different failure modes than spinning rust disks, SRDs, or
>> "plain old mechanical hard drives".
>>
>> RAID'ing SSDs with any data duplicative RAID level, any mirroring or
>> parity RAID levels, _decreases_ the life of all SSDs in the array.
>> This is the opposite effect of what you want:  reliability and
>> lifespan.
>>
>> People have a misconception that SSDs are like hard disks.  The only
>> thing they have in common is that both store data and they can have a
>> similar interface (SATA).  The similarities end there.
>>
>> RAID is not a proper method of extending the life of SSD storage nor
>> protecting the data on SSD devices.  If you want to pool all the
>> capacity of multiple SSDs into a single logical device, use RAID 0 or
>> spanning, _not_ a mirror or parity RAID level.  If you want to
>> protect the data, snap it to a single large SATA drive, or a D2D
>> backup array, and then to tape.
>>
>
> First off, let me agree with you that backup is important no matter what you
> use as your primary storage.
>
> But beyond that, you've got a basic assumption wrong here.
>
> Good quality, modern SSDs do not have write-endurance issues.  It's a thing
> of the past.  Internally, of course, the flash /does/ have endurance limits.
>  But these are high (especially with SLC devices rather than MLC devices),
> and the combination of ECC, wear-levelling and redundant blocks means that
> you can write to these devices continuously at high speed for /years/ before
> endurance issues become visible by the host.  An additional effect of the
> extensive ECC is that undetected read errors are much less likely than with
> hard disks - when a failure /does/ occur, you know it has occurred.
>
> Many SSD models suffer from a certain amount of performance degradation when
> they have been used for a while.  Intel's devices were notorious for this,
> though apparently they are better now.  But that's a speed issue, not a
> reliability or lifetime issue.
>
> SSDs (again, I refer to good quality modern devices - earlier models had
> more problems) are inherently more reliable than HDs, and have longer
> expected lifetimes.  This means that it is often fine to put your SSDs in a
> RAID0 combination - you still have a greater reliability than you would with
> a single HDD.
>
> However, SSDs are not infallible - using redundant RAID with SSDs is a
> perfectly valid setup.  Obviously you will have a whole disks worth of extra
> writes when you set up the RAID, and redundant writes means more writes, but
> the SSDs will handle those writes perfectly well.
>
>
> There is plenty of scope for md / SSD optimisation, however.  Good TRIM
> support is just one aspect.  Other points include matching stripe sizes to
> fit the geometry of the SSD, and taking advantage of the seek speeds of SSD
> (this is particularly important if you are mirroring an SSD and an HD).
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 14:39       ` Roberto Spadim
@ 2011-02-09 15:00         ` Scott E. Armitage
  2011-02-09 15:52           ` Chris Worley
  2011-02-09 16:19           ` Eric D. Mudama
  2011-02-09 15:49         ` David Brown
  1 sibling, 2 replies; 70+ messages in thread
From: Scott E. Armitage @ 2011-02-09 15:00 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: David Brown, linux-raid

I reiterate my previous reply that under the current md architecture,
where the complete device is considered to be in use, sending TRIM
commands makes little sense. AFAICT, reading back a trimmed page is
not defined, since the whole idea is that the host doesn't care about
what is on that page any more.

The next time md comes around to corresponding trimmed pages on two
SSDs, their contents may differ, and all of a sudden our array is no
longer consistent.

On Wed, Feb 9, 2011 at 9:39 AM, Roberto Spadim <roberto@spadim.com.br> wrote:
> guys...
> if my ssd fail, i buy another...
> let's make software ok, the hardware is another problem
> raid1 should work with floppy disks, hard disks, ssd, nbd... that's the point
> make solutions for hardware mix
> the question is simple, could we send TRIM command to all mirrors (for
> stripe just disks that should receive it)? if device don't have TRIM
> we should translate it for a similar command, with the same READ
> effect (no problem if it's not atomic)
>
> the point of good read, i sent a email to maurice, and many others
> emails in this raid list, there's a new read balance mode for kernel
> 2.6.37 if you want try to benchmark it please test it:
> www.spadim.com.br/raid1
> for me it's work very well with hd and ssd mixed array, i need more
> test and benchmark to neil accept it as a default feature of md
> the sysfs interface is poor yet, in future it should change
> the time based mode work, but it should have some features implemented
> in futures (queue time estimation)
>
>
> 2011/2/9 David Brown <david@westcontrol.com>:
>> On 09/02/2011 08:44, Stan Hoeppner wrote:
>>>
>>> maurice put forth on 2/8/2011 11:37 AM:
>>>>
>>>> On 2/7/2011 1:07 PM, Roberto Spadim wrote:
>>>>>
>>>>> hi guys, could md send TRIM command to ssd? using ext4 discart
>>>>> mount option? if i mix ssd and hd, could this TRIM be rewrite to
>>>>> non TRIM compatible disks?
>>>>>
>>>> I have read that using md with SSDs is not a great idea: Form the
>>>> Fedora 14 documentation:
>>>
>>> Using any RAID level but pure striping with SSDs is a bad idea, for
>>> the exact reason in that documentation:  excessive writes.
>>>
>>> SSD - Solid State Drive
>>>
>>> Note the first two words.  Solid state device = integrated circuit.
>>> ICs, including those comprised of flash memory transistors, have
>>> totally different failure modes than spinning rust disks, SRDs, or
>>> "plain old mechanical hard drives".
>>>
>>> RAID'ing SSDs with any data duplicative RAID level, any mirroring or
>>> parity RAID levels, _decreases_ the life of all SSDs in the array.
>>> This is the opposite effect of what you want:  reliability and
>>> lifespan.
>>>
>>> People have a misconception that SSDs are like hard disks.  The only
>>> thing they have in common is that both store data and they can have a
>>> similar interface (SATA).  The similarities end there.
>>>
>>> RAID is not a proper method of extending the life of SSD storage nor
>>> protecting the data on SSD devices.  If you want to pool all the
>>> capacity of multiple SSDs into a single logical device, use RAID 0 or
>>> spanning, _not_ a mirror or parity RAID level.  If you want to
>>> protect the data, snap it to a single large SATA drive, or a D2D
>>> backup array, and then to tape.
>>>
>>
>> First off, let me agree with you that backup is important no matter what you
>> use as your primary storage.
>>
>> But beyond that, you've got a basic assumption wrong here.
>>
>> Good quality, modern SSDs do not have write-endurance issues.  It's a thing
>> of the past.  Internally, of course, the flash /does/ have endurance limits.
>>  But these are high (especially with SLC devices rather than MLC devices),
>> and the combination of ECC, wear-levelling and redundant blocks means that
>> you can write to these devices continuously at high speed for /years/ before
>> endurance issues become visible by the host.  An additional effect of the
>> extensive ECC is that undetected read errors are much less likely than with
>> hard disks - when a failure /does/ occur, you know it has occurred.
>>
>> Many SSD models suffer from a certain amount of performance degradation when
>> they have been used for a while.  Intel's devices were notorious for this,
>> though apparently they are better now.  But that's a speed issue, not a
>> reliability or lifetime issue.
>>
>> SSDs (again, I refer to good quality modern devices - earlier models had
>> more problems) are inherently more reliable than HDs, and have longer
>> expected lifetimes.  This means that it is often fine to put your SSDs in a
>> RAID0 combination - you still have a greater reliability than you would with
>> a single HDD.
>>
>> However, SSDs are not infallible - using redundant RAID with SSDs is a
>> perfectly valid setup.  Obviously you will have a whole disks worth of extra
>> writes when you set up the RAID, and redundant writes means more writes, but
>> the SSDs will handle those writes perfectly well.
>>
>>
>> There is plenty of scope for md / SSD optimisation, however.  Good TRIM
>> support is just one aspect.  Other points include matching stripe sizes to
>> fit the geometry of the SSD, and taking advantage of the seek speeds of SSD
>> (this is particularly important if you are mirroring an SSD and an HD).
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Scott Armitage, B.A.Sc., M.A.Sc. candidate
Space Flight Laboratory
University of Toronto Institute for Aerospace Studies
4925 Dufferin Street, Toronto, Ontario, Canada, M3H 5T6
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09  9:05     ` Eric D. Mudama
@ 2011-02-09 15:45       ` Chris Worley
  0 siblings, 0 replies; 70+ messages in thread
From: Chris Worley @ 2011-02-09 15:45 UTC (permalink / raw)
  To: Eric D. Mudama; +Cc: Stan Hoeppner, maurice, Roberto Spadim, linux-raid

On Wed, Feb 9, 2011 at 2:05 AM, Eric D. Mudama
<edmudama@bounceswoosh.org> wrote:
<snip>
> I agree that in aggregate we've now done 2 units worth of writes,
> however, in a mirror case, we're protecting against both whole-device
> failure and single-sector failure modes, so hardly seems like a bad
> idea in all applications.

Yes, just pass through the discards, and let us mirror.  Sync'ing a
new drive w/o writing (only what needs to be written) is really
trivial (needs no extra saved metadata/LBA bitmaps or ability to query
the device for active sectors), if folks would give it some thought
and quit saying it shouldn't be done.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 14:39       ` Roberto Spadim
  2011-02-09 15:00         ` Scott E. Armitage
@ 2011-02-09 15:49         ` David Brown
  2011-02-21 18:20           ` Phillip Susi
  1 sibling, 1 reply; 70+ messages in thread
From: David Brown @ 2011-02-09 15:49 UTC (permalink / raw)
  To: linux-raid

On 09/02/2011 15:39, Roberto Spadim wrote:
> guys...
> if my ssd fail, i buy another...
> let's make software ok, the hardware is another problem
> raid1 should work with floppy disks, hard disks, ssd, nbd... that's the point
> make solutions for hardware mix
> the question is simple, could we send TRIM command to all mirrors (for
> stripe just disks that should receive it)? if device don't have TRIM
> we should translate it for a similar command, with the same READ
> effect (no problem if it's not atomic)
>


I've been reading a little more about this.  It seems that the days of 
TRIM may well be numbered - the latest generation of high-end SSDs have 
more powerful garbage collection algorithms, together with more spare 
blocks, making TRIM pretty much redundant.  This is, of course, the most 
convenient solution for everyone (as long as it doesn't cost too much!).

The point of the TRIM command is to tell the SSD that a particular block 
is no longer being used, so that the SSD can erase it in the background 
- that way when you want to write more data, there are more free blocks 
ready and waiting.  But if you've got plenty of spare blocks, it's easy 
to have them erased in advance and you don't need TRIM.



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 15:00         ` Scott E. Armitage
@ 2011-02-09 15:52           ` Chris Worley
  2011-02-09 19:15             ` Doug Dumitru
  2011-02-09 16:19           ` Eric D. Mudama
  1 sibling, 1 reply; 70+ messages in thread
From: Chris Worley @ 2011-02-09 15:52 UTC (permalink / raw)
  To: Scott E. Armitage; +Cc: Roberto Spadim, David Brown, linux-raid

On Wed, Feb 9, 2011 at 8:00 AM, Scott E. Armitage
<launchpad@scott.armitage.name> wrote:
<snip>
>AFAICT, reading back a trimmed page is
> not defined

... and so should be assumed that reading a trimmed/nonexistant LBA
off of two of the same vendor's SSD's would realize different results?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 15:00         ` Scott E. Armitage
  2011-02-09 15:52           ` Chris Worley
@ 2011-02-09 16:19           ` Eric D. Mudama
  2011-02-09 16:28             ` Scott E. Armitage
  2011-02-21 18:24             ` Phillip Susi
  1 sibling, 2 replies; 70+ messages in thread
From: Eric D. Mudama @ 2011-02-09 16:19 UTC (permalink / raw)
  To: Scott E. Armitage; +Cc: Roberto Spadim, David Brown, linux-raid

On Wed, Feb  9 at 10:00, Scott E. Armitage wrote:
>I reiterate my previous reply that under the current md architecture,
>where the complete device is considered to be in use, sending TRIM
>commands makes little sense. AFAICT, reading back a trimmed page is
>not defined, since the whole idea is that the host doesn't care about
>what is on that page any more.
>
>The next time md comes around to corresponding trimmed pages on two
>SSDs, their contents may differ, and all of a sudden our array is no
>longer consistent.

For SATA devices, ATA8-ACS2 addresses this through Deterministic Read
After Trim in the DATA SET MANAGEMENT command.  Devices can be
indeterminate, determinate with a non-zero pattern (often all-ones) or
determinate all-zero for sectors read after being trimmed.

--eric

-- 
Eric D. Mudama
edmudama@bounceswoosh.org


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 16:19           ` Eric D. Mudama
@ 2011-02-09 16:28             ` Scott E. Armitage
  2011-02-09 17:17               ` Eric D. Mudama
  2011-02-21 18:24             ` Phillip Susi
  1 sibling, 1 reply; 70+ messages in thread
From: Scott E. Armitage @ 2011-02-09 16:28 UTC (permalink / raw)
  To: Eric D. Mudama; +Cc: Roberto Spadim, David Brown, linux-raid

Who sends this command? If md can assume that determinate mode is
always set, then RAID 1 at least would remain consistent. For RAID 5,
consistency of the parity information depends on the determinate
pattern used and the number of disks. If you used determinate
all-zero, then parity information would always be consistent, but this
is probably not preferable since every TRIM command would incur an
extra write for each bit in each page of the block.

-S

On Wed, Feb 9, 2011 at 11:19 AM, Eric D. Mudama
<edmudama@bounceswoosh.org> wrote:
> On Wed, Feb  9 at 10:00, Scott E. Armitage wrote:
>>
>> I reiterate my previous reply that under the current md architecture,
>> where the complete device is considered to be in use, sending TRIM
>> commands makes little sense. AFAICT, reading back a trimmed page is
>> not defined, since the whole idea is that the host doesn't care about
>> what is on that page any more.
>>
>> The next time md comes around to corresponding trimmed pages on two
>> SSDs, their contents may differ, and all of a sudden our array is no
>> longer consistent.
>
> For SATA devices, ATA8-ACS2 addresses this through Deterministic Read
> After Trim in the DATA SET MANAGEMENT command.  Devices can be
> indeterminate, determinate with a non-zero pattern (often all-ones) or
> determinate all-zero for sectors read after being trimmed.
>
> --eric
>
> --
> Eric D. Mudama
> edmudama@bounceswoosh.org
>
>



-- 
Scott Armitage, B.A.Sc., M.A.Sc. candidate
Space Flight Laboratory
University of Toronto Institute for Aerospace Studies
4925 Dufferin Street, Toronto, Ontario, Canada, M3H 5T6
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 16:28             ` Scott E. Armitage
@ 2011-02-09 17:17               ` Eric D. Mudama
  2011-02-09 18:18                 ` Roberto Spadim
  0 siblings, 1 reply; 70+ messages in thread
From: Eric D. Mudama @ 2011-02-09 17:17 UTC (permalink / raw)
  To: Scott E. Armitage; +Cc: Eric D. Mudama, Roberto Spadim, David Brown, linux-raid

On Wed, Feb  9 at 11:28, Scott E. Armitage wrote:
>Who sends this command? If md can assume that determinate mode is
>always set, then RAID 1 at least would remain consistent. For RAID 5,
>consistency of the parity information depends on the determinate
>pattern used and the number of disks. If you used determinate
>all-zero, then parity information would always be consistent, but this
>is probably not preferable since every TRIM command would incur an
>extra write for each bit in each page of the block.

True, and there are several solutions.  Maybe track space used via
some mechanism, such that when you trim you're only trimming the
entire stripe width so no parity is required for the trimmed regions.
Or, trust the drive's wear leveling and endurance rating, combined
with SMART data, to indicate when you need to replace the device
preemptive to eventual failure.

It's not an unsolvable issue.  If the RAID5 used distributed parity,
you could expect wear leveling to wear all the devices evenly, since
on average, the # of writes to all devices will be the same.  Only a
RAID4 setup would see a lopsided amount of writes to a single device.

--eric

-- 
Eric D. Mudama
edmudama@bounceswoosh.org


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 17:17               ` Eric D. Mudama
@ 2011-02-09 18:18                 ` Roberto Spadim
  2011-02-09 18:24                   ` Piergiorgio Sartor
  0 siblings, 1 reply; 70+ messages in thread
From: Roberto Spadim @ 2011-02-09 18:18 UTC (permalink / raw)
  To: Eric D. Mudama; +Cc: Scott E. Armitage, David Brown, linux-raid

who send?
ext4 send trim commands to device (disk/md raid/nbd)
kernel swap send this commands (when possible) to device too
for internal raid5 parity disk this could be done by md, for data
disks this should be done by ext4

the other question... about resync with only write what is different
this is very good since write and read speed can be different for ssd
(hd don´t have this 'problem')
but i´m sure that just write what is diff is better than write all
(ssd life will be bigger, hd maybe... i think that will be bigger too)


2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>:
> On Wed, Feb  9 at 11:28, Scott E. Armitage wrote:
>>
>> Who sends this command? If md can assume that determinate mode is
>> always set, then RAID 1 at least would remain consistent. For RAID 5,
>> consistency of the parity information depends on the determinate
>> pattern used and the number of disks. If you used determinate
>> all-zero, then parity information would always be consistent, but this
>> is probably not preferable since every TRIM command would incur an
>> extra write for each bit in each page of the block.
>
> True, and there are several solutions.  Maybe track space used via
> some mechanism, such that when you trim you're only trimming the
> entire stripe width so no parity is required for the trimmed regions.
> Or, trust the drive's wear leveling and endurance rating, combined
> with SMART data, to indicate when you need to replace the device
> preemptive to eventual failure.
>
> It's not an unsolvable issue.  If the RAID5 used distributed parity,
> you could expect wear leveling to wear all the devices evenly, since
> on average, the # of writes to all devices will be the same.  Only a
> RAID4 setup would see a lopsided amount of writes to a single device.
>
> --eric
>
> --
> Eric D. Mudama
> edmudama@bounceswoosh.org
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 18:18                 ` Roberto Spadim
@ 2011-02-09 18:24                   ` Piergiorgio Sartor
  2011-02-09 18:30                     ` Roberto Spadim
  0 siblings, 1 reply; 70+ messages in thread
From: Piergiorgio Sartor @ 2011-02-09 18:24 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid

> ext4 send trim commands to device (disk/md raid/nbd)
> kernel swap send this commands (when possible) to device too
> for internal raid5 parity disk this could be done by md, for data
> disks this should be done by ext4

That's an interesting point.

On which basis should a parity "block" get a TRIM?

If you ask me, I think the complete TRIM story is, at
best, a temporary patch.

IMHO the wear levelling should be handled by the filesystem
and, with awarness of this, by the underlining device drivers.
Reason is that the FS knows better what's going on with the
blocks and what will happen.

bye,

pg

> 
> the other question... about resync with only write what is different
> this is very good since write and read speed can be different for ssd
> (hd don´t have this 'problem')
> but i´m sure that just write what is diff is better than write all
> (ssd life will be bigger, hd maybe... i think that will be bigger too)
> 
> 
> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>:
> > On Wed, Feb  9 at 11:28, Scott E. Armitage wrote:
> >>
> >> Who sends this command? If md can assume that determinate mode is
> >> always set, then RAID 1 at least would remain consistent. For RAID 5,
> >> consistency of the parity information depends on the determinate
> >> pattern used and the number of disks. If you used determinate
> >> all-zero, then parity information would always be consistent, but this
> >> is probably not preferable since every TRIM command would incur an
> >> extra write for each bit in each page of the block.
> >
> > True, and there are several solutions.  Maybe track space used via
> > some mechanism, such that when you trim you're only trimming the
> > entire stripe width so no parity is required for the trimmed regions.
> > Or, trust the drive's wear leveling and endurance rating, combined
> > with SMART data, to indicate when you need to replace the device
> > preemptive to eventual failure.
> >
> > It's not an unsolvable issue.  If the RAID5 used distributed parity,
> > you could expect wear leveling to wear all the devices evenly, since
> > on average, the # of writes to all devices will be the same.  Only a
> > RAID4 setup would see a lopsided amount of writes to a single device.
> >
> > --eric
> >
> > --
> > Eric D. Mudama
> > edmudama@bounceswoosh.org
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 
> 
> -- 
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

piergiorgio
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 18:24                   ` Piergiorgio Sartor
@ 2011-02-09 18:30                     ` Roberto Spadim
  2011-02-09 18:38                       ` Piergiorgio Sartor
  0 siblings, 1 reply; 70+ messages in thread
From: Roberto Spadim @ 2011-02-09 18:30 UTC (permalink / raw)
  To: Piergiorgio Sartor
  Cc: Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid

nice =)
but check that parity block is a raid information, not a filesystem information
for raid we could implement trim when possible (like swap)
and implement a trim that we receive from filesystem, and send to all
disks (if it´s a raid1 with mirrors, we should sent to all mirrors)
i don´t know what trim do very well, but i think it´s a very big write
with only some bits for example:
set sector1='00000000000000000000000000000000000000000000000000'
could be replace by:
trim sector1
it´s faster for sata communication, and it´s a good information for
hard disk (it can put a single '0' at the start of the sector and know
that all sector is 0, if it try to read any information it can use
internal memory (don´t read hard disk), if a write is done it should
write 0000 to bits, and after after the write operation, but it´s
internal function of hard disk/ssd, not a problem of md raid... md
raid should need know how to optimize and use it =] )

2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
>> ext4 send trim commands to device (disk/md raid/nbd)
>> kernel swap send this commands (when possible) to device too
>> for internal raid5 parity disk this could be done by md, for data
>> disks this should be done by ext4
>
> That's an interesting point.
>
> On which basis should a parity "block" get a TRIM?
>
> If you ask me, I think the complete TRIM story is, at
> best, a temporary patch.
>
> IMHO the wear levelling should be handled by the filesystem
> and, with awarness of this, by the underlining device drivers.
> Reason is that the FS knows better what's going on with the
> blocks and what will happen.
>
> bye,
>
> pg
>
>>
>> the other question... about resync with only write what is different
>> this is very good since write and read speed can be different for ssd
>> (hd don´t have this 'problem')
>> but i´m sure that just write what is diff is better than write all
>> (ssd life will be bigger, hd maybe... i think that will be bigger too)
>>
>>
>> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>:
>> > On Wed, Feb  9 at 11:28, Scott E. Armitage wrote:
>> >>
>> >> Who sends this command? If md can assume that determinate mode is
>> >> always set, then RAID 1 at least would remain consistent. For RAID 5,
>> >> consistency of the parity information depends on the determinate
>> >> pattern used and the number of disks. If you used determinate
>> >> all-zero, then parity information would always be consistent, but this
>> >> is probably not preferable since every TRIM command would incur an
>> >> extra write for each bit in each page of the block.
>> >
>> > True, and there are several solutions.  Maybe track space used via
>> > some mechanism, such that when you trim you're only trimming the
>> > entire stripe width so no parity is required for the trimmed regions.
>> > Or, trust the drive's wear leveling and endurance rating, combined
>> > with SMART data, to indicate when you need to replace the device
>> > preemptive to eventual failure.
>> >
>> > It's not an unsolvable issue.  If the RAID5 used distributed parity,
>> > you could expect wear leveling to wear all the devices evenly, since
>> > on average, the # of writes to all devices will be the same.  Only a
>> > RAID4 setup would see a lopsided amount of writes to a single device.
>> >
>> > --eric
>> >
>> > --
>> > Eric D. Mudama
>> > edmudama@bounceswoosh.org
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
>
> piergiorgio
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 18:30                     ` Roberto Spadim
@ 2011-02-09 18:38                       ` Piergiorgio Sartor
  2011-02-09 18:46                         ` Roberto Spadim
  0 siblings, 1 reply; 70+ messages in thread
From: Piergiorgio Sartor @ 2011-02-09 18:38 UTC (permalink / raw)
  To: Roberto Spadim
  Cc: Piergiorgio Sartor, Eric D. Mudama, Scott E. Armitage,
	David Brown, linux-raid

On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote:
> nice =)
> but check that parity block is a raid information, not a filesystem information
> for raid we could implement trim when possible (like swap)
> and implement a trim that we receive from filesystem, and send to all
> disks (if it´s a raid1 with mirrors, we should sent to all mirrors)

To all disk also in case of RAID-5?

What if the TRIM belongs only to a single SDD block
belonging to a single chunk of a stripe?
That is a *single* SSD of the RAID-5.

Should md re-read the block and re-write (not TRIM)
the parity?

I think anything that has to do with checking &
repairing must be carefully considered...

bye,

pg

> i don´t know what trim do very well, but i think it´s a very big write
> with only some bits for example:
> set sector1='00000000000000000000000000000000000000000000000000'
> could be replace by:
> trim sector1
> it´s faster for sata communication, and it´s a good information for
> hard disk (it can put a single '0' at the start of the sector and know
> that all sector is 0, if it try to read any information it can use
> internal memory (don´t read hard disk), if a write is done it should
> write 0000 to bits, and after after the write operation, but it´s
> internal function of hard disk/ssd, not a problem of md raid... md
> raid should need know how to optimize and use it =] )
> 
> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
> >> ext4 send trim commands to device (disk/md raid/nbd)
> >> kernel swap send this commands (when possible) to device too
> >> for internal raid5 parity disk this could be done by md, for data
> >> disks this should be done by ext4
> >
> > That's an interesting point.
> >
> > On which basis should a parity "block" get a TRIM?
> >
> > If you ask me, I think the complete TRIM story is, at
> > best, a temporary patch.
> >
> > IMHO the wear levelling should be handled by the filesystem
> > and, with awarness of this, by the underlining device drivers.
> > Reason is that the FS knows better what's going on with the
> > blocks and what will happen.
> >
> > bye,
> >
> > pg
> >
> >>
> >> the other question... about resync with only write what is different
> >> this is very good since write and read speed can be different for ssd
> >> (hd don´t have this 'problem')
> >> but i´m sure that just write what is diff is better than write all
> >> (ssd life will be bigger, hd maybe... i think that will be bigger too)
> >>
> >>
> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>:
> >> > On Wed, Feb  9 at 11:28, Scott E. Armitage wrote:
> >> >>
> >> >> Who sends this command? If md can assume that determinate mode is
> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5,
> >> >> consistency of the parity information depends on the determinate
> >> >> pattern used and the number of disks. If you used determinate
> >> >> all-zero, then parity information would always be consistent, but this
> >> >> is probably not preferable since every TRIM command would incur an
> >> >> extra write for each bit in each page of the block.
> >> >
> >> > True, and there are several solutions.  Maybe track space used via
> >> > some mechanism, such that when you trim you're only trimming the
> >> > entire stripe width so no parity is required for the trimmed regions.
> >> > Or, trust the drive's wear leveling and endurance rating, combined
> >> > with SMART data, to indicate when you need to replace the device
> >> > preemptive to eventual failure.
> >> >
> >> > It's not an unsolvable issue.  If the RAID5 used distributed parity,
> >> > you could expect wear leveling to wear all the devices evenly, since
> >> > on average, the # of writes to all devices will be the same.  Only a
> >> > RAID4 setup would see a lopsided amount of writes to a single device.
> >> >
> >> > --eric
> >> >
> >> > --
> >> > Eric D. Mudama
> >> > edmudama@bounceswoosh.org
> >> >
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> > the body of a message to majordomo@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >>
> >>
> >>
> >> --
> >> Roberto Spadim
> >> Spadim Technology / SPAEmpresarial
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> > --
> >
> > piergiorgio
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 
> 
> -- 
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 

piergiorgio
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 18:38                       ` Piergiorgio Sartor
@ 2011-02-09 18:46                         ` Roberto Spadim
  2011-02-09 18:52                           ` Roberto Spadim
  2011-02-09 19:13                           ` Piergiorgio Sartor
  0 siblings, 2 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-09 18:46 UTC (permalink / raw)
  To: Piergiorgio Sartor
  Cc: Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid

it´s just a discussion, right? no implementation yet, right?

what i think....
if device accept TRIM, we can use TRIM.
if not, we must translate TRIM to something similar (maybe many WRITES
?), and when we READ from disk we get the same information
the translation coulbe be done by kernel (not md) maybe options on
libata, nbd device....
other option is do it with md, internal (md) TRIM translate function

who send trim?
internal md information: md can generate it (if necessary, maybe it´s
not...) for parity disks (not data disks)
filesystem/or another upper layer program (database with direct device
access), we could accept TRIM from filesystem/database, and send it to
disks/mirrors, when necessary translate it (internal or kernel
translate function)


2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
> On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote:
>> nice =)
>> but check that parity block is a raid information, not a filesystem information
>> for raid we could implement trim when possible (like swap)
>> and implement a trim that we receive from filesystem, and send to all
>> disks (if it´s a raid1 with mirrors, we should sent to all mirrors)
>
> To all disk also in case of RAID-5?
>
> What if the TRIM belongs only to a single SDD block
> belonging to a single chunk of a stripe?
> That is a *single* SSD of the RAID-5.
>
> Should md re-read the block and re-write (not TRIM)
> the parity?
>
> I think anything that has to do with checking &
> repairing must be carefully considered...
>
> bye,
>
> pg
>
>> i don´t know what trim do very well, but i think it´s a very big write
>> with only some bits for example:
>> set sector1='00000000000000000000000000000000000000000000000000'
>> could be replace by:
>> trim sector1
>> it´s faster for sata communication, and it´s a good information for
>> hard disk (it can put a single '0' at the start of the sector and know
>> that all sector is 0, if it try to read any information it can use
>> internal memory (don´t read hard disk), if a write is done it should
>> write 0000 to bits, and after after the write operation, but it´s
>> internal function of hard disk/ssd, not a problem of md raid... md
>> raid should need know how to optimize and use it =] )
>>
>> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
>> >> ext4 send trim commands to device (disk/md raid/nbd)
>> >> kernel swap send this commands (when possible) to device too
>> >> for internal raid5 parity disk this could be done by md, for data
>> >> disks this should be done by ext4
>> >
>> > That's an interesting point.
>> >
>> > On which basis should a parity "block" get a TRIM?
>> >
>> > If you ask me, I think the complete TRIM story is, at
>> > best, a temporary patch.
>> >
>> > IMHO the wear levelling should be handled by the filesystem
>> > and, with awarness of this, by the underlining device drivers.
>> > Reason is that the FS knows better what's going on with the
>> > blocks and what will happen.
>> >
>> > bye,
>> >
>> > pg
>> >
>> >>
>> >> the other question... about resync with only write what is different
>> >> this is very good since write and read speed can be different for ssd
>> >> (hd don´t have this 'problem')
>> >> but i´m sure that just write what is diff is better than write all
>> >> (ssd life will be bigger, hd maybe... i think that will be bigger too)
>> >>
>> >>
>> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>:
>> >> > On Wed, Feb  9 at 11:28, Scott E. Armitage wrote:
>> >> >>
>> >> >> Who sends this command? If md can assume that determinate mode is
>> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5,
>> >> >> consistency of the parity information depends on the determinate
>> >> >> pattern used and the number of disks. If you used determinate
>> >> >> all-zero, then parity information would always be consistent, but this
>> >> >> is probably not preferable since every TRIM command would incur an
>> >> >> extra write for each bit in each page of the block.
>> >> >
>> >> > True, and there are several solutions.  Maybe track space used via
>> >> > some mechanism, such that when you trim you're only trimming the
>> >> > entire stripe width so no parity is required for the trimmed regions.
>> >> > Or, trust the drive's wear leveling and endurance rating, combined
>> >> > with SMART data, to indicate when you need to replace the device
>> >> > preemptive to eventual failure.
>> >> >
>> >> > It's not an unsolvable issue.  If the RAID5 used distributed parity,
>> >> > you could expect wear leveling to wear all the devices evenly, since
>> >> > on average, the # of writes to all devices will be the same.  Only a
>> >> > RAID4 setup would see a lopsided amount of writes to a single device.
>> >> >
>> >> > --eric
>> >> >
>> >> > --
>> >> > Eric D. Mudama
>> >> > edmudama@bounceswoosh.org
>> >> >
>> >> > --
>> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> > the body of a message to majordomo@vger.kernel.org
>> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Roberto Spadim
>> >> Spadim Technology / SPAEmpresarial
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> > --
>> >
>> > piergiorgio
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> --
>
> piergiorgio
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 18:46                         ` Roberto Spadim
@ 2011-02-09 18:52                           ` Roberto Spadim
  2011-02-09 19:13                           ` Piergiorgio Sartor
  1 sibling, 0 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-09 18:52 UTC (permalink / raw)
  To: Piergiorgio Sartor
  Cc: Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid

the other question...
checked and repair
i don´t know the today resync implementation (i need read source code)
but, a read check diferences and after write if any diference is
found, is better than write without check diferences
why better?
to SSD: it will have a bigger life
to HDD: i think it will have a bigger life too (I THINK)
the problem: more operations
without check:
READ from source, WRITE to mirror
with check:
READ from source, READ from mirror, check diff, WRITE to mirror if diff

maybe a option to mdadm could set the md device to RESYNC WITH CHECK,
and RESYNC WITHOUT CHECK
it´s a user option, not a md option, right? if user want a fast resync
it can use without check or with check, but we can give user
options... that´s very nice (to user), the default option? i think
WITHOUT CHECK should be the default option, without check is a feature
like default chuck size...


2011/2/9 Roberto Spadim <roberto@spadim.com.br>:
> it´s just a discussion, right? no implementation yet, right?
>
> what i think....
> if device accept TRIM, we can use TRIM.
> if not, we must translate TRIM to something similar (maybe many WRITES
> ?), and when we READ from disk we get the same information
> the translation coulbe be done by kernel (not md) maybe options on
> libata, nbd device....
> other option is do it with md, internal (md) TRIM translate function
>
> who send trim?
> internal md information: md can generate it (if necessary, maybe it´s
> not...) for parity disks (not data disks)
> filesystem/or another upper layer program (database with direct device
> access), we could accept TRIM from filesystem/database, and send it to
> disks/mirrors, when necessary translate it (internal or kernel
> translate function)
>
>
> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
>> On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote:
>>> nice =)
>>> but check that parity block is a raid information, not a filesystem information
>>> for raid we could implement trim when possible (like swap)
>>> and implement a trim that we receive from filesystem, and send to all
>>> disks (if it´s a raid1 with mirrors, we should sent to all mirrors)
>>
>> To all disk also in case of RAID-5?
>>
>> What if the TRIM belongs only to a single SDD block
>> belonging to a single chunk of a stripe?
>> That is a *single* SSD of the RAID-5.
>>
>> Should md re-read the block and re-write (not TRIM)
>> the parity?
>>
>> I think anything that has to do with checking &
>> repairing must be carefully considered...
>>
>> bye,
>>
>> pg
>>
>>> i don´t know what trim do very well, but i think it´s a very big write
>>> with only some bits for example:
>>> set sector1='00000000000000000000000000000000000000000000000000'
>>> could be replace by:
>>> trim sector1
>>> it´s faster for sata communication, and it´s a good information for
>>> hard disk (it can put a single '0' at the start of the sector and know
>>> that all sector is 0, if it try to read any information it can use
>>> internal memory (don´t read hard disk), if a write is done it should
>>> write 0000 to bits, and after after the write operation, but it´s
>>> internal function of hard disk/ssd, not a problem of md raid... md
>>> raid should need know how to optimize and use it =] )
>>>
>>> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
>>> >> ext4 send trim commands to device (disk/md raid/nbd)
>>> >> kernel swap send this commands (when possible) to device too
>>> >> for internal raid5 parity disk this could be done by md, for data
>>> >> disks this should be done by ext4
>>> >
>>> > That's an interesting point.
>>> >
>>> > On which basis should a parity "block" get a TRIM?
>>> >
>>> > If you ask me, I think the complete TRIM story is, at
>>> > best, a temporary patch.
>>> >
>>> > IMHO the wear levelling should be handled by the filesystem
>>> > and, with awarness of this, by the underlining device drivers.
>>> > Reason is that the FS knows better what's going on with the
>>> > blocks and what will happen.
>>> >
>>> > bye,
>>> >
>>> > pg
>>> >
>>> >>
>>> >> the other question... about resync with only write what is different
>>> >> this is very good since write and read speed can be different for ssd
>>> >> (hd don´t have this 'problem')
>>> >> but i´m sure that just write what is diff is better than write all
>>> >> (ssd life will be bigger, hd maybe... i think that will be bigger too)
>>> >>
>>> >>
>>> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>:
>>> >> > On Wed, Feb  9 at 11:28, Scott E. Armitage wrote:
>>> >> >>
>>> >> >> Who sends this command? If md can assume that determinate mode is
>>> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5,
>>> >> >> consistency of the parity information depends on the determinate
>>> >> >> pattern used and the number of disks. If you used determinate
>>> >> >> all-zero, then parity information would always be consistent, but this
>>> >> >> is probably not preferable since every TRIM command would incur an
>>> >> >> extra write for each bit in each page of the block.
>>> >> >
>>> >> > True, and there are several solutions.  Maybe track space used via
>>> >> > some mechanism, such that when you trim you're only trimming the
>>> >> > entire stripe width so no parity is required for the trimmed regions.
>>> >> > Or, trust the drive's wear leveling and endurance rating, combined
>>> >> > with SMART data, to indicate when you need to replace the device
>>> >> > preemptive to eventual failure.
>>> >> >
>>> >> > It's not an unsolvable issue.  If the RAID5 used distributed parity,
>>> >> > you could expect wear leveling to wear all the devices evenly, since
>>> >> > on average, the # of writes to all devices will be the same.  Only a
>>> >> > RAID4 setup would see a lopsided amount of writes to a single device.
>>> >> >
>>> >> > --eric
>>> >> >
>>> >> > --
>>> >> > Eric D. Mudama
>>> >> > edmudama@bounceswoosh.org
>>> >> >
>>> >> > --
>>> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> >> > the body of a message to majordomo@vger.kernel.org
>>> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Roberto Spadim
>>> >> Spadim Technology / SPAEmpresarial
>>> >> --
>>> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> >> the body of a message to majordomo@vger.kernel.org
>>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >
>>> > --
>>> >
>>> > piergiorgio
>>> > --
>>> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> > the body of a message to majordomo@vger.kernel.org
>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >
>>>
>>>
>>>
>>> --
>>> Roberto Spadim
>>> Spadim Technology / SPAEmpresarial
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> --
>>
>> piergiorgio
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 18:46                         ` Roberto Spadim
  2011-02-09 18:52                           ` Roberto Spadim
@ 2011-02-09 19:13                           ` Piergiorgio Sartor
  2011-02-09 19:16                             ` Roberto Spadim
  1 sibling, 1 reply; 70+ messages in thread
From: Piergiorgio Sartor @ 2011-02-09 19:13 UTC (permalink / raw)
  To: Roberto Spadim
  Cc: Piergiorgio Sartor, Eric D. Mudama, Scott E. Armitage,
	David Brown, linux-raid

> it´s just a discussion, right? no implementation yet, right?

Of course...
 
> what i think....
> if device accept TRIM, we can use TRIM.
> if not, we must translate TRIM to something similar (maybe many WRITES
> ?), and when we READ from disk we get the same information

TRIM is not about writing at all. TRIM tells the
device that the addressed block is not anymore used,
so it (the SSD) can do whatever it wants with it.

The only software layer having the same "knowledge"
is the filesystem, the other layers, do not have
any decisional power about the block allocation.
Except for metadata, of course.

So, IMHO, a software TRIM can only be in the FS.

bye,

pg

> the translation coulbe be done by kernel (not md) maybe options on
> libata, nbd device....
> other option is do it with md, internal (md) TRIM translate function
> 
> who send trim?
> internal md information: md can generate it (if necessary, maybe it´s
> not...) for parity disks (not data disks)
> filesystem/or another upper layer program (database with direct device
> access), we could accept TRIM from filesystem/database, and send it to
> disks/mirrors, when necessary translate it (internal or kernel
> translate function)
> 
> 
> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
> > On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote:
> >> nice =)
> >> but check that parity block is a raid information, not a filesystem information
> >> for raid we could implement trim when possible (like swap)
> >> and implement a trim that we receive from filesystem, and send to all
> >> disks (if it´s a raid1 with mirrors, we should sent to all mirrors)
> >
> > To all disk also in case of RAID-5?
> >
> > What if the TRIM belongs only to a single SDD block
> > belonging to a single chunk of a stripe?
> > That is a *single* SSD of the RAID-5.
> >
> > Should md re-read the block and re-write (not TRIM)
> > the parity?
> >
> > I think anything that has to do with checking &
> > repairing must be carefully considered...
> >
> > bye,
> >
> > pg
> >
> >> i don´t know what trim do very well, but i think it´s a very big write
> >> with only some bits for example:
> >> set sector1='00000000000000000000000000000000000000000000000000'
> >> could be replace by:
> >> trim sector1
> >> it´s faster for sata communication, and it´s a good information for
> >> hard disk (it can put a single '0' at the start of the sector and know
> >> that all sector is 0, if it try to read any information it can use
> >> internal memory (don´t read hard disk), if a write is done it should
> >> write 0000 to bits, and after after the write operation, but it´s
> >> internal function of hard disk/ssd, not a problem of md raid... md
> >> raid should need know how to optimize and use it =] )
> >>
> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
> >> >> ext4 send trim commands to device (disk/md raid/nbd)
> >> >> kernel swap send this commands (when possible) to device too
> >> >> for internal raid5 parity disk this could be done by md, for data
> >> >> disks this should be done by ext4
> >> >
> >> > That's an interesting point.
> >> >
> >> > On which basis should a parity "block" get a TRIM?
> >> >
> >> > If you ask me, I think the complete TRIM story is, at
> >> > best, a temporary patch.
> >> >
> >> > IMHO the wear levelling should be handled by the filesystem
> >> > and, with awarness of this, by the underlining device drivers.
> >> > Reason is that the FS knows better what's going on with the
> >> > blocks and what will happen.
> >> >
> >> > bye,
> >> >
> >> > pg
> >> >
> >> >>
> >> >> the other question... about resync with only write what is different
> >> >> this is very good since write and read speed can be different for ssd
> >> >> (hd don´t have this 'problem')
> >> >> but i´m sure that just write what is diff is better than write all
> >> >> (ssd life will be bigger, hd maybe... i think that will be bigger too)
> >> >>
> >> >>
> >> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>:
> >> >> > On Wed, Feb  9 at 11:28, Scott E. Armitage wrote:
> >> >> >>
> >> >> >> Who sends this command? If md can assume that determinate mode is
> >> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5,
> >> >> >> consistency of the parity information depends on the determinate
> >> >> >> pattern used and the number of disks. If you used determinate
> >> >> >> all-zero, then parity information would always be consistent, but this
> >> >> >> is probably not preferable since every TRIM command would incur an
> >> >> >> extra write for each bit in each page of the block.
> >> >> >
> >> >> > True, and there are several solutions.  Maybe track space used via
> >> >> > some mechanism, such that when you trim you're only trimming the
> >> >> > entire stripe width so no parity is required for the trimmed regions.
> >> >> > Or, trust the drive's wear leveling and endurance rating, combined
> >> >> > with SMART data, to indicate when you need to replace the device
> >> >> > preemptive to eventual failure.
> >> >> >
> >> >> > It's not an unsolvable issue.  If the RAID5 used distributed parity,
> >> >> > you could expect wear leveling to wear all the devices evenly, since
> >> >> > on average, the # of writes to all devices will be the same.  Only a
> >> >> > RAID4 setup would see a lopsided amount of writes to a single device.
> >> >> >
> >> >> > --eric
> >> >> >
> >> >> > --
> >> >> > Eric D. Mudama
> >> >> > edmudama@bounceswoosh.org
> >> >> >
> >> >> > --
> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> >> > the body of a message to majordomo@vger.kernel.org
> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Roberto Spadim
> >> >> Spadim Technology / SPAEmpresarial
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >> > --
> >> >
> >> > piergiorgio
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> > the body of a message to majordomo@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >>
> >>
> >>
> >> --
> >> Roberto Spadim
> >> Spadim Technology / SPAEmpresarial
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> > --
> >
> > piergiorgio
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 
> 
> -- 
> Roberto Spadim
> Spadim Technology / SPAEmpresarial

-- 

piergiorgio
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 15:52           ` Chris Worley
@ 2011-02-09 19:15             ` Doug Dumitru
  2011-02-09 19:22               ` Roberto Spadim
  0 siblings, 1 reply; 70+ messages in thread
From: Doug Dumitru @ 2011-02-09 19:15 UTC (permalink / raw)
  To: Chris Worley; +Cc: Scott E. Armitage, Roberto Spadim, David Brown, linux-raid

I work with SSDs arrays all the time, so I have a couple of thoughts
about trim and md.

'trim' is still necessary.  SandForce controllers are "better" at
this, but still need free space to do their work.  I had a set of SF
drives drop to 22 MB/sec writes because they were full and scrambled.
It takes a lot of effort to get them that messed up, but it can still
happen.  Trim brings them back.

The bottom line is that SSDs do block re-organization on the fly and
free space makes the re-org more efficient.  More efficient means
faster, and as importantly less wear amplification.

Most SSDs (and I think the latest trim spec) are deterministic on
trim'd sectors.  If you trim a sector, they read that sector as zeros.
 This makes raid much "safer".

raid/0,1,10 should be fine to echo discard commands down to the
downstream drives in the bio request.  It is then up to the physical
device driver to turn the discard bio request into an ATA (or SCSI)
trim.  Most block devices don't seem to understand discard requests
yet, but this will get better over time.

raid/4,5,6 is a lot more complicated.  With raid/4,5 with an even
number of drives, you can trim whole stripes safely.  Pieces of
stripes get interesting because you have to treat a trim as a write of
zeros and re-calc parity.  raid/6 will always have parity issues
regardless of how many drives there are.  Even worse is that
raid/4,5,6 parity read/modify/write operations tend to chatter the FTL
(Flash Translation Layer) logic and make matters worse (often much
worse).  If you are not streaming long linear writes, raid/4,5,6 in a
heavy write environment is a probably a very bad idea for most SSDs.

Another issue with trim is how "async" it behaves.  You can trim a lot
of data to a drive, but it is hard to tell when the drive actually is
ready afterwards.  Some drives also choke on trim requests that come
at them too fast or requests that are too long.  The behavior can be
quite random.  So then comes the issue of how many "user knobs" to
supply to tune what trims where.  Again, raid/0,1,10 are pretty easy.
Raid/4,5,6 really requires that you know the precise geometry and
control the IO.  Way beyond what ext4 understands at this point.

Trim can also be "faked" with some drives.  Again, looking at the
SandForce based drives, these drive internally de-dupe so you can fake
write data and help the drives get free space.  Do this by filling the
drive with zeros (ie, dd if=/dev/zero of=big.file bs=1M), do a sync,
and then delete the big.file.  This works through md, across SANs,
from XEN virtuals, or wherever.  With SandForce drives, this is not as
effective as a trim, but better than nothing.  Unfortunately, only
SandForce drives and Flash Supercharger understand zero's this way.  A
filesystem option that "zeros discarded sectors" would actually make
as much sense in some deployment settings as the discard option (not
sure, but ext# might already have this).  NTFS has actually supported
this since XP as a security enhancement.

Doug Dumitru
EasyCo LLC

ps:  My background with this has been the development of Flash
SuperCharger.  I am not trying to run an advert here, but the care and
feeding of SSDs can be interesting.  Flash SuperCharger breaks most of
these rules, but it does know the exact geometry of what it is driving
and plays excessive games to drives SSDs at their exact "sweet spot".
One of our licensees just sent me some benchmarks at > 500,000 4K
random writes/sec for a moderate sized array running raid/5.

pps:  Failures of SSDs are different than HDDs.  SSDs can and do fail
and need raid for many applications.  If you need high write IOPS, it
pretty much has to be raid/1,10 (unless you run our Flash SuperCharger
layer).

ppps:  I have seen SSDs silently return corrupted data.  Disks do this
as well.  A paper from 2 years ago quoted disk silent error rates as
high as 1 bad block every 73TB read.  Very scary stuff, but probably
beyond the scope of what md can address.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 19:13                           ` Piergiorgio Sartor
@ 2011-02-09 19:16                             ` Roberto Spadim
  2011-02-09 19:21                               ` Piergiorgio Sartor
  0 siblings, 1 reply; 70+ messages in thread
From: Roberto Spadim @ 2011-02-09 19:16 UTC (permalink / raw)
  To: Piergiorgio Sartor
  Cc: Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid

yeah =)
a question...
if i send a TRIM to a sector
if i read from it
what i have?
0x00000000000000000000000000000000000 ?
if yes, we could translate TRIM to WRITE on devices without TRIM (hard disks)
just to have the same READ information

2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
>> it´s just a discussion, right? no implementation yet, right?
>
> Of course...
>
>> what i think....
>> if device accept TRIM, we can use TRIM.
>> if not, we must translate TRIM to something similar (maybe many WRITES
>> ?), and when we READ from disk we get the same information
>
> TRIM is not about writing at all. TRIM tells the
> device that the addressed block is not anymore used,
> so it (the SSD) can do whatever it wants with it.
>
> The only software layer having the same "knowledge"
> is the filesystem, the other layers, do not have
> any decisional power about the block allocation.
> Except for metadata, of course.
>
> So, IMHO, a software TRIM can only be in the FS.
>
> bye,
>
> pg
>
>> the translation coulbe be done by kernel (not md) maybe options on
>> libata, nbd device....
>> other option is do it with md, internal (md) TRIM translate function
>>
>> who send trim?
>> internal md information: md can generate it (if necessary, maybe it´s
>> not...) for parity disks (not data disks)
>> filesystem/or another upper layer program (database with direct device
>> access), we could accept TRIM from filesystem/database, and send it to
>> disks/mirrors, when necessary translate it (internal or kernel
>> translate function)
>>
>>
>> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
>> > On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote:
>> >> nice =)
>> >> but check that parity block is a raid information, not a filesystem information
>> >> for raid we could implement trim when possible (like swap)
>> >> and implement a trim that we receive from filesystem, and send to all
>> >> disks (if it´s a raid1 with mirrors, we should sent to all mirrors)
>> >
>> > To all disk also in case of RAID-5?
>> >
>> > What if the TRIM belongs only to a single SDD block
>> > belonging to a single chunk of a stripe?
>> > That is a *single* SSD of the RAID-5.
>> >
>> > Should md re-read the block and re-write (not TRIM)
>> > the parity?
>> >
>> > I think anything that has to do with checking &
>> > repairing must be carefully considered...
>> >
>> > bye,
>> >
>> > pg
>> >
>> >> i don´t know what trim do very well, but i think it´s a very big write
>> >> with only some bits for example:
>> >> set sector1='00000000000000000000000000000000000000000000000000'
>> >> could be replace by:
>> >> trim sector1
>> >> it´s faster for sata communication, and it´s a good information for
>> >> hard disk (it can put a single '0' at the start of the sector and know
>> >> that all sector is 0, if it try to read any information it can use
>> >> internal memory (don´t read hard disk), if a write is done it should
>> >> write 0000 to bits, and after after the write operation, but it´s
>> >> internal function of hard disk/ssd, not a problem of md raid... md
>> >> raid should need know how to optimize and use it =] )
>> >>
>> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
>> >> >> ext4 send trim commands to device (disk/md raid/nbd)
>> >> >> kernel swap send this commands (when possible) to device too
>> >> >> for internal raid5 parity disk this could be done by md, for data
>> >> >> disks this should be done by ext4
>> >> >
>> >> > That's an interesting point.
>> >> >
>> >> > On which basis should a parity "block" get a TRIM?
>> >> >
>> >> > If you ask me, I think the complete TRIM story is, at
>> >> > best, a temporary patch.
>> >> >
>> >> > IMHO the wear levelling should be handled by the filesystem
>> >> > and, with awarness of this, by the underlining device drivers.
>> >> > Reason is that the FS knows better what's going on with the
>> >> > blocks and what will happen.
>> >> >
>> >> > bye,
>> >> >
>> >> > pg
>> >> >
>> >> >>
>> >> >> the other question... about resync with only write what is different
>> >> >> this is very good since write and read speed can be different for ssd
>> >> >> (hd don´t have this 'problem')
>> >> >> but i´m sure that just write what is diff is better than write all
>> >> >> (ssd life will be bigger, hd maybe... i think that will be bigger too)
>> >> >>
>> >> >>
>> >> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>:
>> >> >> > On Wed, Feb  9 at 11:28, Scott E. Armitage wrote:
>> >> >> >>
>> >> >> >> Who sends this command? If md can assume that determinate mode is
>> >> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5,
>> >> >> >> consistency of the parity information depends on the determinate
>> >> >> >> pattern used and the number of disks. If you used determinate
>> >> >> >> all-zero, then parity information would always be consistent, but this
>> >> >> >> is probably not preferable since every TRIM command would incur an
>> >> >> >> extra write for each bit in each page of the block.
>> >> >> >
>> >> >> > True, and there are several solutions.  Maybe track space used via
>> >> >> > some mechanism, such that when you trim you're only trimming the
>> >> >> > entire stripe width so no parity is required for the trimmed regions.
>> >> >> > Or, trust the drive's wear leveling and endurance rating, combined
>> >> >> > with SMART data, to indicate when you need to replace the device
>> >> >> > preemptive to eventual failure.
>> >> >> >
>> >> >> > It's not an unsolvable issue.  If the RAID5 used distributed parity,
>> >> >> > you could expect wear leveling to wear all the devices evenly, since
>> >> >> > on average, the # of writes to all devices will be the same.  Only a
>> >> >> > RAID4 setup would see a lopsided amount of writes to a single device.
>> >> >> >
>> >> >> > --eric
>> >> >> >
>> >> >> > --
>> >> >> > Eric D. Mudama
>> >> >> > edmudama@bounceswoosh.org
>> >> >> >
>> >> >> > --
>> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> >> > the body of a message to majordomo@vger.kernel.org
>> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Roberto Spadim
>> >> >> Spadim Technology / SPAEmpresarial
>> >> >> --
>> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >
>> >> > --
>> >> >
>> >> > piergiorgio
>> >> > --
>> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> > the body of a message to majordomo@vger.kernel.org
>> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Roberto Spadim
>> >> Spadim Technology / SPAEmpresarial
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> > --
>> >
>> > piergiorgio
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>
> --
>
> piergiorgio
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 19:16                             ` Roberto Spadim
@ 2011-02-09 19:21                               ` Piergiorgio Sartor
  2011-02-09 19:27                                 ` Roberto Spadim
  0 siblings, 1 reply; 70+ messages in thread
From: Piergiorgio Sartor @ 2011-02-09 19:21 UTC (permalink / raw)
  To: Roberto Spadim
  Cc: Piergiorgio Sartor, Eric D. Mudama, Scott E. Armitage,
	David Brown, linux-raid

> yeah =)
> a question...
> if i send a TRIM to a sector
> if i read from it
> what i have?
> 0x00000000000000000000000000000000000 ?
> if yes, we could translate TRIM to WRITE on devices without TRIM (hard disks)
> just to have the same READ information

It seems the 0x0 is not a standard. Return values
seem to be quite undefined, even if 0x0 *might*
be common.

Second, why do you want to emulate the 0x0 thing?

I do not see the point of writing zero on a device
which do not support TRIM. Just do nothing seems a
better choice, even in mixed environment.

bye,

pg
 
> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
> >> it´s just a discussion, right? no implementation yet, right?
> >
> > Of course...
> >
> >> what i think....
> >> if device accept TRIM, we can use TRIM.
> >> if not, we must translate TRIM to something similar (maybe many WRITES
> >> ?), and when we READ from disk we get the same information
> >
> > TRIM is not about writing at all. TRIM tells the
> > device that the addressed block is not anymore used,
> > so it (the SSD) can do whatever it wants with it.
> >
> > The only software layer having the same "knowledge"
> > is the filesystem, the other layers, do not have
> > any decisional power about the block allocation.
> > Except for metadata, of course.
> >
> > So, IMHO, a software TRIM can only be in the FS.
> >
> > bye,
> >
> > pg
> >
> >> the translation coulbe be done by kernel (not md) maybe options on
> >> libata, nbd device....
> >> other option is do it with md, internal (md) TRIM translate function
> >>
> >> who send trim?
> >> internal md information: md can generate it (if necessary, maybe it´s
> >> not...) for parity disks (not data disks)
> >> filesystem/or another upper layer program (database with direct device
> >> access), we could accept TRIM from filesystem/database, and send it to
> >> disks/mirrors, when necessary translate it (internal or kernel
> >> translate function)
> >>
> >>
> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
> >> > On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote:
> >> >> nice =)
> >> >> but check that parity block is a raid information, not a filesystem information
> >> >> for raid we could implement trim when possible (like swap)
> >> >> and implement a trim that we receive from filesystem, and send to all
> >> >> disks (if it´s a raid1 with mirrors, we should sent to all mirrors)
> >> >
> >> > To all disk also in case of RAID-5?
> >> >
> >> > What if the TRIM belongs only to a single SDD block
> >> > belonging to a single chunk of a stripe?
> >> > That is a *single* SSD of the RAID-5.
> >> >
> >> > Should md re-read the block and re-write (not TRIM)
> >> > the parity?
> >> >
> >> > I think anything that has to do with checking &
> >> > repairing must be carefully considered...
> >> >
> >> > bye,
> >> >
> >> > pg
> >> >
> >> >> i don´t know what trim do very well, but i think it´s a very big write
> >> >> with only some bits for example:
> >> >> set sector1='00000000000000000000000000000000000000000000000000'
> >> >> could be replace by:
> >> >> trim sector1
> >> >> it´s faster for sata communication, and it´s a good information for
> >> >> hard disk (it can put a single '0' at the start of the sector and know
> >> >> that all sector is 0, if it try to read any information it can use
> >> >> internal memory (don´t read hard disk), if a write is done it should
> >> >> write 0000 to bits, and after after the write operation, but it´s
> >> >> internal function of hard disk/ssd, not a problem of md raid... md
> >> >> raid should need know how to optimize and use it =] )
> >> >>
> >> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
> >> >> >> ext4 send trim commands to device (disk/md raid/nbd)
> >> >> >> kernel swap send this commands (when possible) to device too
> >> >> >> for internal raid5 parity disk this could be done by md, for data
> >> >> >> disks this should be done by ext4
> >> >> >
> >> >> > That's an interesting point.
> >> >> >
> >> >> > On which basis should a parity "block" get a TRIM?
> >> >> >
> >> >> > If you ask me, I think the complete TRIM story is, at
> >> >> > best, a temporary patch.
> >> >> >
> >> >> > IMHO the wear levelling should be handled by the filesystem
> >> >> > and, with awarness of this, by the underlining device drivers.
> >> >> > Reason is that the FS knows better what's going on with the
> >> >> > blocks and what will happen.
> >> >> >
> >> >> > bye,
> >> >> >
> >> >> > pg
> >> >> >
> >> >> >>
> >> >> >> the other question... about resync with only write what is different
> >> >> >> this is very good since write and read speed can be different for ssd
> >> >> >> (hd don´t have this 'problem')
> >> >> >> but i´m sure that just write what is diff is better than write all
> >> >> >> (ssd life will be bigger, hd maybe... i think that will be bigger too)
> >> >> >>
> >> >> >>
> >> >> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>:
> >> >> >> > On Wed, Feb  9 at 11:28, Scott E. Armitage wrote:
> >> >> >> >>
> >> >> >> >> Who sends this command? If md can assume that determinate mode is
> >> >> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5,
> >> >> >> >> consistency of the parity information depends on the determinate
> >> >> >> >> pattern used and the number of disks. If you used determinate
> >> >> >> >> all-zero, then parity information would always be consistent, but this
> >> >> >> >> is probably not preferable since every TRIM command would incur an
> >> >> >> >> extra write for each bit in each page of the block.
> >> >> >> >
> >> >> >> > True, and there are several solutions.  Maybe track space used via
> >> >> >> > some mechanism, such that when you trim you're only trimming the
> >> >> >> > entire stripe width so no parity is required for the trimmed regions.
> >> >> >> > Or, trust the drive's wear leveling and endurance rating, combined
> >> >> >> > with SMART data, to indicate when you need to replace the device
> >> >> >> > preemptive to eventual failure.
> >> >> >> >
> >> >> >> > It's not an unsolvable issue.  If the RAID5 used distributed parity,
> >> >> >> > you could expect wear leveling to wear all the devices evenly, since
> >> >> >> > on average, the # of writes to all devices will be the same.  Only a
> >> >> >> > RAID4 setup would see a lopsided amount of writes to a single device.
> >> >> >> >
> >> >> >> > --eric
> >> >> >> >
> >> >> >> > --
> >> >> >> > Eric D. Mudama
> >> >> >> > edmudama@bounceswoosh.org
> >> >> >> >
> >> >> >> > --
> >> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> >> >> > the body of a message to majordomo@vger.kernel.org
> >> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Roberto Spadim
> >> >> >> Spadim Technology / SPAEmpresarial
> >> >> >> --
> >> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >
> >> >> > --
> >> >> >
> >> >> > piergiorgio
> >> >> > --
> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> >> > the body of a message to majordomo@vger.kernel.org
> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Roberto Spadim
> >> >> Spadim Technology / SPAEmpresarial
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >> > --
> >> >
> >> > piergiorgio
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> > the body of a message to majordomo@vger.kernel.org
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >>
> >>
> >>
> >> --
> >> Roberto Spadim
> >> Spadim Technology / SPAEmpresarial
> >
> > --
> >
> > piergiorgio
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 
> 
> -- 
> Roberto Spadim
> Spadim Technology / SPAEmpresarial

-- 

piergiorgio
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 19:15             ` Doug Dumitru
@ 2011-02-09 19:22               ` Roberto Spadim
  0 siblings, 0 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-09 19:22 UTC (permalink / raw)
  To: doug; +Cc: Chris Worley, Scott E. Armitage, David Brown, linux-raid

i agree with ppps
that´s why ecc, checksum and parity is usefull (raid5,6) (raid1 if you
read from all mirror to check difference and select the 'right disk')

2011/2/9 Doug Dumitru <doug@easyco.com>:
> I work with SSDs arrays all the time, so I have a couple of thoughts
> about trim and md.
>
> 'trim' is still necessary.  SandForce controllers are "better" at
> this, but still need free space to do their work.  I had a set of SF
> drives drop to 22 MB/sec writes because they were full and scrambled.
> It takes a lot of effort to get them that messed up, but it can still
> happen.  Trim brings them back.
>
> The bottom line is that SSDs do block re-organization on the fly and
> free space makes the re-org more efficient.  More efficient means
> faster, and as importantly less wear amplification.
>
> Most SSDs (and I think the latest trim spec) are deterministic on
> trim'd sectors.  If you trim a sector, they read that sector as zeros.
>  This makes raid much "safer".
>
> raid/0,1,10 should be fine to echo discard commands down to the
> downstream drives in the bio request.  It is then up to the physical
> device driver to turn the discard bio request into an ATA (or SCSI)
> trim.  Most block devices don't seem to understand discard requests
> yet, but this will get better over time.
>
> raid/4,5,6 is a lot more complicated.  With raid/4,5 with an even
> number of drives, you can trim whole stripes safely.  Pieces of
> stripes get interesting because you have to treat a trim as a write of
> zeros and re-calc parity.  raid/6 will always have parity issues
> regardless of how many drives there are.  Even worse is that
> raid/4,5,6 parity read/modify/write operations tend to chatter the FTL
> (Flash Translation Layer) logic and make matters worse (often much
> worse).  If you are not streaming long linear writes, raid/4,5,6 in a
> heavy write environment is a probably a very bad idea for most SSDs.
>
> Another issue with trim is how "async" it behaves.  You can trim a lot
> of data to a drive, but it is hard to tell when the drive actually is
> ready afterwards.  Some drives also choke on trim requests that come
> at them too fast or requests that are too long.  The behavior can be
> quite random.  So then comes the issue of how many "user knobs" to
> supply to tune what trims where.  Again, raid/0,1,10 are pretty easy.
> Raid/4,5,6 really requires that you know the precise geometry and
> control the IO.  Way beyond what ext4 understands at this point.
>
> Trim can also be "faked" with some drives.  Again, looking at the
> SandForce based drives, these drive internally de-dupe so you can fake
> write data and help the drives get free space.  Do this by filling the
> drive with zeros (ie, dd if=/dev/zero of=big.file bs=1M), do a sync,
> and then delete the big.file.  This works through md, across SANs,
> from XEN virtuals, or wherever.  With SandForce drives, this is not as
> effective as a trim, but better than nothing.  Unfortunately, only
> SandForce drives and Flash Supercharger understand zero's this way.  A
> filesystem option that "zeros discarded sectors" would actually make
> as much sense in some deployment settings as the discard option (not
> sure, but ext# might already have this).  NTFS has actually supported
> this since XP as a security enhancement.
>
> Doug Dumitru
> EasyCo LLC
>
> ps:  My background with this has been the development of Flash
> SuperCharger.  I am not trying to run an advert here, but the care and
> feeding of SSDs can be interesting.  Flash SuperCharger breaks most of
> these rules, but it does know the exact geometry of what it is driving
> and plays excessive games to drives SSDs at their exact "sweet spot".
> One of our licensees just sent me some benchmarks at > 500,000 4K
> random writes/sec for a moderate sized array running raid/5.
>
> pps:  Failures of SSDs are different than HDDs.  SSDs can and do fail
> and need raid for many applications.  If you need high write IOPS, it
> pretty much has to be raid/1,10 (unless you run our Flash SuperCharger
> layer).
>
> ppps:  I have seen SSDs silently return corrupted data.  Disks do this
> as well.  A paper from 2 years ago quoted disk silent error rates as
> high as 1 bad block every 73TB read.  Very scary stuff, but probably
> beyond the scope of what md can address.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 19:21                               ` Piergiorgio Sartor
@ 2011-02-09 19:27                                 ` Roberto Spadim
  0 siblings, 0 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-09 19:27 UTC (permalink / raw)
  To: Piergiorgio Sartor
  Cc: Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid

just to make READ ok with any drive mix
if device have TRIM, use it
if not use WRITE 0x000000...
after if we READ from /dev/md0
we have the same information (0x000000) doesn´t matter if it´s a ssd
hd with or without trim function
ext4 send trim command (but it´s a user option, should be used only
with TRIM supported disks)
swap send (it´s not a user option, kernel check if device can execute
TRIM, if not don´t send (i don´t know what it do, but we could use the
same code to 'emulate' TRIM command, like swap do))

why emulate? because we can use a mixed array (ssd/hd) and get more
performace from TRIM enabled disks and ext4 (or other filesystem that
will use md as a device)
the point is: put support of TRIM command to MD devices
today i don´t know if it have (i think not)
if exists this support, how it works? could we mix TRIM enabled and
non TRIM devices in a raid array?

the first option is don´t use trim
the second use trim when possible, emulate trim when impossible
the third only accept trim if all devices are trim enabled (this
should be a run time option, since we can remove a mirror with trim
support and put a mirror without trim support)

2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
>> yeah =)
>> a question...
>> if i send a TRIM to a sector
>> if i read from it
>> what i have?
>> 0x00000000000000000000000000000000000 ?
>> if yes, we could translate TRIM to WRITE on devices without TRIM (hard disks)
>> just to have the same READ information
>
> It seems the 0x0 is not a standard. Return values
> seem to be quite undefined, even if 0x0 *might*
> be common.
>
> Second, why do you want to emulate the 0x0 thing?
>
> I do not see the point of writing zero on a device
> which do not support TRIM. Just do nothing seems a
> better choice, even in mixed environment.
>
> bye,
>
> pg
>
>> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
>> >> it´s just a discussion, right? no implementation yet, right?
>> >
>> > Of course...
>> >
>> >> what i think....
>> >> if device accept TRIM, we can use TRIM.
>> >> if not, we must translate TRIM to something similar (maybe many WRITES
>> >> ?), and when we READ from disk we get the same information
>> >
>> > TRIM is not about writing at all. TRIM tells the
>> > device that the addressed block is not anymore used,
>> > so it (the SSD) can do whatever it wants with it.
>> >
>> > The only software layer having the same "knowledge"
>> > is the filesystem, the other layers, do not have
>> > any decisional power about the block allocation.
>> > Except for metadata, of course.
>> >
>> > So, IMHO, a software TRIM can only be in the FS.
>> >
>> > bye,
>> >
>> > pg
>> >
>> >> the translation coulbe be done by kernel (not md) maybe options on
>> >> libata, nbd device....
>> >> other option is do it with md, internal (md) TRIM translate function
>> >>
>> >> who send trim?
>> >> internal md information: md can generate it (if necessary, maybe it´s
>> >> not...) for parity disks (not data disks)
>> >> filesystem/or another upper layer program (database with direct device
>> >> access), we could accept TRIM from filesystem/database, and send it to
>> >> disks/mirrors, when necessary translate it (internal or kernel
>> >> translate function)
>> >>
>> >>
>> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
>> >> > On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote:
>> >> >> nice =)
>> >> >> but check that parity block is a raid information, not a filesystem information
>> >> >> for raid we could implement trim when possible (like swap)
>> >> >> and implement a trim that we receive from filesystem, and send to all
>> >> >> disks (if it´s a raid1 with mirrors, we should sent to all mirrors)
>> >> >
>> >> > To all disk also in case of RAID-5?
>> >> >
>> >> > What if the TRIM belongs only to a single SDD block
>> >> > belonging to a single chunk of a stripe?
>> >> > That is a *single* SSD of the RAID-5.
>> >> >
>> >> > Should md re-read the block and re-write (not TRIM)
>> >> > the parity?
>> >> >
>> >> > I think anything that has to do with checking &
>> >> > repairing must be carefully considered...
>> >> >
>> >> > bye,
>> >> >
>> >> > pg
>> >> >
>> >> >> i don´t know what trim do very well, but i think it´s a very big write
>> >> >> with only some bits for example:
>> >> >> set sector1='00000000000000000000000000000000000000000000000000'
>> >> >> could be replace by:
>> >> >> trim sector1
>> >> >> it´s faster for sata communication, and it´s a good information for
>> >> >> hard disk (it can put a single '0' at the start of the sector and know
>> >> >> that all sector is 0, if it try to read any information it can use
>> >> >> internal memory (don´t read hard disk), if a write is done it should
>> >> >> write 0000 to bits, and after after the write operation, but it´s
>> >> >> internal function of hard disk/ssd, not a problem of md raid... md
>> >> >> raid should need know how to optimize and use it =] )
>> >> >>
>> >> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>:
>> >> >> >> ext4 send trim commands to device (disk/md raid/nbd)
>> >> >> >> kernel swap send this commands (when possible) to device too
>> >> >> >> for internal raid5 parity disk this could be done by md, for data
>> >> >> >> disks this should be done by ext4
>> >> >> >
>> >> >> > That's an interesting point.
>> >> >> >
>> >> >> > On which basis should a parity "block" get a TRIM?
>> >> >> >
>> >> >> > If you ask me, I think the complete TRIM story is, at
>> >> >> > best, a temporary patch.
>> >> >> >
>> >> >> > IMHO the wear levelling should be handled by the filesystem
>> >> >> > and, with awarness of this, by the underlining device drivers.
>> >> >> > Reason is that the FS knows better what's going on with the
>> >> >> > blocks and what will happen.
>> >> >> >
>> >> >> > bye,
>> >> >> >
>> >> >> > pg
>> >> >> >
>> >> >> >>
>> >> >> >> the other question... about resync with only write what is different
>> >> >> >> this is very good since write and read speed can be different for ssd
>> >> >> >> (hd don´t have this 'problem')
>> >> >> >> but i´m sure that just write what is diff is better than write all
>> >> >> >> (ssd life will be bigger, hd maybe... i think that will be bigger too)
>> >> >> >>
>> >> >> >>
>> >> >> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>:
>> >> >> >> > On Wed, Feb  9 at 11:28, Scott E. Armitage wrote:
>> >> >> >> >>
>> >> >> >> >> Who sends this command? If md can assume that determinate mode is
>> >> >> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5,
>> >> >> >> >> consistency of the parity information depends on the determinate
>> >> >> >> >> pattern used and the number of disks. If you used determinate
>> >> >> >> >> all-zero, then parity information would always be consistent, but this
>> >> >> >> >> is probably not preferable since every TRIM command would incur an
>> >> >> >> >> extra write for each bit in each page of the block.
>> >> >> >> >
>> >> >> >> > True, and there are several solutions.  Maybe track space used via
>> >> >> >> > some mechanism, such that when you trim you're only trimming the
>> >> >> >> > entire stripe width so no parity is required for the trimmed regions.
>> >> >> >> > Or, trust the drive's wear leveling and endurance rating, combined
>> >> >> >> > with SMART data, to indicate when you need to replace the device
>> >> >> >> > preemptive to eventual failure.
>> >> >> >> >
>> >> >> >> > It's not an unsolvable issue.  If the RAID5 used distributed parity,
>> >> >> >> > you could expect wear leveling to wear all the devices evenly, since
>> >> >> >> > on average, the # of writes to all devices will be the same.  Only a
>> >> >> >> > RAID4 setup would see a lopsided amount of writes to a single device.
>> >> >> >> >
>> >> >> >> > --eric
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > Eric D. Mudama
>> >> >> >> > edmudama@bounceswoosh.org
>> >> >> >> >
>> >> >> >> > --
>> >> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> >> >> > the body of a message to majordomo@vger.kernel.org
>> >> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >> >> >
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> --
>> >> >> >> Roberto Spadim
>> >> >> >> Spadim Technology / SPAEmpresarial
>> >> >> >> --
>> >> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >> >
>> >> >> > --
>> >> >> >
>> >> >> > piergiorgio
>> >> >> > --
>> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> >> > the body of a message to majordomo@vger.kernel.org
>> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Roberto Spadim
>> >> >> Spadim Technology / SPAEmpresarial
>> >> >> --
>> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >
>> >> > --
>> >> >
>> >> > piergiorgio
>> >> > --
>> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> >> > the body of a message to majordomo@vger.kernel.org
>> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Roberto Spadim
>> >> Spadim Technology / SPAEmpresarial
>> >
>> > --
>> >
>> > piergiorgio
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>
> --
>
> piergiorgio
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 15:49         ` David Brown
@ 2011-02-21 18:20           ` Phillip Susi
  2011-02-21 18:25             ` Roberto Spadim
  0 siblings, 1 reply; 70+ messages in thread
From: Phillip Susi @ 2011-02-21 18:20 UTC (permalink / raw)
  To: David Brown; +Cc: linux-raid

On 2/9/2011 10:49 AM, David Brown wrote:
> I've been reading a little more about this.  It seems that the days of
> TRIM may well be numbered - the latest generation of high-end SSDs have
> more powerful garbage collection algorithms, together with more spare
> blocks, making TRIM pretty much redundant.  This is, of course, the most
> convenient solution for everyone (as long as it doesn't cost too much!).
> 
> The point of the TRIM command is to tell the SSD that a particular block
> is no longer being used, so that the SSD can erase it in the background
> - that way when you want to write more data, there are more free blocks
> ready and waiting.  But if you've got plenty of spare blocks, it's easy
> to have them erased in advance and you don't need TRIM.

It is not just about having free blocks ready and waiting.  When doing
wear leveling, you might find an erase block that has not been written
to in a long time, so you want to move that data to a more worn block,
and use the less worn block for more frequently written to sectors.  If
you know that sectors are unused because they have been TRIMed, then you
don't have to waste time and wear copying the junk there to the new
flash block.

TRIM is also quite useful for thin provisioned storage, which seems to
be getting popular.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-09 16:19           ` Eric D. Mudama
  2011-02-09 16:28             ` Scott E. Armitage
@ 2011-02-21 18:24             ` Phillip Susi
  2011-02-21 18:30               ` Roberto Spadim
  1 sibling, 1 reply; 70+ messages in thread
From: Phillip Susi @ 2011-02-21 18:24 UTC (permalink / raw)
  To: Eric D. Mudama; +Cc: Scott E. Armitage, Roberto Spadim, David Brown, linux-raid

On 2/9/2011 11:19 AM, Eric D. Mudama wrote:
> For SATA devices, ATA8-ACS2 addresses this through Deterministic Read
> After Trim in the DATA SET MANAGEMENT command.  Devices can be
> indeterminate, determinate with a non-zero pattern (often all-ones) or
> determinate all-zero for sectors read after being trimmed.

IIRC, it was a word in the IDENTIFY response, not the DATA SET
MANAGEMENT command.

On 2/9/2011 11:28 AM, Scott E. Armitage wrote:
> Who sends this command? If md can assume that determinate mode is
> always set, then RAID 1 at least would remain consistent. For RAID 5,
> consistency of the parity information depends on the determinate
> pattern used and the number of disks. If you used determinate
> all-zero, then parity information would always be consistent, but this
> is probably not preferable since every TRIM command would incur an
> extra write for each bit in each page of the block.

The drive tells YOU how its trim behaves; you don't command it.

If the drive is deterministic and always returns zeros after TRIM, then
mdadm could pass the TRIM down and process it like a write of all zeros,
and recompute the parity.  If it isn't deterministic, then I don't think
there's anything you can do to handle TRIM requests.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 18:20           ` Phillip Susi
@ 2011-02-21 18:25             ` Roberto Spadim
  2011-02-21 18:34               ` Phillip Susi
  2011-02-21 18:51               ` Mathias Burén
  0 siblings, 2 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-21 18:25 UTC (permalink / raw)
  To: Phillip Susi; +Cc: David Brown, linux-raid

TRIM is a new feature for many hard disk/ssd
it´s more to get a bigger life o disk, allow a dynamic badblock
reallocation (filesystem must tell where is empty)


2011/2/21 Phillip Susi <psusi@cfl.rr.com>:
> On 2/9/2011 10:49 AM, David Brown wrote:
>> I've been reading a little more about this.  It seems that the days of
>> TRIM may well be numbered - the latest generation of high-end SSDs have
>> more powerful garbage collection algorithms, together with more spare
>> blocks, making TRIM pretty much redundant.  This is, of course, the most
>> convenient solution for everyone (as long as it doesn't cost too much!).
>>
>> The point of the TRIM command is to tell the SSD that a particular block
>> is no longer being used, so that the SSD can erase it in the background
>> - that way when you want to write more data, there are more free blocks
>> ready and waiting.  But if you've got plenty of spare blocks, it's easy
>> to have them erased in advance and you don't need TRIM.
>
> It is not just about having free blocks ready and waiting.  When doing
> wear leveling, you might find an erase block that has not been written
> to in a long time, so you want to move that data to a more worn block,
> and use the less worn block for more frequently written to sectors.  If
> you know that sectors are unused because they have been TRIMed, then you
> don't have to waste time and wear copying the junk there to the new
> flash block.
>
> TRIM is also quite useful for thin provisioned storage, which seems to
> be getting popular.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 18:24             ` Phillip Susi
@ 2011-02-21 18:30               ` Roberto Spadim
  0 siblings, 0 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-21 18:30 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid

just some ideas...

hummm thinking about TRIM in a mixed supported/non supported raid1 array...

when a filesystem will read a block that is trimmed?
since filesystem first write and after read, maybe never
trimmed blocks are unused blocks (filesystem know where they are)
maybe with a (read/compare/write if diff) resync function, we will
have problems with non trimmed (with support to TRIM) disks being
added on raid1
maybe....

i think that sending trim to devices isn´t a problem, it´s
optimization of disk that must be done by filesystem, raid1 should
only send this command  to disks. the problem is, if a disk don´t have
trim, we must implement a trim compatible command (or not...
filesystem know about free blocks)



2011/2/21 Phillip Susi <psusi@cfl.rr.com>:
> On 2/9/2011 11:19 AM, Eric D. Mudama wrote:
>> For SATA devices, ATA8-ACS2 addresses this through Deterministic Read
>> After Trim in the DATA SET MANAGEMENT command.  Devices can be
>> indeterminate, determinate with a non-zero pattern (often all-ones) or
>> determinate all-zero for sectors read after being trimmed.
>
> IIRC, it was a word in the IDENTIFY response, not the DATA SET
> MANAGEMENT command.
>
> On 2/9/2011 11:28 AM, Scott E. Armitage wrote:
>> Who sends this command? If md can assume that determinate mode is
>> always set, then RAID 1 at least would remain consistent. For RAID 5,
>> consistency of the parity information depends on the determinate
>> pattern used and the number of disks. If you used determinate
>> all-zero, then parity information would always be consistent, but this
>> is probably not preferable since every TRIM command would incur an
>> extra write for each bit in each page of the block.
>
> The drive tells YOU how its trim behaves; you don't command it.
>
> If the drive is deterministic and always returns zeros after TRIM, then
> mdadm could pass the TRIM down and process it like a write of all zeros,
> and recompute the parity.  If it isn't deterministic, then I don't think
> there's anything you can do to handle TRIM requests.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 18:25             ` Roberto Spadim
@ 2011-02-21 18:34               ` Phillip Susi
  2011-02-21 18:48                 ` Roberto Spadim
  2011-02-21 18:51               ` Mathias Burén
  1 sibling, 1 reply; 70+ messages in thread
From: Phillip Susi @ 2011-02-21 18:34 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: David Brown, linux-raid

On 2/21/2011 1:25 PM, Roberto Spadim wrote:
> TRIM is a new feature for many hard disk/ssd
> it´s more to get a bigger life o disk, allow a dynamic badblock
> reallocation (filesystem must tell where is empty)

Ummm... thanks????

I know quite well what TRIM is, which is why I was discussing how mdadm
could support it.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 18:34               ` Phillip Susi
@ 2011-02-21 18:48                 ` Roberto Spadim
  0 siblings, 0 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-21 18:48 UTC (permalink / raw)
  To: Phillip Susi; +Cc: David Brown, linux-raid

yeah, for raid1 just send trim to device (if no layout is in use)
for stripe must have a rewrite o command and check if we could use trim
for internal raid informations we shoudn´t use

2011/2/21 Phillip Susi <psusi@cfl.rr.com>:
> On 2/21/2011 1:25 PM, Roberto Spadim wrote:
>> TRIM is a new feature for many hard disk/ssd
>> it´s more to get a bigger life o disk, allow a dynamic badblock
>> reallocation (filesystem must tell where is empty)
>
> Ummm... thanks????
>
> I know quite well what TRIM is, which is why I was discussing how mdadm
> could support it.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 18:25             ` Roberto Spadim
  2011-02-21 18:34               ` Phillip Susi
@ 2011-02-21 18:51               ` Mathias Burén
  2011-02-21 19:32                 ` Roberto Spadim
  1 sibling, 1 reply; 70+ messages in thread
From: Mathias Burén @ 2011-02-21 18:51 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Phillip Susi, David Brown, linux-raid

(please don't top post)

On 21 February 2011 18:25, Roberto Spadim <roberto@spadim.com.br> wrote:
> TRIM is a new feature for many hard disk/ssd
> it´s more to get a bigger life o disk, allow a dynamic badblock
> reallocation (filesystem must tell where is empty)
>
>
> 2011/2/21 Phillip Susi <psusi@cfl.rr.com>:
>> On 2/9/2011 10:49 AM, David Brown wrote:
>>> I've been reading a little more about this.  It seems that the days of
>>> TRIM may well be numbered - the latest generation of high-end SSDs have
>>> more powerful garbage collection algorithms, together with more spare
>>> blocks, making TRIM pretty much redundant.  This is, of course, the most
>>> convenient solution for everyone (as long as it doesn't cost too much!).
>>>
>>> The point of the TRIM command is to tell the SSD that a particular block
>>> is no longer being used, so that the SSD can erase it in the background
>>> - that way when you want to write more data, there are more free blocks
>>> ready and waiting.  But if you've got plenty of spare blocks, it's easy
>>> to have them erased in advance and you don't need TRIM.
>>
>> It is not just about having free blocks ready and waiting.  When doing
>> wear leveling, you might find an erase block that has not been written
>> to in a long time, so you want to move that data to a more worn block,
>> and use the less worn block for more frequently written to sectors.  If
>> you know that sectors are unused because they have been TRIMed, then you
>> don't have to waste time and wear copying the junk there to the new
>> flash block.
>>
>> TRIM is also quite useful for thin provisioned storage, which seems to
>> be getting popular.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

TRIM is not a new feature for HDDs as they don't have the problem that
SSDs have. Where did you hear this?

// Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 18:51               ` Mathias Burén
@ 2011-02-21 19:32                 ` Roberto Spadim
  2011-02-21 19:38                   ` Mathias Burén
  2011-02-21 19:39                   ` Roberto Spadim
  0 siblings, 2 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-21 19:32 UTC (permalink / raw)
  To: Mathias Burén; +Cc: Phillip Susi, David Brown, linux-raid

TRIM isn´t a problem, it´s a solution to optimize dynamic allocation,
and life time of devices (SSD or harddisk)
i don´t see any problem to implement trim command on hard disks (not
in linux, but at harddisk firmware level)

hard disk have the same problem of ssd, allocation of badblocks, any
harddisk could implement trim and use it to realloc badblocks...

-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 19:32                 ` Roberto Spadim
@ 2011-02-21 19:38                   ` Mathias Burén
  2011-02-21 19:39                     ` Mathias Burén
  2011-02-21 19:39                   ` Roberto Spadim
  1 sibling, 1 reply; 70+ messages in thread
From: Mathias Burén @ 2011-02-21 19:38 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Phillip Susi, David Brown, linux-raid

On 21 February 2011 19:32, Roberto Spadim <roberto@spadim.com.br> wrote:
> TRIM isn´t a problem, it´s a solution to optimize dynamic allocation,
> and life time of devices (SSD or harddisk)
> i don´t see any problem to implement trim command on hard disks (not
> in linux, but at harddisk firmware level)
>
> hard disk have the same problem of ssd, allocation of badblocks, any
> harddisk could implement trim and use it to realloc badblocks...
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>

I don't think you understand TRIM. It wouldn't work, and there is no
need for it, on a HDD. AFAIK a HDD does not have the same penalty as a
SSD does when it needs to write to a (previously) used area. An SSD
cannot do this without erasing the whole (block? page?), usually 512KB
in size (varies between different manufacturers), but the data that's
on there still needs to be moved elsewhere first, block erased, data
moved back the same time the new data is written together with it.
AFAIK it works something like this anyway. The only benefit TRIM will
give you would be potentially faster writes, right.

// M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 19:38                   ` Mathias Burén
@ 2011-02-21 19:39                     ` Mathias Burén
  2011-02-21 19:43                       ` Roberto Spadim
  2011-02-21 20:45                       ` Phillip Susi
  0 siblings, 2 replies; 70+ messages in thread
From: Mathias Burén @ 2011-02-21 19:39 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Phillip Susi, David Brown, linux-raid

On 21 February 2011 19:38, Mathias Burén <mathias.buren@gmail.com> wrote:
> On 21 February 2011 19:32, Roberto Spadim <roberto@spadim.com.br> wrote:
>> TRIM isn´t a problem, it´s a solution to optimize dynamic allocation,
>> and life time of devices (SSD or harddisk)
>> i don´t see any problem to implement trim command on hard disks (not
>> in linux, but at harddisk firmware level)
>>
>> hard disk have the same problem of ssd, allocation of badblocks, any
>> harddisk could implement trim and use it to realloc badblocks...
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>>
>
> I don't think you understand TRIM. It wouldn't work, and there is no
> need for it, on a HDD. AFAIK a HDD does not have the same penalty as a
> SSD does when it needs to write to a (previously) used area. An SSD
> cannot do this without erasing the whole (block? page?), usually 512KB
> in size (varies between different manufacturers), but the data that's
> on there still needs to be moved elsewhere first, block erased, data
> moved back the same time the new data is written together with it.
> AFAIK it works something like this anyway. The only benefit TRIM will
> give you would be potentially faster writes, right.
>
> // M
>

Plus support is needed from the kernel (done) filesystem (ext4 has
it). The filesystem seese the MD device, not the actual SSDs behind
it, so it would probably be quite complicated to implement passthrough
of the trim command in this case.

// M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 19:32                 ` Roberto Spadim
  2011-02-21 19:38                   ` Mathias Burén
@ 2011-02-21 19:39                   ` Roberto Spadim
  2011-02-21 19:51                     ` Doug Dumitru
  2011-02-21 20:47                     ` Phillip Susi
  1 sibling, 2 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-21 19:39 UTC (permalink / raw)
  To: Mathias Burén; +Cc: Phillip Susi, David Brown, linux-raid

sorry, but i sent email without a information:
TRIM is a 'ATA Specification' command

http://en.wikipedia.org/wiki/TRIM_command

any disk with ATA command could suport TRIM, hard disk or ssd or
anyother type of phisical allocation


2011/2/21 Roberto Spadim <roberto@spadim.com.br>:
> TRIM isn´t a problem, it´s a solution to optimize dynamic allocation,
> and life time of devices (SSD or harddisk)
> i don´t see any problem to implement trim command on hard disks (not
> in linux, but at harddisk firmware level)
>
> hard disk have the same problem of ssd, allocation of badblocks, any
> harddisk could implement trim and use it to realloc badblocks...
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 19:39                     ` Mathias Burén
@ 2011-02-21 19:43                       ` Roberto Spadim
  2011-02-21 20:45                       ` Phillip Susi
  1 sibling, 0 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-21 19:43 UTC (permalink / raw)
  To: Mathias Burén; +Cc: Phillip Susi, David Brown, linux-raid

yeah, the idea of implement of TRIM at MD is to send TRIM to devices
that was received by MD on filesystem level
raid1 + no layout + all mirrors with TRIM support=> i think it´s easy
to implement... just send the command to mirros (ssd or hd, since they
support it)
for striped devices?! maybe could support, it´s more dificult
for linear raid0 it could be easy too

2011/2/21 Mathias Burén <mathias.buren@gmail.com>:
> On 21 February 2011 19:38, Mathias Burén <mathias.buren@gmail.com> wrote:
>> On 21 February 2011 19:32, Roberto Spadim <roberto@spadim.com.br> wrote:
>>> TRIM isn´t a problem, it´s a solution to optimize dynamic allocation,
>>> and life time of devices (SSD or harddisk)
>>> i don´t see any problem to implement trim command on hard disks (not
>>> in linux, but at harddisk firmware level)
>>>
>>> hard disk have the same problem of ssd, allocation of badblocks, any
>>> harddisk could implement trim and use it to realloc badblocks...
>>>
>>> --
>>> Roberto Spadim
>>> Spadim Technology / SPAEmpresarial
>>>
>>
>> I don't think you understand TRIM. It wouldn't work, and there is no
>> need for it, on a HDD. AFAIK a HDD does not have the same penalty as a
>> SSD does when it needs to write to a (previously) used area. An SSD
>> cannot do this without erasing the whole (block? page?), usually 512KB
>> in size (varies between different manufacturers), but the data that's
>> on there still needs to be moved elsewhere first, block erased, data
>> moved back the same time the new data is written together with it.
>> AFAIK it works something like this anyway. The only benefit TRIM will
>> give you would be potentially faster writes, right.
>>
>> // M
>>
>
> Plus support is needed from the kernel (done) filesystem (ext4 has
> it). The filesystem seese the MD device, not the actual SSDs behind
> it, so it would probably be quite complicated to implement passthrough
> of the trim command in this case.
>
> // M
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 19:39                   ` Roberto Spadim
@ 2011-02-21 19:51                     ` Doug Dumitru
  2011-02-21 19:57                       ` Roberto Spadim
  2011-02-21 20:47                     ` Phillip Susi
  1 sibling, 1 reply; 70+ messages in thread
From: Doug Dumitru @ 2011-02-21 19:51 UTC (permalink / raw)
  To: linux-raid

To be technically accurate, trim is a hint to a storage device that
has a "block translation layer" that can take advantage of knowing
that a block contains no meaningful data.

Flash needs trim only if flash has an FTL (Flash Translation Layer)
that is re-mapping blocks in such a manner as free blocks are helpful
in making this process more efficient.  Older SSDs did not support
trim and had no real need for it.  If you look at the FTL used with
simple Flash (think CF cards, SD cards, and USB sticks) trim does not
help them.  Trim and wear leveling are un-related and don't really
impact each other.

On the linux side trim is "discard".  This is actually a much better
abstraction as it does not imply SSDs.

Any type of block device that does dynamic block remapping will likely
be helped (at least somewhat) by discard.  The only examples of this I
can think of off-hand are 1) my Flash SuperCharger code, and 2)
block-level de-dupe engines.  I am sure other examples will be created
over time.

Hopefully, discard can be driven down the stack.  I would personally
prefer the linux community declare that discard and zero writes are
identical.  If an SSD supports trim and linux wants to translate a
discard into a trim at the device driver layer, and the SSD is
non-deterministic, then that SSD is broken.  Then again, my attitude
about this is very arrogant and I think the trim spec was broken from
the beginning.

--
Doug Dumitru
EasyCo LLC

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 19:51                     ` Doug Dumitru
@ 2011-02-21 19:57                       ` Roberto Spadim
  0 siblings, 0 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-21 19:57 UTC (permalink / raw)
  To: doug; +Cc: linux-raid

>Then again, my attitude
> about this is very arrogant and I think the trim spec was broken from
> the beginning.


maybe... we could put all harddisk firmware at linux code... why we
need reallocation of harddisks? we need when filesystem don´t do it
we question is: can we implement TRIM at MD device?




2011/2/21 Doug Dumitru <doug@easyco.com>:
> To be technically accurate, trim is a hint to a storage device that
> has a "block translation layer" that can take advantage of knowing
> that a block contains no meaningful data.
>
> Flash needs trim only if flash has an FTL (Flash Translation Layer)
> that is re-mapping blocks in such a manner as free blocks are helpful
> in making this process more efficient.  Older SSDs did not support
> trim and had no real need for it.  If you look at the FTL used with
> simple Flash (think CF cards, SD cards, and USB sticks) trim does not
> help them.  Trim and wear leveling are un-related and don't really
> impact each other.
>
> On the linux side trim is "discard".  This is actually a much better
> abstraction as it does not imply SSDs.
>
> Any type of block device that does dynamic block remapping will likely
> be helped (at least somewhat) by discard.  The only examples of this I
> can think of off-hand are 1) my Flash SuperCharger code, and 2)
> block-level de-dupe engines.  I am sure other examples will be created
> over time.
>
> Hopefully, discard can be driven down the stack.  I would personally
> prefer the linux community declare that discard and zero writes are
> identical.  If an SSD supports trim and linux wants to translate a
> discard into a trim at the device driver layer, and the SSD is
> non-deterministic, then that SSD is broken.  Then again, my attitude
> about this is very arrogant and I think the trim spec was broken from
> the beginning.
>
> --
> Doug Dumitru
> EasyCo LLC
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 19:39                     ` Mathias Burén
  2011-02-21 19:43                       ` Roberto Spadim
@ 2011-02-21 20:45                       ` Phillip Susi
  1 sibling, 0 replies; 70+ messages in thread
From: Phillip Susi @ 2011-02-21 20:45 UTC (permalink / raw)
  To: Mathias Burén; +Cc: Roberto Spadim, David Brown, linux-raid

On 2/21/2011 2:39 PM, Mathias Burén wrote:
> Plus support is needed from the kernel (done) filesystem (ext4 has
> it). The filesystem seese the MD device, not the actual SSDs behind
> it, so it would probably be quite complicated to implement passthrough
> of the trim command in this case.

It has been mentioned at least twice now how to implement it.  The
device-mapper driver already has implemented TRIM passthrough for its
linear, stripe, and mirror targets.  The trick is handling it with raid[56].
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 19:39                   ` Roberto Spadim
  2011-02-21 19:51                     ` Doug Dumitru
@ 2011-02-21 20:47                     ` Phillip Susi
  2011-02-21 21:02                       ` Mathias Burén
  1 sibling, 1 reply; 70+ messages in thread
From: Phillip Susi @ 2011-02-21 20:47 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Mathias Burén, David Brown, linux-raid

On 2/21/2011 2:39 PM, Roberto Spadim wrote:
> sorry, but i sent email without a information:
> TRIM is a 'ATA Specification' command
> 
> http://en.wikipedia.org/wiki/TRIM_command
> 
> any disk with ATA command could suport TRIM, hard disk or ssd or
> anyother type of phisical allocation

Sure, but hard disks have no reason to, which is why they don't and
won't support it.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 20:47                     ` Phillip Susi
@ 2011-02-21 21:02                       ` Mathias Burén
  2011-02-21 22:52                         ` Roberto Spadim
  0 siblings, 1 reply; 70+ messages in thread
From: Mathias Burén @ 2011-02-21 21:02 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Roberto Spadim, David Brown, linux-raid

On 21 February 2011 20:47, Phillip Susi <psusi@cfl.rr.com> wrote:
> On 2/21/2011 2:39 PM, Roberto Spadim wrote:
>> sorry, but i sent email without a information:
>> TRIM is a 'ATA Specification' command
>>
>> http://en.wikipedia.org/wiki/TRIM_command
>>
>> any disk with ATA command could suport TRIM, hard disk or ssd or
>> anyother type of phisical allocation
>
> Sure, but hard disks have no reason to, which is why they don't and
> won't support it.
>

My point exactly.

// M

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 21:02                       ` Mathias Burén
@ 2011-02-21 22:52                         ` Roberto Spadim
  2011-02-21 23:41                           ` Mathias Burén
  2011-02-22  0:32                           ` Eric D. Mudama
  0 siblings, 2 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-21 22:52 UTC (permalink / raw)
  To: Mathias Burén; +Cc: Phillip Susi, David Brown, linux-raid

i don´t think so, since it´s ATA command, any ATA compatible can use
it, it could be used for HD with badblocks and dynamic reallocation
without problems, the harddisk don´t need a dedicated space for
badblock. for md software we must know if devices support or not TRIM.

the next question, md is ATA compatible? no!?, it´s a linux device,
not a ATA device. what commands linux devices allow? could md allow
TRIM?

2011/2/21 Mathias Burén <mathias.buren@gmail.com>:
> On 21 February 2011 20:47, Phillip Susi <psusi@cfl.rr.com> wrote:
>> On 2/21/2011 2:39 PM, Roberto Spadim wrote:
>>> sorry, but i sent email without a information:
>>> TRIM is a 'ATA Specification' command
>>>
>>> http://en.wikipedia.org/wiki/TRIM_command
>>>
>>> any disk with ATA command could suport TRIM, hard disk or ssd or
>>> anyother type of phisical allocation
>>
>> Sure, but hard disks have no reason to, which is why they don't and
>> won't support it.
>>
>
> My point exactly.
>
> // M
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 22:52                         ` Roberto Spadim
@ 2011-02-21 23:41                           ` Mathias Burén
  2011-02-21 23:42                             ` Mathias Burén
  2011-02-22  0:32                           ` Eric D. Mudama
  1 sibling, 1 reply; 70+ messages in thread
From: Mathias Burén @ 2011-02-21 23:41 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Phillip Susi, David Brown, linux-raid

On 21 February 2011 22:52, Roberto Spadim <roberto@spadim.com.br> wrote:
> i don´t think so, since it´s ATA command, any ATA compatible can use
> it, it could be used for HD with badblocks and dynamic reallocation
> without problems, the harddisk don´t need a dedicated space for
> badblock. for md software we must know if devices support or not TRIM.
>
> the next question, md is ATA compatible? no!?, it´s a linux device,
> not a ATA device. what commands linux devices allow? could md allow
> TRIM?
>
> 2011/2/21 Mathias Burén <mathias.buren@gmail.com>:
>> On 21 February 2011 20:47, Phillip Susi <psusi@cfl.rr.com> wrote:
>>> On 2/21/2011 2:39 PM, Roberto Spadim wrote:
>>>> sorry, but i sent email without a information:
>>>> TRIM is a 'ATA Specification' command
>>>>
>>>> http://en.wikipedia.org/wiki/TRIM_command
>>>>
>>>> any disk with ATA command could suport TRIM, hard disk or ssd or
>>>> anyother type of phisical allocation
>>>
>>> Sure, but hard disks have no reason to, which is why they don't and
>>> won't support it.
>>>
>>
>> My point exactly.
>>
>> // M
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>

Please don't top post.
http://www.splitbrain.org/blog/2011-02/15-top_posting_like_dont_i_why

Harddrives already have an allocated area with spare sectors, which
they use whenever they need to. You can find out how many sectors have
been reallocated by the HDD by looking at the SMART data, like so:

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
[...]
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
Always       -       0

// M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 23:41                           ` Mathias Burén
@ 2011-02-21 23:42                             ` Mathias Burén
  2011-02-21 23:52                               ` Roberto Spadim
  0 siblings, 1 reply; 70+ messages in thread
From: Mathias Burén @ 2011-02-21 23:42 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Phillip Susi, David Brown, linux-raid

On 21 February 2011 23:41, Mathias Burén <mathias.buren@gmail.com> wrote:
> On 21 February 2011 22:52, Roberto Spadim <roberto@spadim.com.br> wrote:
>> i don´t think so, since it´s ATA command, any ATA compatible can use
>> it, it could be used for HD with badblocks and dynamic reallocation
>> without problems, the harddisk don´t need a dedicated space for
>> badblock. for md software we must know if devices support or not TRIM.
>>
>> the next question, md is ATA compatible? no!?, it´s a linux device,
>> not a ATA device. what commands linux devices allow? could md allow
>> TRIM?
>>
>> 2011/2/21 Mathias Burén <mathias.buren@gmail.com>:
>>> On 21 February 2011 20:47, Phillip Susi <psusi@cfl.rr.com> wrote:
>>>> On 2/21/2011 2:39 PM, Roberto Spadim wrote:
>>>>> sorry, but i sent email without a information:
>>>>> TRIM is a 'ATA Specification' command
>>>>>
>>>>> http://en.wikipedia.org/wiki/TRIM_command
>>>>>
>>>>> any disk with ATA command could suport TRIM, hard disk or ssd or
>>>>> anyother type of phisical allocation
>>>>
>>>> Sure, but hard disks have no reason to, which is why they don't and
>>>> won't support it.
>>>>
>>>
>>> My point exactly.
>>>
>>> // M
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>
>>
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>>
>
> Please don't top post.
> http://www.splitbrain.org/blog/2011-02/15-top_posting_like_dont_i_why
>
> Harddrives already have an allocated area with spare sectors, which
> they use whenever they need to. You can find out how many sectors have
> been reallocated by the HDD by looking at the SMART data, like so:
>
> SMART Attributes Data Structure revision number: 16
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
> UPDATED  WHEN_FAILED RAW_VALUE
> [...]
>  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail
> Always       -       0
>
> // M
>

I forgot to write that the trim command has nothing to do with bad
blocks or sectors, it's just a way of "resetting" blocks so that can
be written to without having to erase them first. (IIRC)

There is no such issue with HDDs, therefore have no benefit at all
using the trim command with them.

// M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 23:42                             ` Mathias Burén
@ 2011-02-21 23:52                               ` Roberto Spadim
  2011-02-22  0:25                                 ` Mathias Burén
                                                   ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-21 23:52 UTC (permalink / raw)
  To: Mathias Burén; +Cc: Phillip Susi, David Brown, linux-raid

trim tell harddisk that those block are not in use

not in use block can be used by harddisk reallocation algorithm, like
spare sectors

hard disks can use TRIM command to 'create' 'good' blocks like spare sectors

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 23:52                               ` Roberto Spadim
@ 2011-02-22  0:25                                 ` Mathias Burén
  2011-02-22  0:30                                 ` Brendan Conoboy
  2011-02-22  0:36                                 ` Eric D. Mudama
  2 siblings, 0 replies; 70+ messages in thread
From: Mathias Burén @ 2011-02-22  0:25 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Phillip Susi, David Brown, linux-raid

On 21 February 2011 23:52, Roberto Spadim <roberto@spadim.com.br> wrote:
> trim tell harddisk that those block are not in use
>
> not in use block can be used by harddisk reallocation algorithm, like
> spare sectors
>
> hard disks can use TRIM command to 'create' 'good' blocks like spare sectors
>

Do you mean online defragmentation...? If so, that's for the
filesystem to do. Or do you mean that it could be used to tell the HDD
that it has extra sectors it can use to reallocate bad sectors?...

// M

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 23:52                               ` Roberto Spadim
  2011-02-22  0:25                                 ` Mathias Burén
@ 2011-02-22  0:30                                 ` Brendan Conoboy
  2011-02-22  0:36                                 ` Eric D. Mudama
  2 siblings, 0 replies; 70+ messages in thread
From: Brendan Conoboy @ 2011-02-22  0:30 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Mathias Burén, Phillip Susi, David Brown, linux-raid

On 02/21/2011 03:52 PM, Roberto Spadim wrote:
> trim tell harddisk that those block are not in use
>
> not in use block can be used by harddisk reallocation algorithm, like
> spare sectors
>
> hard disks can use TRIM command to 'create' 'good' blocks like spare sectors

I'm trying really hard to follow what this means but just can't grasp 
what you're getting at.  What scenario is there in which trim actually 
does anything for you on an HD?  I can't think of any situation where 
this makes any sense for HDs with current firmware functionality.  If a 
sector is unused, but bad, you won't know until you write to it.  If 
it's bad and you write to it, the write gets reallocated to a good spare 
sector.  Are you proposing to notify the drive what sectors are unused 
so it can check for and reallocate bad blocks before they're used again? 
  Something else?

-- 
Brendan Conoboy / Red Hat, Inc. / blc@redhat.com

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 22:52                         ` Roberto Spadim
  2011-02-21 23:41                           ` Mathias Burén
@ 2011-02-22  0:32                           ` Eric D. Mudama
  1 sibling, 0 replies; 70+ messages in thread
From: Eric D. Mudama @ 2011-02-22  0:32 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Mathias Burén, Phillip Susi, David Brown, linux-raid

On Mon, Feb 21 at 19:52, Roberto Spadim wrote:
>i don´t think so, since it´s ATA command, any ATA compatible can use
>it, it could be used for HD with badblocks and dynamic reallocation
>without problems, the harddisk don´t need a dedicated space for
>badblock. for md software we must know if devices support or not TRIM.

It's been 15 or more years since hard drives exposed their bad blocks
to the host, I don't think it'd be a good idea to revisit that
decision.

-- 
Eric D. Mudama
edmudama@bounceswoosh.org

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-21 23:52                               ` Roberto Spadim
  2011-02-22  0:25                                 ` Mathias Burén
  2011-02-22  0:30                                 ` Brendan Conoboy
@ 2011-02-22  0:36                                 ` Eric D. Mudama
  2011-02-22  1:46                                   ` Roberto Spadim
  2 siblings, 1 reply; 70+ messages in thread
From: Eric D. Mudama @ 2011-02-22  0:36 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Mathias Burén, Phillip Susi, David Brown, linux-raid

On Mon, Feb 21 at 20:52, Roberto Spadim wrote:
>trim tell harddisk that those block are not in use

yes

>not in use block can be used by harddisk reallocation algorithm, like
>spare sectors

no, because the host may immediately write to a trim'd sector

The spares in an HDD can never be accessed outside of special tools,
they're swap-in replacements for regions of the media that have
developed defects.

>hard disks can use TRIM command to 'create' 'good' blocks like spare sectors

this doesn't make sense to me


-- 
Eric D. Mudama
edmudama@bounceswoosh.org


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-22  0:36                                 ` Eric D. Mudama
@ 2011-02-22  1:46                                   ` Roberto Spadim
  2011-02-22  1:52                                     ` Mathias Burén
  0 siblings, 1 reply; 70+ messages in thread
From: Roberto Spadim @ 2011-02-22  1:46 UTC (permalink / raw)
  To: Eric D. Mudama; +Cc: Mathias Burén, Phillip Susi, David Brown, linux-raid

if it make sense on ssd, harddisk make sense too, it's a block device
like ssd, the diference of ssd/harddisk? access time,
bytes(bits)/block, life time
bad block exist in ssd and harddisk, ssd can realloc online, some harddisks too


> no, because the host may immediately write to a trim'd sector
yes, filesystem know where exists a unused sector
if device (harddisk/ssd) know and have a reallocation algorithm, it
can realloc without telling filesystem to do it (that's why TRIM is
interesting)
since today ssd use NAND (not NOR) the block size isn't 1 bit like a
harddisk head. trim for harddisk only make sense for badblock
reallocation

--------------------------
getting back to the first question, can MD support trim? yes/no/not
now/some levels and layouts only?


2011/2/21 Eric D. Mudama <edmudama@bounceswoosh.org>:
> On Mon, Feb 21 at 20:52, Roberto Spadim wrote:
>>
>> trim tell harddisk that those block are not in use
>
> yes
>
>> not in use block can be used by harddisk reallocation algorithm, like
>> spare sectors
>
> no, because the host may immediately write to a trim'd sector
>
> The spares in an HDD can never be accessed outside of special tools,
> they're swap-in replacements for regions of the media that have
> developed defects.
>
>> hard disks can use TRIM command to 'create' 'good' blocks like spare
>> sectors
>
> this doesn't make sense to me
>
>
> --
> Eric D. Mudama
> edmudama@bounceswoosh.org
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-22  1:46                                   ` Roberto Spadim
@ 2011-02-22  1:52                                     ` Mathias Burén
  2011-02-22  1:55                                       ` Roberto Spadim
  0 siblings, 1 reply; 70+ messages in thread
From: Mathias Burén @ 2011-02-22  1:52 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Eric D. Mudama, Phillip Susi, David Brown, linux-raid

On 22 February 2011 01:46, Roberto Spadim <roberto@spadim.com.br> wrote:
> if it make sense on ssd, harddisk make sense too, it's a block device
> like ssd, the diference of ssd/harddisk? access time,
> bytes(bits)/block, life time
> bad block exist in ssd and harddisk, ssd can realloc online, some harddisks too
>
>> no, because the host may immediately write to a trim'd sector
> yes, filesystem know where exists a unused sector
> if device (harddisk/ssd) know and have a reallocation algorithm, it
> can realloc without telling filesystem to do it (that's why TRIM is
> interesting)
> since today ssd use NAND (not NOR) the block size isn't 1 bit like a
> harddisk head. trim for harddisk only make sense for badblock
> reallocation
> --------------------------
> getting back to the first question, can MD support trim? yes/no/not
> now/some levels and layouts only?
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial

This explains a bit why trim is good for SSDs and has nothing to do
with harddrives at all, since they use spinning platters and not
chips. http://www.anandtech.com/show/2738/10

// Mathias

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-22  1:52                                     ` Mathias Burén
@ 2011-02-22  1:55                                       ` Roberto Spadim
  2011-02-22  2:01                                         ` Eric D. Mudama
                                                           ` (2 more replies)
  0 siblings, 3 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-22  1:55 UTC (permalink / raw)
  To: Mathias Burén; +Cc: Eric D. Mudama, Phillip Susi, David Brown, linux-raid

it can be used for badblock reallocation if harddisk have it
a harddisk is near to NOR ssd with variable accesstime, if head is
near sector to be read/write accesstime is small, if sector is far
from head, access time increase (normaly <=1 disk revolution if head
control system is good, for 7200rpm 1revolution is near to 8.33ms)

2011/2/21 Mathias Burén <mathias.buren@gmail.com>:
> On 22 February 2011 01:46, Roberto Spadim <roberto@spadim.com.br> wrote:
>> if it make sense on ssd, harddisk make sense too, it's a block device
>> like ssd, the diference of ssd/harddisk? access time,
>> bytes(bits)/block, life time
>> bad block exist in ssd and harddisk, ssd can realloc online, some harddisks too
>>
>>> no, because the host may immediately write to a trim'd sector
>> yes, filesystem know where exists a unused sector
>> if device (harddisk/ssd) know and have a reallocation algorithm, it
>> can realloc without telling filesystem to do it (that's why TRIM is
>> interesting)
>> since today ssd use NAND (not NOR) the block size isn't 1 bit like a
>> harddisk head. trim for harddisk only make sense for badblock
>> reallocation
>> --------------------------
>> getting back to the first question, can MD support trim? yes/no/not
>> now/some levels and layouts only?
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>
> This explains a bit why trim is good for SSDs and has nothing to do
> with harddrives at all, since they use spinning platters and not
> chips. http://www.anandtech.com/show/2738/10
>
> // Mathias
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-22  1:55                                       ` Roberto Spadim
@ 2011-02-22  2:01                                         ` Eric D. Mudama
  2011-02-22  2:02                                         ` Mikael Abrahamsson
  2011-02-22  2:38                                         ` Phillip Susi
  2 siblings, 0 replies; 70+ messages in thread
From: Eric D. Mudama @ 2011-02-22  2:01 UTC (permalink / raw)
  To: Roberto Spadim
  Cc: Mathias Burén, Eric D. Mudama, Phillip Susi, David Brown,
	linux-raid

On Mon, Feb 21 at 22:55, Roberto Spadim wrote:
>it can be used for badblock reallocation if harddisk have it
>a harddisk is near to NOR ssd with variable accesstime, if head is
>near sector to be read/write accesstime is small, if sector is far
>from head, access time increase (normaly <=1 disk revolution if head
>control system is good, for 7200rpm 1revolution is near to 8.33ms)

Hard disks do not expose their defect information/remappings.  They
present a defect-free logical region to the host.

Optimizing for a few hundred thousand remapped sectors across the LBA
range of ~6 billion LBAs on a 3TB drive isn't worth the effort or code
complexity in most cases.

I still don't see how TRIM helps a rotating drive.

-- 
Eric D. Mudama
edmudama@bounceswoosh.org


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-22  1:55                                       ` Roberto Spadim
  2011-02-22  2:01                                         ` Eric D. Mudama
@ 2011-02-22  2:02                                         ` Mikael Abrahamsson
  2011-02-22  2:22                                           ` Guy Watkins
  2011-02-22  2:38                                         ` Phillip Susi
  2 siblings, 1 reply; 70+ messages in thread
From: Mikael Abrahamsson @ 2011-02-22  2:02 UTC (permalink / raw)
  To: linux-raid

On Mon, 21 Feb 2011, Roberto Spadim wrote:

> it can be used for badblock reallocation if harddisk have it a harddisk 
> is near to NOR ssd with variable accesstime, if head is near sector to 
> be read/write accesstime is small, if sector is far from head, access 
> time increase (normaly <=1 disk revolution if head control system is 
> good, for 7200rpm 1revolution is near to 8.33ms)

Could we please stop this discussion. If you think HDDs should have this 
kind of bad sector reallocation scheme, please go to the HDD manufacturers 
and lobby to them. It is not on-topic for linux-raid ml.

-- 
Mikael Abrahamsson    email: swmike@swm.pp.se

^ permalink raw reply	[flat|nested] 70+ messages in thread

* RE: SSD - TRIM command
  2011-02-22  2:02                                         ` Mikael Abrahamsson
@ 2011-02-22  2:22                                           ` Guy Watkins
  2011-02-22  2:27                                             ` Roberto Spadim
  0 siblings, 1 reply; 70+ messages in thread
From: Guy Watkins @ 2011-02-22  2:22 UTC (permalink / raw)
  To: 'Mikael Abrahamsson', linux-raid

} -----Original Message-----
} From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
} owner@vger.kernel.org] On Behalf Of Mikael Abrahamsson
} Sent: Monday, February 21, 2011 9:02 PM
} To: linux-raid@vger.kernel.org
} Subject: Re: SSD - TRIM command
} 
} On Mon, 21 Feb 2011, Roberto Spadim wrote:
} 
} > it can be used for badblock reallocation if harddisk have it a harddisk
} > is near to NOR ssd with variable accesstime, if head is near sector to
} > be read/write accesstime is small, if sector is far from head, access
} > time increase (normaly <=1 disk revolution if head control system is
} > good, for 7200rpm 1revolution is near to 8.33ms)
} 
} Could we please stop this discussion. If you think HDDs should have this
} kind of bad sector reallocation scheme, please go to the HDD manufacturers
} and lobby to them. It is not on-topic for linux-raid ml.
} 
} --
} Mikael Abrahamsson    email: swmike@swm.pp.se

What about tape drives?  :)


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-22  2:22                                           ` Guy Watkins
@ 2011-02-22  2:27                                             ` Roberto Spadim
  2011-02-22  3:45                                               ` NeilBrown
  0 siblings, 1 reply; 70+ messages in thread
From: Roberto Spadim @ 2011-02-22  2:27 UTC (permalink / raw)
  To: Guy Watkins; +Cc: Mikael Abrahamsson, linux-raid

tape drive = harddisk with only one head, the head can't move, only
the tape (disk/plate or any other name you want)

could we get back and answer the main question?
--------------------------
getting back to the first question, can MD support trim? yes/no/not
now/some levels and layouts only?


2011/2/21 Guy Watkins <linux-raid@watkins-home.com>:
> } -----Original Message-----
> } From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> } owner@vger.kernel.org] On Behalf Of Mikael Abrahamsson
> } Sent: Monday, February 21, 2011 9:02 PM
> } To: linux-raid@vger.kernel.org
> } Subject: Re: SSD - TRIM command
> }
> } On Mon, 21 Feb 2011, Roberto Spadim wrote:
> }
> } > it can be used for badblock reallocation if harddisk have it a harddisk
> } > is near to NOR ssd with variable accesstime, if head is near sector to
> } > be read/write accesstime is small, if sector is far from head, access
> } > time increase (normaly <=1 disk revolution if head control system is
> } > good, for 7200rpm 1revolution is near to 8.33ms)
> }
> } Could we please stop this discussion. If you think HDDs should have this
> } kind of bad sector reallocation scheme, please go to the HDD manufacturers
> } and lobby to them. It is not on-topic for linux-raid ml.
> }
> } --
> } Mikael Abrahamsson    email: swmike@swm.pp.se
>
> What about tape drives?  :)
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-22  1:55                                       ` Roberto Spadim
  2011-02-22  2:01                                         ` Eric D. Mudama
  2011-02-22  2:02                                         ` Mikael Abrahamsson
@ 2011-02-22  2:38                                         ` Phillip Susi
  2011-02-22  3:29                                           ` Roberto Spadim
  2 siblings, 1 reply; 70+ messages in thread
From: Phillip Susi @ 2011-02-22  2:38 UTC (permalink / raw)
  To: Roberto Spadim
  Cc: Mathias Burén, Eric D. Mudama, David Brown, linux-raid

On 02/21/2011 08:55 PM, Roberto Spadim wrote:
> it can be used for badblock reallocation if harddisk have it
> a harddisk is near to NOR ssd with variable accesstime, if head is
> near sector to be read/write accesstime is small, if sector is far
> from head, access time increase (normaly<=1 disk revolution if head
> control system is good, for 7200rpm 1revolution is near to 8.33ms)

Bad blocks are only reallocated when you write to them.  Since they are 
bad, you can't read the previous contents anyway, so it does not matter 
whether the OS cared about it before or not.

You seem to not understand the fundamental purpose of TRIM.  Hard disks 
only reallocate blocks when they go bad.  SSDs move blocks around all 
the time.  That process can be optimized if the drive knows that the OS 
does not care about certain blocks.  Hard drives don't do this, so they 
have no reason to support TRIM.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-22  2:38                                         ` Phillip Susi
@ 2011-02-22  3:29                                           ` Roberto Spadim
  2011-02-22  3:42                                             ` Roberto Spadim
  2011-02-22  4:04                                             ` Phillip Susi
  0 siblings, 2 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-22  3:29 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Mathias Burén, Eric D. Mudama, David Brown, linux-raid

getting off topic...
----
they have - reallocation
> Bad blocks are only reallocated when you write to them.  Since they are bad,
> you can't read the previous contents anyway, so it does not matter whether
> the OS cared about it before or not.

when you write, if bad, mark block as bad. how? internal disk memory,
spare blocks. it's a device level problem, if device can't correct
move the problem to filesystem level.

what device level could do? use a 'good block' (if exists) => dynamic
reallocation
'good block' = block not in use by filesystem, not marked as bad, can
be used by realloc

with trim, you can inform device firmware what blocks are not in use
by filesystem, if harddisk have reallocation it can use 'good blocks'
to store blocks that was realloc on badblock errors.

why implement it? if you have 11111filesystems mounted with bad blocks
at same time you will have >=11111 iops to repair this error at
filesystem level. if device can correct you don't need to waste cpu
and memory at filesystem

------
any layer between ATA and [plate,NAND flash,NOR flash] can be
implemented by harddisk/ssd firmware
some layers that can be implemented: online reallocation, queue,
online encrypt/decrypt, online compress/decompress and others, some
ssd have optimizations to get better life time and write/read
performace
how to 'tune' these algorithms? ATA commands, SCSI or anyother
protocol that support tune

why trim? inform harddisk/ssd what block isn't in use

what harddisk/ssd could do with trim information?
dynamic reallocation (badblocks), any other operation that need not
used blocks (some algorithms use it to get better read/write
performace)
on devices with byte read/write level (NAND flash) we could write to
one timmed block without reading the block and write again, NOR flash
and harddisk don't need this they work with bits not bytes/blocks
why send a error to filesystem if it can be corrected at device level.
just send error when can't correct it.


2011/2/21 Phillip Susi <psusi@cfl.rr.com>:
> On 02/21/2011 08:55 PM, Roberto Spadim wrote:
>>
>> it can be used for badblock reallocation if harddisk have it
>> a harddisk is near to NOR ssd with variable accesstime, if head is
>> near sector to be read/write accesstime is small, if sector is far
>> from head, access time increase (normaly<=1 disk revolution if head
>> control system is good, for 7200rpm 1revolution is near to 8.33ms)
>
> Bad blocks are only reallocated when you write to them.  Since they are bad,
> you can't read the previous contents anyway, so it does not matter whether
> the OS cared about it before or not.
>
> You seem to not understand the fundamental purpose of TRIM.  Hard disks only
> reallocate blocks when they go bad.  SSDs move blocks around all the time.
>  That process can be optimized if the drive knows that the OS does not care
> about certain blocks.  Hard drives don't do this, so they have no reason to
> support TRIM.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-22  3:29                                           ` Roberto Spadim
@ 2011-02-22  3:42                                             ` Roberto Spadim
  2011-02-22  4:04                                             ` Phillip Susi
  1 sibling, 0 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-22  3:42 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Mathias Burén, Eric D. Mudama, David Brown, linux-raid

off topic again...

continue with the idea of optimizations, the last otimization we could
have is implement a filesystem at harddisk
it could implement all filesystem functions, no device function, it
could have many more information about data, not only block 'in
use'/'not in use'. it could understand: file starting at block x
ending at block y, with information w, accestime z, etc etc. it could
be more intelligent than a raw device. in others words, it's a
fileserver...

why implement algorithms at device level? today harddisk processors
(fpga, arm processors, others) have a lot of cpu power not in use, why
not use it? that's why we send trim to device, if it's a harddisk or
ssd or anyother pseudo/real device no problem, we sent the trim
command to otimize it

----------------
getting out of off topic,

please stop sending 'i think it's not a performace feature, it don't
need be implemented in device level', let's implement all functions
that device level could allow (ATA/SCSI specifications or any other)
and optimize when possible
checking neil md roadmap, badblock work will be very good for md
devices, it's a good optimization for raid1 since mirror will only
fail when many blocks fail



can we implement TRIM at MD level? it's a good feature to implement?
we will have a lot of work to implement it?
my opnion
we can, on some raid levels
it's a good feature
we will have a lot of work to implement and test


any answer from raid developers?


-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-22  2:27                                             ` Roberto Spadim
@ 2011-02-22  3:45                                               ` NeilBrown
  2011-02-22  4:37                                                 ` Roberto Spadim
  0 siblings, 1 reply; 70+ messages in thread
From: NeilBrown @ 2011-02-22  3:45 UTC (permalink / raw)
  To: Roberto Spadim; +Cc: Guy Watkins, Mikael Abrahamsson, linux-raid

On Mon, 21 Feb 2011 23:27:26 -0300 Roberto Spadim <roberto@spadim.com.br>
wrote:

> tape drive = harddisk with only one head, the head can't move, only
> the tape (disk/plate or any other name you want)
> 
> could we get back and answer the main question?
> --------------------------
> getting back to the first question, can MD support trim? yes/no/not
> now/some levels and layouts only?
> 

MD currently doesn't accept 'discard' requests.

RAID0 and LINEAR could be made to accept 'discard' if any
member device accepted 'discard'.  Patches welcome.

Other levels need md to know not to try to resync/recover regions that
have been discarded.  See "non-sync bitmap" section of the recent
md roadmap.

NeilBrown

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-22  3:29                                           ` Roberto Spadim
  2011-02-22  3:42                                             ` Roberto Spadim
@ 2011-02-22  4:04                                             ` Phillip Susi
  2011-02-22  4:30                                               ` Roberto Spadim
  1 sibling, 1 reply; 70+ messages in thread
From: Phillip Susi @ 2011-02-22  4:04 UTC (permalink / raw)
  To: Roberto Spadim
  Cc: Mathias Burén, Eric D. Mudama, David Brown, linux-raid

On 02/21/2011 10:29 PM, Roberto Spadim wrote:
> what device level could do? use a 'good block' (if exists) =>  dynamic
> reallocation
> 'good block' = block not in use by filesystem, not marked as bad, can
> be used by realloc

No.  It can only use blocks reserved for spares at manufacture time.  It 
can not use any old block that the fs is not using at the time, because 
the fs may choose to use it in the future.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-22  4:04                                             ` Phillip Susi
@ 2011-02-22  4:30                                               ` Roberto Spadim
  2011-02-22 14:45                                                 ` Phillip Susi
  0 siblings, 1 reply; 70+ messages in thread
From: Roberto Spadim @ 2011-02-22  4:30 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Mathias Burén, Eric D. Mudama, David Brown, linux-raid

it can't because today filesystem (exclude ext4 and swap) don't use
trim command to tell device what block isn't in use

2011/2/22 Phillip Susi <psusi@cfl.rr.com>:
> On 02/21/2011 10:29 PM, Roberto Spadim wrote:
>>
>> what device level could do? use a 'good block' (if exists) =>  dynamic
>> reallocation
>> 'good block' = block not in use by filesystem, not marked as bad, can
>> be used by realloc
>
> No.  It can only use blocks reserved for spares at manufacture time.  It can
> not use any old block that the fs is not using at the time, because the fs
> may choose to use it in the future.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-22  3:45                                               ` NeilBrown
@ 2011-02-22  4:37                                                 ` Roberto Spadim
  0 siblings, 0 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-22  4:37 UTC (permalink / raw)
  To: NeilBrown; +Cc: Guy Watkins, Mikael Abrahamsson, linux-raid

thanks neil, i will try to read and make some patch, my focus is ssd
optimization, at hardware level (hw raid) i didn't see any good
improvement
good = a good read balance (based on queue and disk read rate), trim support

----
good read balance = round robin or another time based algorithm (can
be cpu intensive), i didn't found yet how to get queue of linux bio
(mirrors)
trim support - nothing to report, it's a 'feature request' for long
term (after badblock and others features)

-----

ps...
neil what are you thinking about badblock and layout?
for example... reading from a bad block will be internaly (md source
code) remapped to a good block? or just try read/write to another
device?

in other words we will have 'dynamic' layout?


2011/2/22 NeilBrown <neilb@suse.de>:
> On Mon, 21 Feb 2011 23:27:26 -0300 Roberto Spadim <roberto@spadim.com.br>
> wrote:
>
>> tape drive = harddisk with only one head, the head can't move, only
>> the tape (disk/plate or any other name you want)
>>
>> could we get back and answer the main question?
>> --------------------------
>> getting back to the first question, can MD support trim? yes/no/not
>> now/some levels and layouts only?
>>
>
> MD currently doesn't accept 'discard' requests.
>
> RAID0 and LINEAR could be made to accept 'discard' if any
> member device accepted 'discard'.  Patches welcome.
>
> Other levels need md to know not to try to resync/recover regions that
> have been discarded.  See "non-sync bitmap" section of the recent
> md roadmap.
>
> NeilBrown
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-22  4:30                                               ` Roberto Spadim
@ 2011-02-22 14:45                                                 ` Phillip Susi
  2011-02-22 17:15                                                   ` Roberto Spadim
  0 siblings, 1 reply; 70+ messages in thread
From: Phillip Susi @ 2011-02-22 14:45 UTC (permalink / raw)
  To: Roberto Spadim
  Cc: Mathias Burén, Eric D. Mudama, David Brown, linux-raid

On 2/21/2011 11:30 PM, Roberto Spadim wrote:
> it can't because today filesystem (exclude ext4 and swap) don't use
> trim command to tell device what block isn't in use

You aren't getting it.  The fs can tell the drive all it wants: the
drive does not care.  It has nothing useful it can do with that information.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: SSD - TRIM command
  2011-02-22 14:45                                                 ` Phillip Susi
@ 2011-02-22 17:15                                                   ` Roberto Spadim
  0 siblings, 0 replies; 70+ messages in thread
From: Roberto Spadim @ 2011-02-22 17:15 UTC (permalink / raw)
  To: Phillip Susi; +Cc: Mathias Burén, Eric D. Mudama, David Brown, linux-raid

i think it have. ssd have, why not hd? hd can implement a inteligent
layer to speedup writes/reads without telling to fs

2011/2/22 Phillip Susi <psusi@cfl.rr.com>:
> On 2/21/2011 11:30 PM, Roberto Spadim wrote:
>> it can't because today filesystem (exclude ext4 and swap) don't use
>> trim command to tell device what block isn't in use
>
> You aren't getting it.  The fs can tell the drive all it wants: the
> drive does not care.  It has nothing useful it can do with that information.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2011-02-22 17:15 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-02-07 20:07 SSD - TRIM command Roberto Spadim
2011-02-08 17:37 ` maurice
2011-02-08 18:31   ` Roberto Spadim
     [not found]     ` <AANLkTik5SumqyTN5LZVntna8nunvPe7v38TSFf9eCfcU@mail.gmail.com>
2011-02-08 20:50       ` Roberto Spadim
2011-02-08 21:18         ` maurice
2011-02-08 21:33           ` Roberto Spadim
2011-02-09  7:44   ` Stan Hoeppner
2011-02-09  9:05     ` Eric D. Mudama
2011-02-09 15:45       ` Chris Worley
2011-02-09 13:29     ` David Brown
2011-02-09 14:39       ` Roberto Spadim
2011-02-09 15:00         ` Scott E. Armitage
2011-02-09 15:52           ` Chris Worley
2011-02-09 19:15             ` Doug Dumitru
2011-02-09 19:22               ` Roberto Spadim
2011-02-09 16:19           ` Eric D. Mudama
2011-02-09 16:28             ` Scott E. Armitage
2011-02-09 17:17               ` Eric D. Mudama
2011-02-09 18:18                 ` Roberto Spadim
2011-02-09 18:24                   ` Piergiorgio Sartor
2011-02-09 18:30                     ` Roberto Spadim
2011-02-09 18:38                       ` Piergiorgio Sartor
2011-02-09 18:46                         ` Roberto Spadim
2011-02-09 18:52                           ` Roberto Spadim
2011-02-09 19:13                           ` Piergiorgio Sartor
2011-02-09 19:16                             ` Roberto Spadim
2011-02-09 19:21                               ` Piergiorgio Sartor
2011-02-09 19:27                                 ` Roberto Spadim
2011-02-21 18:24             ` Phillip Susi
2011-02-21 18:30               ` Roberto Spadim
2011-02-09 15:49         ` David Brown
2011-02-21 18:20           ` Phillip Susi
2011-02-21 18:25             ` Roberto Spadim
2011-02-21 18:34               ` Phillip Susi
2011-02-21 18:48                 ` Roberto Spadim
2011-02-21 18:51               ` Mathias Burén
2011-02-21 19:32                 ` Roberto Spadim
2011-02-21 19:38                   ` Mathias Burén
2011-02-21 19:39                     ` Mathias Burén
2011-02-21 19:43                       ` Roberto Spadim
2011-02-21 20:45                       ` Phillip Susi
2011-02-21 19:39                   ` Roberto Spadim
2011-02-21 19:51                     ` Doug Dumitru
2011-02-21 19:57                       ` Roberto Spadim
2011-02-21 20:47                     ` Phillip Susi
2011-02-21 21:02                       ` Mathias Burén
2011-02-21 22:52                         ` Roberto Spadim
2011-02-21 23:41                           ` Mathias Burén
2011-02-21 23:42                             ` Mathias Burén
2011-02-21 23:52                               ` Roberto Spadim
2011-02-22  0:25                                 ` Mathias Burén
2011-02-22  0:30                                 ` Brendan Conoboy
2011-02-22  0:36                                 ` Eric D. Mudama
2011-02-22  1:46                                   ` Roberto Spadim
2011-02-22  1:52                                     ` Mathias Burén
2011-02-22  1:55                                       ` Roberto Spadim
2011-02-22  2:01                                         ` Eric D. Mudama
2011-02-22  2:02                                         ` Mikael Abrahamsson
2011-02-22  2:22                                           ` Guy Watkins
2011-02-22  2:27                                             ` Roberto Spadim
2011-02-22  3:45                                               ` NeilBrown
2011-02-22  4:37                                                 ` Roberto Spadim
2011-02-22  2:38                                         ` Phillip Susi
2011-02-22  3:29                                           ` Roberto Spadim
2011-02-22  3:42                                             ` Roberto Spadim
2011-02-22  4:04                                             ` Phillip Susi
2011-02-22  4:30                                               ` Roberto Spadim
2011-02-22 14:45                                                 ` Phillip Susi
2011-02-22 17:15                                                   ` Roberto Spadim
2011-02-22  0:32                           ` Eric D. Mudama

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.