* SSD - TRIM command @ 2011-02-07 20:07 Roberto Spadim 2011-02-08 17:37 ` maurice 0 siblings, 1 reply; 70+ messages in thread From: Roberto Spadim @ 2011-02-07 20:07 UTC (permalink / raw) To: Linux-RAID hi guys, could md send TRIM command to ssd? using ext4 discart mount option? if i mix ssd and hd, could this TRIM be rewrite to non TRIM compatible disks? -- Roberto Spadim Spadim Technology / SPAEmpresarial ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-07 20:07 SSD - TRIM command Roberto Spadim @ 2011-02-08 17:37 ` maurice 2011-02-08 18:31 ` Roberto Spadim 2011-02-09 7:44 ` Stan Hoeppner 0 siblings, 2 replies; 70+ messages in thread From: maurice @ 2011-02-08 17:37 UTC (permalink / raw) To: Roberto Spadim; +Cc: linux-raid On 2/7/2011 1:07 PM, Roberto Spadim wrote: > hi guys, could md send TRIM command to ssd? using ext4 discart mount option? > if i mix ssd and hd, could this TRIM be rewrite to non TRIM compatible disks? > I have read that using md with SSDs is not a great idea: Form the Fedora 14 documentation: "Take note as well that software RAID levels 1, 4, 5, and 6 are not recommended for use on SSDs. During the initialization stage of these RAID levels, some RAID management utilities (such as mdadm) write to all of the blocks on the storage device to ensure that checksums operate properly. This will cause the performance of the SSD to degrade quickly. " https://docs.fedoraproject.org/en-US/Fedora/14/html/Storage_Administration_Guide/newmds-ssdtuning.html -- Cheers, Maurice Hilarius eMail: /mhilarius@gmail.com/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-08 17:37 ` maurice @ 2011-02-08 18:31 ` Roberto Spadim [not found] ` <AANLkTik5SumqyTN5LZVntna8nunvPe7v38TSFf9eCfcU@mail.gmail.com> 2011-02-09 7:44 ` Stan Hoeppner 1 sibling, 1 reply; 70+ messages in thread From: Roberto Spadim @ 2011-02-08 18:31 UTC (permalink / raw) To: maurice; +Cc: linux-raid it's resync running? i don't think it's a problem... any device will die some day... ssd is faster than hd, why not use it? i'm using hp smart array p212 with 3.0 firmware, it write on all blocks too maybe a just command line option to start array without sync could help... i don't know if resync is write intensive or just write on diferent blocks, if it's just diff it's not a problem for ssd... again... i know that the 'translate' of trim command to non compatible devices is a problem for device layer not md layer, but can md send trim command to all mirrors/disks? 2011/2/8 maurice <mhilarius@gmail.com>: > On 2/7/2011 1:07 PM, Roberto Spadim wrote: >> >> hi guys, could md send TRIM command to ssd? using ext4 discart mount >> option? >> if i mix ssd and hd, could this TRIM be rewrite to non TRIM compatible >> disks? >> > I have read that using md with SSDs is not a great idea: > Form the Fedora 14 documentation: > > "Take note as well that software RAID levels 1, 4, 5, and 6 are not > recommended for use on SSDs. > During the initialization stage of these RAID levels, some RAID management > utilities (such as mdadm) > write to all of the blocks on the storage device to ensure that checksums > operate properly. > This will cause the performance of the SSD to degrade quickly. " > > https://docs.fedoraproject.org/en-US/Fedora/14/html/Storage_Administration_Guide/newmds-ssdtuning.html > > > -- > Cheers, > Maurice Hilarius > eMail: /mhilarius@gmail.com/ > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
[parent not found: <AANLkTik5SumqyTN5LZVntna8nunvPe7v38TSFf9eCfcU@mail.gmail.com>]
* Re: SSD - TRIM command [not found] ` <AANLkTik5SumqyTN5LZVntna8nunvPe7v38TSFf9eCfcU@mail.gmail.com> @ 2011-02-08 20:50 ` Roberto Spadim 2011-02-08 21:18 ` maurice 0 siblings, 1 reply; 70+ messages in thread From: Roberto Spadim @ 2011-02-08 20:50 UTC (permalink / raw) To: Scott E. Armitage; +Cc: maurice, linux-raid =] now the right answer :) question: maybe in future... could we make trim compatible with md? obs: i understanded that trim is just for ssd making sectors clean without writing 000000000000-000000 at the entire sector (a ssd optimization) if we translate trim to not supported trim disks at device level could we send TRIM to all disks on md device? just a option at mdadm --assemble --allow-trim and send trim received by filesystem 2011/2/8 Scott E. Armitage <launchpad@scott.armitage.name>: > The problem as I understand it is that md treats the entire device (or > partition) as "in use" -- even if the filesystem isn't using a particular > set of blocks, those blocks must still be consistent across the array. The > SSD TRIM command is used to tell the physical drive which blocks are no > longer in use by the filesystem, so that it can optimize write operations. > Running under md, all blocks would be "used", so there would be nothing to > send with the TRIM command. > -Scott > > On Tue, Feb 8, 2011 at 1:31 PM, Roberto Spadim <roberto@spadim.com.br> > wrote: >> >> it's resync running? >> i don't think it's a problem... >> any device will die some day... >> ssd is faster than hd, why not use it? >> i'm using hp smart array p212 with 3.0 firmware, it write on all blocks >> too >> maybe a just command line option to start array without sync could help... >> i don't know if resync is write intensive or just write on diferent >> blocks, if it's just diff it's not a problem for ssd... >> >> again... >> i know that the 'translate' of trim command to non compatible devices >> is a problem for device layer not md layer, but can md send trim >> command to all mirrors/disks? >> >> 2011/2/8 maurice <mhilarius@gmail.com>: >> > On 2/7/2011 1:07 PM, Roberto Spadim wrote: >> >> >> >> hi guys, could md send TRIM command to ssd? using ext4 discart mount >> >> option? >> >> if i mix ssd and hd, could this TRIM be rewrite to non TRIM compatible >> >> disks? >> >> >> > I have read that using md with SSDs is not a great idea: >> > Form the Fedora 14 documentation: >> > >> > "Take note as well that software RAID levels 1, 4, 5, and 6 are not >> > recommended for use on SSDs. >> > During the initialization stage of these RAID levels, some RAID >> > management >> > utilities (such as mdadm) >> > write to all of the blocks on the storage device to ensure that >> > checksums >> > operate properly. >> > This will cause the performance of the SSD to degrade quickly. " >> > >> > >> > https://docs.fedoraproject.org/en-US/Fedora/14/html/Storage_Administration_Guide/newmds-ssdtuning.html >> > >> > >> > -- >> > Cheers, >> > Maurice Hilarius >> > eMail: /mhilarius@gmail.com/ >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> >> >> >> -- >> Roberto Spadim >> Spadim Technology / SPAEmpresarial >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > Scott Armitage, B.A.Sc., M.A.Sc. candidate > Space Flight Laboratory > University of Toronto Institute for Aerospace Studies > 4925 Dufferin Street, Toronto, Ontario, Canada, M3H 5T6 > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-08 20:50 ` Roberto Spadim @ 2011-02-08 21:18 ` maurice 2011-02-08 21:33 ` Roberto Spadim 0 siblings, 1 reply; 70+ messages in thread From: maurice @ 2011-02-08 21:18 UTC (permalink / raw) To: Roberto Spadim; +Cc: linux-raid On 2/8/2011 1:50 PM, Roberto Spadim wrote: > =] now the right answer :) > question: maybe in future... could we make trim compatible with md? > I hope that future is "real soon now" MLC SSD is now starting to appear in the "Enterprise space. Companies like Pliant have released products for that. Typical SAN RAID controllers have specific performance limits which can be saturated with a not very large number of SSDs. To get higher IOs we need a more powerful RAID engine A typical 48 core, 128GB RAM box using AMD CPUs and 4 SAS HBAs to disk JBD cases can be a ridiculously power RAID engine for a reasonable cost ( at least reasonable compered to NetApp, EMC, Hitachi SANs, etc) with a large number of devices. BUT: To use SSDs in the design we need mdadm to be more SSD friendly. -- Cheers, Maurice Hilarius eMail: /mhilarius@gmail.com/ ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-08 21:18 ` maurice @ 2011-02-08 21:33 ` Roberto Spadim 0 siblings, 0 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-08 21:33 UTC (permalink / raw) To: maurice; +Cc: linux-raid yeah, we will make it :) maurice, i was making some raid1 new read balance, could you help me benchmark it? it's kernel 2.6.37 based, here is the code: www.spadim.com.br/raid1/ there's raid1.new.c raid1.new.h, raid1.old.c raid1.old.h the old and new kernel source code for user space we can now make this: /sys/block/mdXXX/md/read_balance_mode /sys/block/mdXXX/md/read_balance_stripe_shift /sys/block/mdXXX/md/read_balance_config at read_balance_mode we have now 4 modes: near_head (default, working without problems, very good for hd only, ssd should other mode) round_robin (normal round robin, with per mirror counter (can make round_robin) after some reads, very good for ssd only array) stripe (like raid0, with read_balance_stripe_shift we can shift the sector number with " >> " command and after select the disk with % raid_disks, very good for hd or ssd, a good number for shift is >=5, but not so much since this can make math formula use only the first disk) time_based (based on head positioning time + read time + i/o queue time, selecting the best disk to read, work with ssd and hd very well, current implementation don't have i/o queue time but i will study and put it to work too) all configurations for round_robin and time_based as sent to kernel by read_balance_config type cat /sys/block/mdxxx/md/read_balance_config and send per disk the parameters the first line on cat command is the parameters list, after | is read only variables, you can't change it, just read use echo "0 0 0 0 0 0 0 0 0 0"> read_balance_config to change values thanks =] 2011/2/8 maurice <mhilarius@gmail.com>: > On 2/8/2011 1:50 PM, Roberto Spadim wrote: >> >> =] now the right answer :) >> question: maybe in future... could we make trim compatible with md? >> > I hope that future is "real soon now" > MLC SSD is now starting to appear in the "Enterprise space. > Companies like Pliant have released products for that. > Typical SAN RAID controllers have specific performance limits which can be > saturated with a not very large number of SSDs. > To get higher IOs we need a more powerful RAID engine > A typical 48 core, 128GB RAM box using AMD CPUs and 4 SAS HBAs to disk JBD > cases can be a ridiculously power RAID engine for a > reasonable cost ( at least reasonable compered to NetApp, EMC, Hitachi SANs, > etc) with a large number of devices. > > BUT: To use SSDs in the design we need mdadm to be more SSD friendly. > > > -- > Cheers, > Maurice Hilarius > eMail: /mhilarius@gmail.com/ > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-08 17:37 ` maurice 2011-02-08 18:31 ` Roberto Spadim @ 2011-02-09 7:44 ` Stan Hoeppner 2011-02-09 9:05 ` Eric D. Mudama 2011-02-09 13:29 ` David Brown 1 sibling, 2 replies; 70+ messages in thread From: Stan Hoeppner @ 2011-02-09 7:44 UTC (permalink / raw) To: maurice; +Cc: Roberto Spadim, linux-raid maurice put forth on 2/8/2011 11:37 AM: > On 2/7/2011 1:07 PM, Roberto Spadim wrote: >> hi guys, could md send TRIM command to ssd? using ext4 discart mount option? >> if i mix ssd and hd, could this TRIM be rewrite to non TRIM compatible disks? >> > I have read that using md with SSDs is not a great idea: > Form the Fedora 14 documentation: Using any RAID level but pure striping with SSDs is a bad idea, for the exact reason in that documentation: excessive writes. SSD - Solid State Drive Note the first two words. Solid state device = integrated circuit. ICs, including those comprised of flash memory transistors, have totally different failure modes than spinning rust disks, SRDs, or "plain old mechanical hard drives". RAID'ing SSDs with any data duplicative RAID level, any mirroring or parity RAID levels, _decreases_ the life of all SSDs in the array. This is the opposite effect of what you want: reliability and lifespan. People have a misconception that SSDs are like hard disks. The only thing they have in common is that both store data and they can have a similar interface (SATA). The similarities end there. RAID is not a proper method of extending the life of SSD storage nor protecting the data on SSD devices. If you want to pool all the capacity of multiple SSDs into a single logical device, use RAID 0 or spanning, _not_ a mirror or parity RAID level. If you want to protect the data, snap it to a single large SATA drive, or a D2D backup array, and then to tape. -- Stan ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 7:44 ` Stan Hoeppner @ 2011-02-09 9:05 ` Eric D. Mudama 2011-02-09 15:45 ` Chris Worley 2011-02-09 13:29 ` David Brown 1 sibling, 1 reply; 70+ messages in thread From: Eric D. Mudama @ 2011-02-09 9:05 UTC (permalink / raw) To: Stan Hoeppner; +Cc: maurice, Roberto Spadim, linux-raid On Wed, Feb 9 at 1:44, Stan Hoeppner wrote: >maurice put forth on 2/8/2011 11:37 AM: >> On 2/7/2011 1:07 PM, Roberto Spadim wrote: >>> hi guys, could md send TRIM command to ssd? using ext4 discart mount option? >>> if i mix ssd and hd, could this TRIM be rewrite to non TRIM compatible disks? >>> >> I have read that using md with SSDs is not a great idea: >> Form the Fedora 14 documentation: > >Using any RAID level but pure striping with SSDs is a bad idea, for the exact >reason in that documentation: excessive writes. If I mirror two SSDs, and write 1 unit of data to the mirror, each element of the mirror should see 1 unit of write. How does this perform excessive writes, compared to the same workload applied to a single SSD? I agree that in aggregate we've now done 2 units worth of writes, however, in a mirror case, we're protecting against both whole-device failure and single-sector failure modes, so hardly seems like a bad idea in all applications. -- Eric D. Mudama edmudama@bounceswoosh.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 9:05 ` Eric D. Mudama @ 2011-02-09 15:45 ` Chris Worley 0 siblings, 0 replies; 70+ messages in thread From: Chris Worley @ 2011-02-09 15:45 UTC (permalink / raw) To: Eric D. Mudama; +Cc: Stan Hoeppner, maurice, Roberto Spadim, linux-raid On Wed, Feb 9, 2011 at 2:05 AM, Eric D. Mudama <edmudama@bounceswoosh.org> wrote: <snip> > I agree that in aggregate we've now done 2 units worth of writes, > however, in a mirror case, we're protecting against both whole-device > failure and single-sector failure modes, so hardly seems like a bad > idea in all applications. Yes, just pass through the discards, and let us mirror. Sync'ing a new drive w/o writing (only what needs to be written) is really trivial (needs no extra saved metadata/LBA bitmaps or ability to query the device for active sectors), if folks would give it some thought and quit saying it shouldn't be done. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 7:44 ` Stan Hoeppner 2011-02-09 9:05 ` Eric D. Mudama @ 2011-02-09 13:29 ` David Brown 2011-02-09 14:39 ` Roberto Spadim 1 sibling, 1 reply; 70+ messages in thread From: David Brown @ 2011-02-09 13:29 UTC (permalink / raw) To: linux-raid On 09/02/2011 08:44, Stan Hoeppner wrote: > maurice put forth on 2/8/2011 11:37 AM: >> On 2/7/2011 1:07 PM, Roberto Spadim wrote: >>> hi guys, could md send TRIM command to ssd? using ext4 discart >>> mount option? if i mix ssd and hd, could this TRIM be rewrite to >>> non TRIM compatible disks? >>> >> I have read that using md with SSDs is not a great idea: Form the >> Fedora 14 documentation: > > Using any RAID level but pure striping with SSDs is a bad idea, for > the exact reason in that documentation: excessive writes. > > SSD - Solid State Drive > > Note the first two words. Solid state device = integrated circuit. > ICs, including those comprised of flash memory transistors, have > totally different failure modes than spinning rust disks, SRDs, or > "plain old mechanical hard drives". > > RAID'ing SSDs with any data duplicative RAID level, any mirroring or > parity RAID levels, _decreases_ the life of all SSDs in the array. > This is the opposite effect of what you want: reliability and > lifespan. > > People have a misconception that SSDs are like hard disks. The only > thing they have in common is that both store data and they can have a > similar interface (SATA). The similarities end there. > > RAID is not a proper method of extending the life of SSD storage nor > protecting the data on SSD devices. If you want to pool all the > capacity of multiple SSDs into a single logical device, use RAID 0 or > spanning, _not_ a mirror or parity RAID level. If you want to > protect the data, snap it to a single large SATA drive, or a D2D > backup array, and then to tape. > First off, let me agree with you that backup is important no matter what you use as your primary storage. But beyond that, you've got a basic assumption wrong here. Good quality, modern SSDs do not have write-endurance issues. It's a thing of the past. Internally, of course, the flash /does/ have endurance limits. But these are high (especially with SLC devices rather than MLC devices), and the combination of ECC, wear-levelling and redundant blocks means that you can write to these devices continuously at high speed for /years/ before endurance issues become visible by the host. An additional effect of the extensive ECC is that undetected read errors are much less likely than with hard disks - when a failure /does/ occur, you know it has occurred. Many SSD models suffer from a certain amount of performance degradation when they have been used for a while. Intel's devices were notorious for this, though apparently they are better now. But that's a speed issue, not a reliability or lifetime issue. SSDs (again, I refer to good quality modern devices - earlier models had more problems) are inherently more reliable than HDs, and have longer expected lifetimes. This means that it is often fine to put your SSDs in a RAID0 combination - you still have a greater reliability than you would with a single HDD. However, SSDs are not infallible - using redundant RAID with SSDs is a perfectly valid setup. Obviously you will have a whole disks worth of extra writes when you set up the RAID, and redundant writes means more writes, but the SSDs will handle those writes perfectly well. There is plenty of scope for md / SSD optimisation, however. Good TRIM support is just one aspect. Other points include matching stripe sizes to fit the geometry of the SSD, and taking advantage of the seek speeds of SSD (this is particularly important if you are mirroring an SSD and an HD). ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 13:29 ` David Brown @ 2011-02-09 14:39 ` Roberto Spadim 2011-02-09 15:00 ` Scott E. Armitage 2011-02-09 15:49 ` David Brown 0 siblings, 2 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-09 14:39 UTC (permalink / raw) To: David Brown; +Cc: linux-raid guys... if my ssd fail, i buy another... let's make software ok, the hardware is another problem raid1 should work with floppy disks, hard disks, ssd, nbd... that's the point make solutions for hardware mix the question is simple, could we send TRIM command to all mirrors (for stripe just disks that should receive it)? if device don't have TRIM we should translate it for a similar command, with the same READ effect (no problem if it's not atomic) the point of good read, i sent a email to maurice, and many others emails in this raid list, there's a new read balance mode for kernel 2.6.37 if you want try to benchmark it please test it: www.spadim.com.br/raid1 for me it's work very well with hd and ssd mixed array, i need more test and benchmark to neil accept it as a default feature of md the sysfs interface is poor yet, in future it should change the time based mode work, but it should have some features implemented in futures (queue time estimation) 2011/2/9 David Brown <david@westcontrol.com>: > On 09/02/2011 08:44, Stan Hoeppner wrote: >> >> maurice put forth on 2/8/2011 11:37 AM: >>> >>> On 2/7/2011 1:07 PM, Roberto Spadim wrote: >>>> >>>> hi guys, could md send TRIM command to ssd? using ext4 discart >>>> mount option? if i mix ssd and hd, could this TRIM be rewrite to >>>> non TRIM compatible disks? >>>> >>> I have read that using md with SSDs is not a great idea: Form the >>> Fedora 14 documentation: >> >> Using any RAID level but pure striping with SSDs is a bad idea, for >> the exact reason in that documentation: excessive writes. >> >> SSD - Solid State Drive >> >> Note the first two words. Solid state device = integrated circuit. >> ICs, including those comprised of flash memory transistors, have >> totally different failure modes than spinning rust disks, SRDs, or >> "plain old mechanical hard drives". >> >> RAID'ing SSDs with any data duplicative RAID level, any mirroring or >> parity RAID levels, _decreases_ the life of all SSDs in the array. >> This is the opposite effect of what you want: reliability and >> lifespan. >> >> People have a misconception that SSDs are like hard disks. The only >> thing they have in common is that both store data and they can have a >> similar interface (SATA). The similarities end there. >> >> RAID is not a proper method of extending the life of SSD storage nor >> protecting the data on SSD devices. If you want to pool all the >> capacity of multiple SSDs into a single logical device, use RAID 0 or >> spanning, _not_ a mirror or parity RAID level. If you want to >> protect the data, snap it to a single large SATA drive, or a D2D >> backup array, and then to tape. >> > > First off, let me agree with you that backup is important no matter what you > use as your primary storage. > > But beyond that, you've got a basic assumption wrong here. > > Good quality, modern SSDs do not have write-endurance issues. It's a thing > of the past. Internally, of course, the flash /does/ have endurance limits. > But these are high (especially with SLC devices rather than MLC devices), > and the combination of ECC, wear-levelling and redundant blocks means that > you can write to these devices continuously at high speed for /years/ before > endurance issues become visible by the host. An additional effect of the > extensive ECC is that undetected read errors are much less likely than with > hard disks - when a failure /does/ occur, you know it has occurred. > > Many SSD models suffer from a certain amount of performance degradation when > they have been used for a while. Intel's devices were notorious for this, > though apparently they are better now. But that's a speed issue, not a > reliability or lifetime issue. > > SSDs (again, I refer to good quality modern devices - earlier models had > more problems) are inherently more reliable than HDs, and have longer > expected lifetimes. This means that it is often fine to put your SSDs in a > RAID0 combination - you still have a greater reliability than you would with > a single HDD. > > However, SSDs are not infallible - using redundant RAID with SSDs is a > perfectly valid setup. Obviously you will have a whole disks worth of extra > writes when you set up the RAID, and redundant writes means more writes, but > the SSDs will handle those writes perfectly well. > > > There is plenty of scope for md / SSD optimisation, however. Good TRIM > support is just one aspect. Other points include matching stripe sizes to > fit the geometry of the SSD, and taking advantage of the seek speeds of SSD > (this is particularly important if you are mirroring an SSD and an HD). > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 14:39 ` Roberto Spadim @ 2011-02-09 15:00 ` Scott E. Armitage 2011-02-09 15:52 ` Chris Worley 2011-02-09 16:19 ` Eric D. Mudama 2011-02-09 15:49 ` David Brown 1 sibling, 2 replies; 70+ messages in thread From: Scott E. Armitage @ 2011-02-09 15:00 UTC (permalink / raw) To: Roberto Spadim; +Cc: David Brown, linux-raid I reiterate my previous reply that under the current md architecture, where the complete device is considered to be in use, sending TRIM commands makes little sense. AFAICT, reading back a trimmed page is not defined, since the whole idea is that the host doesn't care about what is on that page any more. The next time md comes around to corresponding trimmed pages on two SSDs, their contents may differ, and all of a sudden our array is no longer consistent. On Wed, Feb 9, 2011 at 9:39 AM, Roberto Spadim <roberto@spadim.com.br> wrote: > guys... > if my ssd fail, i buy another... > let's make software ok, the hardware is another problem > raid1 should work with floppy disks, hard disks, ssd, nbd... that's the point > make solutions for hardware mix > the question is simple, could we send TRIM command to all mirrors (for > stripe just disks that should receive it)? if device don't have TRIM > we should translate it for a similar command, with the same READ > effect (no problem if it's not atomic) > > the point of good read, i sent a email to maurice, and many others > emails in this raid list, there's a new read balance mode for kernel > 2.6.37 if you want try to benchmark it please test it: > www.spadim.com.br/raid1 > for me it's work very well with hd and ssd mixed array, i need more > test and benchmark to neil accept it as a default feature of md > the sysfs interface is poor yet, in future it should change > the time based mode work, but it should have some features implemented > in futures (queue time estimation) > > > 2011/2/9 David Brown <david@westcontrol.com>: >> On 09/02/2011 08:44, Stan Hoeppner wrote: >>> >>> maurice put forth on 2/8/2011 11:37 AM: >>>> >>>> On 2/7/2011 1:07 PM, Roberto Spadim wrote: >>>>> >>>>> hi guys, could md send TRIM command to ssd? using ext4 discart >>>>> mount option? if i mix ssd and hd, could this TRIM be rewrite to >>>>> non TRIM compatible disks? >>>>> >>>> I have read that using md with SSDs is not a great idea: Form the >>>> Fedora 14 documentation: >>> >>> Using any RAID level but pure striping with SSDs is a bad idea, for >>> the exact reason in that documentation: excessive writes. >>> >>> SSD - Solid State Drive >>> >>> Note the first two words. Solid state device = integrated circuit. >>> ICs, including those comprised of flash memory transistors, have >>> totally different failure modes than spinning rust disks, SRDs, or >>> "plain old mechanical hard drives". >>> >>> RAID'ing SSDs with any data duplicative RAID level, any mirroring or >>> parity RAID levels, _decreases_ the life of all SSDs in the array. >>> This is the opposite effect of what you want: reliability and >>> lifespan. >>> >>> People have a misconception that SSDs are like hard disks. The only >>> thing they have in common is that both store data and they can have a >>> similar interface (SATA). The similarities end there. >>> >>> RAID is not a proper method of extending the life of SSD storage nor >>> protecting the data on SSD devices. If you want to pool all the >>> capacity of multiple SSDs into a single logical device, use RAID 0 or >>> spanning, _not_ a mirror or parity RAID level. If you want to >>> protect the data, snap it to a single large SATA drive, or a D2D >>> backup array, and then to tape. >>> >> >> First off, let me agree with you that backup is important no matter what you >> use as your primary storage. >> >> But beyond that, you've got a basic assumption wrong here. >> >> Good quality, modern SSDs do not have write-endurance issues. It's a thing >> of the past. Internally, of course, the flash /does/ have endurance limits. >> But these are high (especially with SLC devices rather than MLC devices), >> and the combination of ECC, wear-levelling and redundant blocks means that >> you can write to these devices continuously at high speed for /years/ before >> endurance issues become visible by the host. An additional effect of the >> extensive ECC is that undetected read errors are much less likely than with >> hard disks - when a failure /does/ occur, you know it has occurred. >> >> Many SSD models suffer from a certain amount of performance degradation when >> they have been used for a while. Intel's devices were notorious for this, >> though apparently they are better now. But that's a speed issue, not a >> reliability or lifetime issue. >> >> SSDs (again, I refer to good quality modern devices - earlier models had >> more problems) are inherently more reliable than HDs, and have longer >> expected lifetimes. This means that it is often fine to put your SSDs in a >> RAID0 combination - you still have a greater reliability than you would with >> a single HDD. >> >> However, SSDs are not infallible - using redundant RAID with SSDs is a >> perfectly valid setup. Obviously you will have a whole disks worth of extra >> writes when you set up the RAID, and redundant writes means more writes, but >> the SSDs will handle those writes perfectly well. >> >> >> There is plenty of scope for md / SSD optimisation, however. Good TRIM >> support is just one aspect. Other points include matching stripe sizes to >> fit the geometry of the SSD, and taking advantage of the seek speeds of SSD >> (this is particularly important if you are mirroring an SSD and an HD). >> >> >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Scott Armitage, B.A.Sc., M.A.Sc. candidate Space Flight Laboratory University of Toronto Institute for Aerospace Studies 4925 Dufferin Street, Toronto, Ontario, Canada, M3H 5T6 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 15:00 ` Scott E. Armitage @ 2011-02-09 15:52 ` Chris Worley 2011-02-09 19:15 ` Doug Dumitru 2011-02-09 16:19 ` Eric D. Mudama 1 sibling, 1 reply; 70+ messages in thread From: Chris Worley @ 2011-02-09 15:52 UTC (permalink / raw) To: Scott E. Armitage; +Cc: Roberto Spadim, David Brown, linux-raid On Wed, Feb 9, 2011 at 8:00 AM, Scott E. Armitage <launchpad@scott.armitage.name> wrote: <snip> >AFAICT, reading back a trimmed page is > not defined ... and so should be assumed that reading a trimmed/nonexistant LBA off of two of the same vendor's SSD's would realize different results? ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 15:52 ` Chris Worley @ 2011-02-09 19:15 ` Doug Dumitru 2011-02-09 19:22 ` Roberto Spadim 0 siblings, 1 reply; 70+ messages in thread From: Doug Dumitru @ 2011-02-09 19:15 UTC (permalink / raw) To: Chris Worley; +Cc: Scott E. Armitage, Roberto Spadim, David Brown, linux-raid I work with SSDs arrays all the time, so I have a couple of thoughts about trim and md. 'trim' is still necessary. SandForce controllers are "better" at this, but still need free space to do their work. I had a set of SF drives drop to 22 MB/sec writes because they were full and scrambled. It takes a lot of effort to get them that messed up, but it can still happen. Trim brings them back. The bottom line is that SSDs do block re-organization on the fly and free space makes the re-org more efficient. More efficient means faster, and as importantly less wear amplification. Most SSDs (and I think the latest trim spec) are deterministic on trim'd sectors. If you trim a sector, they read that sector as zeros. This makes raid much "safer". raid/0,1,10 should be fine to echo discard commands down to the downstream drives in the bio request. It is then up to the physical device driver to turn the discard bio request into an ATA (or SCSI) trim. Most block devices don't seem to understand discard requests yet, but this will get better over time. raid/4,5,6 is a lot more complicated. With raid/4,5 with an even number of drives, you can trim whole stripes safely. Pieces of stripes get interesting because you have to treat a trim as a write of zeros and re-calc parity. raid/6 will always have parity issues regardless of how many drives there are. Even worse is that raid/4,5,6 parity read/modify/write operations tend to chatter the FTL (Flash Translation Layer) logic and make matters worse (often much worse). If you are not streaming long linear writes, raid/4,5,6 in a heavy write environment is a probably a very bad idea for most SSDs. Another issue with trim is how "async" it behaves. You can trim a lot of data to a drive, but it is hard to tell when the drive actually is ready afterwards. Some drives also choke on trim requests that come at them too fast or requests that are too long. The behavior can be quite random. So then comes the issue of how many "user knobs" to supply to tune what trims where. Again, raid/0,1,10 are pretty easy. Raid/4,5,6 really requires that you know the precise geometry and control the IO. Way beyond what ext4 understands at this point. Trim can also be "faked" with some drives. Again, looking at the SandForce based drives, these drive internally de-dupe so you can fake write data and help the drives get free space. Do this by filling the drive with zeros (ie, dd if=/dev/zero of=big.file bs=1M), do a sync, and then delete the big.file. This works through md, across SANs, from XEN virtuals, or wherever. With SandForce drives, this is not as effective as a trim, but better than nothing. Unfortunately, only SandForce drives and Flash Supercharger understand zero's this way. A filesystem option that "zeros discarded sectors" would actually make as much sense in some deployment settings as the discard option (not sure, but ext# might already have this). NTFS has actually supported this since XP as a security enhancement. Doug Dumitru EasyCo LLC ps: My background with this has been the development of Flash SuperCharger. I am not trying to run an advert here, but the care and feeding of SSDs can be interesting. Flash SuperCharger breaks most of these rules, but it does know the exact geometry of what it is driving and plays excessive games to drives SSDs at their exact "sweet spot". One of our licensees just sent me some benchmarks at > 500,000 4K random writes/sec for a moderate sized array running raid/5. pps: Failures of SSDs are different than HDDs. SSDs can and do fail and need raid for many applications. If you need high write IOPS, it pretty much has to be raid/1,10 (unless you run our Flash SuperCharger layer). ppps: I have seen SSDs silently return corrupted data. Disks do this as well. A paper from 2 years ago quoted disk silent error rates as high as 1 bad block every 73TB read. Very scary stuff, but probably beyond the scope of what md can address. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 19:15 ` Doug Dumitru @ 2011-02-09 19:22 ` Roberto Spadim 0 siblings, 0 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-09 19:22 UTC (permalink / raw) To: doug; +Cc: Chris Worley, Scott E. Armitage, David Brown, linux-raid i agree with ppps that´s why ecc, checksum and parity is usefull (raid5,6) (raid1 if you read from all mirror to check difference and select the 'right disk') 2011/2/9 Doug Dumitru <doug@easyco.com>: > I work with SSDs arrays all the time, so I have a couple of thoughts > about trim and md. > > 'trim' is still necessary. SandForce controllers are "better" at > this, but still need free space to do their work. I had a set of SF > drives drop to 22 MB/sec writes because they were full and scrambled. > It takes a lot of effort to get them that messed up, but it can still > happen. Trim brings them back. > > The bottom line is that SSDs do block re-organization on the fly and > free space makes the re-org more efficient. More efficient means > faster, and as importantly less wear amplification. > > Most SSDs (and I think the latest trim spec) are deterministic on > trim'd sectors. If you trim a sector, they read that sector as zeros. > This makes raid much "safer". > > raid/0,1,10 should be fine to echo discard commands down to the > downstream drives in the bio request. It is then up to the physical > device driver to turn the discard bio request into an ATA (or SCSI) > trim. Most block devices don't seem to understand discard requests > yet, but this will get better over time. > > raid/4,5,6 is a lot more complicated. With raid/4,5 with an even > number of drives, you can trim whole stripes safely. Pieces of > stripes get interesting because you have to treat a trim as a write of > zeros and re-calc parity. raid/6 will always have parity issues > regardless of how many drives there are. Even worse is that > raid/4,5,6 parity read/modify/write operations tend to chatter the FTL > (Flash Translation Layer) logic and make matters worse (often much > worse). If you are not streaming long linear writes, raid/4,5,6 in a > heavy write environment is a probably a very bad idea for most SSDs. > > Another issue with trim is how "async" it behaves. You can trim a lot > of data to a drive, but it is hard to tell when the drive actually is > ready afterwards. Some drives also choke on trim requests that come > at them too fast or requests that are too long. The behavior can be > quite random. So then comes the issue of how many "user knobs" to > supply to tune what trims where. Again, raid/0,1,10 are pretty easy. > Raid/4,5,6 really requires that you know the precise geometry and > control the IO. Way beyond what ext4 understands at this point. > > Trim can also be "faked" with some drives. Again, looking at the > SandForce based drives, these drive internally de-dupe so you can fake > write data and help the drives get free space. Do this by filling the > drive with zeros (ie, dd if=/dev/zero of=big.file bs=1M), do a sync, > and then delete the big.file. This works through md, across SANs, > from XEN virtuals, or wherever. With SandForce drives, this is not as > effective as a trim, but better than nothing. Unfortunately, only > SandForce drives and Flash Supercharger understand zero's this way. A > filesystem option that "zeros discarded sectors" would actually make > as much sense in some deployment settings as the discard option (not > sure, but ext# might already have this). NTFS has actually supported > this since XP as a security enhancement. > > Doug Dumitru > EasyCo LLC > > ps: My background with this has been the development of Flash > SuperCharger. I am not trying to run an advert here, but the care and > feeding of SSDs can be interesting. Flash SuperCharger breaks most of > these rules, but it does know the exact geometry of what it is driving > and plays excessive games to drives SSDs at their exact "sweet spot". > One of our licensees just sent me some benchmarks at > 500,000 4K > random writes/sec for a moderate sized array running raid/5. > > pps: Failures of SSDs are different than HDDs. SSDs can and do fail > and need raid for many applications. If you need high write IOPS, it > pretty much has to be raid/1,10 (unless you run our Flash SuperCharger > layer). > > ppps: I have seen SSDs silently return corrupted data. Disks do this > as well. A paper from 2 years ago quoted disk silent error rates as > high as 1 bad block every 73TB read. Very scary stuff, but probably > beyond the scope of what md can address. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 15:00 ` Scott E. Armitage 2011-02-09 15:52 ` Chris Worley @ 2011-02-09 16:19 ` Eric D. Mudama 2011-02-09 16:28 ` Scott E. Armitage 2011-02-21 18:24 ` Phillip Susi 1 sibling, 2 replies; 70+ messages in thread From: Eric D. Mudama @ 2011-02-09 16:19 UTC (permalink / raw) To: Scott E. Armitage; +Cc: Roberto Spadim, David Brown, linux-raid On Wed, Feb 9 at 10:00, Scott E. Armitage wrote: >I reiterate my previous reply that under the current md architecture, >where the complete device is considered to be in use, sending TRIM >commands makes little sense. AFAICT, reading back a trimmed page is >not defined, since the whole idea is that the host doesn't care about >what is on that page any more. > >The next time md comes around to corresponding trimmed pages on two >SSDs, their contents may differ, and all of a sudden our array is no >longer consistent. For SATA devices, ATA8-ACS2 addresses this through Deterministic Read After Trim in the DATA SET MANAGEMENT command. Devices can be indeterminate, determinate with a non-zero pattern (often all-ones) or determinate all-zero for sectors read after being trimmed. --eric -- Eric D. Mudama edmudama@bounceswoosh.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 16:19 ` Eric D. Mudama @ 2011-02-09 16:28 ` Scott E. Armitage 2011-02-09 17:17 ` Eric D. Mudama 2011-02-21 18:24 ` Phillip Susi 1 sibling, 1 reply; 70+ messages in thread From: Scott E. Armitage @ 2011-02-09 16:28 UTC (permalink / raw) To: Eric D. Mudama; +Cc: Roberto Spadim, David Brown, linux-raid Who sends this command? If md can assume that determinate mode is always set, then RAID 1 at least would remain consistent. For RAID 5, consistency of the parity information depends on the determinate pattern used and the number of disks. If you used determinate all-zero, then parity information would always be consistent, but this is probably not preferable since every TRIM command would incur an extra write for each bit in each page of the block. -S On Wed, Feb 9, 2011 at 11:19 AM, Eric D. Mudama <edmudama@bounceswoosh.org> wrote: > On Wed, Feb 9 at 10:00, Scott E. Armitage wrote: >> >> I reiterate my previous reply that under the current md architecture, >> where the complete device is considered to be in use, sending TRIM >> commands makes little sense. AFAICT, reading back a trimmed page is >> not defined, since the whole idea is that the host doesn't care about >> what is on that page any more. >> >> The next time md comes around to corresponding trimmed pages on two >> SSDs, their contents may differ, and all of a sudden our array is no >> longer consistent. > > For SATA devices, ATA8-ACS2 addresses this through Deterministic Read > After Trim in the DATA SET MANAGEMENT command. Devices can be > indeterminate, determinate with a non-zero pattern (often all-ones) or > determinate all-zero for sectors read after being trimmed. > > --eric > > -- > Eric D. Mudama > edmudama@bounceswoosh.org > > -- Scott Armitage, B.A.Sc., M.A.Sc. candidate Space Flight Laboratory University of Toronto Institute for Aerospace Studies 4925 Dufferin Street, Toronto, Ontario, Canada, M3H 5T6 -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 16:28 ` Scott E. Armitage @ 2011-02-09 17:17 ` Eric D. Mudama 2011-02-09 18:18 ` Roberto Spadim 0 siblings, 1 reply; 70+ messages in thread From: Eric D. Mudama @ 2011-02-09 17:17 UTC (permalink / raw) To: Scott E. Armitage; +Cc: Eric D. Mudama, Roberto Spadim, David Brown, linux-raid On Wed, Feb 9 at 11:28, Scott E. Armitage wrote: >Who sends this command? If md can assume that determinate mode is >always set, then RAID 1 at least would remain consistent. For RAID 5, >consistency of the parity information depends on the determinate >pattern used and the number of disks. If you used determinate >all-zero, then parity information would always be consistent, but this >is probably not preferable since every TRIM command would incur an >extra write for each bit in each page of the block. True, and there are several solutions. Maybe track space used via some mechanism, such that when you trim you're only trimming the entire stripe width so no parity is required for the trimmed regions. Or, trust the drive's wear leveling and endurance rating, combined with SMART data, to indicate when you need to replace the device preemptive to eventual failure. It's not an unsolvable issue. If the RAID5 used distributed parity, you could expect wear leveling to wear all the devices evenly, since on average, the # of writes to all devices will be the same. Only a RAID4 setup would see a lopsided amount of writes to a single device. --eric -- Eric D. Mudama edmudama@bounceswoosh.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 17:17 ` Eric D. Mudama @ 2011-02-09 18:18 ` Roberto Spadim 2011-02-09 18:24 ` Piergiorgio Sartor 0 siblings, 1 reply; 70+ messages in thread From: Roberto Spadim @ 2011-02-09 18:18 UTC (permalink / raw) To: Eric D. Mudama; +Cc: Scott E. Armitage, David Brown, linux-raid who send? ext4 send trim commands to device (disk/md raid/nbd) kernel swap send this commands (when possible) to device too for internal raid5 parity disk this could be done by md, for data disks this should be done by ext4 the other question... about resync with only write what is different this is very good since write and read speed can be different for ssd (hd don´t have this 'problem') but i´m sure that just write what is diff is better than write all (ssd life will be bigger, hd maybe... i think that will be bigger too) 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>: > On Wed, Feb 9 at 11:28, Scott E. Armitage wrote: >> >> Who sends this command? If md can assume that determinate mode is >> always set, then RAID 1 at least would remain consistent. For RAID 5, >> consistency of the parity information depends on the determinate >> pattern used and the number of disks. If you used determinate >> all-zero, then parity information would always be consistent, but this >> is probably not preferable since every TRIM command would incur an >> extra write for each bit in each page of the block. > > True, and there are several solutions. Maybe track space used via > some mechanism, such that when you trim you're only trimming the > entire stripe width so no parity is required for the trimmed regions. > Or, trust the drive's wear leveling and endurance rating, combined > with SMART data, to indicate when you need to replace the device > preemptive to eventual failure. > > It's not an unsolvable issue. If the RAID5 used distributed parity, > you could expect wear leveling to wear all the devices evenly, since > on average, the # of writes to all devices will be the same. Only a > RAID4 setup would see a lopsided amount of writes to a single device. > > --eric > > -- > Eric D. Mudama > edmudama@bounceswoosh.org > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 18:18 ` Roberto Spadim @ 2011-02-09 18:24 ` Piergiorgio Sartor 2011-02-09 18:30 ` Roberto Spadim 0 siblings, 1 reply; 70+ messages in thread From: Piergiorgio Sartor @ 2011-02-09 18:24 UTC (permalink / raw) To: Roberto Spadim; +Cc: Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid > ext4 send trim commands to device (disk/md raid/nbd) > kernel swap send this commands (when possible) to device too > for internal raid5 parity disk this could be done by md, for data > disks this should be done by ext4 That's an interesting point. On which basis should a parity "block" get a TRIM? If you ask me, I think the complete TRIM story is, at best, a temporary patch. IMHO the wear levelling should be handled by the filesystem and, with awarness of this, by the underlining device drivers. Reason is that the FS knows better what's going on with the blocks and what will happen. bye, pg > > the other question... about resync with only write what is different > this is very good since write and read speed can be different for ssd > (hd don´t have this 'problem') > but i´m sure that just write what is diff is better than write all > (ssd life will be bigger, hd maybe... i think that will be bigger too) > > > 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>: > > On Wed, Feb 9 at 11:28, Scott E. Armitage wrote: > >> > >> Who sends this command? If md can assume that determinate mode is > >> always set, then RAID 1 at least would remain consistent. For RAID 5, > >> consistency of the parity information depends on the determinate > >> pattern used and the number of disks. If you used determinate > >> all-zero, then parity information would always be consistent, but this > >> is probably not preferable since every TRIM command would incur an > >> extra write for each bit in each page of the block. > > > > True, and there are several solutions. Maybe track space used via > > some mechanism, such that when you trim you're only trimming the > > entire stripe width so no parity is required for the trimmed regions. > > Or, trust the drive's wear leveling and endurance rating, combined > > with SMART data, to indicate when you need to replace the device > > preemptive to eventual failure. > > > > It's not an unsolvable issue. If the RAID5 used distributed parity, > > you could expect wear leveling to wear all the devices evenly, since > > on average, the # of writes to all devices will be the same. Only a > > RAID4 setup would see a lopsided amount of writes to a single device. > > > > --eric > > > > -- > > Eric D. Mudama > > edmudama@bounceswoosh.org > > > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- piergiorgio -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 18:24 ` Piergiorgio Sartor @ 2011-02-09 18:30 ` Roberto Spadim 2011-02-09 18:38 ` Piergiorgio Sartor 0 siblings, 1 reply; 70+ messages in thread From: Roberto Spadim @ 2011-02-09 18:30 UTC (permalink / raw) To: Piergiorgio Sartor Cc: Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid nice =) but check that parity block is a raid information, not a filesystem information for raid we could implement trim when possible (like swap) and implement a trim that we receive from filesystem, and send to all disks (if it´s a raid1 with mirrors, we should sent to all mirrors) i don´t know what trim do very well, but i think it´s a very big write with only some bits for example: set sector1='00000000000000000000000000000000000000000000000000' could be replace by: trim sector1 it´s faster for sata communication, and it´s a good information for hard disk (it can put a single '0' at the start of the sector and know that all sector is 0, if it try to read any information it can use internal memory (don´t read hard disk), if a write is done it should write 0000 to bits, and after after the write operation, but it´s internal function of hard disk/ssd, not a problem of md raid... md raid should need know how to optimize and use it =] ) 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: >> ext4 send trim commands to device (disk/md raid/nbd) >> kernel swap send this commands (when possible) to device too >> for internal raid5 parity disk this could be done by md, for data >> disks this should be done by ext4 > > That's an interesting point. > > On which basis should a parity "block" get a TRIM? > > If you ask me, I think the complete TRIM story is, at > best, a temporary patch. > > IMHO the wear levelling should be handled by the filesystem > and, with awarness of this, by the underlining device drivers. > Reason is that the FS knows better what's going on with the > blocks and what will happen. > > bye, > > pg > >> >> the other question... about resync with only write what is different >> this is very good since write and read speed can be different for ssd >> (hd don´t have this 'problem') >> but i´m sure that just write what is diff is better than write all >> (ssd life will be bigger, hd maybe... i think that will be bigger too) >> >> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>: >> > On Wed, Feb 9 at 11:28, Scott E. Armitage wrote: >> >> >> >> Who sends this command? If md can assume that determinate mode is >> >> always set, then RAID 1 at least would remain consistent. For RAID 5, >> >> consistency of the parity information depends on the determinate >> >> pattern used and the number of disks. If you used determinate >> >> all-zero, then parity information would always be consistent, but this >> >> is probably not preferable since every TRIM command would incur an >> >> extra write for each bit in each page of the block. >> > >> > True, and there are several solutions. Maybe track space used via >> > some mechanism, such that when you trim you're only trimming the >> > entire stripe width so no parity is required for the trimmed regions. >> > Or, trust the drive's wear leveling and endurance rating, combined >> > with SMART data, to indicate when you need to replace the device >> > preemptive to eventual failure. >> > >> > It's not an unsolvable issue. If the RAID5 used distributed parity, >> > you could expect wear leveling to wear all the devices evenly, since >> > on average, the # of writes to all devices will be the same. Only a >> > RAID4 setup would see a lopsided amount of writes to a single device. >> > >> > --eric >> > >> > -- >> > Eric D. Mudama >> > edmudama@bounceswoosh.org >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> >> >> >> -- >> Roberto Spadim >> Spadim Technology / SPAEmpresarial >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > piergiorgio > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 18:30 ` Roberto Spadim @ 2011-02-09 18:38 ` Piergiorgio Sartor 2011-02-09 18:46 ` Roberto Spadim 0 siblings, 1 reply; 70+ messages in thread From: Piergiorgio Sartor @ 2011-02-09 18:38 UTC (permalink / raw) To: Roberto Spadim Cc: Piergiorgio Sartor, Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote: > nice =) > but check that parity block is a raid information, not a filesystem information > for raid we could implement trim when possible (like swap) > and implement a trim that we receive from filesystem, and send to all > disks (if it´s a raid1 with mirrors, we should sent to all mirrors) To all disk also in case of RAID-5? What if the TRIM belongs only to a single SDD block belonging to a single chunk of a stripe? That is a *single* SSD of the RAID-5. Should md re-read the block and re-write (not TRIM) the parity? I think anything that has to do with checking & repairing must be carefully considered... bye, pg > i don´t know what trim do very well, but i think it´s a very big write > with only some bits for example: > set sector1='00000000000000000000000000000000000000000000000000' > could be replace by: > trim sector1 > it´s faster for sata communication, and it´s a good information for > hard disk (it can put a single '0' at the start of the sector and know > that all sector is 0, if it try to read any information it can use > internal memory (don´t read hard disk), if a write is done it should > write 0000 to bits, and after after the write operation, but it´s > internal function of hard disk/ssd, not a problem of md raid... md > raid should need know how to optimize and use it =] ) > > 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: > >> ext4 send trim commands to device (disk/md raid/nbd) > >> kernel swap send this commands (when possible) to device too > >> for internal raid5 parity disk this could be done by md, for data > >> disks this should be done by ext4 > > > > That's an interesting point. > > > > On which basis should a parity "block" get a TRIM? > > > > If you ask me, I think the complete TRIM story is, at > > best, a temporary patch. > > > > IMHO the wear levelling should be handled by the filesystem > > and, with awarness of this, by the underlining device drivers. > > Reason is that the FS knows better what's going on with the > > blocks and what will happen. > > > > bye, > > > > pg > > > >> > >> the other question... about resync with only write what is different > >> this is very good since write and read speed can be different for ssd > >> (hd don´t have this 'problem') > >> but i´m sure that just write what is diff is better than write all > >> (ssd life will be bigger, hd maybe... i think that will be bigger too) > >> > >> > >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>: > >> > On Wed, Feb 9 at 11:28, Scott E. Armitage wrote: > >> >> > >> >> Who sends this command? If md can assume that determinate mode is > >> >> always set, then RAID 1 at least would remain consistent. For RAID 5, > >> >> consistency of the parity information depends on the determinate > >> >> pattern used and the number of disks. If you used determinate > >> >> all-zero, then parity information would always be consistent, but this > >> >> is probably not preferable since every TRIM command would incur an > >> >> extra write for each bit in each page of the block. > >> > > >> > True, and there are several solutions. Maybe track space used via > >> > some mechanism, such that when you trim you're only trimming the > >> > entire stripe width so no parity is required for the trimmed regions. > >> > Or, trust the drive's wear leveling and endurance rating, combined > >> > with SMART data, to indicate when you need to replace the device > >> > preemptive to eventual failure. > >> > > >> > It's not an unsolvable issue. If the RAID5 used distributed parity, > >> > you could expect wear leveling to wear all the devices evenly, since > >> > on average, the # of writes to all devices will be the same. Only a > >> > RAID4 setup would see a lopsided amount of writes to a single device. > >> > > >> > --eric > >> > > >> > -- > >> > Eric D. Mudama > >> > edmudama@bounceswoosh.org > >> > > >> > -- > >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> > the body of a message to majordomo@vger.kernel.org > >> > More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > >> > >> > >> > >> -- > >> Roberto Spadim > >> Spadim Technology / SPAEmpresarial > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > > > > piergiorgio > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- piergiorgio -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 18:38 ` Piergiorgio Sartor @ 2011-02-09 18:46 ` Roberto Spadim 2011-02-09 18:52 ` Roberto Spadim 2011-02-09 19:13 ` Piergiorgio Sartor 0 siblings, 2 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-09 18:46 UTC (permalink / raw) To: Piergiorgio Sartor Cc: Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid it´s just a discussion, right? no implementation yet, right? what i think.... if device accept TRIM, we can use TRIM. if not, we must translate TRIM to something similar (maybe many WRITES ?), and when we READ from disk we get the same information the translation coulbe be done by kernel (not md) maybe options on libata, nbd device.... other option is do it with md, internal (md) TRIM translate function who send trim? internal md information: md can generate it (if necessary, maybe it´s not...) for parity disks (not data disks) filesystem/or another upper layer program (database with direct device access), we could accept TRIM from filesystem/database, and send it to disks/mirrors, when necessary translate it (internal or kernel translate function) 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: > On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote: >> nice =) >> but check that parity block is a raid information, not a filesystem information >> for raid we could implement trim when possible (like swap) >> and implement a trim that we receive from filesystem, and send to all >> disks (if it´s a raid1 with mirrors, we should sent to all mirrors) > > To all disk also in case of RAID-5? > > What if the TRIM belongs only to a single SDD block > belonging to a single chunk of a stripe? > That is a *single* SSD of the RAID-5. > > Should md re-read the block and re-write (not TRIM) > the parity? > > I think anything that has to do with checking & > repairing must be carefully considered... > > bye, > > pg > >> i don´t know what trim do very well, but i think it´s a very big write >> with only some bits for example: >> set sector1='00000000000000000000000000000000000000000000000000' >> could be replace by: >> trim sector1 >> it´s faster for sata communication, and it´s a good information for >> hard disk (it can put a single '0' at the start of the sector and know >> that all sector is 0, if it try to read any information it can use >> internal memory (don´t read hard disk), if a write is done it should >> write 0000 to bits, and after after the write operation, but it´s >> internal function of hard disk/ssd, not a problem of md raid... md >> raid should need know how to optimize and use it =] ) >> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: >> >> ext4 send trim commands to device (disk/md raid/nbd) >> >> kernel swap send this commands (when possible) to device too >> >> for internal raid5 parity disk this could be done by md, for data >> >> disks this should be done by ext4 >> > >> > That's an interesting point. >> > >> > On which basis should a parity "block" get a TRIM? >> > >> > If you ask me, I think the complete TRIM story is, at >> > best, a temporary patch. >> > >> > IMHO the wear levelling should be handled by the filesystem >> > and, with awarness of this, by the underlining device drivers. >> > Reason is that the FS knows better what's going on with the >> > blocks and what will happen. >> > >> > bye, >> > >> > pg >> > >> >> >> >> the other question... about resync with only write what is different >> >> this is very good since write and read speed can be different for ssd >> >> (hd don´t have this 'problem') >> >> but i´m sure that just write what is diff is better than write all >> >> (ssd life will be bigger, hd maybe... i think that will be bigger too) >> >> >> >> >> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>: >> >> > On Wed, Feb 9 at 11:28, Scott E. Armitage wrote: >> >> >> >> >> >> Who sends this command? If md can assume that determinate mode is >> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5, >> >> >> consistency of the parity information depends on the determinate >> >> >> pattern used and the number of disks. If you used determinate >> >> >> all-zero, then parity information would always be consistent, but this >> >> >> is probably not preferable since every TRIM command would incur an >> >> >> extra write for each bit in each page of the block. >> >> > >> >> > True, and there are several solutions. Maybe track space used via >> >> > some mechanism, such that when you trim you're only trimming the >> >> > entire stripe width so no parity is required for the trimmed regions. >> >> > Or, trust the drive's wear leveling and endurance rating, combined >> >> > with SMART data, to indicate when you need to replace the device >> >> > preemptive to eventual failure. >> >> > >> >> > It's not an unsolvable issue. If the RAID5 used distributed parity, >> >> > you could expect wear leveling to wear all the devices evenly, since >> >> > on average, the # of writes to all devices will be the same. Only a >> >> > RAID4 setup would see a lopsided amount of writes to a single device. >> >> > >> >> > --eric >> >> > >> >> > -- >> >> > Eric D. Mudama >> >> > edmudama@bounceswoosh.org >> >> > >> >> > -- >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> > the body of a message to majordomo@vger.kernel.org >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > >> >> >> >> >> >> >> >> -- >> >> Roberto Spadim >> >> Spadim Technology / SPAEmpresarial >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> the body of a message to majordomo@vger.kernel.org >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> > -- >> > >> > piergiorgio >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> >> >> >> -- >> Roberto Spadim >> Spadim Technology / SPAEmpresarial >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > > piergiorgio > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 18:46 ` Roberto Spadim @ 2011-02-09 18:52 ` Roberto Spadim 2011-02-09 19:13 ` Piergiorgio Sartor 1 sibling, 0 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-09 18:52 UTC (permalink / raw) To: Piergiorgio Sartor Cc: Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid the other question... checked and repair i don´t know the today resync implementation (i need read source code) but, a read check diferences and after write if any diference is found, is better than write without check diferences why better? to SSD: it will have a bigger life to HDD: i think it will have a bigger life too (I THINK) the problem: more operations without check: READ from source, WRITE to mirror with check: READ from source, READ from mirror, check diff, WRITE to mirror if diff maybe a option to mdadm could set the md device to RESYNC WITH CHECK, and RESYNC WITHOUT CHECK it´s a user option, not a md option, right? if user want a fast resync it can use without check or with check, but we can give user options... that´s very nice (to user), the default option? i think WITHOUT CHECK should be the default option, without check is a feature like default chuck size... 2011/2/9 Roberto Spadim <roberto@spadim.com.br>: > it´s just a discussion, right? no implementation yet, right? > > what i think.... > if device accept TRIM, we can use TRIM. > if not, we must translate TRIM to something similar (maybe many WRITES > ?), and when we READ from disk we get the same information > the translation coulbe be done by kernel (not md) maybe options on > libata, nbd device.... > other option is do it with md, internal (md) TRIM translate function > > who send trim? > internal md information: md can generate it (if necessary, maybe it´s > not...) for parity disks (not data disks) > filesystem/or another upper layer program (database with direct device > access), we could accept TRIM from filesystem/database, and send it to > disks/mirrors, when necessary translate it (internal or kernel > translate function) > > > 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: >> On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote: >>> nice =) >>> but check that parity block is a raid information, not a filesystem information >>> for raid we could implement trim when possible (like swap) >>> and implement a trim that we receive from filesystem, and send to all >>> disks (if it´s a raid1 with mirrors, we should sent to all mirrors) >> >> To all disk also in case of RAID-5? >> >> What if the TRIM belongs only to a single SDD block >> belonging to a single chunk of a stripe? >> That is a *single* SSD of the RAID-5. >> >> Should md re-read the block and re-write (not TRIM) >> the parity? >> >> I think anything that has to do with checking & >> repairing must be carefully considered... >> >> bye, >> >> pg >> >>> i don´t know what trim do very well, but i think it´s a very big write >>> with only some bits for example: >>> set sector1='00000000000000000000000000000000000000000000000000' >>> could be replace by: >>> trim sector1 >>> it´s faster for sata communication, and it´s a good information for >>> hard disk (it can put a single '0' at the start of the sector and know >>> that all sector is 0, if it try to read any information it can use >>> internal memory (don´t read hard disk), if a write is done it should >>> write 0000 to bits, and after after the write operation, but it´s >>> internal function of hard disk/ssd, not a problem of md raid... md >>> raid should need know how to optimize and use it =] ) >>> >>> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: >>> >> ext4 send trim commands to device (disk/md raid/nbd) >>> >> kernel swap send this commands (when possible) to device too >>> >> for internal raid5 parity disk this could be done by md, for data >>> >> disks this should be done by ext4 >>> > >>> > That's an interesting point. >>> > >>> > On which basis should a parity "block" get a TRIM? >>> > >>> > If you ask me, I think the complete TRIM story is, at >>> > best, a temporary patch. >>> > >>> > IMHO the wear levelling should be handled by the filesystem >>> > and, with awarness of this, by the underlining device drivers. >>> > Reason is that the FS knows better what's going on with the >>> > blocks and what will happen. >>> > >>> > bye, >>> > >>> > pg >>> > >>> >> >>> >> the other question... about resync with only write what is different >>> >> this is very good since write and read speed can be different for ssd >>> >> (hd don´t have this 'problem') >>> >> but i´m sure that just write what is diff is better than write all >>> >> (ssd life will be bigger, hd maybe... i think that will be bigger too) >>> >> >>> >> >>> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>: >>> >> > On Wed, Feb 9 at 11:28, Scott E. Armitage wrote: >>> >> >> >>> >> >> Who sends this command? If md can assume that determinate mode is >>> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5, >>> >> >> consistency of the parity information depends on the determinate >>> >> >> pattern used and the number of disks. If you used determinate >>> >> >> all-zero, then parity information would always be consistent, but this >>> >> >> is probably not preferable since every TRIM command would incur an >>> >> >> extra write for each bit in each page of the block. >>> >> > >>> >> > True, and there are several solutions. Maybe track space used via >>> >> > some mechanism, such that when you trim you're only trimming the >>> >> > entire stripe width so no parity is required for the trimmed regions. >>> >> > Or, trust the drive's wear leveling and endurance rating, combined >>> >> > with SMART data, to indicate when you need to replace the device >>> >> > preemptive to eventual failure. >>> >> > >>> >> > It's not an unsolvable issue. If the RAID5 used distributed parity, >>> >> > you could expect wear leveling to wear all the devices evenly, since >>> >> > on average, the # of writes to all devices will be the same. Only a >>> >> > RAID4 setup would see a lopsided amount of writes to a single device. >>> >> > >>> >> > --eric >>> >> > >>> >> > -- >>> >> > Eric D. Mudama >>> >> > edmudama@bounceswoosh.org >>> >> > >>> >> > -- >>> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> >> > the body of a message to majordomo@vger.kernel.org >>> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Roberto Spadim >>> >> Spadim Technology / SPAEmpresarial >>> >> -- >>> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> >> the body of a message to majordomo@vger.kernel.org >>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> > >>> > -- >>> > >>> > piergiorgio >>> > -- >>> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> > the body of a message to majordomo@vger.kernel.org >>> > More majordomo info at http://vger.kernel.org/majordomo-info.html >>> > >>> >>> >>> >>> -- >>> Roberto Spadim >>> Spadim Technology / SPAEmpresarial >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> >> piergiorgio >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 18:46 ` Roberto Spadim 2011-02-09 18:52 ` Roberto Spadim @ 2011-02-09 19:13 ` Piergiorgio Sartor 2011-02-09 19:16 ` Roberto Spadim 1 sibling, 1 reply; 70+ messages in thread From: Piergiorgio Sartor @ 2011-02-09 19:13 UTC (permalink / raw) To: Roberto Spadim Cc: Piergiorgio Sartor, Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid > it´s just a discussion, right? no implementation yet, right? Of course... > what i think.... > if device accept TRIM, we can use TRIM. > if not, we must translate TRIM to something similar (maybe many WRITES > ?), and when we READ from disk we get the same information TRIM is not about writing at all. TRIM tells the device that the addressed block is not anymore used, so it (the SSD) can do whatever it wants with it. The only software layer having the same "knowledge" is the filesystem, the other layers, do not have any decisional power about the block allocation. Except for metadata, of course. So, IMHO, a software TRIM can only be in the FS. bye, pg > the translation coulbe be done by kernel (not md) maybe options on > libata, nbd device.... > other option is do it with md, internal (md) TRIM translate function > > who send trim? > internal md information: md can generate it (if necessary, maybe it´s > not...) for parity disks (not data disks) > filesystem/or another upper layer program (database with direct device > access), we could accept TRIM from filesystem/database, and send it to > disks/mirrors, when necessary translate it (internal or kernel > translate function) > > > 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: > > On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote: > >> nice =) > >> but check that parity block is a raid information, not a filesystem information > >> for raid we could implement trim when possible (like swap) > >> and implement a trim that we receive from filesystem, and send to all > >> disks (if it´s a raid1 with mirrors, we should sent to all mirrors) > > > > To all disk also in case of RAID-5? > > > > What if the TRIM belongs only to a single SDD block > > belonging to a single chunk of a stripe? > > That is a *single* SSD of the RAID-5. > > > > Should md re-read the block and re-write (not TRIM) > > the parity? > > > > I think anything that has to do with checking & > > repairing must be carefully considered... > > > > bye, > > > > pg > > > >> i don´t know what trim do very well, but i think it´s a very big write > >> with only some bits for example: > >> set sector1='00000000000000000000000000000000000000000000000000' > >> could be replace by: > >> trim sector1 > >> it´s faster for sata communication, and it´s a good information for > >> hard disk (it can put a single '0' at the start of the sector and know > >> that all sector is 0, if it try to read any information it can use > >> internal memory (don´t read hard disk), if a write is done it should > >> write 0000 to bits, and after after the write operation, but it´s > >> internal function of hard disk/ssd, not a problem of md raid... md > >> raid should need know how to optimize and use it =] ) > >> > >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: > >> >> ext4 send trim commands to device (disk/md raid/nbd) > >> >> kernel swap send this commands (when possible) to device too > >> >> for internal raid5 parity disk this could be done by md, for data > >> >> disks this should be done by ext4 > >> > > >> > That's an interesting point. > >> > > >> > On which basis should a parity "block" get a TRIM? > >> > > >> > If you ask me, I think the complete TRIM story is, at > >> > best, a temporary patch. > >> > > >> > IMHO the wear levelling should be handled by the filesystem > >> > and, with awarness of this, by the underlining device drivers. > >> > Reason is that the FS knows better what's going on with the > >> > blocks and what will happen. > >> > > >> > bye, > >> > > >> > pg > >> > > >> >> > >> >> the other question... about resync with only write what is different > >> >> this is very good since write and read speed can be different for ssd > >> >> (hd don´t have this 'problem') > >> >> but i´m sure that just write what is diff is better than write all > >> >> (ssd life will be bigger, hd maybe... i think that will be bigger too) > >> >> > >> >> > >> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>: > >> >> > On Wed, Feb 9 at 11:28, Scott E. Armitage wrote: > >> >> >> > >> >> >> Who sends this command? If md can assume that determinate mode is > >> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5, > >> >> >> consistency of the parity information depends on the determinate > >> >> >> pattern used and the number of disks. If you used determinate > >> >> >> all-zero, then parity information would always be consistent, but this > >> >> >> is probably not preferable since every TRIM command would incur an > >> >> >> extra write for each bit in each page of the block. > >> >> > > >> >> > True, and there are several solutions. Maybe track space used via > >> >> > some mechanism, such that when you trim you're only trimming the > >> >> > entire stripe width so no parity is required for the trimmed regions. > >> >> > Or, trust the drive's wear leveling and endurance rating, combined > >> >> > with SMART data, to indicate when you need to replace the device > >> >> > preemptive to eventual failure. > >> >> > > >> >> > It's not an unsolvable issue. If the RAID5 used distributed parity, > >> >> > you could expect wear leveling to wear all the devices evenly, since > >> >> > on average, the # of writes to all devices will be the same. Only a > >> >> > RAID4 setup would see a lopsided amount of writes to a single device. > >> >> > > >> >> > --eric > >> >> > > >> >> > -- > >> >> > Eric D. Mudama > >> >> > edmudama@bounceswoosh.org > >> >> > > >> >> > -- > >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> >> > the body of a message to majordomo@vger.kernel.org > >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> Roberto Spadim > >> >> Spadim Technology / SPAEmpresarial > >> >> -- > >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> >> the body of a message to majordomo@vger.kernel.org > >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > >> > -- > >> > > >> > piergiorgio > >> > -- > >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> > the body of a message to majordomo@vger.kernel.org > >> > More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > >> > >> > >> > >> -- > >> Roberto Spadim > >> Spadim Technology / SPAEmpresarial > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > -- > > > > piergiorgio > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial -- piergiorgio -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 19:13 ` Piergiorgio Sartor @ 2011-02-09 19:16 ` Roberto Spadim 2011-02-09 19:21 ` Piergiorgio Sartor 0 siblings, 1 reply; 70+ messages in thread From: Roberto Spadim @ 2011-02-09 19:16 UTC (permalink / raw) To: Piergiorgio Sartor Cc: Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid yeah =) a question... if i send a TRIM to a sector if i read from it what i have? 0x00000000000000000000000000000000000 ? if yes, we could translate TRIM to WRITE on devices without TRIM (hard disks) just to have the same READ information 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: >> it´s just a discussion, right? no implementation yet, right? > > Of course... > >> what i think.... >> if device accept TRIM, we can use TRIM. >> if not, we must translate TRIM to something similar (maybe many WRITES >> ?), and when we READ from disk we get the same information > > TRIM is not about writing at all. TRIM tells the > device that the addressed block is not anymore used, > so it (the SSD) can do whatever it wants with it. > > The only software layer having the same "knowledge" > is the filesystem, the other layers, do not have > any decisional power about the block allocation. > Except for metadata, of course. > > So, IMHO, a software TRIM can only be in the FS. > > bye, > > pg > >> the translation coulbe be done by kernel (not md) maybe options on >> libata, nbd device.... >> other option is do it with md, internal (md) TRIM translate function >> >> who send trim? >> internal md information: md can generate it (if necessary, maybe it´s >> not...) for parity disks (not data disks) >> filesystem/or another upper layer program (database with direct device >> access), we could accept TRIM from filesystem/database, and send it to >> disks/mirrors, when necessary translate it (internal or kernel >> translate function) >> >> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: >> > On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote: >> >> nice =) >> >> but check that parity block is a raid information, not a filesystem information >> >> for raid we could implement trim when possible (like swap) >> >> and implement a trim that we receive from filesystem, and send to all >> >> disks (if it´s a raid1 with mirrors, we should sent to all mirrors) >> > >> > To all disk also in case of RAID-5? >> > >> > What if the TRIM belongs only to a single SDD block >> > belonging to a single chunk of a stripe? >> > That is a *single* SSD of the RAID-5. >> > >> > Should md re-read the block and re-write (not TRIM) >> > the parity? >> > >> > I think anything that has to do with checking & >> > repairing must be carefully considered... >> > >> > bye, >> > >> > pg >> > >> >> i don´t know what trim do very well, but i think it´s a very big write >> >> with only some bits for example: >> >> set sector1='00000000000000000000000000000000000000000000000000' >> >> could be replace by: >> >> trim sector1 >> >> it´s faster for sata communication, and it´s a good information for >> >> hard disk (it can put a single '0' at the start of the sector and know >> >> that all sector is 0, if it try to read any information it can use >> >> internal memory (don´t read hard disk), if a write is done it should >> >> write 0000 to bits, and after after the write operation, but it´s >> >> internal function of hard disk/ssd, not a problem of md raid... md >> >> raid should need know how to optimize and use it =] ) >> >> >> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: >> >> >> ext4 send trim commands to device (disk/md raid/nbd) >> >> >> kernel swap send this commands (when possible) to device too >> >> >> for internal raid5 parity disk this could be done by md, for data >> >> >> disks this should be done by ext4 >> >> > >> >> > That's an interesting point. >> >> > >> >> > On which basis should a parity "block" get a TRIM? >> >> > >> >> > If you ask me, I think the complete TRIM story is, at >> >> > best, a temporary patch. >> >> > >> >> > IMHO the wear levelling should be handled by the filesystem >> >> > and, with awarness of this, by the underlining device drivers. >> >> > Reason is that the FS knows better what's going on with the >> >> > blocks and what will happen. >> >> > >> >> > bye, >> >> > >> >> > pg >> >> > >> >> >> >> >> >> the other question... about resync with only write what is different >> >> >> this is very good since write and read speed can be different for ssd >> >> >> (hd don´t have this 'problem') >> >> >> but i´m sure that just write what is diff is better than write all >> >> >> (ssd life will be bigger, hd maybe... i think that will be bigger too) >> >> >> >> >> >> >> >> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>: >> >> >> > On Wed, Feb 9 at 11:28, Scott E. Armitage wrote: >> >> >> >> >> >> >> >> Who sends this command? If md can assume that determinate mode is >> >> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5, >> >> >> >> consistency of the parity information depends on the determinate >> >> >> >> pattern used and the number of disks. If you used determinate >> >> >> >> all-zero, then parity information would always be consistent, but this >> >> >> >> is probably not preferable since every TRIM command would incur an >> >> >> >> extra write for each bit in each page of the block. >> >> >> > >> >> >> > True, and there are several solutions. Maybe track space used via >> >> >> > some mechanism, such that when you trim you're only trimming the >> >> >> > entire stripe width so no parity is required for the trimmed regions. >> >> >> > Or, trust the drive's wear leveling and endurance rating, combined >> >> >> > with SMART data, to indicate when you need to replace the device >> >> >> > preemptive to eventual failure. >> >> >> > >> >> >> > It's not an unsolvable issue. If the RAID5 used distributed parity, >> >> >> > you could expect wear leveling to wear all the devices evenly, since >> >> >> > on average, the # of writes to all devices will be the same. Only a >> >> >> > RAID4 setup would see a lopsided amount of writes to a single device. >> >> >> > >> >> >> > --eric >> >> >> > >> >> >> > -- >> >> >> > Eric D. Mudama >> >> >> > edmudama@bounceswoosh.org >> >> >> > >> >> >> > -- >> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> >> > the body of a message to majordomo@vger.kernel.org >> >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Roberto Spadim >> >> >> Spadim Technology / SPAEmpresarial >> >> >> -- >> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> >> the body of a message to majordomo@vger.kernel.org >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > >> >> > -- >> >> > >> >> > piergiorgio >> >> > -- >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> > the body of a message to majordomo@vger.kernel.org >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > >> >> >> >> >> >> >> >> -- >> >> Roberto Spadim >> >> Spadim Technology / SPAEmpresarial >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> the body of a message to majordomo@vger.kernel.org >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> > -- >> > >> > piergiorgio >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> >> >> >> -- >> Roberto Spadim >> Spadim Technology / SPAEmpresarial > > -- > > piergiorgio > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 19:16 ` Roberto Spadim @ 2011-02-09 19:21 ` Piergiorgio Sartor 2011-02-09 19:27 ` Roberto Spadim 0 siblings, 1 reply; 70+ messages in thread From: Piergiorgio Sartor @ 2011-02-09 19:21 UTC (permalink / raw) To: Roberto Spadim Cc: Piergiorgio Sartor, Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid > yeah =) > a question... > if i send a TRIM to a sector > if i read from it > what i have? > 0x00000000000000000000000000000000000 ? > if yes, we could translate TRIM to WRITE on devices without TRIM (hard disks) > just to have the same READ information It seems the 0x0 is not a standard. Return values seem to be quite undefined, even if 0x0 *might* be common. Second, why do you want to emulate the 0x0 thing? I do not see the point of writing zero on a device which do not support TRIM. Just do nothing seems a better choice, even in mixed environment. bye, pg > 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: > >> it´s just a discussion, right? no implementation yet, right? > > > > Of course... > > > >> what i think.... > >> if device accept TRIM, we can use TRIM. > >> if not, we must translate TRIM to something similar (maybe many WRITES > >> ?), and when we READ from disk we get the same information > > > > TRIM is not about writing at all. TRIM tells the > > device that the addressed block is not anymore used, > > so it (the SSD) can do whatever it wants with it. > > > > The only software layer having the same "knowledge" > > is the filesystem, the other layers, do not have > > any decisional power about the block allocation. > > Except for metadata, of course. > > > > So, IMHO, a software TRIM can only be in the FS. > > > > bye, > > > > pg > > > >> the translation coulbe be done by kernel (not md) maybe options on > >> libata, nbd device.... > >> other option is do it with md, internal (md) TRIM translate function > >> > >> who send trim? > >> internal md information: md can generate it (if necessary, maybe it´s > >> not...) for parity disks (not data disks) > >> filesystem/or another upper layer program (database with direct device > >> access), we could accept TRIM from filesystem/database, and send it to > >> disks/mirrors, when necessary translate it (internal or kernel > >> translate function) > >> > >> > >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: > >> > On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote: > >> >> nice =) > >> >> but check that parity block is a raid information, not a filesystem information > >> >> for raid we could implement trim when possible (like swap) > >> >> and implement a trim that we receive from filesystem, and send to all > >> >> disks (if it´s a raid1 with mirrors, we should sent to all mirrors) > >> > > >> > To all disk also in case of RAID-5? > >> > > >> > What if the TRIM belongs only to a single SDD block > >> > belonging to a single chunk of a stripe? > >> > That is a *single* SSD of the RAID-5. > >> > > >> > Should md re-read the block and re-write (not TRIM) > >> > the parity? > >> > > >> > I think anything that has to do with checking & > >> > repairing must be carefully considered... > >> > > >> > bye, > >> > > >> > pg > >> > > >> >> i don´t know what trim do very well, but i think it´s a very big write > >> >> with only some bits for example: > >> >> set sector1='00000000000000000000000000000000000000000000000000' > >> >> could be replace by: > >> >> trim sector1 > >> >> it´s faster for sata communication, and it´s a good information for > >> >> hard disk (it can put a single '0' at the start of the sector and know > >> >> that all sector is 0, if it try to read any information it can use > >> >> internal memory (don´t read hard disk), if a write is done it should > >> >> write 0000 to bits, and after after the write operation, but it´s > >> >> internal function of hard disk/ssd, not a problem of md raid... md > >> >> raid should need know how to optimize and use it =] ) > >> >> > >> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: > >> >> >> ext4 send trim commands to device (disk/md raid/nbd) > >> >> >> kernel swap send this commands (when possible) to device too > >> >> >> for internal raid5 parity disk this could be done by md, for data > >> >> >> disks this should be done by ext4 > >> >> > > >> >> > That's an interesting point. > >> >> > > >> >> > On which basis should a parity "block" get a TRIM? > >> >> > > >> >> > If you ask me, I think the complete TRIM story is, at > >> >> > best, a temporary patch. > >> >> > > >> >> > IMHO the wear levelling should be handled by the filesystem > >> >> > and, with awarness of this, by the underlining device drivers. > >> >> > Reason is that the FS knows better what's going on with the > >> >> > blocks and what will happen. > >> >> > > >> >> > bye, > >> >> > > >> >> > pg > >> >> > > >> >> >> > >> >> >> the other question... about resync with only write what is different > >> >> >> this is very good since write and read speed can be different for ssd > >> >> >> (hd don´t have this 'problem') > >> >> >> but i´m sure that just write what is diff is better than write all > >> >> >> (ssd life will be bigger, hd maybe... i think that will be bigger too) > >> >> >> > >> >> >> > >> >> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>: > >> >> >> > On Wed, Feb 9 at 11:28, Scott E. Armitage wrote: > >> >> >> >> > >> >> >> >> Who sends this command? If md can assume that determinate mode is > >> >> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5, > >> >> >> >> consistency of the parity information depends on the determinate > >> >> >> >> pattern used and the number of disks. If you used determinate > >> >> >> >> all-zero, then parity information would always be consistent, but this > >> >> >> >> is probably not preferable since every TRIM command would incur an > >> >> >> >> extra write for each bit in each page of the block. > >> >> >> > > >> >> >> > True, and there are several solutions. Maybe track space used via > >> >> >> > some mechanism, such that when you trim you're only trimming the > >> >> >> > entire stripe width so no parity is required for the trimmed regions. > >> >> >> > Or, trust the drive's wear leveling and endurance rating, combined > >> >> >> > with SMART data, to indicate when you need to replace the device > >> >> >> > preemptive to eventual failure. > >> >> >> > > >> >> >> > It's not an unsolvable issue. If the RAID5 used distributed parity, > >> >> >> > you could expect wear leveling to wear all the devices evenly, since > >> >> >> > on average, the # of writes to all devices will be the same. Only a > >> >> >> > RAID4 setup would see a lopsided amount of writes to a single device. > >> >> >> > > >> >> >> > --eric > >> >> >> > > >> >> >> > -- > >> >> >> > Eric D. Mudama > >> >> >> > edmudama@bounceswoosh.org > >> >> >> > > >> >> >> > -- > >> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> >> >> > the body of a message to majordomo@vger.kernel.org > >> >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html > >> >> >> > > >> >> >> > >> >> >> > >> >> >> > >> >> >> -- > >> >> >> Roberto Spadim > >> >> >> Spadim Technology / SPAEmpresarial > >> >> >> -- > >> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> >> >> the body of a message to majordomo@vger.kernel.org > >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> >> > > >> >> > -- > >> >> > > >> >> > piergiorgio > >> >> > -- > >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> >> > the body of a message to majordomo@vger.kernel.org > >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html > >> >> > > >> >> > >> >> > >> >> > >> >> -- > >> >> Roberto Spadim > >> >> Spadim Technology / SPAEmpresarial > >> >> -- > >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> >> the body of a message to majordomo@vger.kernel.org > >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > >> > -- > >> > > >> > piergiorgio > >> > -- > >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > >> > the body of a message to majordomo@vger.kernel.org > >> > More majordomo info at http://vger.kernel.org/majordomo-info.html > >> > > >> > >> > >> > >> -- > >> Roberto Spadim > >> Spadim Technology / SPAEmpresarial > > > > -- > > > > piergiorgio > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial -- piergiorgio -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 19:21 ` Piergiorgio Sartor @ 2011-02-09 19:27 ` Roberto Spadim 0 siblings, 0 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-09 19:27 UTC (permalink / raw) To: Piergiorgio Sartor Cc: Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid just to make READ ok with any drive mix if device have TRIM, use it if not use WRITE 0x000000... after if we READ from /dev/md0 we have the same information (0x000000) doesn´t matter if it´s a ssd hd with or without trim function ext4 send trim command (but it´s a user option, should be used only with TRIM supported disks) swap send (it´s not a user option, kernel check if device can execute TRIM, if not don´t send (i don´t know what it do, but we could use the same code to 'emulate' TRIM command, like swap do)) why emulate? because we can use a mixed array (ssd/hd) and get more performace from TRIM enabled disks and ext4 (or other filesystem that will use md as a device) the point is: put support of TRIM command to MD devices today i don´t know if it have (i think not) if exists this support, how it works? could we mix TRIM enabled and non TRIM devices in a raid array? the first option is don´t use trim the second use trim when possible, emulate trim when impossible the third only accept trim if all devices are trim enabled (this should be a run time option, since we can remove a mirror with trim support and put a mirror without trim support) 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: >> yeah =) >> a question... >> if i send a TRIM to a sector >> if i read from it >> what i have? >> 0x00000000000000000000000000000000000 ? >> if yes, we could translate TRIM to WRITE on devices without TRIM (hard disks) >> just to have the same READ information > > It seems the 0x0 is not a standard. Return values > seem to be quite undefined, even if 0x0 *might* > be common. > > Second, why do you want to emulate the 0x0 thing? > > I do not see the point of writing zero on a device > which do not support TRIM. Just do nothing seems a > better choice, even in mixed environment. > > bye, > > pg > >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: >> >> it´s just a discussion, right? no implementation yet, right? >> > >> > Of course... >> > >> >> what i think.... >> >> if device accept TRIM, we can use TRIM. >> >> if not, we must translate TRIM to something similar (maybe many WRITES >> >> ?), and when we READ from disk we get the same information >> > >> > TRIM is not about writing at all. TRIM tells the >> > device that the addressed block is not anymore used, >> > so it (the SSD) can do whatever it wants with it. >> > >> > The only software layer having the same "knowledge" >> > is the filesystem, the other layers, do not have >> > any decisional power about the block allocation. >> > Except for metadata, of course. >> > >> > So, IMHO, a software TRIM can only be in the FS. >> > >> > bye, >> > >> > pg >> > >> >> the translation coulbe be done by kernel (not md) maybe options on >> >> libata, nbd device.... >> >> other option is do it with md, internal (md) TRIM translate function >> >> >> >> who send trim? >> >> internal md information: md can generate it (if necessary, maybe it´s >> >> not...) for parity disks (not data disks) >> >> filesystem/or another upper layer program (database with direct device >> >> access), we could accept TRIM from filesystem/database, and send it to >> >> disks/mirrors, when necessary translate it (internal or kernel >> >> translate function) >> >> >> >> >> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: >> >> > On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote: >> >> >> nice =) >> >> >> but check that parity block is a raid information, not a filesystem information >> >> >> for raid we could implement trim when possible (like swap) >> >> >> and implement a trim that we receive from filesystem, and send to all >> >> >> disks (if it´s a raid1 with mirrors, we should sent to all mirrors) >> >> > >> >> > To all disk also in case of RAID-5? >> >> > >> >> > What if the TRIM belongs only to a single SDD block >> >> > belonging to a single chunk of a stripe? >> >> > That is a *single* SSD of the RAID-5. >> >> > >> >> > Should md re-read the block and re-write (not TRIM) >> >> > the parity? >> >> > >> >> > I think anything that has to do with checking & >> >> > repairing must be carefully considered... >> >> > >> >> > bye, >> >> > >> >> > pg >> >> > >> >> >> i don´t know what trim do very well, but i think it´s a very big write >> >> >> with only some bits for example: >> >> >> set sector1='00000000000000000000000000000000000000000000000000' >> >> >> could be replace by: >> >> >> trim sector1 >> >> >> it´s faster for sata communication, and it´s a good information for >> >> >> hard disk (it can put a single '0' at the start of the sector and know >> >> >> that all sector is 0, if it try to read any information it can use >> >> >> internal memory (don´t read hard disk), if a write is done it should >> >> >> write 0000 to bits, and after after the write operation, but it´s >> >> >> internal function of hard disk/ssd, not a problem of md raid... md >> >> >> raid should need know how to optimize and use it =] ) >> >> >> >> >> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@nexgo.de>: >> >> >> >> ext4 send trim commands to device (disk/md raid/nbd) >> >> >> >> kernel swap send this commands (when possible) to device too >> >> >> >> for internal raid5 parity disk this could be done by md, for data >> >> >> >> disks this should be done by ext4 >> >> >> > >> >> >> > That's an interesting point. >> >> >> > >> >> >> > On which basis should a parity "block" get a TRIM? >> >> >> > >> >> >> > If you ask me, I think the complete TRIM story is, at >> >> >> > best, a temporary patch. >> >> >> > >> >> >> > IMHO the wear levelling should be handled by the filesystem >> >> >> > and, with awarness of this, by the underlining device drivers. >> >> >> > Reason is that the FS knows better what's going on with the >> >> >> > blocks and what will happen. >> >> >> > >> >> >> > bye, >> >> >> > >> >> >> > pg >> >> >> > >> >> >> >> >> >> >> >> the other question... about resync with only write what is different >> >> >> >> this is very good since write and read speed can be different for ssd >> >> >> >> (hd don´t have this 'problem') >> >> >> >> but i´m sure that just write what is diff is better than write all >> >> >> >> (ssd life will be bigger, hd maybe... i think that will be bigger too) >> >> >> >> >> >> >> >> >> >> >> >> 2011/2/9 Eric D. Mudama <edmudama@bounceswoosh.org>: >> >> >> >> > On Wed, Feb 9 at 11:28, Scott E. Armitage wrote: >> >> >> >> >> >> >> >> >> >> Who sends this command? If md can assume that determinate mode is >> >> >> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5, >> >> >> >> >> consistency of the parity information depends on the determinate >> >> >> >> >> pattern used and the number of disks. If you used determinate >> >> >> >> >> all-zero, then parity information would always be consistent, but this >> >> >> >> >> is probably not preferable since every TRIM command would incur an >> >> >> >> >> extra write for each bit in each page of the block. >> >> >> >> > >> >> >> >> > True, and there are several solutions. Maybe track space used via >> >> >> >> > some mechanism, such that when you trim you're only trimming the >> >> >> >> > entire stripe width so no parity is required for the trimmed regions. >> >> >> >> > Or, trust the drive's wear leveling and endurance rating, combined >> >> >> >> > with SMART data, to indicate when you need to replace the device >> >> >> >> > preemptive to eventual failure. >> >> >> >> > >> >> >> >> > It's not an unsolvable issue. If the RAID5 used distributed parity, >> >> >> >> > you could expect wear leveling to wear all the devices evenly, since >> >> >> >> > on average, the # of writes to all devices will be the same. Only a >> >> >> >> > RAID4 setup would see a lopsided amount of writes to a single device. >> >> >> >> > >> >> >> >> > --eric >> >> >> >> > >> >> >> >> > -- >> >> >> >> > Eric D. Mudama >> >> >> >> > edmudama@bounceswoosh.org >> >> >> >> > >> >> >> >> > -- >> >> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> >> >> > the body of a message to majordomo@vger.kernel.org >> >> >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> >> Roberto Spadim >> >> >> >> Spadim Technology / SPAEmpresarial >> >> >> >> -- >> >> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> >> >> the body of a message to majordomo@vger.kernel.org >> >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> > >> >> >> > -- >> >> >> > >> >> >> > piergiorgio >> >> >> > -- >> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> >> > the body of a message to majordomo@vger.kernel.org >> >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Roberto Spadim >> >> >> Spadim Technology / SPAEmpresarial >> >> >> -- >> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> >> the body of a message to majordomo@vger.kernel.org >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > >> >> > -- >> >> > >> >> > piergiorgio >> >> > -- >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> > the body of a message to majordomo@vger.kernel.org >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > >> >> >> >> >> >> >> >> -- >> >> Roberto Spadim >> >> Spadim Technology / SPAEmpresarial >> > >> > -- >> > >> > piergiorgio >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> > the body of a message to majordomo@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> >> >> >> -- >> Roberto Spadim >> Spadim Technology / SPAEmpresarial > > -- > > piergiorgio > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 16:19 ` Eric D. Mudama 2011-02-09 16:28 ` Scott E. Armitage @ 2011-02-21 18:24 ` Phillip Susi 2011-02-21 18:30 ` Roberto Spadim 1 sibling, 1 reply; 70+ messages in thread From: Phillip Susi @ 2011-02-21 18:24 UTC (permalink / raw) To: Eric D. Mudama; +Cc: Scott E. Armitage, Roberto Spadim, David Brown, linux-raid On 2/9/2011 11:19 AM, Eric D. Mudama wrote: > For SATA devices, ATA8-ACS2 addresses this through Deterministic Read > After Trim in the DATA SET MANAGEMENT command. Devices can be > indeterminate, determinate with a non-zero pattern (often all-ones) or > determinate all-zero for sectors read after being trimmed. IIRC, it was a word in the IDENTIFY response, not the DATA SET MANAGEMENT command. On 2/9/2011 11:28 AM, Scott E. Armitage wrote: > Who sends this command? If md can assume that determinate mode is > always set, then RAID 1 at least would remain consistent. For RAID 5, > consistency of the parity information depends on the determinate > pattern used and the number of disks. If you used determinate > all-zero, then parity information would always be consistent, but this > is probably not preferable since every TRIM command would incur an > extra write for each bit in each page of the block. The drive tells YOU how its trim behaves; you don't command it. If the drive is deterministic and always returns zeros after TRIM, then mdadm could pass the TRIM down and process it like a write of all zeros, and recompute the parity. If it isn't deterministic, then I don't think there's anything you can do to handle TRIM requests. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 18:24 ` Phillip Susi @ 2011-02-21 18:30 ` Roberto Spadim 0 siblings, 0 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-21 18:30 UTC (permalink / raw) To: Phillip Susi; +Cc: Eric D. Mudama, Scott E. Armitage, David Brown, linux-raid just some ideas... hummm thinking about TRIM in a mixed supported/non supported raid1 array... when a filesystem will read a block that is trimmed? since filesystem first write and after read, maybe never trimmed blocks are unused blocks (filesystem know where they are) maybe with a (read/compare/write if diff) resync function, we will have problems with non trimmed (with support to TRIM) disks being added on raid1 maybe.... i think that sending trim to devices isn´t a problem, it´s optimization of disk that must be done by filesystem, raid1 should only send this command to disks. the problem is, if a disk don´t have trim, we must implement a trim compatible command (or not... filesystem know about free blocks) 2011/2/21 Phillip Susi <psusi@cfl.rr.com>: > On 2/9/2011 11:19 AM, Eric D. Mudama wrote: >> For SATA devices, ATA8-ACS2 addresses this through Deterministic Read >> After Trim in the DATA SET MANAGEMENT command. Devices can be >> indeterminate, determinate with a non-zero pattern (often all-ones) or >> determinate all-zero for sectors read after being trimmed. > > IIRC, it was a word in the IDENTIFY response, not the DATA SET > MANAGEMENT command. > > On 2/9/2011 11:28 AM, Scott E. Armitage wrote: >> Who sends this command? If md can assume that determinate mode is >> always set, then RAID 1 at least would remain consistent. For RAID 5, >> consistency of the parity information depends on the determinate >> pattern used and the number of disks. If you used determinate >> all-zero, then parity information would always be consistent, but this >> is probably not preferable since every TRIM command would incur an >> extra write for each bit in each page of the block. > > The drive tells YOU how its trim behaves; you don't command it. > > If the drive is deterministic and always returns zeros after TRIM, then > mdadm could pass the TRIM down and process it like a write of all zeros, > and recompute the parity. If it isn't deterministic, then I don't think > there's anything you can do to handle TRIM requests. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 14:39 ` Roberto Spadim 2011-02-09 15:00 ` Scott E. Armitage @ 2011-02-09 15:49 ` David Brown 2011-02-21 18:20 ` Phillip Susi 1 sibling, 1 reply; 70+ messages in thread From: David Brown @ 2011-02-09 15:49 UTC (permalink / raw) To: linux-raid On 09/02/2011 15:39, Roberto Spadim wrote: > guys... > if my ssd fail, i buy another... > let's make software ok, the hardware is another problem > raid1 should work with floppy disks, hard disks, ssd, nbd... that's the point > make solutions for hardware mix > the question is simple, could we send TRIM command to all mirrors (for > stripe just disks that should receive it)? if device don't have TRIM > we should translate it for a similar command, with the same READ > effect (no problem if it's not atomic) > I've been reading a little more about this. It seems that the days of TRIM may well be numbered - the latest generation of high-end SSDs have more powerful garbage collection algorithms, together with more spare blocks, making TRIM pretty much redundant. This is, of course, the most convenient solution for everyone (as long as it doesn't cost too much!). The point of the TRIM command is to tell the SSD that a particular block is no longer being used, so that the SSD can erase it in the background - that way when you want to write more data, there are more free blocks ready and waiting. But if you've got plenty of spare blocks, it's easy to have them erased in advance and you don't need TRIM. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-09 15:49 ` David Brown @ 2011-02-21 18:20 ` Phillip Susi 2011-02-21 18:25 ` Roberto Spadim 0 siblings, 1 reply; 70+ messages in thread From: Phillip Susi @ 2011-02-21 18:20 UTC (permalink / raw) To: David Brown; +Cc: linux-raid On 2/9/2011 10:49 AM, David Brown wrote: > I've been reading a little more about this. It seems that the days of > TRIM may well be numbered - the latest generation of high-end SSDs have > more powerful garbage collection algorithms, together with more spare > blocks, making TRIM pretty much redundant. This is, of course, the most > convenient solution for everyone (as long as it doesn't cost too much!). > > The point of the TRIM command is to tell the SSD that a particular block > is no longer being used, so that the SSD can erase it in the background > - that way when you want to write more data, there are more free blocks > ready and waiting. But if you've got plenty of spare blocks, it's easy > to have them erased in advance and you don't need TRIM. It is not just about having free blocks ready and waiting. When doing wear leveling, you might find an erase block that has not been written to in a long time, so you want to move that data to a more worn block, and use the less worn block for more frequently written to sectors. If you know that sectors are unused because they have been TRIMed, then you don't have to waste time and wear copying the junk there to the new flash block. TRIM is also quite useful for thin provisioned storage, which seems to be getting popular. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 18:20 ` Phillip Susi @ 2011-02-21 18:25 ` Roberto Spadim 2011-02-21 18:34 ` Phillip Susi 2011-02-21 18:51 ` Mathias Burén 0 siblings, 2 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-21 18:25 UTC (permalink / raw) To: Phillip Susi; +Cc: David Brown, linux-raid TRIM is a new feature for many hard disk/ssd it´s more to get a bigger life o disk, allow a dynamic badblock reallocation (filesystem must tell where is empty) 2011/2/21 Phillip Susi <psusi@cfl.rr.com>: > On 2/9/2011 10:49 AM, David Brown wrote: >> I've been reading a little more about this. It seems that the days of >> TRIM may well be numbered - the latest generation of high-end SSDs have >> more powerful garbage collection algorithms, together with more spare >> blocks, making TRIM pretty much redundant. This is, of course, the most >> convenient solution for everyone (as long as it doesn't cost too much!). >> >> The point of the TRIM command is to tell the SSD that a particular block >> is no longer being used, so that the SSD can erase it in the background >> - that way when you want to write more data, there are more free blocks >> ready and waiting. But if you've got plenty of spare blocks, it's easy >> to have them erased in advance and you don't need TRIM. > > It is not just about having free blocks ready and waiting. When doing > wear leveling, you might find an erase block that has not been written > to in a long time, so you want to move that data to a more worn block, > and use the less worn block for more frequently written to sectors. If > you know that sectors are unused because they have been TRIMed, then you > don't have to waste time and wear copying the junk there to the new > flash block. > > TRIM is also quite useful for thin provisioned storage, which seems to > be getting popular. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 18:25 ` Roberto Spadim @ 2011-02-21 18:34 ` Phillip Susi 2011-02-21 18:48 ` Roberto Spadim 2011-02-21 18:51 ` Mathias Burén 1 sibling, 1 reply; 70+ messages in thread From: Phillip Susi @ 2011-02-21 18:34 UTC (permalink / raw) To: Roberto Spadim; +Cc: David Brown, linux-raid On 2/21/2011 1:25 PM, Roberto Spadim wrote: > TRIM is a new feature for many hard disk/ssd > it´s more to get a bigger life o disk, allow a dynamic badblock > reallocation (filesystem must tell where is empty) Ummm... thanks???? I know quite well what TRIM is, which is why I was discussing how mdadm could support it. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 18:34 ` Phillip Susi @ 2011-02-21 18:48 ` Roberto Spadim 0 siblings, 0 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-21 18:48 UTC (permalink / raw) To: Phillip Susi; +Cc: David Brown, linux-raid yeah, for raid1 just send trim to device (if no layout is in use) for stripe must have a rewrite o command and check if we could use trim for internal raid informations we shoudn´t use 2011/2/21 Phillip Susi <psusi@cfl.rr.com>: > On 2/21/2011 1:25 PM, Roberto Spadim wrote: >> TRIM is a new feature for many hard disk/ssd >> it´s more to get a bigger life o disk, allow a dynamic badblock >> reallocation (filesystem must tell where is empty) > > Ummm... thanks???? > > I know quite well what TRIM is, which is why I was discussing how mdadm > could support it. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 18:25 ` Roberto Spadim 2011-02-21 18:34 ` Phillip Susi @ 2011-02-21 18:51 ` Mathias Burén 2011-02-21 19:32 ` Roberto Spadim 1 sibling, 1 reply; 70+ messages in thread From: Mathias Burén @ 2011-02-21 18:51 UTC (permalink / raw) To: Roberto Spadim; +Cc: Phillip Susi, David Brown, linux-raid (please don't top post) On 21 February 2011 18:25, Roberto Spadim <roberto@spadim.com.br> wrote: > TRIM is a new feature for many hard disk/ssd > it´s more to get a bigger life o disk, allow a dynamic badblock > reallocation (filesystem must tell where is empty) > > > 2011/2/21 Phillip Susi <psusi@cfl.rr.com>: >> On 2/9/2011 10:49 AM, David Brown wrote: >>> I've been reading a little more about this. It seems that the days of >>> TRIM may well be numbered - the latest generation of high-end SSDs have >>> more powerful garbage collection algorithms, together with more spare >>> blocks, making TRIM pretty much redundant. This is, of course, the most >>> convenient solution for everyone (as long as it doesn't cost too much!). >>> >>> The point of the TRIM command is to tell the SSD that a particular block >>> is no longer being used, so that the SSD can erase it in the background >>> - that way when you want to write more data, there are more free blocks >>> ready and waiting. But if you've got plenty of spare blocks, it's easy >>> to have them erased in advance and you don't need TRIM. >> >> It is not just about having free blocks ready and waiting. When doing >> wear leveling, you might find an erase block that has not been written >> to in a long time, so you want to move that data to a more worn block, >> and use the less worn block for more frequently written to sectors. If >> you know that sectors are unused because they have been TRIMed, then you >> don't have to waste time and wear copying the junk there to the new >> flash block. >> >> TRIM is also quite useful for thin provisioned storage, which seems to >> be getting popular. >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > TRIM is not a new feature for HDDs as they don't have the problem that SSDs have. Where did you hear this? // Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 18:51 ` Mathias Burén @ 2011-02-21 19:32 ` Roberto Spadim 2011-02-21 19:38 ` Mathias Burén 2011-02-21 19:39 ` Roberto Spadim 0 siblings, 2 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-21 19:32 UTC (permalink / raw) To: Mathias Burén; +Cc: Phillip Susi, David Brown, linux-raid TRIM isn´t a problem, it´s a solution to optimize dynamic allocation, and life time of devices (SSD or harddisk) i don´t see any problem to implement trim command on hard disks (not in linux, but at harddisk firmware level) hard disk have the same problem of ssd, allocation of badblocks, any harddisk could implement trim and use it to realloc badblocks... -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 19:32 ` Roberto Spadim @ 2011-02-21 19:38 ` Mathias Burén 2011-02-21 19:39 ` Mathias Burén 2011-02-21 19:39 ` Roberto Spadim 1 sibling, 1 reply; 70+ messages in thread From: Mathias Burén @ 2011-02-21 19:38 UTC (permalink / raw) To: Roberto Spadim; +Cc: Phillip Susi, David Brown, linux-raid On 21 February 2011 19:32, Roberto Spadim <roberto@spadim.com.br> wrote: > TRIM isn´t a problem, it´s a solution to optimize dynamic allocation, > and life time of devices (SSD or harddisk) > i don´t see any problem to implement trim command on hard disks (not > in linux, but at harddisk firmware level) > > hard disk have the same problem of ssd, allocation of badblocks, any > harddisk could implement trim and use it to realloc badblocks... > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > I don't think you understand TRIM. It wouldn't work, and there is no need for it, on a HDD. AFAIK a HDD does not have the same penalty as a SSD does when it needs to write to a (previously) used area. An SSD cannot do this without erasing the whole (block? page?), usually 512KB in size (varies between different manufacturers), but the data that's on there still needs to be moved elsewhere first, block erased, data moved back the same time the new data is written together with it. AFAIK it works something like this anyway. The only benefit TRIM will give you would be potentially faster writes, right. // M -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 19:38 ` Mathias Burén @ 2011-02-21 19:39 ` Mathias Burén 2011-02-21 19:43 ` Roberto Spadim 2011-02-21 20:45 ` Phillip Susi 0 siblings, 2 replies; 70+ messages in thread From: Mathias Burén @ 2011-02-21 19:39 UTC (permalink / raw) To: Roberto Spadim; +Cc: Phillip Susi, David Brown, linux-raid On 21 February 2011 19:38, Mathias Burén <mathias.buren@gmail.com> wrote: > On 21 February 2011 19:32, Roberto Spadim <roberto@spadim.com.br> wrote: >> TRIM isn´t a problem, it´s a solution to optimize dynamic allocation, >> and life time of devices (SSD or harddisk) >> i don´t see any problem to implement trim command on hard disks (not >> in linux, but at harddisk firmware level) >> >> hard disk have the same problem of ssd, allocation of badblocks, any >> harddisk could implement trim and use it to realloc badblocks... >> >> -- >> Roberto Spadim >> Spadim Technology / SPAEmpresarial >> > > I don't think you understand TRIM. It wouldn't work, and there is no > need for it, on a HDD. AFAIK a HDD does not have the same penalty as a > SSD does when it needs to write to a (previously) used area. An SSD > cannot do this without erasing the whole (block? page?), usually 512KB > in size (varies between different manufacturers), but the data that's > on there still needs to be moved elsewhere first, block erased, data > moved back the same time the new data is written together with it. > AFAIK it works something like this anyway. The only benefit TRIM will > give you would be potentially faster writes, right. > > // M > Plus support is needed from the kernel (done) filesystem (ext4 has it). The filesystem seese the MD device, not the actual SSDs behind it, so it would probably be quite complicated to implement passthrough of the trim command in this case. // M -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 19:39 ` Mathias Burén @ 2011-02-21 19:43 ` Roberto Spadim 2011-02-21 20:45 ` Phillip Susi 1 sibling, 0 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-21 19:43 UTC (permalink / raw) To: Mathias Burén; +Cc: Phillip Susi, David Brown, linux-raid yeah, the idea of implement of TRIM at MD is to send TRIM to devices that was received by MD on filesystem level raid1 + no layout + all mirrors with TRIM support=> i think it´s easy to implement... just send the command to mirros (ssd or hd, since they support it) for striped devices?! maybe could support, it´s more dificult for linear raid0 it could be easy too 2011/2/21 Mathias Burén <mathias.buren@gmail.com>: > On 21 February 2011 19:38, Mathias Burén <mathias.buren@gmail.com> wrote: >> On 21 February 2011 19:32, Roberto Spadim <roberto@spadim.com.br> wrote: >>> TRIM isn´t a problem, it´s a solution to optimize dynamic allocation, >>> and life time of devices (SSD or harddisk) >>> i don´t see any problem to implement trim command on hard disks (not >>> in linux, but at harddisk firmware level) >>> >>> hard disk have the same problem of ssd, allocation of badblocks, any >>> harddisk could implement trim and use it to realloc badblocks... >>> >>> -- >>> Roberto Spadim >>> Spadim Technology / SPAEmpresarial >>> >> >> I don't think you understand TRIM. It wouldn't work, and there is no >> need for it, on a HDD. AFAIK a HDD does not have the same penalty as a >> SSD does when it needs to write to a (previously) used area. An SSD >> cannot do this without erasing the whole (block? page?), usually 512KB >> in size (varies between different manufacturers), but the data that's >> on there still needs to be moved elsewhere first, block erased, data >> moved back the same time the new data is written together with it. >> AFAIK it works something like this anyway. The only benefit TRIM will >> give you would be potentially faster writes, right. >> >> // M >> > > Plus support is needed from the kernel (done) filesystem (ext4 has > it). The filesystem seese the MD device, not the actual SSDs behind > it, so it would probably be quite complicated to implement passthrough > of the trim command in this case. > > // M > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 19:39 ` Mathias Burén 2011-02-21 19:43 ` Roberto Spadim @ 2011-02-21 20:45 ` Phillip Susi 1 sibling, 0 replies; 70+ messages in thread From: Phillip Susi @ 2011-02-21 20:45 UTC (permalink / raw) To: Mathias Burén; +Cc: Roberto Spadim, David Brown, linux-raid On 2/21/2011 2:39 PM, Mathias Burén wrote: > Plus support is needed from the kernel (done) filesystem (ext4 has > it). The filesystem seese the MD device, not the actual SSDs behind > it, so it would probably be quite complicated to implement passthrough > of the trim command in this case. It has been mentioned at least twice now how to implement it. The device-mapper driver already has implemented TRIM passthrough for its linear, stripe, and mirror targets. The trick is handling it with raid[56]. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 19:32 ` Roberto Spadim 2011-02-21 19:38 ` Mathias Burén @ 2011-02-21 19:39 ` Roberto Spadim 2011-02-21 19:51 ` Doug Dumitru 2011-02-21 20:47 ` Phillip Susi 1 sibling, 2 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-21 19:39 UTC (permalink / raw) To: Mathias Burén; +Cc: Phillip Susi, David Brown, linux-raid sorry, but i sent email without a information: TRIM is a 'ATA Specification' command http://en.wikipedia.org/wiki/TRIM_command any disk with ATA command could suport TRIM, hard disk or ssd or anyother type of phisical allocation 2011/2/21 Roberto Spadim <roberto@spadim.com.br>: > TRIM isn´t a problem, it´s a solution to optimize dynamic allocation, > and life time of devices (SSD or harddisk) > i don´t see any problem to implement trim command on hard disks (not > in linux, but at harddisk firmware level) > > hard disk have the same problem of ssd, allocation of badblocks, any > harddisk could implement trim and use it to realloc badblocks... > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 19:39 ` Roberto Spadim @ 2011-02-21 19:51 ` Doug Dumitru 2011-02-21 19:57 ` Roberto Spadim 2011-02-21 20:47 ` Phillip Susi 1 sibling, 1 reply; 70+ messages in thread From: Doug Dumitru @ 2011-02-21 19:51 UTC (permalink / raw) To: linux-raid To be technically accurate, trim is a hint to a storage device that has a "block translation layer" that can take advantage of knowing that a block contains no meaningful data. Flash needs trim only if flash has an FTL (Flash Translation Layer) that is re-mapping blocks in such a manner as free blocks are helpful in making this process more efficient. Older SSDs did not support trim and had no real need for it. If you look at the FTL used with simple Flash (think CF cards, SD cards, and USB sticks) trim does not help them. Trim and wear leveling are un-related and don't really impact each other. On the linux side trim is "discard". This is actually a much better abstraction as it does not imply SSDs. Any type of block device that does dynamic block remapping will likely be helped (at least somewhat) by discard. The only examples of this I can think of off-hand are 1) my Flash SuperCharger code, and 2) block-level de-dupe engines. I am sure other examples will be created over time. Hopefully, discard can be driven down the stack. I would personally prefer the linux community declare that discard and zero writes are identical. If an SSD supports trim and linux wants to translate a discard into a trim at the device driver layer, and the SSD is non-deterministic, then that SSD is broken. Then again, my attitude about this is very arrogant and I think the trim spec was broken from the beginning. -- Doug Dumitru EasyCo LLC ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 19:51 ` Doug Dumitru @ 2011-02-21 19:57 ` Roberto Spadim 0 siblings, 0 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-21 19:57 UTC (permalink / raw) To: doug; +Cc: linux-raid >Then again, my attitude > about this is very arrogant and I think the trim spec was broken from > the beginning. maybe... we could put all harddisk firmware at linux code... why we need reallocation of harddisks? we need when filesystem don´t do it we question is: can we implement TRIM at MD device? 2011/2/21 Doug Dumitru <doug@easyco.com>: > To be technically accurate, trim is a hint to a storage device that > has a "block translation layer" that can take advantage of knowing > that a block contains no meaningful data. > > Flash needs trim only if flash has an FTL (Flash Translation Layer) > that is re-mapping blocks in such a manner as free blocks are helpful > in making this process more efficient. Older SSDs did not support > trim and had no real need for it. If you look at the FTL used with > simple Flash (think CF cards, SD cards, and USB sticks) trim does not > help them. Trim and wear leveling are un-related and don't really > impact each other. > > On the linux side trim is "discard". This is actually a much better > abstraction as it does not imply SSDs. > > Any type of block device that does dynamic block remapping will likely > be helped (at least somewhat) by discard. The only examples of this I > can think of off-hand are 1) my Flash SuperCharger code, and 2) > block-level de-dupe engines. I am sure other examples will be created > over time. > > Hopefully, discard can be driven down the stack. I would personally > prefer the linux community declare that discard and zero writes are > identical. If an SSD supports trim and linux wants to translate a > discard into a trim at the device driver layer, and the SSD is > non-deterministic, then that SSD is broken. Then again, my attitude > about this is very arrogant and I think the trim spec was broken from > the beginning. > > -- > Doug Dumitru > EasyCo LLC > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 19:39 ` Roberto Spadim 2011-02-21 19:51 ` Doug Dumitru @ 2011-02-21 20:47 ` Phillip Susi 2011-02-21 21:02 ` Mathias Burén 1 sibling, 1 reply; 70+ messages in thread From: Phillip Susi @ 2011-02-21 20:47 UTC (permalink / raw) To: Roberto Spadim; +Cc: Mathias Burén, David Brown, linux-raid On 2/21/2011 2:39 PM, Roberto Spadim wrote: > sorry, but i sent email without a information: > TRIM is a 'ATA Specification' command > > http://en.wikipedia.org/wiki/TRIM_command > > any disk with ATA command could suport TRIM, hard disk or ssd or > anyother type of phisical allocation Sure, but hard disks have no reason to, which is why they don't and won't support it. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 20:47 ` Phillip Susi @ 2011-02-21 21:02 ` Mathias Burén 2011-02-21 22:52 ` Roberto Spadim 0 siblings, 1 reply; 70+ messages in thread From: Mathias Burén @ 2011-02-21 21:02 UTC (permalink / raw) To: Phillip Susi; +Cc: Roberto Spadim, David Brown, linux-raid On 21 February 2011 20:47, Phillip Susi <psusi@cfl.rr.com> wrote: > On 2/21/2011 2:39 PM, Roberto Spadim wrote: >> sorry, but i sent email without a information: >> TRIM is a 'ATA Specification' command >> >> http://en.wikipedia.org/wiki/TRIM_command >> >> any disk with ATA command could suport TRIM, hard disk or ssd or >> anyother type of phisical allocation > > Sure, but hard disks have no reason to, which is why they don't and > won't support it. > My point exactly. // M ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 21:02 ` Mathias Burén @ 2011-02-21 22:52 ` Roberto Spadim 2011-02-21 23:41 ` Mathias Burén 2011-02-22 0:32 ` Eric D. Mudama 0 siblings, 2 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-21 22:52 UTC (permalink / raw) To: Mathias Burén; +Cc: Phillip Susi, David Brown, linux-raid i don´t think so, since it´s ATA command, any ATA compatible can use it, it could be used for HD with badblocks and dynamic reallocation without problems, the harddisk don´t need a dedicated space for badblock. for md software we must know if devices support or not TRIM. the next question, md is ATA compatible? no!?, it´s a linux device, not a ATA device. what commands linux devices allow? could md allow TRIM? 2011/2/21 Mathias Burén <mathias.buren@gmail.com>: > On 21 February 2011 20:47, Phillip Susi <psusi@cfl.rr.com> wrote: >> On 2/21/2011 2:39 PM, Roberto Spadim wrote: >>> sorry, but i sent email without a information: >>> TRIM is a 'ATA Specification' command >>> >>> http://en.wikipedia.org/wiki/TRIM_command >>> >>> any disk with ATA command could suport TRIM, hard disk or ssd or >>> anyother type of phisical allocation >> >> Sure, but hard disks have no reason to, which is why they don't and >> won't support it. >> > > My point exactly. > > // M > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 22:52 ` Roberto Spadim @ 2011-02-21 23:41 ` Mathias Burén 2011-02-21 23:42 ` Mathias Burén 2011-02-22 0:32 ` Eric D. Mudama 1 sibling, 1 reply; 70+ messages in thread From: Mathias Burén @ 2011-02-21 23:41 UTC (permalink / raw) To: Roberto Spadim; +Cc: Phillip Susi, David Brown, linux-raid On 21 February 2011 22:52, Roberto Spadim <roberto@spadim.com.br> wrote: > i don´t think so, since it´s ATA command, any ATA compatible can use > it, it could be used for HD with badblocks and dynamic reallocation > without problems, the harddisk don´t need a dedicated space for > badblock. for md software we must know if devices support or not TRIM. > > the next question, md is ATA compatible? no!?, it´s a linux device, > not a ATA device. what commands linux devices allow? could md allow > TRIM? > > 2011/2/21 Mathias Burén <mathias.buren@gmail.com>: >> On 21 February 2011 20:47, Phillip Susi <psusi@cfl.rr.com> wrote: >>> On 2/21/2011 2:39 PM, Roberto Spadim wrote: >>>> sorry, but i sent email without a information: >>>> TRIM is a 'ATA Specification' command >>>> >>>> http://en.wikipedia.org/wiki/TRIM_command >>>> >>>> any disk with ATA command could suport TRIM, hard disk or ssd or >>>> anyother type of phisical allocation >>> >>> Sure, but hard disks have no reason to, which is why they don't and >>> won't support it. >>> >> >> My point exactly. >> >> // M >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > Please don't top post. http://www.splitbrain.org/blog/2011-02/15-top_posting_like_dont_i_why Harddrives already have an allocated area with spare sectors, which they use whenever they need to. You can find out how many sectors have been reallocated by the HDD by looking at the SMART data, like so: SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE [...] 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 // M -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 23:41 ` Mathias Burén @ 2011-02-21 23:42 ` Mathias Burén 2011-02-21 23:52 ` Roberto Spadim 0 siblings, 1 reply; 70+ messages in thread From: Mathias Burén @ 2011-02-21 23:42 UTC (permalink / raw) To: Roberto Spadim; +Cc: Phillip Susi, David Brown, linux-raid On 21 February 2011 23:41, Mathias Burén <mathias.buren@gmail.com> wrote: > On 21 February 2011 22:52, Roberto Spadim <roberto@spadim.com.br> wrote: >> i don´t think so, since it´s ATA command, any ATA compatible can use >> it, it could be used for HD with badblocks and dynamic reallocation >> without problems, the harddisk don´t need a dedicated space for >> badblock. for md software we must know if devices support or not TRIM. >> >> the next question, md is ATA compatible? no!?, it´s a linux device, >> not a ATA device. what commands linux devices allow? could md allow >> TRIM? >> >> 2011/2/21 Mathias Burén <mathias.buren@gmail.com>: >>> On 21 February 2011 20:47, Phillip Susi <psusi@cfl.rr.com> wrote: >>>> On 2/21/2011 2:39 PM, Roberto Spadim wrote: >>>>> sorry, but i sent email without a information: >>>>> TRIM is a 'ATA Specification' command >>>>> >>>>> http://en.wikipedia.org/wiki/TRIM_command >>>>> >>>>> any disk with ATA command could suport TRIM, hard disk or ssd or >>>>> anyother type of phisical allocation >>>> >>>> Sure, but hard disks have no reason to, which is why they don't and >>>> won't support it. >>>> >>> >>> My point exactly. >>> >>> // M >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@vger.kernel.org >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> >> >> >> -- >> Roberto Spadim >> Spadim Technology / SPAEmpresarial >> > > Please don't top post. > http://www.splitbrain.org/blog/2011-02/15-top_posting_like_dont_i_why > > Harddrives already have an allocated area with spare sectors, which > they use whenever they need to. You can find out how many sectors have > been reallocated by the HDD by looking at the SMART data, like so: > > SMART Attributes Data Structure revision number: 16 > Vendor Specific SMART Attributes with Thresholds: > ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE > UPDATED WHEN_FAILED RAW_VALUE > [...] > 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail > Always - 0 > > // M > I forgot to write that the trim command has nothing to do with bad blocks or sectors, it's just a way of "resetting" blocks so that can be written to without having to erase them first. (IIRC) There is no such issue with HDDs, therefore have no benefit at all using the trim command with them. // M -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 23:42 ` Mathias Burén @ 2011-02-21 23:52 ` Roberto Spadim 2011-02-22 0:25 ` Mathias Burén ` (2 more replies) 0 siblings, 3 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-21 23:52 UTC (permalink / raw) To: Mathias Burén; +Cc: Phillip Susi, David Brown, linux-raid trim tell harddisk that those block are not in use not in use block can be used by harddisk reallocation algorithm, like spare sectors hard disks can use TRIM command to 'create' 'good' blocks like spare sectors ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 23:52 ` Roberto Spadim @ 2011-02-22 0:25 ` Mathias Burén 2011-02-22 0:30 ` Brendan Conoboy 2011-02-22 0:36 ` Eric D. Mudama 2 siblings, 0 replies; 70+ messages in thread From: Mathias Burén @ 2011-02-22 0:25 UTC (permalink / raw) To: Roberto Spadim; +Cc: Phillip Susi, David Brown, linux-raid On 21 February 2011 23:52, Roberto Spadim <roberto@spadim.com.br> wrote: > trim tell harddisk that those block are not in use > > not in use block can be used by harddisk reallocation algorithm, like > spare sectors > > hard disks can use TRIM command to 'create' 'good' blocks like spare sectors > Do you mean online defragmentation...? If so, that's for the filesystem to do. Or do you mean that it could be used to tell the HDD that it has extra sectors it can use to reallocate bad sectors?... // M ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 23:52 ` Roberto Spadim 2011-02-22 0:25 ` Mathias Burén @ 2011-02-22 0:30 ` Brendan Conoboy 2011-02-22 0:36 ` Eric D. Mudama 2 siblings, 0 replies; 70+ messages in thread From: Brendan Conoboy @ 2011-02-22 0:30 UTC (permalink / raw) To: Roberto Spadim; +Cc: Mathias Burén, Phillip Susi, David Brown, linux-raid On 02/21/2011 03:52 PM, Roberto Spadim wrote: > trim tell harddisk that those block are not in use > > not in use block can be used by harddisk reallocation algorithm, like > spare sectors > > hard disks can use TRIM command to 'create' 'good' blocks like spare sectors I'm trying really hard to follow what this means but just can't grasp what you're getting at. What scenario is there in which trim actually does anything for you on an HD? I can't think of any situation where this makes any sense for HDs with current firmware functionality. If a sector is unused, but bad, you won't know until you write to it. If it's bad and you write to it, the write gets reallocated to a good spare sector. Are you proposing to notify the drive what sectors are unused so it can check for and reallocate bad blocks before they're used again? Something else? -- Brendan Conoboy / Red Hat, Inc. / blc@redhat.com ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 23:52 ` Roberto Spadim 2011-02-22 0:25 ` Mathias Burén 2011-02-22 0:30 ` Brendan Conoboy @ 2011-02-22 0:36 ` Eric D. Mudama 2011-02-22 1:46 ` Roberto Spadim 2 siblings, 1 reply; 70+ messages in thread From: Eric D. Mudama @ 2011-02-22 0:36 UTC (permalink / raw) To: Roberto Spadim; +Cc: Mathias Burén, Phillip Susi, David Brown, linux-raid On Mon, Feb 21 at 20:52, Roberto Spadim wrote: >trim tell harddisk that those block are not in use yes >not in use block can be used by harddisk reallocation algorithm, like >spare sectors no, because the host may immediately write to a trim'd sector The spares in an HDD can never be accessed outside of special tools, they're swap-in replacements for regions of the media that have developed defects. >hard disks can use TRIM command to 'create' 'good' blocks like spare sectors this doesn't make sense to me -- Eric D. Mudama edmudama@bounceswoosh.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-22 0:36 ` Eric D. Mudama @ 2011-02-22 1:46 ` Roberto Spadim 2011-02-22 1:52 ` Mathias Burén 0 siblings, 1 reply; 70+ messages in thread From: Roberto Spadim @ 2011-02-22 1:46 UTC (permalink / raw) To: Eric D. Mudama; +Cc: Mathias Burén, Phillip Susi, David Brown, linux-raid if it make sense on ssd, harddisk make sense too, it's a block device like ssd, the diference of ssd/harddisk? access time, bytes(bits)/block, life time bad block exist in ssd and harddisk, ssd can realloc online, some harddisks too > no, because the host may immediately write to a trim'd sector yes, filesystem know where exists a unused sector if device (harddisk/ssd) know and have a reallocation algorithm, it can realloc without telling filesystem to do it (that's why TRIM is interesting) since today ssd use NAND (not NOR) the block size isn't 1 bit like a harddisk head. trim for harddisk only make sense for badblock reallocation -------------------------- getting back to the first question, can MD support trim? yes/no/not now/some levels and layouts only? 2011/2/21 Eric D. Mudama <edmudama@bounceswoosh.org>: > On Mon, Feb 21 at 20:52, Roberto Spadim wrote: >> >> trim tell harddisk that those block are not in use > > yes > >> not in use block can be used by harddisk reallocation algorithm, like >> spare sectors > > no, because the host may immediately write to a trim'd sector > > The spares in an HDD can never be accessed outside of special tools, > they're swap-in replacements for regions of the media that have > developed defects. > >> hard disks can use TRIM command to 'create' 'good' blocks like spare >> sectors > > this doesn't make sense to me > > > -- > Eric D. Mudama > edmudama@bounceswoosh.org > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-22 1:46 ` Roberto Spadim @ 2011-02-22 1:52 ` Mathias Burén 2011-02-22 1:55 ` Roberto Spadim 0 siblings, 1 reply; 70+ messages in thread From: Mathias Burén @ 2011-02-22 1:52 UTC (permalink / raw) To: Roberto Spadim; +Cc: Eric D. Mudama, Phillip Susi, David Brown, linux-raid On 22 February 2011 01:46, Roberto Spadim <roberto@spadim.com.br> wrote: > if it make sense on ssd, harddisk make sense too, it's a block device > like ssd, the diference of ssd/harddisk? access time, > bytes(bits)/block, life time > bad block exist in ssd and harddisk, ssd can realloc online, some harddisks too > >> no, because the host may immediately write to a trim'd sector > yes, filesystem know where exists a unused sector > if device (harddisk/ssd) know and have a reallocation algorithm, it > can realloc without telling filesystem to do it (that's why TRIM is > interesting) > since today ssd use NAND (not NOR) the block size isn't 1 bit like a > harddisk head. trim for harddisk only make sense for badblock > reallocation > -------------------------- > getting back to the first question, can MD support trim? yes/no/not > now/some levels and layouts only? > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial This explains a bit why trim is good for SSDs and has nothing to do with harddrives at all, since they use spinning platters and not chips. http://www.anandtech.com/show/2738/10 // Mathias ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-22 1:52 ` Mathias Burén @ 2011-02-22 1:55 ` Roberto Spadim 2011-02-22 2:01 ` Eric D. Mudama ` (2 more replies) 0 siblings, 3 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-22 1:55 UTC (permalink / raw) To: Mathias Burén; +Cc: Eric D. Mudama, Phillip Susi, David Brown, linux-raid it can be used for badblock reallocation if harddisk have it a harddisk is near to NOR ssd with variable accesstime, if head is near sector to be read/write accesstime is small, if sector is far from head, access time increase (normaly <=1 disk revolution if head control system is good, for 7200rpm 1revolution is near to 8.33ms) 2011/2/21 Mathias Burén <mathias.buren@gmail.com>: > On 22 February 2011 01:46, Roberto Spadim <roberto@spadim.com.br> wrote: >> if it make sense on ssd, harddisk make sense too, it's a block device >> like ssd, the diference of ssd/harddisk? access time, >> bytes(bits)/block, life time >> bad block exist in ssd and harddisk, ssd can realloc online, some harddisks too >> >>> no, because the host may immediately write to a trim'd sector >> yes, filesystem know where exists a unused sector >> if device (harddisk/ssd) know and have a reallocation algorithm, it >> can realloc without telling filesystem to do it (that's why TRIM is >> interesting) >> since today ssd use NAND (not NOR) the block size isn't 1 bit like a >> harddisk head. trim for harddisk only make sense for badblock >> reallocation >> -------------------------- >> getting back to the first question, can MD support trim? yes/no/not >> now/some levels and layouts only? >> -- >> Roberto Spadim >> Spadim Technology / SPAEmpresarial > > This explains a bit why trim is good for SSDs and has nothing to do > with harddrives at all, since they use spinning platters and not > chips. http://www.anandtech.com/show/2738/10 > > // Mathias > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-22 1:55 ` Roberto Spadim @ 2011-02-22 2:01 ` Eric D. Mudama 2011-02-22 2:02 ` Mikael Abrahamsson 2011-02-22 2:38 ` Phillip Susi 2 siblings, 0 replies; 70+ messages in thread From: Eric D. Mudama @ 2011-02-22 2:01 UTC (permalink / raw) To: Roberto Spadim Cc: Mathias Burén, Eric D. Mudama, Phillip Susi, David Brown, linux-raid On Mon, Feb 21 at 22:55, Roberto Spadim wrote: >it can be used for badblock reallocation if harddisk have it >a harddisk is near to NOR ssd with variable accesstime, if head is >near sector to be read/write accesstime is small, if sector is far >from head, access time increase (normaly <=1 disk revolution if head >control system is good, for 7200rpm 1revolution is near to 8.33ms) Hard disks do not expose their defect information/remappings. They present a defect-free logical region to the host. Optimizing for a few hundred thousand remapped sectors across the LBA range of ~6 billion LBAs on a 3TB drive isn't worth the effort or code complexity in most cases. I still don't see how TRIM helps a rotating drive. -- Eric D. Mudama edmudama@bounceswoosh.org ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-22 1:55 ` Roberto Spadim 2011-02-22 2:01 ` Eric D. Mudama @ 2011-02-22 2:02 ` Mikael Abrahamsson 2011-02-22 2:22 ` Guy Watkins 2011-02-22 2:38 ` Phillip Susi 2 siblings, 1 reply; 70+ messages in thread From: Mikael Abrahamsson @ 2011-02-22 2:02 UTC (permalink / raw) To: linux-raid On Mon, 21 Feb 2011, Roberto Spadim wrote: > it can be used for badblock reallocation if harddisk have it a harddisk > is near to NOR ssd with variable accesstime, if head is near sector to > be read/write accesstime is small, if sector is far from head, access > time increase (normaly <=1 disk revolution if head control system is > good, for 7200rpm 1revolution is near to 8.33ms) Could we please stop this discussion. If you think HDDs should have this kind of bad sector reallocation scheme, please go to the HDD manufacturers and lobby to them. It is not on-topic for linux-raid ml. -- Mikael Abrahamsson email: swmike@swm.pp.se ^ permalink raw reply [flat|nested] 70+ messages in thread
* RE: SSD - TRIM command 2011-02-22 2:02 ` Mikael Abrahamsson @ 2011-02-22 2:22 ` Guy Watkins 2011-02-22 2:27 ` Roberto Spadim 0 siblings, 1 reply; 70+ messages in thread From: Guy Watkins @ 2011-02-22 2:22 UTC (permalink / raw) To: 'Mikael Abrahamsson', linux-raid } -----Original Message----- } From: linux-raid-owner@vger.kernel.org [mailto:linux-raid- } owner@vger.kernel.org] On Behalf Of Mikael Abrahamsson } Sent: Monday, February 21, 2011 9:02 PM } To: linux-raid@vger.kernel.org } Subject: Re: SSD - TRIM command } } On Mon, 21 Feb 2011, Roberto Spadim wrote: } } > it can be used for badblock reallocation if harddisk have it a harddisk } > is near to NOR ssd with variable accesstime, if head is near sector to } > be read/write accesstime is small, if sector is far from head, access } > time increase (normaly <=1 disk revolution if head control system is } > good, for 7200rpm 1revolution is near to 8.33ms) } } Could we please stop this discussion. If you think HDDs should have this } kind of bad sector reallocation scheme, please go to the HDD manufacturers } and lobby to them. It is not on-topic for linux-raid ml. } } -- } Mikael Abrahamsson email: swmike@swm.pp.se What about tape drives? :) ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-22 2:22 ` Guy Watkins @ 2011-02-22 2:27 ` Roberto Spadim 2011-02-22 3:45 ` NeilBrown 0 siblings, 1 reply; 70+ messages in thread From: Roberto Spadim @ 2011-02-22 2:27 UTC (permalink / raw) To: Guy Watkins; +Cc: Mikael Abrahamsson, linux-raid tape drive = harddisk with only one head, the head can't move, only the tape (disk/plate or any other name you want) could we get back and answer the main question? -------------------------- getting back to the first question, can MD support trim? yes/no/not now/some levels and layouts only? 2011/2/21 Guy Watkins <linux-raid@watkins-home.com>: > } -----Original Message----- > } From: linux-raid-owner@vger.kernel.org [mailto:linux-raid- > } owner@vger.kernel.org] On Behalf Of Mikael Abrahamsson > } Sent: Monday, February 21, 2011 9:02 PM > } To: linux-raid@vger.kernel.org > } Subject: Re: SSD - TRIM command > } > } On Mon, 21 Feb 2011, Roberto Spadim wrote: > } > } > it can be used for badblock reallocation if harddisk have it a harddisk > } > is near to NOR ssd with variable accesstime, if head is near sector to > } > be read/write accesstime is small, if sector is far from head, access > } > time increase (normaly <=1 disk revolution if head control system is > } > good, for 7200rpm 1revolution is near to 8.33ms) > } > } Could we please stop this discussion. If you think HDDs should have this > } kind of bad sector reallocation scheme, please go to the HDD manufacturers > } and lobby to them. It is not on-topic for linux-raid ml. > } > } -- > } Mikael Abrahamsson email: swmike@swm.pp.se > > What about tape drives? :) > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-22 2:27 ` Roberto Spadim @ 2011-02-22 3:45 ` NeilBrown 2011-02-22 4:37 ` Roberto Spadim 0 siblings, 1 reply; 70+ messages in thread From: NeilBrown @ 2011-02-22 3:45 UTC (permalink / raw) To: Roberto Spadim; +Cc: Guy Watkins, Mikael Abrahamsson, linux-raid On Mon, 21 Feb 2011 23:27:26 -0300 Roberto Spadim <roberto@spadim.com.br> wrote: > tape drive = harddisk with only one head, the head can't move, only > the tape (disk/plate or any other name you want) > > could we get back and answer the main question? > -------------------------- > getting back to the first question, can MD support trim? yes/no/not > now/some levels and layouts only? > MD currently doesn't accept 'discard' requests. RAID0 and LINEAR could be made to accept 'discard' if any member device accepted 'discard'. Patches welcome. Other levels need md to know not to try to resync/recover regions that have been discarded. See "non-sync bitmap" section of the recent md roadmap. NeilBrown ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-22 3:45 ` NeilBrown @ 2011-02-22 4:37 ` Roberto Spadim 0 siblings, 0 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-22 4:37 UTC (permalink / raw) To: NeilBrown; +Cc: Guy Watkins, Mikael Abrahamsson, linux-raid thanks neil, i will try to read and make some patch, my focus is ssd optimization, at hardware level (hw raid) i didn't see any good improvement good = a good read balance (based on queue and disk read rate), trim support ---- good read balance = round robin or another time based algorithm (can be cpu intensive), i didn't found yet how to get queue of linux bio (mirrors) trim support - nothing to report, it's a 'feature request' for long term (after badblock and others features) ----- ps... neil what are you thinking about badblock and layout? for example... reading from a bad block will be internaly (md source code) remapped to a good block? or just try read/write to another device? in other words we will have 'dynamic' layout? 2011/2/22 NeilBrown <neilb@suse.de>: > On Mon, 21 Feb 2011 23:27:26 -0300 Roberto Spadim <roberto@spadim.com.br> > wrote: > >> tape drive = harddisk with only one head, the head can't move, only >> the tape (disk/plate or any other name you want) >> >> could we get back and answer the main question? >> -------------------------- >> getting back to the first question, can MD support trim? yes/no/not >> now/some levels and layouts only? >> > > MD currently doesn't accept 'discard' requests. > > RAID0 and LINEAR could be made to accept 'discard' if any > member device accepted 'discard'. Patches welcome. > > Other levels need md to know not to try to resync/recover regions that > have been discarded. See "non-sync bitmap" section of the recent > md roadmap. > > NeilBrown > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-22 1:55 ` Roberto Spadim 2011-02-22 2:01 ` Eric D. Mudama 2011-02-22 2:02 ` Mikael Abrahamsson @ 2011-02-22 2:38 ` Phillip Susi 2011-02-22 3:29 ` Roberto Spadim 2 siblings, 1 reply; 70+ messages in thread From: Phillip Susi @ 2011-02-22 2:38 UTC (permalink / raw) To: Roberto Spadim Cc: Mathias Burén, Eric D. Mudama, David Brown, linux-raid On 02/21/2011 08:55 PM, Roberto Spadim wrote: > it can be used for badblock reallocation if harddisk have it > a harddisk is near to NOR ssd with variable accesstime, if head is > near sector to be read/write accesstime is small, if sector is far > from head, access time increase (normaly<=1 disk revolution if head > control system is good, for 7200rpm 1revolution is near to 8.33ms) Bad blocks are only reallocated when you write to them. Since they are bad, you can't read the previous contents anyway, so it does not matter whether the OS cared about it before or not. You seem to not understand the fundamental purpose of TRIM. Hard disks only reallocate blocks when they go bad. SSDs move blocks around all the time. That process can be optimized if the drive knows that the OS does not care about certain blocks. Hard drives don't do this, so they have no reason to support TRIM. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-22 2:38 ` Phillip Susi @ 2011-02-22 3:29 ` Roberto Spadim 2011-02-22 3:42 ` Roberto Spadim 2011-02-22 4:04 ` Phillip Susi 0 siblings, 2 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-22 3:29 UTC (permalink / raw) To: Phillip Susi; +Cc: Mathias Burén, Eric D. Mudama, David Brown, linux-raid getting off topic... ---- they have - reallocation > Bad blocks are only reallocated when you write to them. Since they are bad, > you can't read the previous contents anyway, so it does not matter whether > the OS cared about it before or not. when you write, if bad, mark block as bad. how? internal disk memory, spare blocks. it's a device level problem, if device can't correct move the problem to filesystem level. what device level could do? use a 'good block' (if exists) => dynamic reallocation 'good block' = block not in use by filesystem, not marked as bad, can be used by realloc with trim, you can inform device firmware what blocks are not in use by filesystem, if harddisk have reallocation it can use 'good blocks' to store blocks that was realloc on badblock errors. why implement it? if you have 11111filesystems mounted with bad blocks at same time you will have >=11111 iops to repair this error at filesystem level. if device can correct you don't need to waste cpu and memory at filesystem ------ any layer between ATA and [plate,NAND flash,NOR flash] can be implemented by harddisk/ssd firmware some layers that can be implemented: online reallocation, queue, online encrypt/decrypt, online compress/decompress and others, some ssd have optimizations to get better life time and write/read performace how to 'tune' these algorithms? ATA commands, SCSI or anyother protocol that support tune why trim? inform harddisk/ssd what block isn't in use what harddisk/ssd could do with trim information? dynamic reallocation (badblocks), any other operation that need not used blocks (some algorithms use it to get better read/write performace) on devices with byte read/write level (NAND flash) we could write to one timmed block without reading the block and write again, NOR flash and harddisk don't need this they work with bits not bytes/blocks why send a error to filesystem if it can be corrected at device level. just send error when can't correct it. 2011/2/21 Phillip Susi <psusi@cfl.rr.com>: > On 02/21/2011 08:55 PM, Roberto Spadim wrote: >> >> it can be used for badblock reallocation if harddisk have it >> a harddisk is near to NOR ssd with variable accesstime, if head is >> near sector to be read/write accesstime is small, if sector is far >> from head, access time increase (normaly<=1 disk revolution if head >> control system is good, for 7200rpm 1revolution is near to 8.33ms) > > Bad blocks are only reallocated when you write to them. Since they are bad, > you can't read the previous contents anyway, so it does not matter whether > the OS cared about it before or not. > > You seem to not understand the fundamental purpose of TRIM. Hard disks only > reallocate blocks when they go bad. SSDs move blocks around all the time. > That process can be optimized if the drive knows that the OS does not care > about certain blocks. Hard drives don't do this, so they have no reason to > support TRIM. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-22 3:29 ` Roberto Spadim @ 2011-02-22 3:42 ` Roberto Spadim 2011-02-22 4:04 ` Phillip Susi 1 sibling, 0 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-22 3:42 UTC (permalink / raw) To: Phillip Susi; +Cc: Mathias Burén, Eric D. Mudama, David Brown, linux-raid off topic again... continue with the idea of optimizations, the last otimization we could have is implement a filesystem at harddisk it could implement all filesystem functions, no device function, it could have many more information about data, not only block 'in use'/'not in use'. it could understand: file starting at block x ending at block y, with information w, accestime z, etc etc. it could be more intelligent than a raw device. in others words, it's a fileserver... why implement algorithms at device level? today harddisk processors (fpga, arm processors, others) have a lot of cpu power not in use, why not use it? that's why we send trim to device, if it's a harddisk or ssd or anyother pseudo/real device no problem, we sent the trim command to otimize it ---------------- getting out of off topic, please stop sending 'i think it's not a performace feature, it don't need be implemented in device level', let's implement all functions that device level could allow (ATA/SCSI specifications or any other) and optimize when possible checking neil md roadmap, badblock work will be very good for md devices, it's a good optimization for raid1 since mirror will only fail when many blocks fail can we implement TRIM at MD level? it's a good feature to implement? we will have a lot of work to implement it? my opnion we can, on some raid levels it's a good feature we will have a lot of work to implement and test any answer from raid developers? -- Roberto Spadim Spadim Technology / SPAEmpresarial ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-22 3:29 ` Roberto Spadim 2011-02-22 3:42 ` Roberto Spadim @ 2011-02-22 4:04 ` Phillip Susi 2011-02-22 4:30 ` Roberto Spadim 1 sibling, 1 reply; 70+ messages in thread From: Phillip Susi @ 2011-02-22 4:04 UTC (permalink / raw) To: Roberto Spadim Cc: Mathias Burén, Eric D. Mudama, David Brown, linux-raid On 02/21/2011 10:29 PM, Roberto Spadim wrote: > what device level could do? use a 'good block' (if exists) => dynamic > reallocation > 'good block' = block not in use by filesystem, not marked as bad, can > be used by realloc No. It can only use blocks reserved for spares at manufacture time. It can not use any old block that the fs is not using at the time, because the fs may choose to use it in the future. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-22 4:04 ` Phillip Susi @ 2011-02-22 4:30 ` Roberto Spadim 2011-02-22 14:45 ` Phillip Susi 0 siblings, 1 reply; 70+ messages in thread From: Roberto Spadim @ 2011-02-22 4:30 UTC (permalink / raw) To: Phillip Susi; +Cc: Mathias Burén, Eric D. Mudama, David Brown, linux-raid it can't because today filesystem (exclude ext4 and swap) don't use trim command to tell device what block isn't in use 2011/2/22 Phillip Susi <psusi@cfl.rr.com>: > On 02/21/2011 10:29 PM, Roberto Spadim wrote: >> >> what device level could do? use a 'good block' (if exists) => dynamic >> reallocation >> 'good block' = block not in use by filesystem, not marked as bad, can >> be used by realloc > > No. It can only use blocks reserved for spares at manufacture time. It can > not use any old block that the fs is not using at the time, because the fs > may choose to use it in the future. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-22 4:30 ` Roberto Spadim @ 2011-02-22 14:45 ` Phillip Susi 2011-02-22 17:15 ` Roberto Spadim 0 siblings, 1 reply; 70+ messages in thread From: Phillip Susi @ 2011-02-22 14:45 UTC (permalink / raw) To: Roberto Spadim Cc: Mathias Burén, Eric D. Mudama, David Brown, linux-raid On 2/21/2011 11:30 PM, Roberto Spadim wrote: > it can't because today filesystem (exclude ext4 and swap) don't use > trim command to tell device what block isn't in use You aren't getting it. The fs can tell the drive all it wants: the drive does not care. It has nothing useful it can do with that information. ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-22 14:45 ` Phillip Susi @ 2011-02-22 17:15 ` Roberto Spadim 0 siblings, 0 replies; 70+ messages in thread From: Roberto Spadim @ 2011-02-22 17:15 UTC (permalink / raw) To: Phillip Susi; +Cc: Mathias Burén, Eric D. Mudama, David Brown, linux-raid i think it have. ssd have, why not hd? hd can implement a inteligent layer to speedup writes/reads without telling to fs 2011/2/22 Phillip Susi <psusi@cfl.rr.com>: > On 2/21/2011 11:30 PM, Roberto Spadim wrote: >> it can't because today filesystem (exclude ext4 and swap) don't use >> trim command to tell device what block isn't in use > > You aren't getting it. The fs can tell the drive all it wants: the > drive does not care. It has nothing useful it can do with that information. > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
* Re: SSD - TRIM command 2011-02-21 22:52 ` Roberto Spadim 2011-02-21 23:41 ` Mathias Burén @ 2011-02-22 0:32 ` Eric D. Mudama 1 sibling, 0 replies; 70+ messages in thread From: Eric D. Mudama @ 2011-02-22 0:32 UTC (permalink / raw) To: Roberto Spadim; +Cc: Mathias Burén, Phillip Susi, David Brown, linux-raid On Mon, Feb 21 at 19:52, Roberto Spadim wrote: >i don´t think so, since it´s ATA command, any ATA compatible can use >it, it could be used for HD with badblocks and dynamic reallocation >without problems, the harddisk don´t need a dedicated space for >badblock. for md software we must know if devices support or not TRIM. It's been 15 or more years since hard drives exposed their bad blocks to the host, I don't think it'd be a good idea to revisit that decision. -- Eric D. Mudama edmudama@bounceswoosh.org -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 70+ messages in thread
end of thread, other threads:[~2011-02-22 17:15 UTC | newest] Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-02-07 20:07 SSD - TRIM command Roberto Spadim 2011-02-08 17:37 ` maurice 2011-02-08 18:31 ` Roberto Spadim [not found] ` <AANLkTik5SumqyTN5LZVntna8nunvPe7v38TSFf9eCfcU@mail.gmail.com> 2011-02-08 20:50 ` Roberto Spadim 2011-02-08 21:18 ` maurice 2011-02-08 21:33 ` Roberto Spadim 2011-02-09 7:44 ` Stan Hoeppner 2011-02-09 9:05 ` Eric D. Mudama 2011-02-09 15:45 ` Chris Worley 2011-02-09 13:29 ` David Brown 2011-02-09 14:39 ` Roberto Spadim 2011-02-09 15:00 ` Scott E. Armitage 2011-02-09 15:52 ` Chris Worley 2011-02-09 19:15 ` Doug Dumitru 2011-02-09 19:22 ` Roberto Spadim 2011-02-09 16:19 ` Eric D. Mudama 2011-02-09 16:28 ` Scott E. Armitage 2011-02-09 17:17 ` Eric D. Mudama 2011-02-09 18:18 ` Roberto Spadim 2011-02-09 18:24 ` Piergiorgio Sartor 2011-02-09 18:30 ` Roberto Spadim 2011-02-09 18:38 ` Piergiorgio Sartor 2011-02-09 18:46 ` Roberto Spadim 2011-02-09 18:52 ` Roberto Spadim 2011-02-09 19:13 ` Piergiorgio Sartor 2011-02-09 19:16 ` Roberto Spadim 2011-02-09 19:21 ` Piergiorgio Sartor 2011-02-09 19:27 ` Roberto Spadim 2011-02-21 18:24 ` Phillip Susi 2011-02-21 18:30 ` Roberto Spadim 2011-02-09 15:49 ` David Brown 2011-02-21 18:20 ` Phillip Susi 2011-02-21 18:25 ` Roberto Spadim 2011-02-21 18:34 ` Phillip Susi 2011-02-21 18:48 ` Roberto Spadim 2011-02-21 18:51 ` Mathias Burén 2011-02-21 19:32 ` Roberto Spadim 2011-02-21 19:38 ` Mathias Burén 2011-02-21 19:39 ` Mathias Burén 2011-02-21 19:43 ` Roberto Spadim 2011-02-21 20:45 ` Phillip Susi 2011-02-21 19:39 ` Roberto Spadim 2011-02-21 19:51 ` Doug Dumitru 2011-02-21 19:57 ` Roberto Spadim 2011-02-21 20:47 ` Phillip Susi 2011-02-21 21:02 ` Mathias Burén 2011-02-21 22:52 ` Roberto Spadim 2011-02-21 23:41 ` Mathias Burén 2011-02-21 23:42 ` Mathias Burén 2011-02-21 23:52 ` Roberto Spadim 2011-02-22 0:25 ` Mathias Burén 2011-02-22 0:30 ` Brendan Conoboy 2011-02-22 0:36 ` Eric D. Mudama 2011-02-22 1:46 ` Roberto Spadim 2011-02-22 1:52 ` Mathias Burén 2011-02-22 1:55 ` Roberto Spadim 2011-02-22 2:01 ` Eric D. Mudama 2011-02-22 2:02 ` Mikael Abrahamsson 2011-02-22 2:22 ` Guy Watkins 2011-02-22 2:27 ` Roberto Spadim 2011-02-22 3:45 ` NeilBrown 2011-02-22 4:37 ` Roberto Spadim 2011-02-22 2:38 ` Phillip Susi 2011-02-22 3:29 ` Roberto Spadim 2011-02-22 3:42 ` Roberto Spadim 2011-02-22 4:04 ` Phillip Susi 2011-02-22 4:30 ` Roberto Spadim 2011-02-22 14:45 ` Phillip Susi 2011-02-22 17:15 ` Roberto Spadim 2011-02-22 0:32 ` Eric D. Mudama
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.