linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Brown <david.brown@hesbynett.no>
To: Chris Murphy <lists@colorremedies.com>
Cc: "linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: Best way (only?) to setup SSD's for using TRIM
Date: Wed, 31 Oct 2012 09:32:57 +0100	[thread overview]
Message-ID: <5090E239.9040302@hesbynett.no> (raw)
In-Reply-To: <B371ADF3-F328-4E1B-A6D2-87DE1974D8FF@colorremedies.com>

On 30/10/2012 20:59, Chris Murphy wrote:
>
> On Oct 30, 2012, at 12:30 PM, Curt Blank <curt@curtronics.com>
> wrote:
>>
>> Right, and without TRIM to tell the SSD which page(s) are invalid
>> the garbage collection will never be able to do that so the
>> garbage collection will be carrying around and preserving invalid
>> page(s) when ever it does do something. Assuming there are invalid
>> pages in the blocks it is acting on. That to me seems inefficient
>> and for that reason says TRIM should be used?

That is correct - there will be unneeded data carried around that stops 
erase blocks from being garbage collected, and this unneeded data will 
occasionally be copied as part of compaction routines or wear-levelling 
functions.

There are a few things to note, however - there will /always/ be some 
unneeded data carried around, no matter how enthusiastic the filesystem 
is about issuing TRIMs (and filesystems /don't/ always issue a TRIM, 
especially in cases where the logical block will be re-used).

Also, the whole point of garbage collection (of TRIM'ed blocks or blocks 
whose logical sector has been overwritten) is so that when the host 
wants to write something, there are free blocks on the SSD already 
erased and waiting.  As long as the SSD has more than enough such free 
blocks at any given time, then it does not need any more - extra free 
blocks cannot improve the speed of the SSD.

Modern SSD's have over-provisioning - the disk claims to have "x" GB of 
space, and provides logical block number for "x" GB, but in fact it has 
something like "x + 15%" GB of actual flash space.  This extra 15% 
(actual values vary) provides two things - a safety margin for bad 
blocks, and a guarantee that there are enough pages that are known to be 
unneeded (even in the absence of TRIM), so that there can always be 
plenty of free erase blocks.  Since the host can only see "x" GB, then 
at most "x" GB of pages can be in use - at least "15% of x" GB pages are 
known to be free.  The SSD may need to re-arrange pages and blocks a bit 
("defragmenting"), but it can always do it.

There are pathological cases where TRIM could make a difference.  If you 
fill your disk with random data, then erase everything, then fill it 
again using very random writes, then your writes will be slowed as 
garbage collection has to put together new free erase blocks - while 
TRIM could have let the SSD erase blocks earlier.

>
> My understanding is that a modern consumer SSD works by copy-on-write
> for new or changed blocks, so this need for TRIM is not needed. The
> SSD is only writing data to "empty" or previously erased cells. The
> correlation between logical sectors and physical sectors is
> constantly adjusted, unlike on HDDs where this remapping tends to
> only occur with persistent write failures to a sector.

Correct.

>
> Case 1: A file is being overwritten, or modified in some way. The
> file system knows this file consumes, .e.g LBA's 5000 to 6000, and so
> it sends a write command to the SSD, in effect "write data to LBA
> 5000, 1000" ergo write a data stream starting at LBA 5000, for 1000
> (contiguous) sectors. Obviously a file system might break up this
> file into multiple fragments, so this is simplistic.
>
> The SSD doesn't actually do what it's told. It doesn't literally
> overwrite those LBA's, what it does is dereference them in its
> lookup. And remaps those LBA's to new empty cells, and writes your
> data there. Later, it can go back and do garbage college on those
> dereferenced cells when there are enough of them accumulated.

Exactly.  The SSD knows that the old physical blocks that used to be 
associated with LBA's 5000 to 6000 are now free, and can be garbage 
collected.  So for re-writing, TRIM is unnecessary.

>
> Case 2: A file is being newly written. The basic thing happens. It's
> possible the file system requests LBA's never before provisioned, or
> it requests LBA's from previously deleted files.

Yes.

>
> Either way, the SSD writes to empty cells. The case where it needs to
> write to occupied cells is if it runs out of empty ones, i.e. like
> David Brown said, in a case where the disk is getting full and poorly
> provisioned this could occur.
>
> It might also occur in some use cases where large files are being
> created/modified, destroyed, very frequently, such that the disk
> can't keep up with garbage collection. Maybe an example of this would
> be heavy VM usage with consumer SSDs. Why someone would do this I
> don't know but perhaps that's an example.

There will always be pathological cases like this where TRIM could be a 
win.  But on the other hand, there are pathological cases where TRIM 
causes great slowdowns - such as deleting a lot of files (as sending 
TRIM commands is very slow).

If you actually want to using your SSD in such a way, with lots of big, 
fast deletions and writings, then you can help it out by 
"short-stroking" it.  You take your new SSD (or newly "secure erased" 
SSD) and partition it to only use part of the space - leave some extra 
at the end.  This extra space increases the over-provisioning of the 
disk, and therefore increases the amount of free blocks you have at any 
given time.


I'd add a case 3 to your list:

Case 3: A file is erased.  If you have TRIM, the data blocks used by the 
file can be marked as "unneeded" by the SSD.  Without TRIM, the SSD 
thinks they are still important.  But the OS/filesystem knows the LBAs 
are free, and will re-use them sooner or later.  As soon as they are 
re-used, the SSD will mark the old physical blocks as unneeded and can 
garbage-collect them.  Without TRIM, this collection is delayed - but it 
still happens, and as long as the SSD has other free blocks, the delay 
has no impact on performance.


>
>
>> As far as I understand TRIM, among other things, it allows the SSD
>> to combine the invalid pages into a block so the block can be
>> erased thus making the pages ready to be written indiviually and
>> avoiding the read-erase-modify-write of the block when a page
>> changes, i.e. write amplification.
>
> It will do this with or without TRIM. TRIM simply is a mechanism for
> the file system to inform the SSD of this in advance, in the case of
> file deletions, where it may be some time before the SSD is informed
> those blocks are "free" when the file system decides to reuse those
> sectors.
>
>
>> Even if it does a read-modify-write to a new block then acks the
>> write and does the erase after in the background it's still
>> overhead in the read-modify-write i.e. read a whole block, modify a
>> page, write a whole block, instead of just being able to write a
>> page.

The SSD doesn't do that.  If make a change to data that is in a page in 
the middle of an erase block, it is only that page that is copied (for 
RMW) to another free page in the same or a different erase block.  The 
original page is marked "unneeded".  TRIM makes no difference to this 
process.  All it does is make it more likely that the other pages in the 
same block are marked "unneeded" at an earlier stage, so the whole old 
block can be recycled earlier.  But as I said above, doing this earlier 
or later makes no difference to performance.

>
>
> a.) Neglible.
 > b.) The file system does RWM at a block/cluster level
> anyway (typically this is 4KB).
>
>
> Chris Murphy--
 > To unsubscribe from this list: send the line
> "unsubscribe linux-raid" in the body of a message to
> majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
>
>


  reply	other threads:[~2012-10-31  8:32 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-28 18:59 Best way (only?) to setup SSD's for using TRIM Curtis J Blank
     [not found] ` <CAH3kUhHX28yNXggLuA+D_cH0STY-Rn_BjxVt_bh1sMeYLnM0cw@mail.gmail.com>
2012-10-29 14:35   ` Curtis J Blank
     [not found]   ` <508E9289.5070904@curtronics.com>
     [not found]     ` <CAH3kUhEdOO+GXKK6ALFUYJdYeTw2Mx-PF9M=0vQvkzzidihxSg@mail.gmail.com>
2012-10-29 17:08       ` Curt Blank
2012-10-29 18:06         ` Roberto Spadim
2012-10-30  9:49 ` David Brown
2012-10-30 14:29   ` Curtis J Blank
2012-10-30 14:33     ` Roberto Spadim
2012-10-30 15:55     ` David Brown
2012-10-30 18:30       ` Curt Blank
2012-10-30 18:43         ` Roberto Spadim
2012-10-30 19:59         ` Chris Murphy
2012-10-31  8:32           ` David Brown [this message]
2012-10-31 13:44             ` Roberto Spadim
     [not found]             ` <CAJEsFnkM9w0kNbNd51ShP0uExvsZE6V9h3WKKs3nxWfncUCYJA@mail.gmail.com>
2012-10-31 14:11               ` David Brown
2012-11-13 13:39                 ` Ric Wheeler
2012-11-13 15:13                   ` David Brown
2012-11-13 15:39                     ` Ric Wheeler
2012-10-31 17:34             ` Curtis J Blank
2012-10-31 20:04               ` David Brown
2012-11-01  1:54                 ` Curtis J Blank
2012-11-01  8:15                   ` David Brown
2012-11-01 15:01                     ` Wolfgang Denk
2012-11-01 16:41                       ` David Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5090E239.9040302@hesbynett.no \
    --to=david.brown@hesbynett.no \
    --cc=linux-raid@vger.kernel.org \
    --cc=lists@colorremedies.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).