linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Brown <david.brown@hesbynett.no>
To: Alexander Haase <mail.alexhaase@gmail.com>
Cc: Chris Murphy <lists@colorremedies.com>,
	"linux-raid@vger.kernel.org" <linux-raid@vger.kernel.org>
Subject: Re: Best way (only?) to setup SSD's for using TRIM
Date: Wed, 31 Oct 2012 15:11:59 +0100	[thread overview]
Message-ID: <509131AF.2030400@hesbynett.no> (raw)
In-Reply-To: <CAJEsFnkM9w0kNbNd51ShP0uExvsZE6V9h3WKKs3nxWfncUCYJA@mail.gmail.com>

On 31/10/2012 14:12, Alexander Haase wrote:
> Has anyone considered handling TRIM via an idle IO queue? You'd have to
> purge queue items that conflicted with incoming writes, but it does get
> around the performance complaint. If the idle period never comes, old
> TRIMs can be silently dropped to lessen queue bloat.
>

I am sure it has been considered - but is it worth the effort and the 
complications?  TRIM has been implemented in several filesystems (ext4 
and, I believe, btrfs) - but is disabled by default because it typically 
slows down the system.  You are certainly correct that putting TRIM at 
the back of the queue will avoid the delays it causes - but it still 
will not give any significant benefit (except for old SSDs with limited 
garbage collection and small over-provisioning ), and you have a lot of 
extra complexity to ensure that a TRIM is never pushed back until after 
a new write to the same logical sectors.

It would be much easier and safer, and give much better effect, to make 
sure the block allocation procedure for filesystems emphasised 
re-writing old blocks as soon as possible (when on an SSD).  Then there 
is no need for TRIM at all.  This would have the added benefit of 
working well for compressed (or sparse) hard disk image files used by 
virtual machines - such image files only take up real disk space for 
blocks that are written, so re-writes would save real-world disk space.

> As far as parity consistency, bitmaps could track which stripes( and
> blocks within those stripes) are expected to be out of parity( also
> useful for lazy device init ). Maybe a bit-per-stripe map at the logical
> device level and a bit-per-LBA bitmap at the stripe level?

Tracking "no-sync" areas of a raid array is already high on the md raid 
things-to-do list (perhaps it is already implemented - I lose track of 
which features are planned and which are implemented).  And yes, such 
no-sync tracking would be useful here.  But it is complicated, 
especially for raid5/6 (raid1 is not too bad) - should TRIMs that cover 
part of a stripe be dropped?  Should the md layer remember them and 
coalesce them when it can TRIM a whole stripe?  Should it try to track 
partial synchronisation within a stripe?

Or should the md developers simply say that since supporting TRIM is not 
going to have any measurable benefits (certainly not with the sort of 
SSD's people use in raid arrays), and since TRIM slows down some 
operations, it is better to keep things simple and ignore TRIM entirely? 
  Even if there are occasional benefits to having TRIM, is it worth it 
in the face of added complication in the code and the risk of errors?

There /have/ been developers working on TRIM support on raid5.  It seems 
to have been a complicated process.  But some people like a challenge!

>
> On the other hand, does it hurt if empty blocks are out of parity( due
> to TRIM or lazy device init)? The parity recovery of garbage is still
> garbage, which is what any sane FS expects from unused blocks. If and
> when you do a parity scrub, you will spend a lot of time recovering
> garbage and undo any good TRIM might have done, but usual drive
> operation should quickly balance that out in a write-intensive
> environment where idle TRIM might help.
>

Yes, it "hurts" if empty blocks are out of sync.  On obvious issue is 
that you will get errors when scrubbing - the md layer has no way of 
knowing that these are unimportant (assuming there is no no-sync 
tracking), so any real problems will be hidden by the unimportant ones.

Another issue is for RMW cycles on raid5.  Small writes are done by 
reading the old data, reading the old parity, writing the new data and 
the new parity - but that only works if the parity was correct across 
the whole stripe.  Even if raid5 TRIM is restricted to whole stripes, a 
later small write to that stripe will be a disaster if it is not in sync.



  parent reply	other threads:[~2012-10-31 14:11 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-28 18:59 Best way (only?) to setup SSD's for using TRIM Curtis J Blank
     [not found] ` <CAH3kUhHX28yNXggLuA+D_cH0STY-Rn_BjxVt_bh1sMeYLnM0cw@mail.gmail.com>
2012-10-29 14:35   ` Curtis J Blank
     [not found]   ` <508E9289.5070904@curtronics.com>
     [not found]     ` <CAH3kUhEdOO+GXKK6ALFUYJdYeTw2Mx-PF9M=0vQvkzzidihxSg@mail.gmail.com>
2012-10-29 17:08       ` Curt Blank
2012-10-29 18:06         ` Roberto Spadim
2012-10-30  9:49 ` David Brown
2012-10-30 14:29   ` Curtis J Blank
2012-10-30 14:33     ` Roberto Spadim
2012-10-30 15:55     ` David Brown
2012-10-30 18:30       ` Curt Blank
2012-10-30 18:43         ` Roberto Spadim
2012-10-30 19:59         ` Chris Murphy
2012-10-31  8:32           ` David Brown
2012-10-31 13:44             ` Roberto Spadim
     [not found]             ` <CAJEsFnkM9w0kNbNd51ShP0uExvsZE6V9h3WKKs3nxWfncUCYJA@mail.gmail.com>
2012-10-31 14:11               ` David Brown [this message]
2012-11-13 13:39                 ` Ric Wheeler
2012-11-13 15:13                   ` David Brown
2012-11-13 15:39                     ` Ric Wheeler
2012-10-31 17:34             ` Curtis J Blank
2012-10-31 20:04               ` David Brown
2012-11-01  1:54                 ` Curtis J Blank
2012-11-01  8:15                   ` David Brown
2012-11-01 15:01                     ` Wolfgang Denk
2012-11-01 16:41                       ` David Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=509131AF.2030400@hesbynett.no \
    --to=david.brown@hesbynett.no \
    --cc=linux-raid@vger.kernel.org \
    --cc=lists@colorremedies.com \
    --cc=mail.alexhaase@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).