All of lore.kernel.org
 help / color / mirror / Atom feed
* automatically running fstrim
@ 2011-05-24 16:53 Phil Karn
  2011-05-25 10:06 ` Lukas Czerner
  0 siblings, 1 reply; 8+ messages in thread
From: Phil Karn @ 2011-05-24 16:53 UTC (permalink / raw)
  To: xfs

Now that the Linux 2.6.39 kernel is out, is there any reason I shouldn't
run fstrim out of my crontab? It doesn't seem to slow down my system
significantly while it runs.

As I understand fstrim, it walks through the file system free list
issuing TRIMs for each entry, and except for whatever load the TRIM
commands themselves generate (which is drive dependent) it shouldn't
interfere that much with system operation. Correct? Is there any
mechanism to issue these commands at a lower priority than regular disk I/O?

Thanks for all the work you guys do on XFS. It is much appreciated.

Phil

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: automatically running fstrim
  2011-05-24 16:53 automatically running fstrim Phil Karn
@ 2011-05-25 10:06 ` Lukas Czerner
  2011-05-25 11:20   ` Phil Karn
  2011-05-26  9:11   ` Dave Chinner
  0 siblings, 2 replies; 8+ messages in thread
From: Lukas Czerner @ 2011-05-25 10:06 UTC (permalink / raw)
  To: Phil Karn; +Cc: xfs

On Tue, 24 May 2011, Phil Karn wrote:

> Now that the Linux 2.6.39 kernel is out, is there any reason I shouldn't
> run fstrim out of my crontab? It doesn't seem to slow down my system
> significantly while it runs.
> 
> As I understand fstrim, it walks through the file system free list
> issuing TRIMs for each entry, and except for whatever load the TRIM
> commands themselves generate (which is drive dependent) it shouldn't
> interfere that much with system operation. Correct? Is there any
> mechanism to issue these commands at a lower priority than regular disk I/O?

No, not that I know of. But why not to run fstrim from cron lets say every
day ? Note that you do not necessarily need to run it "all the time",
because if the drive firmware has a lot of space for doing
wear-leveling, there is no point of sending TRIM.

Also keep in mind that lot of newer SSD's has some "hidden" space just
for wear-leveling, so to get to the point where firmware will have hard
time doing it and the drive actually get slower takes even more writes
than just filling your drive up to max.

So doing fstrim once or twice a day (it really depends on your work
load) is more than enough.

Also, since we have all this in place we might talk to distributions to
add the infrastructure to actually recognise "discard enabled" devices
and add fstrim into cron job automatically. Or, since the filesystem
should know the best when is the "right" time to do this, we might try
to figure out some kernel logic to trigger it. However it might be a
little bit tricky, since every drive behaves differently...

Thanks!
-Lukas

> 
> Thanks for all the work you guys do on XFS. It is much appreciated.
> 
> Phil
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
> 

-- 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: automatically running fstrim
  2011-05-25 10:06 ` Lukas Czerner
@ 2011-05-25 11:20   ` Phil Karn
  2011-05-25 11:47     ` Lukas Czerner
  2011-05-26  9:11   ` Dave Chinner
  1 sibling, 1 reply; 8+ messages in thread
From: Phil Karn @ 2011-05-25 11:20 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: xfs


[-- Attachment #1.1: Type: text/plain, Size: 643 bytes --]

Thanks. My problem is that I've been running some workloads that can gobble
up the SSD erased page pool rather quickly. It's a Perl script feeding a
large number of email messages to procmail, one at a time. I think this
creates and deletes a lot of temporary files. While XFS delayed allocation
normally keeps such files from going to disk, I think procmail defeats this
with fsync() to keep mail from ever being lost.

So I've simply been running fstrim by hand a lot so I don't have a repeat of
the system lockup I had a few days ago that I am pretty sure was due to my
OCZ Revo drive not handling garbage collection very gracefully.

Phil

[-- Attachment #1.2: Type: text/html, Size: 683 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: automatically running fstrim
  2011-05-25 11:20   ` Phil Karn
@ 2011-05-25 11:47     ` Lukas Czerner
  2011-05-25 22:36       ` Phil Karn
  0 siblings, 1 reply; 8+ messages in thread
From: Lukas Czerner @ 2011-05-25 11:47 UTC (permalink / raw)
  To: karn; +Cc: Lukas Czerner, xfs

On Wed, 25 May 2011, Phil Karn wrote:

> Thanks. My problem is that I've been running some workloads that can gobble
> up the SSD erased page pool rather quickly. It's a Perl script feeding a
> large number of email messages to procmail, one at a time. I think this
> creates and deletes a lot of temporary files. While XFS delayed allocation
> normally keeps such files from going to disk, I think procmail defeats this
> with fsync() to keep mail from ever being lost.
> 
> So I've simply been running fstrim by hand a lot so I don't have a repeat of
> the system lockup I had a few days ago that I am pretty sure was due to my
> OCZ Revo drive not handling garbage collection very gracefully.
> 
> Phil
> 

Interesting, system lockup really ? I have never seen problems like this
and I have been doing a lot of SSD testing. Looks like that the drive is
really crappy :), have you tried to look up for firmware update ?

Anyway, if running fstrim more often solves the problem, it is fine. But
I wonder if the other approach (periodic discard) would do better in
this case (it might not since the files are really small and are
unlinked often). Unfortunately xfs does not have this support yet, but
other filesystems do (ext4,btrfs,...) so if you like you might try one
of those and mount it with -o discard mount option. What it does is,
that it will send a TRIM for every range of freed filesystem blocks.

Generally, in its current state it comes with quite big performance
loss (that's why we have fstrim), but in you case it might be actually
more convenient than running fstrim all the time. Also it is handled
automatically and the only think needed is to pass the "-i discard"
mount option.

Thanks!
-Lukas

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: automatically running fstrim
  2011-05-25 11:47     ` Lukas Czerner
@ 2011-05-25 22:36       ` Phil Karn
  2011-05-26  7:56         ` Lukas Czerner
  0 siblings, 1 reply; 8+ messages in thread
From: Phil Karn @ 2011-05-25 22:36 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: xfs

On 5/25/11 4:47 AM, Lukas Czerner wrote:


> unlinked often). Unfortunately xfs does not have this support yet, but
> other filesystems do (ext4,btrfs,...) so if you like you might try one
> of those and mount it with -o discard mount option. What it does is,
> that it will send a TRIM for every range of freed filesystem blocks.
> 
> Generally, in its current state it comes with quite big performance
> loss (that's why we have fstrim), but in you case it might be actually
> more convenient than running fstrim all the time. Also it is handled
> automatically and the only think needed is to pass the "-i discard"
> mount option.

I have thought of using ext4 with the discard option on that device for
just this reason. But this OCZ Revo SSD seems to execute TRIM rather
slowly. I just timed it at 7 minutes 38 seconds to trim 46 GB of free
space on a 90 GB SSD. I wouldn't want that to occur in the foreground
while I'm running a program that's generating a lot of garbage blocks.

Intel drives, at least, seem to execute TRIM much faster; I think they
can take more blocks in each operation, and I conjecture that the drive
controller simply adds them to a "to do" list for later erasure in the
background. So there should probably be an option for "real-time" TRIM
on those SSDs that can do it without a performance penalty.

It would be nice if the fitrim ioctl were to issue TRIM commands only
for newly created garbage blocks that haven't already been trimmed. But
I guess that would require some major changes to the file system data
structures. At the least, it would require some special record-keeping
to keep track of this information. The Intel drive shows it's possible
to implement a very speedy TRIM, so maybe it won't be such a bad thing
to just trim the whole free list every time.

Phil

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: automatically running fstrim
  2011-05-25 22:36       ` Phil Karn
@ 2011-05-26  7:56         ` Lukas Czerner
  0 siblings, 0 replies; 8+ messages in thread
From: Lukas Czerner @ 2011-05-26  7:56 UTC (permalink / raw)
  To: Phil Karn; +Cc: Lukas Czerner, xfs

On Wed, 25 May 2011, Phil Karn wrote:

> On 5/25/11 4:47 AM, Lukas Czerner wrote:
> 
> 
> > unlinked often). Unfortunately xfs does not have this support yet, but
> > other filesystems do (ext4,btrfs,...) so if you like you might try one
> > of those and mount it with -o discard mount option. What it does is,
> > that it will send a TRIM for every range of freed filesystem blocks.
> > 
> > Generally, in its current state it comes with quite big performance
> > loss (that's why we have fstrim), but in you case it might be actually
> > more convenient than running fstrim all the time. Also it is handled
> > automatically and the only think needed is to pass the "-i discard"
> > mount option.
> 
> I have thought of using ext4 with the discard option on that device for
> just this reason. But this OCZ Revo SSD seems to execute TRIM rather
> slowly. I just timed it at 7 minutes 38 seconds to trim 46 GB of free
> space on a 90 GB SSD. I wouldn't want that to occur in the foreground
> while I'm running a program that's generating a lot of garbage blocks.
> 
> Intel drives, at least, seem to execute TRIM much faster; I think they
> can take more blocks in each operation, and I conjecture that the drive
> controller simply adds them to a "to do" list for later erasure in the
> background. So there should probably be an option for "real-time" TRIM
> on those SSDs that can do it without a performance penalty.

Well, this is a bit tricky. I have had a chance to test drive like this
and I realized that the drive seems to perform slower after more and
more trims sent to it. It did eventually recover, however it took about
half a minute to get performance back. Well, it is still a bit young
technology.

If you want to see some of my results, look here:
http://people.redhat.com/lczerner/discard/

there is also a tool available to do the testing.

> 
> It would be nice if the fitrim ioctl were to issue TRIM commands only
> for newly created garbage blocks that haven't already been trimmed. But
> I guess that would require some major changes to the file system data
> structures. At the least, it would require some special record-keeping
> to keep track of this information.

There are some patches for ext4 to do something like this, however it is
still not finished.

> The Intel drive shows it's possible
> to implement a very speedy TRIM, so maybe it won't be such a bad thing
> to just trim the whole free list every time.
> 
> Phil

Thanks!
-Lukas

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: automatically running fstrim
  2011-05-25 10:06 ` Lukas Czerner
  2011-05-25 11:20   ` Phil Karn
@ 2011-05-26  9:11   ` Dave Chinner
  2011-05-26  9:57     ` Lukas Czerner
  1 sibling, 1 reply; 8+ messages in thread
From: Dave Chinner @ 2011-05-26  9:11 UTC (permalink / raw)
  To: Lukas Czerner; +Cc: Phil Karn, xfs

On Wed, May 25, 2011 at 12:06:32PM +0200, Lukas Czerner wrote:
> On Tue, 24 May 2011, Phil Karn wrote:
> 
> > Now that the Linux 2.6.39 kernel is out, is there any reason I shouldn't
> > run fstrim out of my crontab? It doesn't seem to slow down my system
> > significantly while it runs.
> > 
> > As I understand fstrim, it walks through the file system free list
> > issuing TRIMs for each entry, and except for whatever load the TRIM
> > commands themselves generate (which is drive dependent) it shouldn't
> > interfere that much with system operation. Correct? Is there any
> > mechanism to issue these commands at a lower priority than regular disk I/O?
> 
> No, not that I know of. But why not to run fstrim from cron lets say every
> day ? Note that you do not necessarily need to run it "all the time",
> because if the drive firmware has a lot of space for doing
> wear-leveling, there is no point of sending TRIM.
> 
> Also keep in mind that lot of newer SSD's has some "hidden" space just
> for wear-leveling, so to get to the point where firmware will have hard
> time doing it and the drive actually get slower takes even more writes
> than just filling your drive up to max.
> 
> So doing fstrim once or twice a day (it really depends on your work
> load) is more than enough.
> 
> Also, since we have all this in place we might talk to distributions to
> add the infrastructure to actually recognise "discard enabled" devices
> and add fstrim into cron job automatically.

History suggests regularly scheduled preventative maintenance like
this can have unintended consequences that don't show up for some
time.

When XFS first got it's online defrag tool (xfs_fsr) back on Irix in
the late 90s, it was considered a good idea that running it once a
week to quickly detect and fix fragementation problems before they
got out of hand.

That seems like a good idea, but then 6-12 months later people
started reporting XFS filesystems with really severe fragmentation,
worse than before xfs_fsr was being run regularly. The majority of
the files that had been in the filesystem for some time were not
fragmented, but any new file would be badly fragemented and could
not be fixed.

It was then discovered that the act of defragmenting files caused
the fragementation of free space. That is, for every file with 2
extents that was defragmented into 1 extent, we now have two
freespace extents instead of 1. So, the more files you defragment,
the more free space fragments you create. If you don't delete files
regularly, then eventually you run out of large free space extents.
Then you can't defragment files any more, nor can you create
unfragemented files. 

So, xfs_fsr was then removed from the system weekly cron job, and
filesystems that suffered from this went through a dump-mkfs-restore
process to defragment them. From that time, xfs_fsr has been
recommended as a "run only when fragmentation is causing perf
problems" type of tool...

The moral of this story is that running trim as a preventative
maintenance tool could have the same sort of unintended long-term
consequences. That is, it may look like a good idea to run it often
to keep things clean and neat, but we just don't know what it is
doing to the underlying device's algorithms and it may take months
for such problems to show up. e.g. as a device that performance
cannot be restored to except via a secure erase....

> Or, since the filesystem
> should know the best when is the "right" time to do this, we might try
> to figure out some kernel logic to trigger it. However it might be a
> little bit tricky, since every drive behaves differently...

And that makes it much more likely that it will cause some kind of
unintended problem.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: automatically running fstrim
  2011-05-26  9:11   ` Dave Chinner
@ 2011-05-26  9:57     ` Lukas Czerner
  0 siblings, 0 replies; 8+ messages in thread
From: Lukas Czerner @ 2011-05-26  9:57 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Lukas Czerner, Phil Karn, xfs

On Thu, 26 May 2011, Dave Chinner wrote:

> On Wed, May 25, 2011 at 12:06:32PM +0200, Lukas Czerner wrote:
> > On Tue, 24 May 2011, Phil Karn wrote:
> > 
> > > Now that the Linux 2.6.39 kernel is out, is there any reason I shouldn't
> > > run fstrim out of my crontab? It doesn't seem to slow down my system
> > > significantly while it runs.
> > > 
> > > As I understand fstrim, it walks through the file system free list
> > > issuing TRIMs for each entry, and except for whatever load the TRIM
> > > commands themselves generate (which is drive dependent) it shouldn't
> > > interfere that much with system operation. Correct? Is there any
> > > mechanism to issue these commands at a lower priority than regular disk I/O?
> > 
> > No, not that I know of. But why not to run fstrim from cron lets say every
> > day ? Note that you do not necessarily need to run it "all the time",
> > because if the drive firmware has a lot of space for doing
> > wear-leveling, there is no point of sending TRIM.
> > 
> > Also keep in mind that lot of newer SSD's has some "hidden" space just
> > for wear-leveling, so to get to the point where firmware will have hard
> > time doing it and the drive actually get slower takes even more writes
> > than just filling your drive up to max.
> > 
> > So doing fstrim once or twice a day (it really depends on your work
> > load) is more than enough.
> > 
> > Also, since we have all this in place we might talk to distributions to
> > add the infrastructure to actually recognise "discard enabled" devices
> > and add fstrim into cron job automatically.
> 
> History suggests regularly scheduled preventative maintenance like
> this can have unintended consequences that don't show up for some
> time.
> 
> When XFS first got it's online defrag tool (xfs_fsr) back on Irix in
> the late 90s, it was considered a good idea that running it once a
> week to quickly detect and fix fragementation problems before they
> got out of hand.
> 
> That seems like a good idea, but then 6-12 months later people
> started reporting XFS filesystems with really severe fragmentation,
> worse than before xfs_fsr was being run regularly. The majority of
> the files that had been in the filesystem for some time were not
> fragmented, but any new file would be badly fragemented and could
> not be fixed.
> 
> It was then discovered that the act of defragmenting files caused
> the fragementation of free space. That is, for every file with 2
> extents that was defragmented into 1 extent, we now have two
> freespace extents instead of 1. So, the more files you defragment,
> the more free space fragments you create. If you don't delete files
> regularly, then eventually you run out of large free space extents.
> Then you can't defragment files any more, nor can you create
> unfragemented files. 
> 
> So, xfs_fsr was then removed from the system weekly cron job, and
> filesystems that suffered from this went through a dump-mkfs-restore
> process to defragment them. From that time, xfs_fsr has been
> recommended as a "run only when fragmentation is causing perf
> problems" type of tool...
> 
> The moral of this story is that running trim as a preventative
> maintenance tool could have the same sort of unintended long-term
> consequences. That is, it may look like a good idea to run it often
> to keep things clean and neat, but we just don't know what it is
> doing to the underlying device's algorithms and it may take months
> for such problems to show up. e.g. as a device that performance
> cannot be restored to except via a secure erase....

Hi Dave,

Interesting story really, so what you have got from this experience is
"lesson learned". I would not be very optimistic about avoiding this
next logical step, because otherwise we'll never learn the lesson, hence
things might be still wrong but silent enough that noone notice. It is
the same like enabling virtually any feature, unless you do not enable
it by default it get very little testing and you'll never find if there
is anything deeply wrong with it.

But I agree that we have to be careful with enabling something to do its
job periodically. So now (I hope) people will use it, possibly create
their own cron jobs, a if there is any problem, we'll notice. And after
six moths or so, when new Fedora will come out (hypothetically with
mentioned infrastructure) it should be relatively safe. But still, this
is something to discuss.

> 
> > Or, since the filesystem
> > should know the best when is the "right" time to do this, we might try
> > to figure out some kernel logic to trigger it. However it might be a
> > little bit tricky, since every drive behaves differently...
> 
> And that makes it much more likely that it will cause some kind of
> unintended problem.

I agree, that's why I like the first approach better.

> 
> Cheers,
> 
> Dave.

Thanks!
-Lukas

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-05-26  9:57 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-24 16:53 automatically running fstrim Phil Karn
2011-05-25 10:06 ` Lukas Czerner
2011-05-25 11:20   ` Phil Karn
2011-05-25 11:47     ` Lukas Czerner
2011-05-25 22:36       ` Phil Karn
2011-05-26  7:56         ` Lukas Czerner
2011-05-26  9:11   ` Dave Chinner
2011-05-26  9:57     ` Lukas Czerner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.