From: Eric Sandeen <sandeen@sandeen.net>
To: Chris Mason <chris.mason@fusionio.com>,
Ric Wheeler <rwheeler@redhat.com>,
Chris Mason <clmason@fusionio.com>,
"Theodore Ts'o" <tytso@mit.edu>,
Linus Torvalds <torvalds@linux-foundation.org>,
Ingo Molnar <mingo@kernel.org>,
Christoph Hellwig <hch@infradead.org>,
Martin Steigerwald <Martin@lichtvoll.de>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Dave Chinner <david@fromorbit.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH, 3.7-rc7, RESEND] fs: revert commit bbdd6808 to fallocate UAPI
Date: Fri, 07 Dec 2012 16:51:33 -0600 [thread overview]
Message-ID: <50C272F5.9040301@sandeen.net> (raw)
In-Reply-To: <20121207215731.GC25713@shiny>
On 12/7/12 3:57 PM, Chris Mason wrote:
> On Fri, Dec 07, 2012 at 02:49:04PM -0700, Ric Wheeler wrote:
>> On 12/07/2012 04:43 PM, Chris Mason wrote:
>>> On Fri, Dec 07, 2012 at 02:27:43PM -0700, Theodore Ts'o wrote:
>>>> On Fri, Dec 07, 2012 at 04:09:32PM -0500, Chris Mason wrote:
>>>>> Persistent trim is what I had in mind, but there are other ideas that do
>>>>> imply a change in behavior as well. Can we safely assume this feature
>>>>> won't matter on spinning media? New features like persistent
>>>>> trim do make it much easier to solve securely, and using a bit for it
>>>>> means we can toss back an error to the app if the underlying storage
>>>>> isn't safe.
>>>> We originally implemented no hide stale for spinning media. Some
>>>> folks have claimed that for XFS their superior technology means that
>>>> no hide stale doesn't buy them anything for HDD's. I'm not entirely
>>>> sure I buy this, since if you need to update metadata, it means at
>>>> least one extra seek for each random write into 4k preallocated space,
>>>> and 7200 RPM disks only have about 200 seeks per second.
>>> True, 7200 RPM disks are slow, but even allowing them to expose stale
>>> data just makes them a little less slow.
>>>
>>> I know it's against the rules to pretend that disks don't matter. But
>>> really, once you're doing random IO into a spindle you've given up on
>>> performance anyway.
>>>
>>> -chris
>>
>> That's right.
>>
>> And equally true, once you have moved the disk heads to that track, you can
>> write a lot as cheaply as a little (i.e., do 1MB instead of 4KB). That will also
>> avoid fragmentation of the extents.
>
> When you do a 4K write, you have to remember that you've written just
> those 4K. When you do a 1MB write, you have to remember that you've
> written just that 1MB. It's the same operation, except with the 1MB
> you've also had to setup all the bios and send down the zeros, and do
> the proper locking to make sure you're not sending zeros down over
> some concurrent IO.
>
> The 1MB setup is actually more work, but it does greatly reduce the
> amount of time the workload needs to run before it goes into a steady
> state. For smaller files it may work well, but for larger ones I don't
> think it will be enough.
Ext4 already does this, actually, I think - see s_extent_max_zeroout_kb
and how it's used.
/* If extent is less than s_max_zeroout_kb, zeroout directly */
It's not a tunable (*gasp* ;)) but it's currently set to "32" as in
32 kb. Would be fun to bump that up and see how your test goes.
-Eric
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2012-12-07 22:52 UTC|newest]
Thread overview: 69+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-19 23:04 [PATCH] fs: revert commit bbdd6808 to fallocate UAPI Dave Chinner
2012-11-20 16:36 ` Christoph Hellwig
2012-11-26 0:28 ` [PATCH, 3.7-rc7, RESEND] " Dave Chinner
2012-11-26 2:55 ` Theodore Ts'o
2012-11-26 6:14 ` Tao Ma
2012-11-26 9:12 ` Dave Chinner
2012-12-05 10:48 ` Martin Steigerwald
2012-12-05 15:45 ` Linus Torvalds
2012-12-05 16:18 ` Martin Steigerwald
2012-12-05 16:33 ` Theodore Ts'o
2012-12-05 17:24 ` Martin Steigerwald
2012-12-05 17:34 ` Theodore Ts'o
2012-12-05 17:55 ` Martin Steigerwald
2012-12-06 0:42 ` Dave Chinner
2012-12-06 9:24 ` Martin Steigerwald
2012-12-05 18:25 ` Linus Torvalds
2012-12-06 1:14 ` Dave Chinner
2012-12-06 3:03 ` Linus Torvalds
2012-12-06 9:37 ` Martin Steigerwald
2012-12-07 1:08 ` Ingo Molnar
2012-12-07 2:40 ` Dave Chinner
2012-12-07 10:24 ` Martin Steigerwald
2012-12-06 12:06 ` Christoph Hellwig
2012-12-06 16:50 ` Theodore Ts'o
2012-12-07 1:57 ` Dave Chinner
2012-12-06 12:05 ` Christoph Hellwig
2012-12-07 1:16 ` Ingo Molnar
2012-12-07 3:19 ` Dave Chinner
2012-12-07 17:36 ` Ric Wheeler
2012-12-07 18:18 ` Linus Torvalds
2012-12-07 19:03 ` Chris Mason
2012-12-07 20:43 ` Theodore Ts'o
2012-12-07 21:09 ` Chris Mason
2012-12-07 21:27 ` Theodore Ts'o
2012-12-07 21:43 ` Chris Mason
2012-12-07 21:49 ` Ric Wheeler
2012-12-07 21:57 ` Chris Mason
2012-12-07 22:51 ` Eric Sandeen [this message]
2012-12-07 22:52 ` Eric Sandeen
2012-12-07 21:42 ` Ric Wheeler
2012-12-07 21:57 ` Theodore Ts'o
2012-12-07 22:02 ` Ric Wheeler
2012-12-08 0:39 ` Dave Chinner
2012-12-08 2:52 ` Joel Becker
2012-12-08 4:04 ` Dave Chinner
2012-12-08 0:17 ` Dave Chinner
2012-12-08 1:39 ` Chris Mason
2012-12-10 16:02 ` Chris Mason
2012-12-10 17:37 ` Theodore Ts'o
2012-12-10 18:05 ` Steven Whitehouse
2012-12-10 18:13 ` Theodore Ts'o
2012-12-10 18:20 ` Theodore Ts'o
2012-12-11 12:16 ` Steven Whitehouse
2012-12-11 22:09 ` Dave Chinner
2012-12-10 18:52 ` Ric Wheeler
2012-12-11 0:52 ` Dave Chinner
2012-12-07 19:30 ` Steven Rostedt
2012-12-07 21:14 ` Theodore Ts'o
2012-12-07 21:47 ` Ric Wheeler
2012-12-07 23:25 ` Howard Chu
2012-12-08 0:50 ` Dave Chinner
2012-12-08 13:52 ` Howard Chu
2012-12-08 14:02 ` Ric Wheeler
2012-12-07 22:01 ` Eric Sandeen
2012-12-09 21:37 ` Ric Wheeler
2012-11-26 11:53 ` Alan Cox
2012-11-26 14:43 ` Theodore Ts'o
2012-11-26 21:12 ` Dave Chinner
2012-11-27 13:44 ` Martin Steigerwald
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50C272F5.9040301@sandeen.net \
--to=sandeen@sandeen.net \
--cc=Martin@lichtvoll.de \
--cc=chris.mason@fusionio.com \
--cc=clmason@fusionio.com \
--cc=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=rwheeler@redhat.com \
--cc=torvalds@linux-foundation.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).