linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: "Theodore Ts'o" <tytso@mit.edu>,
	Dave Chinner <david@fromorbit.com>,
	Chris Mason <chris.mason@fusionio.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Ric Wheeler <rwheeler@redhat.com>, Ingo Molnar <mingo@kernel.org>,
	Christoph Hellwig <hch@infradead.org>,
	Martin Steigerwald <Martin@lichtvoll.de>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH, 3.7-rc7, RESEND] fs: revert commit bbdd6808 to fallocate UAPI
Date: Mon, 10 Dec 2012 13:52:15 -0500	[thread overview]
Message-ID: <50C62F5F.7040808@redhat.com> (raw)
In-Reply-To: <20121210173739.GA1359@thunk.org>

On 12/10/2012 12:37 PM, Theodore Ts'o wrote:
> On Sat, Dec 08, 2012 at 11:17:05AM +1100, Dave Chinner wrote:
>> I wouldn't recommend XFS_IOC_ALLOCSP as a user-friendly interface.
>> The concept, however, implemented by a new fallocate()
>> flag (say FALLOC_FL_WRITE_ZEROS) so that the filesystem knows that
>> the application considers unwritten extents undesirable is exactly
>> the sort of thing that we should be considering implementing.
> What's the point of using a new flag like this (or XFS's
> XFS_IOC_ALLOCSP) for writing zeros during preallocation as oppoised to
> simply doing a fallocate() followed by zeroing the data via a O_DIRECT
> write system call?

I think that this is actually quite useful if you take it to mean "keep normal 
fallocate semantics and make sure that no meta-data twiddling needs be done"

The existing behaviour (the default one) is preallocate and it is OK to metadata 
twiddling (extent conversion, etc) later on.

Of course, we could well offload either to WRITE_SAME or TRIM, etc if the 
storage target supports it,

ric

>
>> Indeed, if the filesystem is on something with WRITE_SAME or
>> discards to zero, no data would need to be written, you wouldn't
>> have any unwritten extent overhead, and no stale data exposure.
> And if you have a storage device which supports WRITE_SAME or
> persistent discards, you can do this automatically at preallocation
> time without needing a new fallocate(2) flag.  I certainly don't
> oppose adding such optimizations to ext4 or any other file system (I'm
> not entirely convinced that it's worth it to do this optimization at
> the VFS level), but it doesn't help for storage devices that don't
> support this feature.
>
>> This is exactly why Ted should have posted the patch for review. He
>> may not have got the flag through, but the discussion might just end
>> up in a place that is *better for everyone*.
> Both of these suggestions have been made multiple times before when we
> submitted the original patch to support NO_HIDE_STALE, and they aren't
> sufficient for our purposes, so it's not like submitting a patch to
> reserve the bit would have given us any new information.  We've had
> this discussion **ALREADY**, multiple times before, with no one
> beliving that the alternate solutions were sufficient for our needs.
>
> This is why this discussion reminds me so much of the wakelocks
> discussion, and why I've made the same decision the Android folks
> made, except they wasted far more time and got far more frustrated ---
> I'll just keep the dammned thing as a out-of-tree patch, until there
> are enough other people willing to say that they need and are using
> this patch because their workloads and use cases need it.  It will
> save me a whole lot of time.
>
> (BTW, on a similar subject, we have folks at Tao Bao, Oracle, and
> Google saying that disabling stable page writes does improve their
> workloads, despite everyone else claiming it doesn't matter, and both
> Tao Bao and Google have out-of-tree patches that spike out stable page
> writes.  Maybe this will be enough so we don't have to waste more time
> convincing people that it's not an insane workload such benefits from
> a way to disable stable page writes.  But if not, I'm not going to
> waste time trying to convincing everyone else on fs-devel.  Keeping
> the out of tree patch is just way less effort, and requires much less
> resources.)
>
>> And further - what happens if we add changes like I've mentioned
>> above and Google moves to using them instead? We'll have a bit in
>> the interface that nobody uses, nobody will ever implement, and we
>> can't remove. There's many, many good reasons why a revert is the
>> only sane thing to do at this point....
> It's one bit, where we have plenty and plenty of bits.  The only other
> possible uses for the fallocate flags (such as hot vs cold storage,
> etc., have all been bike-sheeded to death on fs-devel already), so I'm
> quite confident that we will never get to the point where even close
> to running out of fallocate flag bits.
>
> The only other bit I'm aware of that might happen soon is the volatile
> ranges patch, and that's just one more bit.  So at this point we'll
> still have 28 bits (out of 32 bits).  So when you talk about an
> interface that we'll never remove, I think you're engaging in
> hyperbole.
>
> 						- Ted
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


  parent reply	other threads:[~2012-12-10 18:52 UTC|newest]

Thread overview: 69+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-19 23:04 [PATCH] fs: revert commit bbdd6808 to fallocate UAPI Dave Chinner
2012-11-20 16:36 ` Christoph Hellwig
2012-11-26  0:28 ` [PATCH, 3.7-rc7, RESEND] " Dave Chinner
2012-11-26  2:55   ` Theodore Ts'o
2012-11-26  6:14     ` Tao Ma
2012-11-26  9:12     ` Dave Chinner
2012-12-05 10:48       ` Martin Steigerwald
2012-12-05 15:45         ` Linus Torvalds
2012-12-05 16:18           ` Martin Steigerwald
2012-12-05 16:33             ` Theodore Ts'o
2012-12-05 17:24               ` Martin Steigerwald
2012-12-05 17:34                 ` Theodore Ts'o
2012-12-05 17:55                   ` Martin Steigerwald
2012-12-06  0:42                   ` Dave Chinner
2012-12-06  9:24                     ` Martin Steigerwald
2012-12-05 18:25             ` Linus Torvalds
2012-12-06  1:14               ` Dave Chinner
2012-12-06  3:03                 ` Linus Torvalds
2012-12-06  9:37                   ` Martin Steigerwald
2012-12-07  1:08                     ` Ingo Molnar
2012-12-07  2:40                       ` Dave Chinner
2012-12-07 10:24                       ` Martin Steigerwald
2012-12-06 12:06                 ` Christoph Hellwig
2012-12-06 16:50                   ` Theodore Ts'o
2012-12-07  1:57                     ` Dave Chinner
2012-12-06 12:05           ` Christoph Hellwig
2012-12-07  1:16             ` Ingo Molnar
2012-12-07  3:19               ` Dave Chinner
2012-12-07 17:36               ` Ric Wheeler
2012-12-07 18:18                 ` Linus Torvalds
2012-12-07 19:03                   ` Chris Mason
2012-12-07 20:43                     ` Theodore Ts'o
2012-12-07 21:09                       ` Chris Mason
2012-12-07 21:27                         ` Theodore Ts'o
2012-12-07 21:43                           ` Chris Mason
2012-12-07 21:49                             ` Ric Wheeler
2012-12-07 21:57                               ` Chris Mason
2012-12-07 22:51                                 ` Eric Sandeen
2012-12-07 22:52                                 ` Eric Sandeen
2012-12-07 21:42                         ` Ric Wheeler
2012-12-07 21:57                           ` Theodore Ts'o
2012-12-07 22:02                             ` Ric Wheeler
2012-12-08  0:39                               ` Dave Chinner
2012-12-08  2:52                                 ` Joel Becker
2012-12-08  4:04                                   ` Dave Chinner
2012-12-08  0:17                     ` Dave Chinner
2012-12-08  1:39                       ` Chris Mason
2012-12-10 16:02                         ` Chris Mason
2012-12-10 17:37                       ` Theodore Ts'o
2012-12-10 18:05                         ` Steven Whitehouse
2012-12-10 18:13                           ` Theodore Ts'o
2012-12-10 18:20                             ` Theodore Ts'o
2012-12-11 12:16                               ` Steven Whitehouse
2012-12-11 22:09                                 ` Dave Chinner
2012-12-10 18:52                         ` Ric Wheeler [this message]
2012-12-11  0:52                         ` Dave Chinner
2012-12-07 19:30                   ` Steven Rostedt
2012-12-07 21:14                     ` Theodore Ts'o
2012-12-07 21:47                       ` Ric Wheeler
2012-12-07 23:25                         ` Howard Chu
2012-12-08  0:50                           ` Dave Chinner
2012-12-08 13:52                             ` Howard Chu
2012-12-08 14:02                               ` Ric Wheeler
2012-12-07 22:01                       ` Eric Sandeen
2012-12-09 21:37                       ` Ric Wheeler
2012-11-26 11:53     ` Alan Cox
2012-11-26 14:43       ` Theodore Ts'o
2012-11-26 21:12       ` Dave Chinner
2012-11-27 13:44         ` Martin Steigerwald

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50C62F5F.7040808@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=Martin@lichtvoll.de \
    --cc=chris.mason@fusionio.com \
    --cc=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).