linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Tux3 for review
@ 2014-05-17  0:50 Daniel Phillips
  2014-05-17  5:09 ` Martin Steigerwald
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Daniel Phillips @ 2014-05-17  0:50 UTC (permalink / raw)
  To: linux-kernel, linux-fsdevel, tux3; +Cc: Linus Torvalds, Andrew Morton

We would like to offer Tux3 for review for mainline merge. We have 
prepared a new repository suitable for pulling:

https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/

Tux3 kernel module files are here:

https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/tree/fs/tux3

Tux3 userspace tools and tests are here:

https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/tree/fs/tux3/user?h=user

Repository

We are moving our development to the kernel.org tree from our standalone 
Github repository. Our history was imported from the standalone 
repository using git am. Our kernel.org tree is the usual fork of Linus 
mainline, with Tux3 kernel files on the master branch and userspace 
files in fs/tux3/user on the user branch. We maintain the user files in 
our kernel tree because Tux3 has a tighter coupling than usual between 
userspace and kernel.

Most of our kernel code also runs in userspace, for testing or as a fuse 
filesystem or as part of our userspace support. We also need to keep our 
master branch clean of userspace files. These conflicting requirements 
creates challenges for our workflow. We can't just merge from user to 
master because that would pull in userspace files to kernel, and we 
can't merge from master to user because that would pull the entire 
kernel history into our branch. The best idea we have come up with is to 
cherry-pick changes from user to master and master to user. This creates 
merge noise in our user history and requires care to avoid combining 
kernel and userspace changes in the same commit. At least, this is 
better than having two completely separate repositories. Probably. We 
would appreciate any comment on how this workflow could be improved.

For the time being, the subtree at fs/tux3 can also be used standalone. 
Run make in fs/tux3 to build a kernel module for the running kernel 
version. Run make in fs/tux3/user to build userspace commands including 
"tux3 mkfs". Run "make tests" in fs/tux3/user to run our unit tests. 
This capability might be useful for people interested in experimenting 
with Tux3 in user space, and is handy for a quick build of the user 
support without needing to pull the whole repository.

The tux3 command built in fs/tux3/user provides our support tools 
including "tux3 mkfs" and "tux3 fsck". For now, we do not build a 
standalone mkfs.tux3 and consider that a feature, not a bug, because it 
sends the message that Tux3 is for developers right now.

API changes

Tux3 does not implement any custom or extended interfaces.

Core changes

Tux3 builds without any core changes, however we do some unnatural 
things to enable that. We would like to have some core changes to clean 
this up. One is a correctness issue for mmap and three others are to 
clean up ugly workarounds. Without any core changes, mmap will be 
disabled because there is a potential for stale cache pages with 
combined file and mmap IO. I will describe them here and provide patches 
if asked:

1. mmap

Our "page fork" technique does copy-on-write on cache pages in order to 
enforce strict delta ordering, which prevents changing pages already 
under IO as a side effect. For mmap, we do the page fork in 
->page_mkwrite, which needs to be able to change the target page. 
Without this ability, we fault twice for each page_mkwrite, and we 
cannot close all races. We also have an ugly hack to export a 
page_cow_file symbol to our module without patching core.

2. Free a forked page

A forked page that goes out of scope after IO must be freed. We 
currently do that in an ugly way by polling for refcount to go to zero.

3. Cgroup interaction

We need some unexported functions to support cgroup.

4. Inode flushing

To enforce strong ordering, we flush inodes in a certain order that core 
knows nothing about. Allowing core to flush our inodes using its current 
algorithm would cause corruption. We would like a new fs-specific hook 
to call our own flushing algorithm. Without that, we replicate part of 
the core flushing code to call the tux3 flusher. Code for this is in 
commit_flusher.c and commit_flusher_hack.c. Alternatively we can try to 
improve the core flusher to meet our needs, or do both: develop a 
generic, improved flusher within Tux3 using the hook, test it a lot, 
then propose it for core. We would be more than happy to join in the 
active effort to improve the core flusher.

Style

We are not perfectly checkpatch clean. We run checkpatch like this:

    scripts/checkpatch.pl -f fs/tux3/*.[ch] --ignore 
PRINTF_L,C99_COMMENTS,SPLIT_STRING,SUSPECT_CODE_INDENT,LONG_LINE -q

With that, checkpatch still has a few complaints, but not too
many. Our rationale for suppressing some checkpatch complaints:

    PRINTF_L: printk supports it. It is shorter and nicer to our eyes.
    Checkpatch complains that it is not standard C, but it is not clear
    why that matters for kernel code. If anybody cares strongly, we will
    change %L to %ll.

    C99_COMMENTS: We use them sparingly as a shorthand for "FIXME: <line
    where fix is obviously needed>". Will go away as fixes arrive.

    SPLIT_STRING: We split some strings to fit in 80 columns. If anybody
    hates that, we will change them back to long lines.

    SUSPECT_CODE_INDENT: False positives

    LONG_LINE: There are a few long lines, where readability would be
    worse with splitting. We take our guidance from Linus:

        http://yarchive.net/comp/linux/coding_style.html

    If we made some line unreadable that way, please let us know and we
    will fix it.

Other issues

Declarations after Statements. We have some declarations after 
statements, mostly in the userspace code but also some in the kernel 
code. We have -Wno-declaration-after-statement in tux3/Makefile to build 
without warnings. We think that tasteful use of this C99 extension 
improves our code readability and maintainability. We would prefer to 
keep these if nobody objects.

Source includes. We include C files in a few places instead of linking 
them, typically because it is easier to maintain that way. This 
technique is already used in various places in kernel. Can be changed if 
necessary.

Fitness for use

Tux3 is not fit for use as of today and will eat your data. The most 
glaring deficiency is that Tux3 goes BUG on ENOSPC. Some expected 
interfaces are missing. like direct io, xattrs and atime. Some 
performance patches are out of tree, to be merged later. This includes 
directory indexing, so directories over a few thousand files will slow 
to a crawl. Tux3 survives our stress testing, but that does not mean it 
will survive your stress testing.

Purpose

We think that Tux3 fills a niche in the Linux ecology where a light, 
tight, modern filesystem belongs. We offer a fresh approach to some 
ancient problems. Tux3's best trick is strong consistency without the 
overhead that you might expect. Our obsession with minimal resource 
consumption, including disk space, CPU overhead and cache memory makes 
Tux3 promising for personal and embedded use. Tux3's feature set is not 
enterprise grade by any stretch of the imagination, but we hope to 
accrete some big system features over time. Any of several existing 
Linux filesystems already do a nice job of servicing that space, so we 
do not need to rush that. Tux3's special mission is to focus on basic 
functionality that is really robust, fast and simple.

Quick tour

Tux3 has thirty three c source files and thirteen header files, 
comprising about 18 thousand lines. Some files are the familiar ones 
from Ext2: balloc.c, dir.c, inode.c, namei.c, super.c and xattr.c.

Our btree code is a generic OOP-like btree class implemented in btree.c. 
Subclasses for different btree types are provided by specialized leaf 
methods in dleaf.c and ileaf.c, for file data btrees and our inode table 
tree, respective. We reuse the ileaf.c methods in orphan.c to store 
orphaned inodes.

The main workhorse of Tux3 is filemap.c, which maps between logical and 
physical file extents for read and write. This is analogous to 
ext2_get_block but more complex because of extents and btrees. This 
spreads out over several subfiles for modularity: filemap_blocklib.c, 
filemap_hole.c, filemap_mmap.c.

Our delta commit model is implemented in commit.c and its subfiles 
commit_flusher.c and commit_flusher_hack.c. This is supported by log.c 
and replay.c, to emit log records and replay them on mount. Flushing out 
dirty cache is a major Tux3 obsession, implemented in writeback.c and 
its subfiles writeback_iattrfork.c, writeback_inodedelete.c and 
writeback_xattrfork.c

We use buffers as handles for cache blocks, and have some unique 
requirements there, so we have buffer.c with subfiles buffer_fork.c, 
buffer_writeback.c, and buffer_writebacklib.c. These implement our block 
fork concept. A "bufvec" batching technique translates buffers to bios 
for fast IO.

Digression: there might be something generically useful in our buffer 
code, however in the long run we would rather replace buffer_head 
entirely than try to fix it. Probably, we can save significant CPU and 
memory using a framework that specifically provides cache block handles 
and not other traditional buffer_head IO functionality. So buffer_head 
eradication is in our future work queue and our factoring here reflects 
that.

Our scheme for variable sized inodes with optional attributes is 
implemented in iattr.c. Block allocation is lightly factored into policy 
and mechanism, with the policy bits hived off into policy.c. 
Inode_defer.c is a subfile of inode.c and decouples frontend file 
creation code from backend inode table updating. In inode_vfslib.c we 
duplicate some core kernel code, which will go away if we can export the 
proper core functionality as described earlier. Our ugly hack to export 
page_cow_file is in mmap_builtin_hack.c. In utility.c we have a few 
functions that could possibly become generic.

We encapsulate some of our internal APIs in header files, so we have 
quite a few of those. We also have kcompat.h to support building our 
module over a range of kernel versions. This will go away but is not 
gone yet. In link.h we have a single linked list implementation somewhat 
resembling the list.h API. We could possibly replace that by llist.h or 
something like it. It is less than a hundred lines so it might be wiser 
to just leave it.

Regards,

Daniel


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-05-17  0:50 [RFC] Tux3 for review Daniel Phillips
@ 2014-05-17  5:09 ` Martin Steigerwald
  2014-05-17  5:29   ` Daniel Phillips
  2014-05-18 23:55 ` Dave Chinner
  2014-06-19 16:24 ` Josef Bacik
  2 siblings, 1 reply; 35+ messages in thread
From: Martin Steigerwald @ 2014-05-17  5:09 UTC (permalink / raw)
  To: daniel; +Cc: linux-kernel, linux-fsdevel, tux3, Linus Torvalds, Andrew Morton

Hi Daniel!

Am Freitag, 16. Mai 2014, 17:50:59 schrieb Daniel Phillips:
> We would like to offer Tux3 for review for mainline merge. We have
> prepared a new repository suitable for pulling:

At long last!

Congrats for arriving at this point.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-05-17  5:09 ` Martin Steigerwald
@ 2014-05-17  5:29   ` Daniel Phillips
  2014-05-20  6:56     ` Daniel Phillips
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Phillips @ 2014-05-17  5:29 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: linux-kernel, linux-fsdevel, tux3, Linus Torvalds, Andrew Morton,
	OGAWA Hirofumi

On Friday, May 16, 2014 10:09:50 PM PDT, Martin Steigerwald wrote:
> Hi Daniel!
>
> Am Freitag, 16. Mai 2014, 17:50:59 schrieb Daniel Phillips:
>> We would like to offer Tux3 for review for mainline merge. We have
>> prepared a new repository suitable for pulling:
>
> At long last!
>
> Congrats for arriving at this point.
>
> Ciao,

Hi Martin,

Thanks, Hirofumi is the one who deserves congratulations, recognition for 
providing more than half the code including most of the hard parts, and 
thanks for bringing Tux3 back to life.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-05-17  0:50 [RFC] Tux3 for review Daniel Phillips
  2014-05-17  5:09 ` Martin Steigerwald
@ 2014-05-18 23:55 ` Dave Chinner
  2014-05-20  0:55   ` Daniel Phillips
  2014-06-19 16:24 ` Josef Bacik
  2 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2014-05-18 23:55 UTC (permalink / raw)
  To: daniel; +Cc: linux-kernel, linux-fsdevel, tux3, Linus Torvalds, Andrew Morton

On Fri, May 16, 2014 at 05:50:59PM -0700, Daniel Phillips wrote:
> We would like to offer Tux3 for review for mainline merge. We have
> prepared a new repository suitable for pulling:
> 
> https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/
> 
> Tux3 kernel module files are here:
> 
> https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/tree/fs/tux3
> 
> Tux3 userspace tools and tests are here:
> 
> https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/tree/fs/tux3/user?h=user

Post patches for review, please.  Go and look at the process used to
merge f2fs for an example of how to filesystem merged....

Ignoring this, I had a quick look at the code.  This is not a code
review - it's a message to tell everyone else not to waste their
time looking at the code right now...

The code is a dog's breakfast of #ifdef hackery, stuff that doesn't
work (lots of code surrounded by "#if 0"), there's "#if __KERNEL__
...  #else .... #endif" all through the code, etc. The "declarations
within code" stuff is just horrible - it's not even used
consistently so it just looks like laziness to me.  IOWs, the code
is an ugly mess and needs a serious amount of cleanup work. Example:

static const struct inode_operations tux_file_iops = {
//      .permission     = ext4_permission,
        .setattr        = tux3_setattr,
        .getattr        = tux3_getattr,
#ifdef CONFIG_EXT4DEV_FS_XATTR
//      .setxattr       = generic_setxattr,
//      .getxattr       = generic_getxattr,
//      .listxattr      = ext4_listxattr,
//      .removexattr    = generic_removexattr,
#endif
//      .fallocate      = ext4_fallocate,
//      .fiemap         = ext4_fiemap,
        .update_time    = tux3_file_update_time,
};

That's code ready for review and merging? Really?

The hacks around VFS and MM functionality need to have demonstrated
methods for being removed. We're not going to merge that page
forking stuff (like you were told at LSF 2013 more than a year ago:
http://lwn.net/Articles/548091/) without rigorous design review and
a demonstration of the solutions to all the hard corner cases it
has. The current code doesn't solve them (e.g. direct IO doesn't
work in tux3), and there's no clear patch set we can review that
demonstrates how it is all supposed to work. i.e. you need to
separate out all the page forking code into a separate patchset for
review, independent of the tux3 code and applies to the core mm/
code.

Then there's all the writeback hacks. You've simply copy-n-pasted
most of fs-writeback.c, including duplicating structures like struct
wb_writeback_work and then hacked in crap (kallsyms lookups!) to be
able to access core structures from kernel module context
(tux3_setup_writeback(), I'm looking at you). This is completely
unacceptible for a merge. Again, you need to separate out all the
writeback changes you need into an independent patchset so that they
can be reviewed independently of the tux3 code that uses it. 

Now, one of the big features tux3 you hyped is built-in snapshotting
capability. All that talk efficient pointer trees (or whatever they
were called) and being so much better than ZFS/btrfs-like COW.
Well, I can't find it anywhere in the code - the only references to
snapshots are 5 comments like this:

	* FIXME: what happen if snapshot was introduced?

IOWs, tux3 is just a prototype of a standard journaling filesystem.
The tux3 code is still missing large parts of it's intended core
functionality and there is nothing to tell us when that might
appear. It really appears to me that tux3 is where btrfs was 5-6
years ago - the core of an idea, but a long, long way from being
feature complete or production ready. btrfs still doesn't handle
ENOSPC well and given that tux3's is following the same development
path (BUG on ENOSPC) it doesn't fill me with any confidence that
tux3 is going to turn out any better than btrfs in 5 years time.

Really, I don't see how you plan to bring tux3 to be feature
complete and production ready in less than 2-3 years. The current
code is barely functional at this point and there's still questions
that haven't been answered about whether core tux3 functionality can
even be made to work properly, let alone integrated effectively.

IMO, it's a waste of time right now asking anyone to review this
code for inclusion until it has been cleaned up, the core
infrastructure problems have been solved and the core filesystem
code is much closer to feature complete.....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-05-18 23:55 ` Dave Chinner
@ 2014-05-20  0:55   ` Daniel Phillips
  2014-05-20  3:18     ` Dave Chinner
  2014-05-22  9:52     ` Dongsu Park
  0 siblings, 2 replies; 35+ messages in thread
From: Daniel Phillips @ 2014-05-20  0:55 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, linux-fsdevel, tux3, Linus Torvalds, Andrew Morton

On 05/18/2014 04:55 PM, Dave Chinner wrote:
> On Fri, May 16, 2014 at 05:50:59PM -0700, Daniel Phillips wrote:
>> We would like to offer Tux3 for review for mainline merge. We have
>> prepared a new repository suitable for pulling:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/
>>
>> Tux3 kernel module files are here:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/tree/fs/tux3
>>
>> Tux3 userspace tools and tests are here:
>>
>> https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/tree/fs/tux3/user?h=user
> Post patches for review, please.  Go and look at the process used to
> merge f2fs for an example of how to filesystem merged....
If nobody objects to the flood then we will be happy to post patches, 
one per file. We thought that maybe the patch flood could be avoided by 
pointing to gitweb, but if that does not work for you then here come the 
patches. Andrew wanted patches too, way back, so that would be a quorum 
I think.

     http://osdir.com/ml/linux-kernel/2009-03/msg04753.html
> Example:
>
> static const struct inode_operations tux_file_iops = {
> //      .permission     = ext4_permission,
>          .setattr        = tux3_setattr,
>          .getattr        = tux3_getattr,
> #ifdef CONFIG_EXT4DEV_FS_XATTR
> //      .setxattr       = generic_setxattr,
> //      .getxattr       = generic_getxattr,
> //      .listxattr      = ext4_listxattr,
> //      .removexattr    = generic_removexattr,
> #endif
> //      .fallocate      = ext4_fallocate,
> //      .fiemap         = ext4_fiemap,
>          .update_time    = tux3_file_update_time,
> };
This was mentioned in the cover mail, it is our shorthand for "FIXME". I 
like that usage but if it is not to your taste we will change those to 
C99 comments.
> The hacks around VFS and MM functionality need to have demonstrated
> methods for being removed. We're not going to merge that page
> forking stuff (like you were told at LSF 2013 more than a year ago:
> http://lwn.net/Articles/548091/) without rigorous design review and
> a demonstration of the solutions to all the hard corner cases it
> has.
Thank you. A design review, hack by hack, is exactly what we want. Would 
you prefer to do them all at once, or one at a time?

If one at a time, I propose starting with page forking. We are proud of 
the advantages we get from page forking. It does what "stable pages" 
does, but boosts performance instead of costing performance by cleanly 
separating frontend from backend processing. Page forking also supports 
Tux3's strong ordering, which among other things, guarantees that usage 
like "write; rename" works atomically without creating empty files on crash.
> The current code doesn't solve them (e.g. direct IO doesn't
> work in tux3), and there's no clear patch set we can review that
> demonstrates how it is all supposed to work.
If you don't mind, we will leave direct IO for after merge. Direct IO is 
an enterprise feature on our to-do list, but Implementing it right now 
does not seem like a good reason to continue working out of tree. We 
would be happy to discuss our approach to direct IO if you wish.
> i.e. you need to
> separate out all the page forking code into a separate patchset for
> review, independent of the tux3 code and applies to the core mm/
> code.
Agreed.
> Then there's all the writeback hacks. You've simply copy-n-pasted
> most of fs-writeback.c, including duplicating structures like struct
> wb_writeback_work and then hacked in crap (kallsyms lookups!) to be
> able to access core structures from kernel module context
> (tux3_setup_writeback(), I'm looking at you).
This is intentional. The files named "*_hack" were kept as close as 
possible to the original core code to clarify exactly where core needs 
to change in order to remove our workarounds. If you think we should 
pretty up that code then we will happily do it. Or maybe we can hammer 
out acceptable core patches right now, and include those with our merge 
proposal. That would make us even happier. We hate those hacks as much 
as you do.
> you need to separate out all the
> writeback changes you need into an independent patchset so that they
> can be reviewed independently of the tux3 code that uses it.
OK, patches are coming. I think it makes sense to post the core patches 
with our one-file-per-patch lkml bomb that will be coming soon. These 
will just be "git format-patch" patches from a new branch in our repository.

As an aside, I would be interested in hearing from anybody who actually 
prefers gitweb urls to patches. It doesn't really feel like a hit so far.
> Now, one of the big features tux3 you hyped is built-in snapshotting
> capability. All that talk efficient pointer trees (or whatever they
> were called) and being so much better than ZFS/btrfs-like COW.
> Well, I can't find it anywhere in the code - the only references to
> snapshots are 5 comments like this:
>
> 	* FIXME: what happen if snapshot was introduced?
We decided to add the versioning after merge because there seems to be 
no shortage of people who are more interested in base functionality like 
performance and reliability than snapshotting.It was called "versioned 
pointers" way back when and is now called "version tags". Here is the 
prototype and test harness:

https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/tree/fs/tux3/devel/version.c?h=user

This should not be an obstacle to merging because neither Ext4 or XFS 
have snapshots. However, both Ext4 and XFS could practically use the 
same technique, presumably after we have proved it in Tux3. A generic 
name for the version.c approach is "fat nodes", touched on here:

     http://en.wikipedia.org/wiki/Persistent_data_structure

To use the version tags approach you need to support variable sized 
inodes so that attributes can be versioned. Otherwise, you just need a 
fancier btree leaf format. No huge changes to filesystem structure. It 
would be an interesting avenue for you to explore, if you think that  
XFS could one day get snapshots.
> IOWs, tux3 is just a prototype of a standard journaling filesystem.
No. Tux3 supports strong ordering without taking a performance hit for 
it. The technology is nothing like journalling. Tux3 is closer in spirit 
to a logging filesystem, but not very much like that either because Tux3 
does not need any cleaning pass.
> The tux3 code is still missing large parts of it's intended core
> functionality
I believe I said that.
> and there is nothing to tell us when that might
> appear.
As I said, the glaring omission is proper ENOSPC handling, which is work 
in progress. I do not view that as an obstacle to merging. After all, 
Btrfs did not have proper ENOSPC handling when it was merged. The design 
is here:

      http://phunq.net/pipermail/tux3/2014-May/002102.html
      Design note: ENOSPC again
> It really appears to me that tux3 is where btrfs was 5-6
> years ago - the core of an idea, but a long, long way from being
> feature complete or production ready. btrfs still doesn't handle
> ENOSPC well and given that tux3's is following the same development
> path (BUG on ENOSPC) it doesn't fill me with any confidence that
> tux3 is going to turn out any better than btrfs in 5 years time.
I totally agree. We take this very seriously and do not want to repeat 
that experience. You can't blame the Btrfs team, Btrfs is just really 
complicated. The progress they have made is impressive and they might be 
nearly there.

Tux3 is a lot more simple. I think that our ENOSPC design is simple and 
theoretically sound. It should get solid quickly, but we shall see.
> Really, I don't see how you plan to bring tux3 to be feature
> complete and production ready in less than 2-3 years.
That seems about right. I suppose I will be running around with Tux3 on 
my root filesystem pretty soon, but users really need to be clear on the 
fact that it takes years to make a fileystem stable. It is said that 
merging is a good way to speed that up.
> The current code is barely functional at this point
Disagree. Tux3 pases lots of stress tests including yours. It is showing 
interesting performance results, and stability is looking good. The 
atomic commit and crash recovery seems to be pretty solid. What Tux3 
needs most is to be hammered on a lot by developers.
> and there's still questions
> that haven't been answered about whether core tux3 functionality can
> even be made to work properly, let alone integrated effectively.
If you have specific questions, please raise them. I think our issues 
are actually a lot less than other filesystems that have been merged, 
including yours.
> IMO, it's a waste of time right now asking anyone to review this
> code for inclusion until it has been cleaned up, the core
> infrastructure problems have been solved and the core filesystem
> code is much closer to feature complete.....
We asked for review and you are doing a great job, very much 
appreciated. We will soldier on.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-05-20  0:55   ` Daniel Phillips
@ 2014-05-20  3:18     ` Dave Chinner
  2014-05-20  5:41       ` Daniel Phillips
  2014-06-13 10:32       ` Pavel Machek
  2014-05-22  9:52     ` Dongsu Park
  1 sibling, 2 replies; 35+ messages in thread
From: Dave Chinner @ 2014-05-20  3:18 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: linux-kernel, linux-fsdevel, tux3, Linus Torvalds, Andrew Morton

On Mon, May 19, 2014 at 05:55:30PM -0700, Daniel Phillips wrote:
> On 05/18/2014 04:55 PM, Dave Chinner wrote:
> >On Fri, May 16, 2014 at 05:50:59PM -0700, Daniel Phillips wrote:
> >static const struct inode_operations tux_file_iops = {
> >//      .permission     = ext4_permission,
> >         .setattr        = tux3_setattr,
> >         .getattr        = tux3_getattr,
> >#ifdef CONFIG_EXT4DEV_FS_XATTR
> >//      .setxattr       = generic_setxattr,
> >//      .getxattr       = generic_getxattr,
> >//      .listxattr      = ext4_listxattr,
> >//      .removexattr    = generic_removexattr,
> >#endif
> >//      .fallocate      = ext4_fallocate,
> >//      .fiemap         = ext4_fiemap,
> >         .update_time    = tux3_file_update_time,
> >};
> This was mentioned in the cover mail, it is our shorthand for
> "FIXME". I like that usage but if it is not to your taste we will
> change those to C99 comments.

I'm not commenting on the c99 comment style, I'm passing comment on
the fact that a filesystem that has commented out code from *other
filesystems* is in no shape to be merged.

> >The hacks around VFS and MM functionality need to have demonstrated
> >methods for being removed. We're not going to merge that page
> >forking stuff (like you were told at LSF 2013 more than a year ago:
> >http://lwn.net/Articles/548091/) without rigorous design review and
> >a demonstration of the solutions to all the hard corner cases it
> >has.
> Thank you. A design review, hack by hack, is exactly what we want.
> Would you prefer to do them all at once, or one at a time?

First you need to write the patches that we'll review. Then send
them once you have them functionally complete, working and ready to
go.

> >The current code doesn't solve them (e.g. direct IO doesn't
> >work in tux3), and there's no clear patch set we can review that
> >demonstrates how it is all supposed to work.
> If you don't mind, we will leave direct IO for after merge. Direct
> IO is an enterprise feature on our to-do list, but Implementing it
> right now does not seem like a good reason to continue working out
> of tree. We would be happy to discuss our approach to direct IO if
> you wish.

Except that Direct IO impacts on the design of the page forking code
(because of how things like get_user_pages() need to be aware of
page forking). So you need to have direct IO working to demonstrate
that the page forking design is sound.....

> >Now, one of the big features tux3 you hyped is built-in snapshotting
> >capability. All that talk efficient pointer trees (or whatever they
> >were called) and being so much better than ZFS/btrfs-like COW.
> >Well, I can't find it anywhere in the code - the only references to
> >snapshots are 5 comments like this:
> >
> >	* FIXME: what happen if snapshot was introduced?
> We decided to add the versioning after merge because there seems to
> be no shortage of people who are more interested in base
> functionality like performance and reliability than snapshotting.It

You completely missed my point. We don't *need* tux3 as it currently
implemented in the mainline tree. You keep saying "performance and
reliability" as reasons to merge code that is not clean, stable or
reliable, nor is the performance of that code at all proven to be
superior to the our supported production filesystems.

The development of btrfs has shown that moving prototype filesystems
into the main kernel tree does not lead stability, performance or
production readiness any faster than if they stayed as an
out-of-tree module until most of the development was complete. If
anything, merging into mainline reduces the speed at which a
filesystem can be brought to being feature complete and production
ready....

....

> As I said, the glaring omission is proper ENOSPC handling, which is
> work in progress. I do not view that as an obstacle to merging.
>
> After all, Btrfs did not have proper ENOSPC handling when it was
> merged.

Yup, and that was a big mistake. Hence not having working ENOSPC
detection is a major strike against merging a new filesystem now.

> The design is here:

So come back when you've implemented it properly and proven that you
have a sound design and clean implementation.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-05-20  3:18     ` Dave Chinner
@ 2014-05-20  5:41       ` Daniel Phillips
  2014-05-20 17:25         ` Daniel Phillips
  2014-06-13 10:32       ` Pavel Machek
  1 sibling, 1 reply; 35+ messages in thread
From: Daniel Phillips @ 2014-05-20  5:41 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, linux-fsdevel, tux3, Linus Torvalds, Andrew Morton

On Monday, May 19, 2014 8:18:02 PM PDT, Dave Chinner wrote:
> On Mon, May 19, 2014 at 05:55:30PM -0700, Daniel Phillips wrote:
>> On 05/18/2014 04:55 PM, Dave Chinner wrote:
>  ...
>
> I'm not commenting on the c99 comment style, I'm passing comment on
> the fact that a filesystem that has commented out code from *other
> filesystems* is in no shape to be merged.

I do not feel at all ashamed of mentioning Ext4 in our code where it makes 
sense. After all, we actually cut and pasted our whole dir.c from Ext3 
originally. But this hurts your eyes, so:

static const struct inode_operations tux_file_iops = {
        /*.permission   = tux3_permission,*/
        .setattr        = tux3_setattr,
        .getattr        = tux3_getattr,
#ifdef CONFIG_TUX3_XATTR
        /*.setxattr     = generic_setxattr,*/
        /*.getxattr     = generic_getxattr,*/
        /*.listxattr    = tux3_listxattr,*/
        /*.removexattr  = generic_removexattr,*/
#endif
        /*.fallocate    = tux3_fallocate,*/
        /*.fiemap       = tux3_fiemap,*/
        .update_time    = tux3_file_update_time,

Why those ones are commented out: fiemap is not important right now; 
fallocate is advisory; tux3 only has xattrs in user space not kernel yet, 
and initial users are unlikely to care; we don't need .permission until 
xattrs are exposed.

>>> The hacks around VFS and MM functionality need to have demonstrated
>>> methods for being removed. We're not going to merge that page
>>> forking stuff (like you were told at LSF 2013 more than a year ago:
>>> http://lwn.net/Articles/548091/) without rigorous design review and
>>> a demonstration of the solutions to all the hard corner cases it
>  ...
>> Thank you. A design review, hack by hack, is exactly what we want.
>> Would you prefer to do them all at once, or one at a time?
>
> First you need to write the patches that we'll review. Then send
> them once you have them functionally complete, working and ready to
> go.

I'll hold you to that review offer :) Our patch bomb is on the way.

>>> The current code doesn't solve them (e.g. direct IO doesn't
>>> work in tux3), and there's no clear patch set we can review that
>>> demonstrates how it is all supposed to work.
>> If you don't mind, we will leave direct IO for after merge. Direct
>> IO is an enterprise feature on our to-do list, but Implementing it
>> right now does not seem like a good reason to continue working out
>> of tree. We would be happy to discuss our approach to direct IO if
>> you wish.
>
> Except that Direct IO impacts on the design of the page forking code
> (because of how things like get_user_pages() need to be aware of
> page forking). So you need to have direct IO working to demonstrate
> that the page forking design is sound.....

We will deal with direct IO when we get to it. It is low on the list of 
features that users of personal and embedded devices actually want.

>>> ...
>> We decided to add the versioning after merge because there seems to
>> be no shortage of people who are more interested in base
>> functionality like performance and reliability than snapshotting.It
>  ...
>
> You completely missed my point. We don't *need* tux3 as it currently
> implemented in the mainline tree. You keep saying "performance and
> reliability" as reasons to merge code that is not clean, stable or
> reliable, nor is the performance of that code at all proven to be
> superior to the our supported production filesystems.

I disagree that Tux3 is not clean. Yes there are warts, but aren't there 
always. I also disagree that Tux3 is not stable or reliable. That remains 
to be seen. Tux3 passes our stress tests and yours. I have no doubt that 
issues will come up, but that is the case even for filesystems that have 
been merged for years.

> The development of btrfs has shown that moving prototype filesystems
> into the main kernel tree does not lead stability, performance or
> production readiness any faster than if they stayed as an
> out-of-tree module until most of the development was complete. If
> anything, merging into mainline reduces the speed at which a
> filesystem can be brought to being feature complete and production
> ready....

Tux3 is beyond the prototype stage and so was Btrfs when it was merged. I 
am glad that Btrfs was merged before it was ready. It had a rough ride for 
a few years and there is still more of that coming, but they stuck with it 
and made something impressive. Without that, Linux would still have no 
answer to ZFS.

I doubt that you can support your argument about merging slowing down 
development. From what I have seen, it tends to light a fire under the 
development team's collective tail. Somebody ought to do a study.

> ....
>
>> As I said, the glaring omission is proper ENOSPC handling, which is
>> work in progress. I do not view that as an obstacle to merging.
>> 
>> After all, Btrfs did not have proper ENOSPC handling when it was
>> merged.
>
> Yup, and that was a big mistake. Hence not having working ENOSPC
> detection is a major strike against merging a new filesystem now.
>
>> The design is here:
>
> So come back when you've implemented it properly and proven that you
> have a sound design and clean implementation.

Whether a completed, perfect implementation of ENOSPC is a precondition for 
merging is up to Andrew or Linus. If you feel that my ENOSPC design is not 
sound, please be specific.

I really like the approach of shrinking down the delta size as the volume 
fills up. I would go so far as to say that it is obviously correct. The 
implementation looks clean so far. I intend to continue working on it 
during review of our current code base. 

Regards,

Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-05-17  5:29   ` Daniel Phillips
@ 2014-05-20  6:56     ` Daniel Phillips
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Phillips @ 2014-05-20  6:56 UTC (permalink / raw)
  To: Martin Steigerwald
  Cc: linux-kernel, linux-fsdevel, tux3, Linus Torvalds, Andrew Morton,
	OGAWA Hirofumi

On Friday, May 16, 2014 10:29:43 PM PDT, I wrote:
> Hirofumi is the one who deserves congratulations, 
> recognition for providing more than half the code including most 
> of the hard parts, and thanks for bringing Tux3 back to life.

An epilogue... one gentleman took that suggestion seriously and sent $100 
to Hirofumi by Amazon payments, quoting that post. I do not feel at liberty 
to name the donor, so I won't, but please feel free to stand up and take 
your bows. Really, what an amazing warm n fuzzy.

Naturally, Hirofumi insists this must a donation to the tux3 project, but I 
say... saki time!

Regards,

Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-05-20  5:41       ` Daniel Phillips
@ 2014-05-20 17:25         ` Daniel Phillips
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Phillips @ 2014-05-20 17:25 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-kernel, linux-fsdevel, tux3, Linus Torvalds, Andrew Morton

Hi Dave,

This is to address your concern about theoretical interaction between 
direct IO and Tux3 page fork.

On Monday, May 19, 2014 10:41:40 PM PDT, I wrote:
>> Except that Direct IO impacts on the design of the page forking code
>> (because of how things like get_user_pages() need to be aware of
>> page forking). So you need to have direct IO working to demonstrate
>> that the page forking design is sound.....

Page fork only affects cache pages, so the only interation with direct IO 
is when the direct IO is to/from a mmap. If a direct write races with a 
programmed write to cache that causes a fork, then get_user_pages may pick 
up the old or new version of a page. It is not defined which will be 
written to disk, which is not a surprise. If a direct read races with a 
programmed write to cache that causes a fork, then it might violate our 
strong ordering, but that is not a surprise. I do not see any theoretical 
oopses or life cycle issues.

So Tux3 may allow racy direct read to violate strong ordering, but strong 
ordering would still be available with proper application sequencing. For 
example, direct read to mmap followed by msync would be strongly ordered.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-05-20  0:55   ` Daniel Phillips
  2014-05-20  3:18     ` Dave Chinner
@ 2014-05-22  9:52     ` Dongsu Park
  2014-05-23  8:21       ` Daniel Phillips
  1 sibling, 1 reply; 35+ messages in thread
From: Dongsu Park @ 2014-05-22  9:52 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Dave Chinner, linux-fsdevel, tux3, Andrew Morton, Linus Torvalds,
	linux-kernel

Hi,

On 19.05.2014 17:55, Daniel Phillips wrote:
> On 05/18/2014 04:55 PM, Dave Chinner wrote:
> >On Fri, May 16, 2014 at 05:50:59PM -0700, Daniel Phillips wrote:
> >>We would like to offer Tux3 for review for mainline merge. We have
> >>prepared a new repository suitable for pulling:
> >>
> >>https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/

First of all, thank you for trying to merge it to mainline.
Maybe I cannot say the code is clean enough, but basically
the filesystem seems to work at least.

> >Then there's all the writeback hacks. You've simply copy-n-pasted
> >most of fs-writeback.c, including duplicating structures like struct
> >wb_writeback_work and then hacked in crap (kallsyms lookups!) to be
> >able to access core structures from kernel module context
> >(tux3_setup_writeback(), I'm looking at you).
> This is intentional. The files named "*_hack" were kept as close as
> possible to the original core code to clarify exactly where core
> needs to change in order to remove our workarounds. If you think we
> should pretty up that code then we will happily do it. Or maybe we
> can hammer out acceptable core patches right now, and include those
> with our merge proposal. That would make us even happier. We hate
> those hacks as much as you do.

Looking up kallsyms is not only hacky, but also making the filesystem
unable to be mounted at all, when CONFIG_KALLSYMS_ALL is not defined.
I'll send out patches to fix that separately to tux3 mailing list.

Regards,
Dongsu


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-05-22  9:52     ` Dongsu Park
@ 2014-05-23  8:21       ` Daniel Phillips
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Phillips @ 2014-05-23  8:21 UTC (permalink / raw)
  To: Dongsu Park
  Cc: Dave Chinner, linux-fsdevel, tux3, Andrew Morton, Linus Torvalds,
	linux-kernel

Hi Dongsu,

On Thursday, May 22, 2014 2:52:27 AM PDT, Dongsu Park wrote:
> First of all, thank you for trying to merge it to mainline.
> Maybe I cannot say the code is clean enough, but basically
> the filesystem seems to work at least.

Thank you for confirming that. We test Tux3 extensively so we know it works 
pretty well (short of enospc handling) but independent confirmation carries 
more weight than anything we could say. Our standard disclaimer: Tux3 is 
for developers right now, not for users.

>> ...The files named "*_hack" were kept as close as
>> possible to the original core code to clarify exactly where core
>> needs to change in order to remove our workarounds. If you think we
>> should pretty up that code then we will happily do it. Or maybe we
>> can hammer out acceptable core patches right now, and include those
>  ...
>
> Looking up kallsyms is not only hacky, but also making the filesystem
> unable to be mounted at all, when CONFIG_KALLSYMS_ALL is not defined.
> I'll send out patches to fix that separately to tux3 mailing list.

Thank you for improving the hack. We are working on getting rid of that 
flusher hack completely. There is a patch under development to introduce a 
new super_operationss.writeback() operation that allows a filesystem to 
flush its own inodes instead of letting core do it. This will allow Tux3 to 
enforce its strong ordering semantics efficiently without needing to 
reimplement part of fs-writeback.c.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-05-20  3:18     ` Dave Chinner
  2014-05-20  5:41       ` Daniel Phillips
@ 2014-06-13 10:32       ` Pavel Machek
  2014-06-13 17:49         ` Daniel Phillips
  1 sibling, 1 reply; 35+ messages in thread
From: Pavel Machek @ 2014-06-13 10:32 UTC (permalink / raw)
  To: Dave Chinner
  Cc: Daniel Phillips, linux-kernel, linux-fsdevel, tux3,
	Linus Torvalds, Andrew Morton

Hi!

> > As I said, the glaring omission is proper ENOSPC handling, which is
> > work in progress. I do not view that as an obstacle to merging.
> >
> > After all, Btrfs did not have proper ENOSPC handling when it was
> > merged.
> 
> Yup, and that was a big mistake. Hence not having working ENOSPC
> detection is a major strike against merging a new filesystem now.

Hmm, it seems that merging filesystems is getting harder over
time. Soon, it will be impossible to merge new filesystem.

> > The design is here:
> 
> So come back when you've implemented it properly and proven that you
> have a sound design and clean implementation.

People submit code early to get feedback... but this is not exactly
helpful feedback, I'm afraid...

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-13 10:32       ` Pavel Machek
@ 2014-06-13 17:49         ` Daniel Phillips
  2014-06-13 20:20           ` Pavel Machek
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Phillips @ 2014-06-13 17:49 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Dave Chinner, linux-kernel, linux-fsdevel, Linus Torvalds, Andrew Morton

Hi Pavel, On Friday, June 13, 2014 3:32:16 AM PDT, Pavel Machek wrote:
> Hmm, it seems that merging filesystems is getting harder over
> time. Soon, it will be impossible to merge new filesystem.

My thought exactly, but it carries more weight coming from you.

It is getting more unpleasant to discuss things on LKML in
general, which tends to drive the design process away from
public view, leaving only the dregs of politics and infighting
for the public record. Perhaps some participants prefer it that
way, but I am certainly not one of them.

I thought this issue was going to be addressed at last year's
kernel summit.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-13 17:49         ` Daniel Phillips
@ 2014-06-13 20:20           ` Pavel Machek
  2014-06-15 21:41             ` Daniel Phillips
  0 siblings, 1 reply; 35+ messages in thread
From: Pavel Machek @ 2014-06-13 20:20 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Dave Chinner, linux-kernel, linux-fsdevel, Linus Torvalds, Andrew Morton

Hi!

On Fri 2014-06-13 10:49:39, Daniel Phillips wrote:
> Hi Pavel, On Friday, June 13, 2014 3:32:16 AM PDT, Pavel Machek wrote:
> >Hmm, it seems that merging filesystems is getting harder over
> >time. Soon, it will be impossible to merge new filesystem.
> 
> My thought exactly, but it carries more weight coming from you.
> 
> It is getting more unpleasant to discuss things on LKML in
> general, which tends to drive the design process away from
> public view, leaving only the dregs of politics and infighting
> for the public record. Perhaps some participants prefer it that
> way, but I am certainly not one of them.
> 
> I thought this issue was going to be addressed at last year's
> kernel summit.

Actually, would it make sense to have staging/fs/?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-13 20:20           ` Pavel Machek
@ 2014-06-15 21:41             ` Daniel Phillips
  2014-06-16 15:25               ` James Bottomley
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Phillips @ 2014-06-15 21:41 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Dave Chinner, linux-kernel, linux-fsdevel, Linus Torvalds, Andrew Morton

On Friday, June 13, 2014 1:20:39 PM PDT, Pavel Machek wrote:
> Hi!
>
> On Fri 2014-06-13 10:49:39, Daniel Phillips wrote:
>> Hi Pavel, On Friday, June 13, 2014 3:32:16 AM PDT, Pavel Machek wrote:
>  ...
>
> Actually, would it make sense to have staging/fs/?

That makes sense to me, if a suitably expert and nonaligned maintainer can 
be found to sign up for a ridiculous amount of largely thankless, but 
perhaps fascinating work. Any volunteers?

Regards,

Daniel							


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-15 21:41             ` Daniel Phillips
@ 2014-06-16 15:25               ` James Bottomley
  2014-06-19  8:21                 ` Pavel Machek
  0 siblings, 1 reply; 35+ messages in thread
From: James Bottomley @ 2014-06-16 15:25 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Pavel Machek, Dave Chinner, linux-kernel, linux-fsdevel,
	Linus Torvalds, Andrew Morton

On Sun, 2014-06-15 at 14:41 -0700, Daniel Phillips wrote:
> On Friday, June 13, 2014 1:20:39 PM PDT, Pavel Machek wrote:
> > Hi!
> >
> > On Fri 2014-06-13 10:49:39, Daniel Phillips wrote:
> >> Hi Pavel, On Friday, June 13, 2014 3:32:16 AM PDT, Pavel Machek wrote:
> >  ...
> >
> > Actually, would it make sense to have staging/fs/?
> 
> That makes sense to me, if a suitably expert and nonaligned maintainer can 
> be found

Really? We're at the passive aggressive implication that everyone's
against you now?  Can we get back to the technical discussion, please?

>  to sign up for a ridiculous amount of largely thankless, but 
> perhaps fascinating work. Any volunteers?

The whole suggestion is a non starter: we can't stage core API changes.
Even if we worked out how to do that, the staging trees mostly don't get
the type of in-depth expert review that you need anyway.

The Cardinal concern has always been the viability page forking and its
impact on writeback  ... and since writeback is our most difficult an
performance sensitive area, the bar to changing it is high.

When you presented page forking at LSF/MM in 2013, it didn't even stand
up to basic scrutiny before people found unresolved problems:

http://lwn.net/Articles/548091/

After lots of prodding, you finally coughed up a patch for discussion:

http://thread.gmane.org/gmane.linux.file-systems/85619

But then that petered out again.  I can't emphasise enough that
iterating these threads to a conclusion and reposting interface
suggestions is the way to proceed on this ... as far as I can tell from
the discussion, the reviewers were making helpful suggestions, even if
they didn't like the original interface you proposed.

James



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-16 15:25               ` James Bottomley
@ 2014-06-19  8:21                 ` Pavel Machek
  2014-06-19  9:26                   ` Lukáš Czerner
  0 siblings, 1 reply; 35+ messages in thread
From: Pavel Machek @ 2014-06-19  8:21 UTC (permalink / raw)
  To: James Bottomley
  Cc: Daniel Phillips, Dave Chinner, linux-kernel, linux-fsdevel,
	Linus Torvalds, Andrew Morton

On Mon 2014-06-16 08:25:54, James Bottomley wrote:
> On Sun, 2014-06-15 at 14:41 -0700, Daniel Phillips wrote:
> > On Friday, June 13, 2014 1:20:39 PM PDT, Pavel Machek wrote:
> > > On Fri 2014-06-13 10:49:39, Daniel Phillips wrote:
> >  to sign up for a ridiculous amount of largely thankless, but 
> > perhaps fascinating work. Any volunteers?
> 
> The whole suggestion is a non starter: we can't stage core API changes.
> Even if we worked out how to do that, the staging trees mostly don't get
> the type of in-depth expert review that you need anyway.

Well.. most filesystems do not need any core API changes, right?

> The Cardinal concern has always been the viability page forking and its
> impact on writeback  ... and since writeback is our most difficult an
> performance sensitive area, the bar to changing it is high.

And in this particular case, Daniel was flamed for poor coding style, not
for page forking. So staging/ would actually help him -- he could concentrate
on core changes without being distracted by unimportant stuff.

> When you presented page forking at LSF/MM in 2013, it didn't even stand
> up to basic scrutiny before people found unresolved problems:
> 
> http://lwn.net/Articles/548091/
> 
> After lots of prodding, you finally coughed up a patch for discussion:
> 
> http://thread.gmane.org/gmane.linux.file-systems/85619
> 
> But then that petered out again.  I can't emphasise enough that
> iterating these threads to a conclusion and reposting interface
> suggestions is the way to proceed on this ... as far as I can tell from
> the discussion, the reviewers were making helpful suggestions, even if
> they didn't like the original interface you proposed.

This obviously needs to be solved, first...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-19  8:21                 ` Pavel Machek
@ 2014-06-19  9:26                   ` Lukáš Czerner
  2014-06-19 21:58                     ` Daniel Phillips
  0 siblings, 1 reply; 35+ messages in thread
From: Lukáš Czerner @ 2014-06-19  9:26 UTC (permalink / raw)
  To: Pavel Machek
  Cc: James Bottomley, Daniel Phillips, Dave Chinner, linux-kernel,
	linux-fsdevel, Linus Torvalds, Andrew Morton

On Thu, 19 Jun 2014, Pavel Machek wrote:

> Date: Thu, 19 Jun 2014 10:21:29 +0200
> From: Pavel Machek <pavel@ucw.cz>
> To: James Bottomley <James.Bottomley@HansenPartnership.com>
> Cc: Daniel Phillips <daniel@phunq.net>, Dave Chinner <david@fromorbit.com>,
>     linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
>     Linus Torvalds <torvalds@linux-foundation.org>,
>     Andrew Morton <akpm@linux-foundation.org>
> Subject: Re: [RFC] Tux3 for review
> 
> On Mon 2014-06-16 08:25:54, James Bottomley wrote:
> > On Sun, 2014-06-15 at 14:41 -0700, Daniel Phillips wrote:
> > > On Friday, June 13, 2014 1:20:39 PM PDT, Pavel Machek wrote:
> > > > On Fri 2014-06-13 10:49:39, Daniel Phillips wrote:
> > >  to sign up for a ridiculous amount of largely thankless, but 
> > > perhaps fascinating work. Any volunteers?
> > 
> > The whole suggestion is a non starter: we can't stage core API changes.
> > Even if we worked out how to do that, the staging trees mostly don't get
> > the type of in-depth expert review that you need anyway.
> 
> Well.. most filesystems do not need any core API changes, right?
> 
> > The Cardinal concern has always been the viability page forking and its
> > impact on writeback  ... and since writeback is our most difficult an
> > performance sensitive area, the bar to changing it is high.
> 
> And in this particular case, Daniel was flamed for poor coding style, not
> for page forking. So staging/ would actually help him -- he could concentrate
> on core changes without being distracted by unimportant stuff.

Flamed ? really ? Dave pointed out some serious coding style problems.
Those should be very easy to fix.

Let me remind you some more important problems Dave brought up,
including page forking:

"
 The hacks around VFS and MM functionality need to have demonstrated
 methods for being removed. We're not going to merge that page
 forking stuff (like you were told at LSF 2013 more than a year ago:
 http://lwn.net/Articles/548091/) without rigorous design review and
 a demonstration of the solutions to all the hard corner cases it
 has. The current code doesn't solve them (e.g. direct IO doesn't
 work in tux3), and there's no clear patch set we can review that
 demonstrates how it is all supposed to work. i.e. you need to
 separate out all the page forking code into a separate patchset for
 review, independent of the tux3 code and applies to the core mm/
 code.
"

"
 Then there's all the writeback hacks. You've simply copy-n-pasted
 most of fs-writeback.c, including duplicating structures like struct
 wb_writeback_work and then hacked in crap (kallsyms lookups!) to be
 able to access core structures from kernel module context
 (tux3_setup_writeback(), I'm looking at you). This is completely
 unacceptible for a merge. Again, you need to separate out all the
 writeback changes you need into an independent patchset so that they
 can be reviewed independently of the tux3 code that uses it.
"

-Lukas



> 
> > When you presented page forking at LSF/MM in 2013, it didn't even stand
> > up to basic scrutiny before people found unresolved problems:
> > 
> > http://lwn.net/Articles/548091/
> > 
> > After lots of prodding, you finally coughed up a patch for discussion:
> > 
> > http://thread.gmane.org/gmane.linux.file-systems/85619
> > 
> > But then that petered out again.  I can't emphasise enough that
> > iterating these threads to a conclusion and reposting interface
> > suggestions is the way to proceed on this ... as far as I can tell from
> > the discussion, the reviewers were making helpful suggestions, even if
> > they didn't like the original interface you proposed.
> 
> This obviously needs to be solved, first...
> 									Pavel
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-05-17  0:50 [RFC] Tux3 for review Daniel Phillips
  2014-05-17  5:09 ` Martin Steigerwald
  2014-05-18 23:55 ` Dave Chinner
@ 2014-06-19 16:24 ` Josef Bacik
  2014-06-19 22:14   ` Daniel Phillips
  2 siblings, 1 reply; 35+ messages in thread
From: Josef Bacik @ 2014-06-19 16:24 UTC (permalink / raw)
  To: daniel, linux-kernel, linux-fsdevel, tux3; +Cc: Linus Torvalds, Andrew Morton



On 05/16/2014 05:50 PM, Daniel Phillips wrote:
> We would like to offer Tux3 for review for mainline merge. We have prepared a new repository suitable for pulling:
>
> https://urldefense.proofpoint.com/v1/url?u=https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=HU1zkg6rNOpSE0e5%2FKr7FH%2B2v8AbariYTtSijfNFsCY%3D%0A&s=941c4856b064898f9f05c0337b06db718dab951d8e65fccffaced7bd1d5e91a2
>
> Tux3 kernel module files are here:
>
> https://urldefense.proofpoint.com/v1/url?u=https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/tree/fs/tux3&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=HU1zkg6rNOpSE0e5%2FKr7FH%2B2v8AbariYTtSijfNFsCY%3D%0A&s=2471ce8b7706ede604604a5be7130daeb9424b7197122a66491c365525fbabe1
>
> Tux3 userspace tools and tests are here:
>
> https://urldefense.proofpoint.com/v1/url?u=https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/tree/fs/tux3/user?h%3Duser&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=HU1zkg6rNOpSE0e5%2FKr7FH%2B2v8AbariYTtSijfNFsCY%3D%0A&s=5b2ed7f8f99d030c502fd74e47c18ca75e5889f6bc4e5b45f5dd9031fe853ac2
>
> Repository
>
> We are moving our development to the kernel.org tree from our standalone Github repository. Our history was imported from the standalone repository using git am. Our kernel.org tree is the usual fork of Linus mainline, with Tux3 kernel files on the master branch and userspace files in fs/tux3/user on the user branch. We maintain the user files in our kernel tree because Tux3 has a tighter coupling than usual between userspace and kernel.
>
> Most of our kernel code also runs in userspace, for testing or as a fuse filesystem or as part of our userspace support. We also need to keep our master branch clean of userspace files. These conflicting requirements creates challenges for our workflow. We can't just merge from user to master because that would pull in userspace files to kernel, and we can't merge from master to user because that would pull the entire kernel history into our branch. The best idea we have come up with is to cherry-pick changes from user to master and master to user. This creates merge noise in our user history and requires care to avoid combining kernel and userspace changes in the same commit. At least, this is better than having two completely separate repositories. Probably. We would appreciate any comment on how this workflow could be improved.
>
> For the time being, the subtree at fs/tux3 can also be used standalone. Run make in fs/tux3 to build a kernel module for the running kernel version. Run make in fs/tux3/user to build userspace commands including "tux3 mkfs". Run "make tests" in fs/tux3/user to run our unit tests. This capability might be useful for people interested in experimenting with Tux3 in user space, and is handy for a quick build of the user support without needing to pull the whole repository.
>
> The tux3 command built in fs/tux3/user provides our support tools including "tux3 mkfs" and "tux3 fsck". For now, we do not build a standalone mkfs.tux3 and consider that a feature, not a bug, because it sends the message that Tux3 is for developers right now.
>
> API changes
>
> Tux3 does not implement any custom or extended interfaces.
>
> Core changes
>
> Tux3 builds without any core changes, however we do some unnatural things to enable that. We would like to have some core changes to clean this up. One is a correctness issue for mmap and three others are to clean up ugly workarounds. Without any core changes, mmap will be disabled because there is a potential for stale cache pages with combined file and mmap IO. I will describe them here and provide patches if asked:
>

So I'd really like to see the page fork stuff broken out in their own core
change.  I want to do something like this to get around the stable pages pain
but haven't had the time to look at it, so if we can hammer out what you guys
did into something workable and generic that would be great.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-19  9:26                   ` Lukáš Czerner
@ 2014-06-19 21:58                     ` Daniel Phillips
  2014-06-21 19:29                       ` James Bottomley
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Phillips @ 2014-06-19 21:58 UTC (permalink / raw)
  To: Lukáš Czerner
  Cc: Pavel Machek, James Bottomley, Dave Chinner, linux-kernel,
	linux-fsdevel, Linus Torvalds, Andrew Morton

On Thursday, June 19, 2014 2:26:48 AM PDT, Lukáš Czerner wrote:
> On Thu, 19 Jun 2014, Pavel Machek wrote:
>
>> Date: Thu, 19 Jun 2014 10:21:29 +0200
>> From: Pavel Machek <pavel@ucw.cz>
>> To: James Bottomley <James.Bottomley@HansenPartnership.com>
>> Cc: Daniel Phillips <daniel@phunq.net>, Dave Chinner 
>> <david@fromorbit.com>,
>> linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
>  ...
>
> Flamed ? really ?

Yes, really. There were valid points and there were also unabashed flames. 
The latter are not helpful to anybody, even the flamer. But note that there 
were no counter flames. The boy scout rule applies: always leave your 
campsite cleaner than you found it.

> Dave pointed out some serious coding style problems.
> Those should be very easy to fix.

One needs to be careful about the definition of "fix" so that it does not 
turn into "throw the baby out with the bath water". Our kernel code 
necessarily has a few __KERNEL__ #ifdefs because the majority of it also 
runs in user space. This not a feature to disparage, far from it.

Among other benefits, running in user space supports automated unit testing 
at fine granularity. We run make tests as a habit to catch a wide spectrum 
of correctness regressions. A successful make tests usually indicates that 
the heavyweight kernel stress tests are going to pass. Obviously, there are 
occasional exceptions to this. For example user space does not catch SMP 
races. In practice, only a handful of those have slipped through and 
required kernel level bug chasing.

That said, we will will happily merge any concrete suggestion that reduces 
the frequency of __KERNEL__. But please be realistic. There are 32 
__KERNEL__ ifdefs in our 18K line code base. That hardly amounts to a 
"dog's breakfast".

> Let me remind you some more important problems Dave brought up,
> including page forking:
>
> "
>  The hacks around VFS and MM functionality need to have demonstrated
>  methods for being removed.

We already removed 450 lines of core kernel workarounds from Tux3 with an 
approach that was literally cut and pasted from one of Dave's emails. Then 
Dave changed his mind. Now the Tux3 team has been assigned a research 
project to improve core kernel writeback instead of simply adapting the 
approach that is already proven to work well enough. That is a rather 
blatant example of "perfect is the enemy of good enough". Please read the 
thread.

> We're not going to merge that page
>  forking stuff (like you were told at LSF 2013 more than a year ago:
>  http://lwn.net/Articles/548091/) without rigorous design review and
>  a demonstration of the solutions to all the hard corner cases it
>  has. The current code doesn't solve them (e.g. direct IO doesn't
>  work in tux3), and there's no clear patch set we can review that
>  demonstrates how it is all supposed to work. i.e. you need to
>  separate out all the page forking code into a separate patchset for
>  review, independent of the tux3 code and applies to the core mm/
>  code.
> "

Direct IO is a spurious issue. To recap: direct IO does not introduce any 
new page forking issues. All of the page forking issues already exist with 
normal buffered IO and mmap. We have little interest and scant available 
time for heading off on a tangent to implement direct IO at this point just 
as a precondition for merging.

On the other hand, page forking itself has a number of interesting issues. 
Hirofumi is currently preparing a set of core kernel patches for review. 
These patches explicitly do not attempt to package page forking up into a 
nice and easy API that other filesystems could patch in tomorrow. That 
would be an unreasonable research burden on our small development team. 
Instead, we show how it works in Tux3, and if other filesystems want to get 
those benefits, they can make similar changes. If we (the kernel community) 
are lucky enough to find a pattern in it such that substantial parts of the 
code can be abstracted into a library, then good. But requiring such a 
library to be developed as a precondition to merging Tux3 is unreasonable.

> "
>  Then there's all the writeback hacks. You've simply copy-n-pasted
>  most of fs-writeback.c, including duplicating structures like struct
>  wb_writeback_work and then hacked in crap (kallsyms lookups!) to be
>  able to access core structures from kernel module context
>  (tux3_setup_writeback(), I'm looking at you). This is completely
>  unacceptible for a merge. Again, you need to separate out all the
>  writeback changes you need into an independent patchset so that they
>  can be reviewed independently of the tux3 code that uses it.
> "

That was already fixed as noted above, and all the relevant changes were 
already posted as an independent patch set. After that, some developers 
weighed in with half formed ideas about how the same thing could be done 
better, but without concrete suggestions. There is nothing wrong with half 
formed ideas, except when they turn into a way of blocking forward 
progress. See "perfect is the enemy of good enough" above.

It is worth noting that we (the kernel community) have been thrashing away 
at the writeback problem for more than twenty years, and the current 
solution still leaves much to be desired. It is unfair to expect us, the 
Tux3 team, to fix that mess in a week or two, just to merge our filesystem. 
We prefer to adapt the existing infrastructure for now, as expressed in the 
currently proposed patch set. With that, we allow core to mark our inodes 
dirty just as it has always done, and we continue to use the usual inode 
writeback lists for writeback sheduling, which work just fine.

Regards,

Daniel


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-19 16:24 ` Josef Bacik
@ 2014-06-19 22:14   ` Daniel Phillips
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Phillips @ 2014-06-19 22:14 UTC (permalink / raw)
  To: Josef Bacik
  Cc: linux-kernel, linux-fsdevel, tux3, Linus Torvalds, Andrew Morton

On Thursday, June 19, 2014 9:24:10 AM PDT, Josef Bacik wrote:
>
> On 05/16/2014 05:50 PM, Daniel Phillips wrote:
>> We would like to offer Tux3 for review for mainline merge. We 
>> have prepared a new repository suitable for pulling:
>> 
>> 
https://urldefense.proofpoint.com/v1/url?u=https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=HU1zkg6rNOpSE0e5%2FKr7FH%2B2v8AbariYTtSijfNFsCY%3D%0A&s=941c4856b064898f9f05c0337b06db718dab951d8e65fccffaced7bd1d5e91a2
>> 
>> Tux3 kernel module files are here:
>> 
>> 
https://urldefense.proofpoint.com/v1/url?u=https://git.kernel.org/cgit/linux/kernel/git/daniel/linux-tux3.git/tree/fs/tux3&k=ZVNjlDMF0FElm4dQtryO4A%3D%3D%0A&r=cKCbChRKsMpTX8ybrSkonQ%3D%3D%0A&m=HU1zkg6rNOpSE0e5%2FKr7FH%2B2v8AbariYTtSijfNFsCY%3D%0A&s=2471ce8b7706ede604604a5be7130daeb9424b7197122a66491c365525fbabe1
>> 
>> Tux3 userspace tools and test ...
>
> So I'd really like to see the page fork stuff broken out in their own 
core
> change.  I want to do something like this to get around the stable pages
> pain but haven't had the time to look at it, so if we can hammer out what
> you guys did into something workable and generic that would be great.

Hirofumi has been working on just that for the last couple of weeks (his
usual attention to detail) and there are still a few days to go on it. We
would appreciate it if somebody else does the hammering for a generic
version, so we can continue to concentrate on getting the core hooks
righ, proving out the corner cases, and proving the benefit through
benchmarks :)

The next round of Tux3 review patches will be two separate patch series,
one for writeback core hooks, and one for page forking core hooks. These
will be against Hirofumi's mirror at Github, to keep the kernel.org git
tree history clean and unrebased.

Regards,

Daniel


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-19 21:58                     ` Daniel Phillips
@ 2014-06-21 19:29                       ` James Bottomley
  2014-06-22  1:06                         ` Dave Chinner
                                           ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: James Bottomley @ 2014-06-21 19:29 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Lukáš Czerner, Pavel Machek, Dave Chinner,
	linux-kernel, linux-fsdevel, Linus Torvalds, Andrew Morton

On Thu, 2014-06-19 at 14:58 -0700, Daniel Phillips wrote:
> On Thursday, June 19, 2014 2:26:48 AM PDT, Lukáš Czerner wrote:
> > Let me remind you some more important problems Dave brought up,
> > including page forking:
> >
> > "
> >  The hacks around VFS and MM functionality need to have demonstrated
> >  methods for being removed.
> 
> We already removed 450 lines of core kernel workarounds from Tux3 with an 
> approach that was literally cut and pasted from one of Dave's emails. Then 
> Dave changed his mind. Now the Tux3 team has been assigned a research 
> project to improve core kernel writeback instead of simply adapting the 
> approach that is already proven to work well enough. That is a rather 
> blatant example of "perfect is the enemy of good enough". Please read the 
> thread.

That's a bit disingenuous: the concern has always been how page forking
interacted with writeback.  It's not new, it was one of the major things
brought up at LSF 14 months ago, so you weren't just assigned this.

> > We're not going to merge that page
> >  forking stuff (like you were told at LSF 2013 more than a year ago:
> >  http://lwn.net/Articles/548091/) without rigorous design review and
> >  a demonstration of the solutions to all the hard corner cases it
> >  has. The current code doesn't solve them (e.g. direct IO doesn't
> >  work in tux3), and there's no clear patch set we can review that
> >  demonstrates how it is all supposed to work. i.e. you need to
> >  separate out all the page forking code into a separate patchset for
> >  review, independent of the tux3 code and applies to the core mm/
> >  code.
> > "
> 
> Direct IO is a spurious issue. To recap: direct IO does not introduce any 
> new page forking issues. All of the page forking issues already exist with 
> normal buffered IO and mmap. We have little interest and scant available 
> time for heading off on a tangent to implement direct IO at this point just 
> as a precondition for merging.

The specific concern is that page forking cannot be made to work with
direct io.  Asserting that it doesn't cause any additional problems
isn't an answer to that concern.  Direct IO isn't actually a huge issue
for most filesystems (I mean even vfat has it).  The fact that you think
it is such a huge deal to implement for tux3 tends to lend credence to
this viewpoint.

The point is that if page forking won't work with direct IO at all, then
it's a broken design and there's no point merging it.

> On the other hand, page forking itself has a number of interesting issues. 
> Hirofumi is currently preparing a set of core kernel patches for review. 
> These patches explicitly do not attempt to package page forking up into a 
> nice and easy API that other filesystems could patch in tomorrow. That 
> would be an unreasonable research burden on our small development team. 
> Instead, we show how it works in Tux3, and if other filesystems want to get 
> those benefits, they can make similar changes. If we (the kernel community) 
> are lucky enough to find a pattern in it such that substantial parts of the 
> code can be abstracted into a library, then good. But requiring such a 
> library to be developed as a precondition to merging Tux3 is unreasonable.

OK, can we take a step back and ask why you're so keen to push this into
the tree?  The usual reason is ease of maintenance because in-tree
filesystems get updated as the vfs and mm APIs change.  However, the
reciprocal side of that is using standard VFS and MM APIs to make this
update and maintenance easy.  The reason no-one wants an in-tree
filesystem that implements its own writeback by hacking into the current
writeback system is that it's a huge maintenance burden.  Every time
writeback gets tweaked, tux3 will break meaning either we double the
burden on people updating writeback (to try to figure out how to
replicate the change in tux3) or we just accept that tux3 gets broken.
The former is unacceptable to the filesystem and mm people and the
latter would mean there's not really much point merging tux3 if we just
keep breaking it ... it's better to keep it out of tree where the
breakages can be fixed by people who understand them on their own
timescales.

The object of the exercise is *not* for you to convert every filesystem
to tux3, it's to see if there's a way of integrating enough of page
forking into the current writeback code that tux3 uses standard APIs and
doesn't multiply the burden on the people who maintain and update the
writeback code.

> > "
> >  Then there's all the writeback hacks. You've simply copy-n-pasted
> >  most of fs-writeback.c, including duplicating structures like struct
> >  wb_writeback_work and then hacked in crap (kallsyms lookups!) to be
> >  able to access core structures from kernel module context
> >  (tux3_setup_writeback(), I'm looking at you). This is completely
> >  unacceptible for a merge. Again, you need to separate out all the
> >  writeback changes you need into an independent patchset so that they
> >  can be reviewed independently of the tux3 code that uses it.
> > "
> 
> That was already fixed as noted above, and all the relevant changes were 
> already posted as an independent patch set. After that, some developers 
> weighed in with half formed ideas about how the same thing could be done 
> better, but without concrete suggestions. There is nothing wrong with half 
> formed ideas, except when they turn into a way of blocking forward 
> progress. See "perfect is the enemy of good enough" above.

Could you post the url to the new series, please, I must have missed it;
seeing the patches that implement the API for insertion into the
writeback code would certainly help frame this discussion.

> It is worth noting that we (the kernel community) have been thrashing away 
> at the writeback problem for more than twenty years, and the current 
> solution still leaves much to be desired. It is unfair to expect us, the 
> Tux3 team, to fix that mess in a week or two, just to merge our filesystem. 
> We prefer to adapt the existing infrastructure for now, as expressed in the 
> currently proposed patch set. With that, we allow core to mark our inodes 
> dirty just as it has always done, and we continue to use the usual inode 
> writeback lists for writeback sheduling, which work just fine.

So that's a misunderstanding of expectations; the actual expectation is
that you won't make the writeback problem more difficult to tackle.
Reimplementing writeback within your code in a way that's hacked into
the system is fragile and burdensome: as I said above, it becomes double
the code to maintain and tux3 breaks if its not updated.

James



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-21 19:29                       ` James Bottomley
@ 2014-06-22  1:06                         ` Dave Chinner
  2014-06-24 11:16                           ` Daniel Phillips
  2014-06-22  3:32                         ` Daniel Phillips
  2014-06-24  0:19                         ` Daniel Phillips
  2 siblings, 1 reply; 35+ messages in thread
From: Dave Chinner @ 2014-06-22  1:06 UTC (permalink / raw)
  To: James Bottomley
  Cc: Daniel Phillips, Lukáš Czerner, Pavel Machek,
	linux-kernel, linux-fsdevel, Linus Torvalds, Andrew Morton

On Sat, Jun 21, 2014 at 12:29:01PM -0700, James Bottomley wrote:
> On Thu, 2014-06-19 at 14:58 -0700, Daniel Phillips wrote:
> > On Thursday, June 19, 2014 2:26:48 AM PDT, Lukáš Czerner wrote:
> > > Let me remind you some more important problems Dave brought up,
> > > including page forking:
> > >
> > > "
> > >  The hacks around VFS and MM functionality need to have demonstrated
> > >  methods for being removed.
> > 
> > We already removed 450 lines of core kernel workarounds from Tux3 with an 
> > approach that was literally cut and pasted from one of Dave's emails. Then 
> > Dave changed his mind. Now the Tux3 team has been assigned a research 
> > project to improve core kernel writeback instead of simply adapting the 
> > approach that is already proven to work well enough. That is a rather 
> > blatant example of "perfect is the enemy of good enough". Please read the 
> > thread.
> 
> That's a bit disingenuous: the concern has always been how page forking
> interacted with writeback.  It's not new, it was one of the major things
> brought up at LSF 14 months ago, so you weren't just assigned this.

BTW, it's worth noting that reviewers are *allowed* to change their
mind at any time during a discussion or during review cycles.
Indeed, this occurs quite commonly. It's no different to multiple
reviewers disagreeing on what the best way to make the improvement
is - sometimes it takes an implementation to solidify opinion on the
best approach to solving a problem.

i.e. it took an implementation of the writeback hook tailored
specifically to tux3's requirements to understand the best way to
solve the infrastructure problem for *everyone*. This is how review
is supposed to work - take an idea, and refine it into something
better that works for everyone.

We'd have been stuck way up the creek without a paddle a long time
ago if reviewers weren't allowed to change their minds....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-21 19:29                       ` James Bottomley
  2014-06-22  1:06                         ` Dave Chinner
@ 2014-06-22  3:32                         ` Daniel Phillips
  2014-06-22 14:43                           ` James Bottomley
  2014-06-22 18:34                           ` Theodore Ts'o
  2014-06-24  0:19                         ` Daniel Phillips
  2 siblings, 2 replies; 35+ messages in thread
From: Daniel Phillips @ 2014-06-22  3:32 UTC (permalink / raw)
  To: James Bottomley
  Cc: Lukáš Czerner, Pavel Machek, Dave Chinner,
	linux-kernel, linux-fsdevel, Linus Torvalds, Andrew Morton

On Saturday, June 21, 2014 12:29:01 PM PDT, James Bottomley wrote:
> On Thu, 2014-06-19 at 14:58 -0700, Daniel Phillips wrote:
>> We already removed 450 lines of core kernel workarounds from Tux3 with 
an 
>> approach that was literally cut and pasted from one of Dave's 
>> emails. Then 
>> Dave changed his mind. Now the Tux3 team has been assigned a research 
>> project to improve core kernel writeback instead of simply adapting the 
>> approach that is already proven to work well enough. That is a rather 
>> blatant example of "perfect is the enemy of good enough". Please read 
the 
>> thread.
>
> That's a bit disingenuous: the concern has always been how page forking
> interacted with writeback.  It's not new, it was one of the major things
> brought up at LSF 14 months ago, so you weren't just assigned this.

[citation needed]

Regards,

Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-22  3:32                         ` Daniel Phillips
@ 2014-06-22 14:43                           ` James Bottomley
       [not found]                             ` <522aee97-34e7-4adc-adf2-c9b73aa0ef36@phunq.net>
  2014-06-22 18:34                           ` Theodore Ts'o
  1 sibling, 1 reply; 35+ messages in thread
From: James Bottomley @ 2014-06-22 14:43 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Lukáš Czerner, Pavel Machek, Dave Chinner,
	linux-kernel, linux-fsdevel, Linus Torvalds, Andrew Morton

On Sat, 2014-06-21 at 20:32 -0700, Daniel Phillips wrote:
> On Saturday, June 21, 2014 12:29:01 PM PDT, James Bottomley wrote:
> > On Thu, 2014-06-19 at 14:58 -0700, Daniel Phillips wrote:
> >> We already removed 450 lines of core kernel workarounds from Tux3 with 
> an 
> >> approach that was literally cut and pasted from one of Dave's 
> >> emails. Then 
> >> Dave changed his mind. Now the Tux3 team has been assigned a research 
> >> project to improve core kernel writeback instead of simply adapting the 
> >> approach that is already proven to work well enough. That is a rather 
> >> blatant example of "perfect is the enemy of good enough". Please read 
> the 
> >> thread.
> >
> > That's a bit disingenuous: the concern has always been how page forking
> > interacted with writeback.  It's not new, it was one of the major things
> > brought up at LSF 14 months ago, so you weren't just assigned this.
> 
> [citation needed]

Really?  I was there; I remember and it's in my notes of the discussion.
However, it's also in Jon's at paragraph 6 if you need to refer to
something to refresh your memory.

However, when it was spotted isn't the issue; how we add tux3 without a
large maintenance burden on writeback is, as I carefully explained in
the rest of the email you cut.

James



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-22  3:32                         ` Daniel Phillips
  2014-06-22 14:43                           ` James Bottomley
@ 2014-06-22 18:34                           ` Theodore Ts'o
  2014-06-24  0:31                             ` Daniel Phillips
  1 sibling, 1 reply; 35+ messages in thread
From: Theodore Ts'o @ 2014-06-22 18:34 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: James Bottomley, Lukáš Czerner, Pavel Machek,
	Dave Chinner, linux-kernel, linux-fsdevel, Linus Torvalds,
	Andrew Morton

On Sat, Jun 21, 2014 at 08:32:03PM -0700, Daniel Phillips wrote:
> >That's a bit disingenuous: the concern has always been how page forking
> >interacted with writeback.  It's not new, it was one of the major things
> >brought up at LSF 14 months ago, so you weren't just assigned this.
> 
> [citation needed]

http://lwn.net/Articles/548091/

						- Ted

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-21 19:29                       ` James Bottomley
  2014-06-22  1:06                         ` Dave Chinner
  2014-06-22  3:32                         ` Daniel Phillips
@ 2014-06-24  0:19                         ` Daniel Phillips
  2 siblings, 0 replies; 35+ messages in thread
From: Daniel Phillips @ 2014-06-24  0:19 UTC (permalink / raw)
  To: James Bottomley
  Cc: Lukáš Czerner, Pavel Machek, Dave Chinner,
	linux-kernel, linux-fsdevel, Linus Torvalds, Andrew Morton

On Saturday, June 21, 2014 12:29:01 PM PDT, James Bottomley wrote:
> On Thu, 2014-06-19 at 14:58 -0700, Daniel Phillips wrote:
>> On Thursday, June 19, 2014 2:26:48 AM PDT, Lukáš Czerner wrote:
>  ...
>
> the concern has always been how page forking interacted with 
> writeback.

More accurately, that is just one of several concerns that Tux3
necessarily addresses in order to benefit from this powerful
optimization. We are pleased that the details continue to be of
general interest.

>> Direct IO is a spurious issue. To recap: direct IO does 
>> notintroduce any new page forking issues. All of the page forking
>> issues already exist with normal buffered IO and mmap. We have 
>> little interest and scant available time for heading off on a 
>> tangent to implement direct IO at this point just as a 
>> precondition for merging.
>  ...
>
> The specific concern is that page forking cannot be made to work
> with direct io. Asserting that it doesn't cause any additional
> problems isn't an answer to that concern. 

Yes it is. We are satisfied that direct IO introduces no new issues
with page forking. If you are concerned about a specific issue then 
the onus is on you to specify it.

> Direct IO isn't actually a huge issue for most filesystems (I mean
> even vfat has it).

You might consider asking Hirofumi about that (VFAT maintainer).

> ...The fact that you think it is such a huge deal...

(Surely you could have found a less disparaging way to express
yourself...)

> ...to implement for tux3 tends to lend credence to this viewpoint.

It is purely a matter of concentrating on what is actually 
important, as opposed to imagined or manufactured. We do not wish 
to spend time on direct IO at this point in time. If you have 
identified a specific issue then please raise it.

For the record, there is a genuine reason why direct IO requires
extra work for Tux3, which has nothing to do with page forking. 
Tux3 has an asynchronous backend, unlike any other local Linux 
filesystem (but like Matt Dillon's Hammer, from which we took 
inspiration). Direct IO thus requires implementing a new 
synchronization mechanism to allow frontend direct IO to use the 
backend allocation and writeback mechanisms, because direct IO is 
synchronous. There is nothing new, magical or particularly 
challenging about that, it is just time consuming work that we do 
not intend to do right now because other more important things need 
to be done.

In the fullness of time, Tux3 will have direct IO just like VFAT,
however that work is a good candidate for post-merge development. 
For example, it could be a good ramp-up project for a new team 
member or a student looking to make their mark on the kernel world.

The bottom line is that direct IO has nothing to do with compiling
the kernel or operating a cell phone efficiently, so it is not 
interesting to us right now. It will become more interesting when 
Tux3 is ready to scale to servers running Oracle and the like.

> The point is that if page forking won't work with direct IO at
> all, then it's a broken design and there's no point merging it.

You can rest assured that direct IO will work with page forking,
given that buffered IO does. We are now discussing details of how 
to make core Linux a more hospitable environment for page forking, 
not whether page forking can be made to work at all, a question that 
was settled by example some time ago.

>> On the other hand, page forking itself has a number of
>> interesting issues. Hirofumi is currently preparing a set of 
>> core kernel patches for review. These patches explicitly do 
>> not attempt to package page forking up into a nice and easy 
>> API that other filesystems could patch in tomorrow. That would 
>> be an unreasonable research burden on our small development 
>> team. 
>  ...
>
> OK, can we take a step back and ask why you're so keen to push
> this into the tree?

If you mean, why are we keen to merge Tux3, I should not need to
explain that to you.

If you mean, why are we keen to push page forking per se into
mainline, then the answer is, we are by no means keen to push page 
forking into core kernel. Rather, that request comes from other 
filesystem developers who recognize it as a plausible way to avoid 
the pain of stable pages.

Based on our experience, page forking is properly implemented within
the filesystem, not core kernel, and we are keen only to push the 
requisite hooks into core. If somebody disagrees and feels the need 
to prove their point by implementing page forking entirely in core, 
then they should post patches and we will be the first to applaud.

> The usual reason is ease of maintenance because in-tree
> filesystems get updated as the vfs and mm APIs change.  However,
> the reciprocal side of that is using standard VFS and MM APIs to 
> make this update and maintenance easy.  The reason no-one wants
> an in-tree filesystem that implements its own writeback by 
> hacking into the current writeback system is that it's a huge 
> maintenance burden.

Every filesystem is a maintenance burden. Core kernel simply must
provide the mechanisms that are required to make the kernel a good 
place for filesystems to exist. The fact that some ancient core 
hackery needs to be tweaked to better accommodate the requirements 
of a modern filesystem is not unusual in any way. Essentially, that 
is the entire story of Linux kernel development.

> Every time writeback gets tweaked, tux3 will break meaning either 
> we double the burden on people updating writeback (to try to 
> figure out how to replicate the change in tux3) or we just accept 
> that tux3 gets broken.

No. Tux3 will be less of a burden for writeback maintenance than
other filesystems because it hooks in above the messy writepages 
machinery and therefore is not sensitive to subtle changes in that 
creaky code.

> The former is unacceptable to the filesystem and mm people and the
> latter would mean there's not really much point merging tux3 if we
> just keep breaking it ... it's better to keep it out of tree
> where the breakages can be fixed by people who understand them on 
> their own timescales.

On the face of it you are arguing the case that Tux3 should be 
blocked from merging forever, as should every new filesystem, as 
Pavel succinctly pointed out. That is less than helpful. But if 
your goal is to buttress the public perception that LKML has
become a toxic forum for contributors then you do an admirable
job.

By the way, after reading your polemic an observer might draw the 
conclusion that I am not one of the "filesystem and mm people". When 
did that change?

>>> ...
>> That was already fixed as noted above, and all the relevant
>> changes were already posted as an independent patch set. After
>> that, some developers weighed in with half formed ideas about 
>> how the same thing could be done better, but without concrete 
>> suggestions. There is nothing wrong with half formed ideas, 
>> except when they turn into a way of blocking forward progress
>  ...
>
> Could you post the url to the new series, please, I must have  
> missed it; seeing the patches that implement the API for 
> insertion into the writeback code would certainly help frame
> this discussion.

We think that our most recently posted patch is the best approach 
at this time. Which is to say that it relies on exactly the 
existing writeback scheduling heuristics. We think that Dave Chinner 
and others are wrong to advocate experimental development of a new 
writeback mechanism at this juncture while the current scheme 
already works perfectly well for Tux3, either with our writeback 
hack or with the new hook.

We further suggest that the new hook is easy to understand and
imposes insignificant new maintenance burden. In any case we will be 
happy to assume whatever maintenance burden might arise. Obviously, 
that is entirely academic while we are the only user.

>> It is worth noting that we (the kernel community) have been
>> thrashing away at the writeback problem for more than twenty 
>> years, and the current solution still leaves much to be 
>> desired. It is unfair to expect us, the Tux3 team, to fix that 
>> mess in a week or two, just to merge our filesystem. We prefer 
>> to adapt the existing infrastructure for now, as expressed in 
>> the currently proposed patch set. With that, we allow core to 
>> mark our inodes dirty just as it has always done, and we 
>> continue to use the usual inode writeback lists for writeback
>> scheduling, which work just fine.
>
> So that's a misunderstanding of expectations...

I did not misunderstand. It is clear from the context you deleted
that we are being pushed to engineer a new core writeback mechanism 
instead of adapting the existing one.

> ...the actual expectation is that you won't make the writeback
> problem more difficult to tackle.

We do not make the writeback problem more difficult, which is 
obvious from the patch.

> Reimplementing writeback within your code in a way that's hacked
> into the system is fragile and burdensome ... it becomes double 
> the code to maintain ... and tux3 breaks if its not updated.

You are preaching to the converted. As you know, we posted a patch
set that eliminates this particular instance of core duplication. 
Upcoming patches will eliminate the remaining core duplication. It 
is unnecessary to belabor that point further.

Regards,

Daniel


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-22 18:34                           ` Theodore Ts'o
@ 2014-06-24  0:31                             ` Daniel Phillips
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Phillips @ 2014-06-24  0:31 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: James Bottomley, Lukáš Czerner, Pavel Machek,
	Dave Chinner, linux-kernel, linux-fsdevel, Linus Torvalds,
	Andrew Morton

On Sunday, June 22, 2014 11:34:50 AM PDT, Theodore Ts'o wrote:
> On Sat, Jun 21, 2014 at 08:32:03PM -0700, Daniel Phillips wrote:
>>> That's a bit disingenuous: the concern has always been how page forking
>>> interacted with writeback.  It's not new, it was one of the major 
things
>>> brought up at LSF 14 months ago, so you weren't just assigned this.
>> 
>> [citation needed]
>
> http://lwn.net/Articles/548091/

Thank you Ted, and also thank you for providing an example worth emulating
of collegial behavior on LKML.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
       [not found]                             ` <522aee97-34e7-4adc-adf2-c9b73aa0ef36@phunq.net>
@ 2014-06-24  4:41                               ` James Bottomley
  2014-06-24  9:10                                 ` Daniel Phillips
  0 siblings, 1 reply; 35+ messages in thread
From: James Bottomley @ 2014-06-24  4:41 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Pavel Machek, Linus Torvalds, Andrew Morton, linux-kernel, linux-fsdevel

On Mon, 2014-06-23 at 17:27 -0700, Daniel Phillips wrote:
> On Sunday, June 22, 2014 7:43:07 AM PDT, James Bottomley wrote:
> > On Sat, 2014-06-21 at 20:32 -0700, Daniel Phillips wrote:
> >> On Saturday, June 21, 2014 12:29:01 PM PDT, James Bottomley wrote:
> >>> That's a bit disingenuous: the concern has always been how page forking
> >>> interacted with writeback.  It's not new, it was one of the major 
> things
> >>> brought up at LSF 14 months ago, so you weren't just assigned this.
> >  ...
> >> 
> >> [citation needed]
> >
> > Really?  I was there; I remember and it's in my notes of the discussion.
> > However, it's also in Jon's at paragraph 6 if you need to refer to
> > something to refresh your memory.
> 
> You have such a wonderfully charismatic way of providing citations.

Well, it's factual, as I presume you have now discovered.

> > However, when it was spotted isn't the issue; how we add tux3 without a
> > large maintenance burden on writeback is, as I carefully explained in
> > the rest of the email you cut.
> 
> You are doing a fine job of proving to the world that LKML has become
> a toxic waste dump. CC to LKML removed for obvious reasons.

Please don't drop the Mailing list cc; that's where the debate actually
happens and where others can see it.

>  Please let this be the end of the unhelpful rhetoric that does none of us any
> good, especially you.

Telling you factually what the issue is isn't rhetoric.  Your Ad Hominem
reply, of course, is rhetoric but I don't need to bother engaging with
your rhetorical technique because I'm still arguing the facts: proving
that page forking can be integrated into writeback without adding to the
maintenance burden is a big issue for tux3.  We're all still waiting for
the patches you were going to produce showing how this could be done.

James



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-24  4:41                               ` James Bottomley
@ 2014-06-24  9:10                                 ` Daniel Phillips
  2014-06-24 10:59                                   ` Theodore Ts'o
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Phillips @ 2014-06-24  9:10 UTC (permalink / raw)
  To: James Bottomley
  Cc: Pavel Machek, Linus Torvalds, Andrew Morton, linux-kernel,
	linux-fsdevel, Dave Chinner

On Monday, June 23, 2014 9:41:30 PM PDT, James Bottomley wrote:
>
> [rhetoric snipped] 
>
> ... I'm still arguing the facts: proving
> that page forking can be integrated into writeback without adding to the
> maintenance burden is a big issue for tux3.

Sorry, I must have missed those facts, I only saw recycled opinions.

> We're all still waiting for the patches you were going to produce
> showing how this could be done.

That makes sense, because the patches to transform our workarounds
into shiny new kernel hooks are still in progress, as I said. I would
appreciate the courtesy of being permitted to take the time to do the
work to the necessary quality without being subjected to endless
carping about when the patches will be posted.

If there is genuine interest in how we are approaching the new mm
hooks for page forking I will happily to take the time to discuss
it.

Note that I do not complain about Dave Chinner's endless carping, which
contains much the same rhetoric as your posts, the difference being that
Dave has proved himself a good reviewer. Though Dave behaves as caustically
as you or perhaps more so, he always takes care to provide just enough
useful technical sweetener to keep the technical vs toxic balance on the
positive side. Of course, it would be much better for all if he cared to
adopt a collegial manner, like Ted for example, who incidentally can flame
with the best of them when he wants to. But who would want to, other than a
self obsessed moron?

Speaking of Dave, what would be really interesting at this point is the 
long
story of how XFS worked around pretty nearly the same writeback issues that
Tux3 does. We already saw the short story, but it went by pretty fast. 
Color
me truly interested, in part because a good solution to this is probably 
what
we really want for writeback. Not immediately, because re-engineering parts 
of
core kernel unnecessarily during a filesystem merge is simply foolhardy, 
but
at some time in the not too distant future. (CC to Dave added.)

Regards,

Daniel


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-24  9:10                                 ` Daniel Phillips
@ 2014-06-24 10:59                                   ` Theodore Ts'o
  2014-06-24 11:27                                     ` Daniel Phillips
  0 siblings, 1 reply; 35+ messages in thread
From: Theodore Ts'o @ 2014-06-24 10:59 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: James Bottomley, Pavel Machek, Linus Torvalds, Andrew Morton,
	linux-kernel, linux-fsdevel, Dave Chinner

On Tue, Jun 24, 2014 at 02:10:52AM -0700, Daniel Phillips wrote:
> 
> That makes sense, because the patches to transform our workarounds
> into shiny new kernel hooks are still in progress, as I said. I would
> appreciate the courtesy of being permitted to take the time to do the
> work to the necessary quality without being subjected to endless
> carping about when the patches will be posted.

The feedback which you have been getting, fairly consistently I
believe, is that it is the shiny new kernel hooks that need to be
reviewed, not the workarounds.  I don't think it's a matter of people
not being willing to give you the time to do this work (take all the
time you need!); but rather that it's premature for you to be asking
for tux3 to be merged before those patches have been posted and
reviewed and found to be shiny.

Best regards,

						- Ted

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-22  1:06                         ` Dave Chinner
@ 2014-06-24 11:16                           ` Daniel Phillips
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Phillips @ 2014-06-24 11:16 UTC (permalink / raw)
  To: Dave Chinner
  Cc: James Bottomley, Lukáš Czerner, Pavel Machek,
	linux-kernel, linux-fsdevel, Linus Torvalds, Andrew Morton

On Saturday, June 21, 2014 6:06:00 PM PDT, Dave Chinner wrote:
> BTW, it's worth noting that reviewers are *allowed* to change their
> mind at any time during a discussion or during review cycles.
> Indeed, this occurs quite commonly. It's no different to multiple
> reviewers disagreeing on what the best way to make the improvement
> is - sometimes it takes an implementation to solidify opinion on the
> best approach to solving a problem.

The issue I have is not that you changed your mind per se, but
that you were right the first time and wrong the second time.
As you know, reviewers are not just allowed to change their
minds but are also allowed to be wrong from time to time.

The reason that you were wrong the second time is not that the
interface you proposed is wrong - I believe that we violently
agree about superblock-based writeback as the correct approach
long term - but that the current, inode based writeback already
works well enough for our needs. It therefore makes exactly zero
sense to go off on a tangent to engineer a new core mechanism
at the same time as merging the filesystem. The correct way to
do it is to get a likely user into kernel first (Tux3) and
then engineer the new interface that will be so all-dancing
that you will immediately feel compelled to adopt it for XFS.

Obviously, with only one user of the imperfect/functional
interface the maintenance overhead of updating it to the new
perfect/amazing interface rounds to zero. Remember, this is an
_internal_ API, so the do-not-break rule simply does not apply.
Instead, the "perfect is the enemy of good enough" rule is
operative.

Just to reiterate for the tl;dr amongst us: you were right the
first time. Go ahead and change your mind, but when you finally
realize that you were wrong the second time, please do let us
know.

Meanwhile, we must concentrate on the upcoming page forking
hooks, which promise to provide even more scope for being both
right and wrong, and smart or stupid about which parts of the
kernel should be deeply re-engineered versus prudently adapted
to evolving needs.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-24 10:59                                   ` Theodore Ts'o
@ 2014-06-24 11:27                                     ` Daniel Phillips
  2014-06-24 11:52                                       ` James Bottomley
  0 siblings, 1 reply; 35+ messages in thread
From: Daniel Phillips @ 2014-06-24 11:27 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: James Bottomley, Pavel Machek, Linus Torvalds, Andrew Morton,
	linux-kernel, linux-fsdevel, Dave Chinner

On Tuesday, June 24, 2014 3:59:40 AM PDT, Theodore Ts'o wrote:
> On Tue, Jun 24, 2014 at 02:10:52AM -0700, Daniel Phillips wrote:
>> 
>> That makes sense, because the patches to transform our workarounds
>> into shiny new kernel hooks are still in progress, as I said. I would
>> appreciate the courtesy of being permitted to take the time to do the
>> work to the necessary quality without being subjected to endless
>> carping about when the patches will be posted.
>
> The feedback which you have been getting, fairly consistently I
> believe, is that it is the shiny new kernel hooks that need to be
> reviewed, not the workarounds.  I don't think it's a matter of people
> not being willing to give you the time to do this work (take all the
> time you need!); but rather that it's premature for you to be asking
> for tux3 to be merged before those patches have been posted and
> reviewed and found to be shiny.

That is not quite right. Before posted the filesystem for review,
we did not know whether core changes or workarounds would be the
better route. Now we do know, and have duly turned our coding
energy to producing a set of decent core hooks. That does not mean
that we are taking Tux3 "out of play". That would just be stupid.

I emphatically disagree that it is premature for asking Tux3 to be
merged. You might think so, but I do not. While I do not begrudge
you your opinion, Linux did not get to the dominant position it has
today by being shy about merging new functionality early. Did we
suddenly lose our mojo just at Tux3 merge time?

If you really think that Tux3 has been offered for merge too early,
then clone our tree, build it, break it and heap abuse on us. That
should take you about one hour if you are right.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-24 11:27                                     ` Daniel Phillips
@ 2014-06-24 11:52                                       ` James Bottomley
  2014-06-24 12:10                                         ` Daniel Phillips
  0 siblings, 1 reply; 35+ messages in thread
From: James Bottomley @ 2014-06-24 11:52 UTC (permalink / raw)
  To: Daniel Phillips
  Cc: Theodore Ts'o, Pavel Machek, Linus Torvalds, Andrew Morton,
	linux-kernel, linux-fsdevel, Dave Chinner

On Tue, 2014-06-24 at 04:27 -0700, Daniel Phillips wrote:
> On Tuesday, June 24, 2014 3:59:40 AM PDT, Theodore Ts'o wrote:
> > On Tue, Jun 24, 2014 at 02:10:52AM -0700, Daniel Phillips wrote:
> >> 
> >> That makes sense, because the patches to transform our workarounds
> >> into shiny new kernel hooks are still in progress, as I said. I would
> >> appreciate the courtesy of being permitted to take the time to do the
> >> work to the necessary quality without being subjected to endless
> >> carping about when the patches will be posted.
> >
> > The feedback which you have been getting, fairly consistently I
> > believe, is that it is the shiny new kernel hooks that need to be
> > reviewed, not the workarounds.  I don't think it's a matter of people
> > not being willing to give you the time to do this work (take all the
> > time you need!); but rather that it's premature for you to be asking
> > for tux3 to be merged before those patches have been posted and
> > reviewed and found to be shiny.
> 
> That is not quite right. Before posted the filesystem for review,
> we did not know whether core changes or workarounds would be the
> better route. Now we do know, and have duly turned our coding
> energy to producing a set of decent core hooks. That does not mean
> that we are taking Tux3 "out of play". That would just be stupid.

OK, but now we've explained the reason several times: The original set
of hacks is fragile against changes to writeback, which is the
maintenance problem.

> I emphatically disagree that it is premature for asking Tux3 to be
> merged. You might think so, but I do not. While I do not begrudge
> you your opinion, Linux did not get to the dominant position it has
> today by being shy about merging new functionality early. Did we
> suddenly lose our mojo just at Tux3 merge time?

But you've agreed to go the core hooks route, the patches for which
aren't yet ready, so what is there actually to review and merge until
the patches appear?

James

> If you really think that Tux3 has been offered for merge too early,
> then clone our tree, build it, break it and heap abuse on us. That
> should take you about one hour if you are right.
> 
> Regards,
> 
> Daniel
> --
> To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html




^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC] Tux3 for review
  2014-06-24 11:52                                       ` James Bottomley
@ 2014-06-24 12:10                                         ` Daniel Phillips
  0 siblings, 0 replies; 35+ messages in thread
From: Daniel Phillips @ 2014-06-24 12:10 UTC (permalink / raw)
  To: James Bottomley
  Cc: Theodore Ts'o, Pavel Machek, Linus Torvalds, Andrew Morton,
	linux-kernel, linux-fsdevel, Dave Chinner

On Tuesday, June 24, 2014 4:52:15 AM PDT, James Bottomley wrote:
> On Tue, 2014-06-24 at 04:27 -0700, Daniel Phillips wrote:
>> I emphatically disagree that it is premature for asking Tux3 to be
>> merged. You might think so, but I do not. While I do not begrudge
>> you your opinion, Linux did not get to the dominant position it has
>> today by being shy about merging new functionality early. Did we
>> suddenly lose our mojo just at Tux3 merge time?
>
> But you've agreed to go the core hooks route, the patches for which
> aren't yet ready, so what is there actually to review and merge until
> the patches appear?

If Linus asks for a Tux3 pull first thing tomorrow we will agree to
it, perfect core patches or not. This is because we are confident
that all remaining API issues and code duplication issues are
solvable in the usual Linux way. The Tux3 tree exactly as posted
builds and runs passing well. We do not feel ashamed of it at all,
quite the contrary.

Mind you, we know that everybody is looking forward to a lively
discussion about page forking, as well they should. But it does not
really matter whether that takes place before or after merge. You
know as well as I do that we are collectively smart enough to make
it work, and you probably understand by now why it is worth making
it work. Further, we think it already works, both by analysis and
empirical results of our stress testing.

If you have a _specific_ example of an API issue that is not solvable
in the usual Linux way, please share it.

Regards,

Daniel

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2014-06-24 12:10 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-17  0:50 [RFC] Tux3 for review Daniel Phillips
2014-05-17  5:09 ` Martin Steigerwald
2014-05-17  5:29   ` Daniel Phillips
2014-05-20  6:56     ` Daniel Phillips
2014-05-18 23:55 ` Dave Chinner
2014-05-20  0:55   ` Daniel Phillips
2014-05-20  3:18     ` Dave Chinner
2014-05-20  5:41       ` Daniel Phillips
2014-05-20 17:25         ` Daniel Phillips
2014-06-13 10:32       ` Pavel Machek
2014-06-13 17:49         ` Daniel Phillips
2014-06-13 20:20           ` Pavel Machek
2014-06-15 21:41             ` Daniel Phillips
2014-06-16 15:25               ` James Bottomley
2014-06-19  8:21                 ` Pavel Machek
2014-06-19  9:26                   ` Lukáš Czerner
2014-06-19 21:58                     ` Daniel Phillips
2014-06-21 19:29                       ` James Bottomley
2014-06-22  1:06                         ` Dave Chinner
2014-06-24 11:16                           ` Daniel Phillips
2014-06-22  3:32                         ` Daniel Phillips
2014-06-22 14:43                           ` James Bottomley
     [not found]                             ` <522aee97-34e7-4adc-adf2-c9b73aa0ef36@phunq.net>
2014-06-24  4:41                               ` James Bottomley
2014-06-24  9:10                                 ` Daniel Phillips
2014-06-24 10:59                                   ` Theodore Ts'o
2014-06-24 11:27                                     ` Daniel Phillips
2014-06-24 11:52                                       ` James Bottomley
2014-06-24 12:10                                         ` Daniel Phillips
2014-06-22 18:34                           ` Theodore Ts'o
2014-06-24  0:31                             ` Daniel Phillips
2014-06-24  0:19                         ` Daniel Phillips
2014-05-22  9:52     ` Dongsu Park
2014-05-23  8:21       ` Daniel Phillips
2014-06-19 16:24 ` Josef Bacik
2014-06-19 22:14   ` Daniel Phillips

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).