* [RFD] XFS: Subvolumes and snapshots.... @ 2018-01-25 5:51 Dave Chinner 2018-01-27 8:34 ` Amir Goldstein ` (3 more replies) 0 siblings, 4 replies; 11+ messages in thread From: Dave Chinner @ 2018-01-25 5:51 UTC (permalink / raw) To: linux-xfs The video from my talk at LCA 2018 yesterday about the XFS subvolume and snapshot support I'm working on has been uploaded and can be found here: https://www.youtube.com/watch?v=wG8FUvSGROw I don't have the code in a reviewable form yet - there's still quite a bit of work before I get to that point, but this is a good introduction to how all the pieces will fit together.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFD] XFS: Subvolumes and snapshots.... 2018-01-25 5:51 [RFD] XFS: Subvolumes and snapshots Dave Chinner @ 2018-01-27 8:34 ` Amir Goldstein 2018-01-27 11:28 ` Dave Chinner 2018-01-27 17:05 ` Martin Raiber ` (2 subsequent siblings) 3 siblings, 1 reply; 11+ messages in thread From: Amir Goldstein @ 2018-01-27 8:34 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Thu, Jan 25, 2018 at 7:51 AM, Dave Chinner <david@fromorbit.com> wrote: > > The video from my talk at LCA 2018 yesterday about the XFS subvolume and > snapshot support I'm working on has been uploaded and can be found > here: > > https://www.youtube.com/watch?v=wG8FUvSGROw > > I don't have the code in a reviewable form yet - there's still quite > a bit of work before I get to that point, but this is a good > introduction to how all the pieces will fit together.... > Very cool! Got any paper napkin design photo to share? What are the big unknowns at this point? Is the data part challenging because of no buffer cache for data? I suppose all subvolumes use the host fs journal? Not gonna share this master plan with fsdevel? Cheers, Amir. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFD] XFS: Subvolumes and snapshots.... 2018-01-27 8:34 ` Amir Goldstein @ 2018-01-27 11:28 ` Dave Chinner 2018-01-27 15:56 ` Amir Goldstein 0 siblings, 1 reply; 11+ messages in thread From: Dave Chinner @ 2018-01-27 11:28 UTC (permalink / raw) To: Amir Goldstein; +Cc: linux-xfs On Sat, Jan 27, 2018 at 10:34:25AM +0200, Amir Goldstein wrote: > On Thu, Jan 25, 2018 at 7:51 AM, Dave Chinner <david@fromorbit.com> wrote: > > > > The video from my talk at LCA 2018 yesterday about the XFS subvolume and > > snapshot support I'm working on has been uploaded and can be found > > here: > > > > https://www.youtube.com/watch?v=wG8FUvSGROw > > > > I don't have the code in a reviewable form yet - there's still quite > > a bit of work before I get to that point, but this is a good > > introduction to how all the pieces will fit together.... > > > > Very cool! > > Got any paper napkin design photo to share? No. I have some arch docs I wrote after the initial Poc on loopback devices and a bunch of bash, sed, awk and xfs_io hacks.... > What are the big unknowns at this point? None - all of the concepts needed for snapshot/clone/repl are now proven and have a working implementation. Matthew Wilcox has a pretty good handle on what is needed for page cache sharing, and encryption is just a matter of implementing the generic interfaces.... > Is the data part challenging because of no buffer cache for data? Not at all. I just didn't have time to implement the remapping hooks into the IO path before I gave the talk. > I suppose all subvolumes use the host fs journal? No. A subvolume is a "fully functioning filesystem" and so - by definition - they each have their own internal journal. The journal IO remapping and COW functionality all works as seen in that demo... > Not gonna share this master plan with fsdevel? There's nothing really to talk about outside of XFS until I split the device space management API out from the XFS code. And that's far from my highest priority right now... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFD] XFS: Subvolumes and snapshots.... 2018-01-27 11:28 ` Dave Chinner @ 2018-01-27 15:56 ` Amir Goldstein 2018-01-28 1:57 ` Dave Chinner 0 siblings, 1 reply; 11+ messages in thread From: Amir Goldstein @ 2018-01-27 15:56 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs On Sat, Jan 27, 2018 at 1:28 PM, Dave Chinner <david@fromorbit.com> wrote: > On Sat, Jan 27, 2018 at 10:34:25AM +0200, Amir Goldstein wrote: >> On Thu, Jan 25, 2018 at 7:51 AM, Dave Chinner <david@fromorbit.com> wrote: >> > >> > The video from my talk at LCA 2018 yesterday about the XFS subvolume and >> > snapshot support I'm working on has been uploaded and can be found >> > here: >> > >> > https://www.youtube.com/watch?v=wG8FUvSGROw >> > >> > I don't have the code in a reviewable form yet - there's still quite >> > a bit of work before I get to that point, but this is a good >> > introduction to how all the pieces will fit together.... >> > >> >> Very cool! >> >> Got any paper napkin design photo to share? > > No. I have some arch docs I wrote after the initial Poc on loopback > devices and a bunch of bash, sed, awk and xfs_io hacks.... > [...] > >> I suppose all subvolumes use the host fs journal? > > No. A subvolume is a "fully functioning filesystem" and so - by > definition - they each have their own internal journal. The journal > IO remapping and COW functionality all works as seen in that demo... So is FUA from subvolume going to be handled the same as with loop (fsync of entire image file) or more efficiently? for example by flushing only dirty pages that are already mapped? Cheers, Amir. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFD] XFS: Subvolumes and snapshots.... 2018-01-27 15:56 ` Amir Goldstein @ 2018-01-28 1:57 ` Dave Chinner 0 siblings, 0 replies; 11+ messages in thread From: Dave Chinner @ 2018-01-28 1:57 UTC (permalink / raw) To: Amir Goldstein; +Cc: linux-xfs On Sat, Jan 27, 2018 at 05:56:53PM +0200, Amir Goldstein wrote: > On Sat, Jan 27, 2018 at 1:28 PM, Dave Chinner <david@fromorbit.com> wrote: > > On Sat, Jan 27, 2018 at 10:34:25AM +0200, Amir Goldstein wrote: > >> On Thu, Jan 25, 2018 at 7:51 AM, Dave Chinner <david@fromorbit.com> wrote: > >> > > >> > The video from my talk at LCA 2018 yesterday about the XFS subvolume and > >> > snapshot support I'm working on has been uploaded and can be found > >> > here: > >> > > >> > https://www.youtube.com/watch?v=wG8FUvSGROw > >> > > >> > I don't have the code in a reviewable form yet - there's still quite > >> > a bit of work before I get to that point, but this is a good > >> > introduction to how all the pieces will fit together.... > >> > > >> > >> Very cool! > >> > >> Got any paper napkin design photo to share? > > > > No. I have some arch docs I wrote after the initial Poc on loopback > > devices and a bunch of bash, sed, awk and xfs_io hacks.... > > > [...] > > > >> I suppose all subvolumes use the host fs journal? > > > > No. A subvolume is a "fully functioning filesystem" and so - by > > definition - they each have their own internal journal. The journal > > IO remapping and COW functionality all works as seen in that demo... > > So is FUA from subvolume going to be handled the same as with loop > (fsync of entire image file) or more efficiently? for example by flushing > only dirty pages that are already mapped? FUA from the subvoume is mapped directly to the nuderlying device, just like all other IO. i.e. we never need to "fsync" the underlying file. We only need to make sure the underlying extent map for the subvolume is flushed when necessary. This is exactly the same constraint as the PNFS file layout offload case, handled by the ->commit_metadata() export operation. (i.e xfs_fs_nfs_commit_metadata()). (I did mention in the talk that the pNFS model was instructive in the talk, because it already handles issues like this.... :) Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFD] XFS: Subvolumes and snapshots.... 2018-01-25 5:51 [RFD] XFS: Subvolumes and snapshots Dave Chinner 2018-01-27 8:34 ` Amir Goldstein @ 2018-01-27 17:05 ` Martin Raiber 2018-01-28 1:59 ` Dave Chinner 2018-01-28 12:59 ` Martin Steigerwald 2021-08-23 4:57 ` Chris Dunlop 3 siblings, 1 reply; 11+ messages in thread From: Martin Raiber @ 2018-01-27 17:05 UTC (permalink / raw) To: Dave Chinner, linux-xfs On 25.01.2018 06:51 Dave Chinner wrote: > The video from my talk at LCA 2018 yesterday about the XFS subvolume and > snapshot support I'm working on has been uploaded and can be found > here: > > https://www.youtube.com/watch?v=wG8FUvSGROw > > I don't have the code in a reviewable form yet - there's still quite > a bit of work before I get to that point, but this is a good > introduction to how all the pieces will fit together.... Great talk! I'm pretty sure it's hard to add, but cross-subvolume reflinks would be something useful for my use case. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFD] XFS: Subvolumes and snapshots.... 2018-01-27 17:05 ` Martin Raiber @ 2018-01-28 1:59 ` Dave Chinner 0 siblings, 0 replies; 11+ messages in thread From: Dave Chinner @ 2018-01-28 1:59 UTC (permalink / raw) To: Martin Raiber; +Cc: linux-xfs On Sat, Jan 27, 2018 at 05:05:31PM +0000, Martin Raiber wrote: > On 25.01.2018 06:51 Dave Chinner wrote: > > The video from my talk at LCA 2018 yesterday about the XFS subvolume and > > snapshot support I'm working on has been uploaded and can be found > > here: > > > > https://www.youtube.com/watch?v=wG8FUvSGROw > > > > I don't have the code in a reviewable form yet - there's still quite > > a bit of work before I get to that point, but this is a good > > introduction to how all the pieces will fit together.... > > Great talk! > > I'm pretty sure it's hard to add, but cross-subvolume reflinks would be > something useful for my use case. That's not going to happen. reflinks across subvolumes (i.e. across independent filesystems) will return -EXDEV. If you want subvolumes to share data blocks of independent ancestry, then you can run deduplication operations on the image files. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFD] XFS: Subvolumes and snapshots.... 2018-01-25 5:51 [RFD] XFS: Subvolumes and snapshots Dave Chinner 2018-01-27 8:34 ` Amir Goldstein 2018-01-27 17:05 ` Martin Raiber @ 2018-01-28 12:59 ` Martin Steigerwald 2018-01-29 1:50 ` Dave Chinner 2021-08-23 4:57 ` Chris Dunlop 3 siblings, 1 reply; 11+ messages in thread From: Martin Steigerwald @ 2018-01-28 12:59 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs Dave Chinner - 25.01.18, 06:51: > The video from my talk at LCA 2018 yesterday about the XFS subvolume and > snapshot support I'm working on has been uploaded and can be found > here: > > https://www.youtube.com/watch?v=wG8FUvSGROw I somehow knew that something about snapshots would be coming for XFS after seeing the reflink / COW and online scrub/repair work by Darrick. But I am highly surprised on the how. I also did not really expect pNFS file layout of Christoph to play a role here. It totally makes sense to me right now, but on the other hand I found myself thinking "It can´t be that easy, can it?" after watching your talk. Easy not in amount of coding work needed and some complexities you mentioned, so I totally get that it is a lot of work needed to pull this off, but easy in terms of the concept behind it. Yet, if a concept is easy that is quite a hint that it might actually be a good one. And if you really can get away with it… then by all means, have a go at it! I am looking forward to this new "extraordinary way to eat your data" (Darrick) or create "blammo" and "kaboom" (Dave). :) >From what I understand it is also way less of a "layering violation" than the approach in taken in BTRFS or ZFS. Actually it might not be a "layering violation" at all, since the different layers are still there and communicating with each other. Which opens a lot of potential on applying this to other filesystems and storage subsystems of the kernel. I see benefit in having more than one concept and learn from each other. Maybe even a new dog like BTRFS can learn a trick from an old dog at some point in time. It sounds crazy to me to think like this at the moment… but for a long time it sounded crazy to try to implement snapshots or subvolumes to traditional filesystems. Kudos to thinking out of the box! -- Martin ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFD] XFS: Subvolumes and snapshots.... 2018-01-28 12:59 ` Martin Steigerwald @ 2018-01-29 1:50 ` Dave Chinner 0 siblings, 0 replies; 11+ messages in thread From: Dave Chinner @ 2018-01-29 1:50 UTC (permalink / raw) To: Martin Steigerwald; +Cc: linux-xfs On Sun, Jan 28, 2018 at 01:59:56PM +0100, Martin Steigerwald wrote: > Dave Chinner - 25.01.18, 06:51: > > The video from my talk at LCA 2018 yesterday about the XFS > > subvolume and snapshot support I'm working on has been uploaded > > and can be found here: > > > > https://www.youtube.com/watch?v=wG8FUvSGROw > > I somehow knew that something about snapshots would be coming for > XFS after seeing the reflink / COW and online scrub/repair work by > Darrick. But I am highly surprised on the how. I also did not > really expect pNFS file layout of Christoph to play a role here. Once I realised the similarities in isolation a pNFS client and subvolumes it started to make sense. i.e. they are a 3rd party fs on top of XFS that needs XFS to tell it how it can access the file data on underlying block device directly... > It totally makes sense to me right now, but on the other hand I > found myself thinking "It can´t be that easy, can it?" after > watching your talk. You're not the only one who's asked that question. I have, too, many times :P > Easy not in amount of coding work needed and some complexities you > mentioned, so I totally get that it is a lot of work needed to > pull this off, but easy in terms of the concept behind it. Yet, if > a concept is easy that is quite a hint that it might actually be a > good one. And if you really can get away with it… then by all > means, have a go at it! > > I am looking forward to this new "extraordinary way to eat your > data" (Darrick) or create "blammo" and "kaboom" (Dave). :) > > From what I understand it is also way less of a "layering > violation" than the approach in taken in BTRFS or ZFS. Actually it > might not be a "layering violation" at all, since the different > layers are still there and communicating with each other. Which > opens a lot of potential on applying this to other filesystems and > storage subsystems of the kernel. I had a bonus slide in anticipation of the first question being about "layering violations". :) I, personally, don't think there are any layering violations because what I've actually done is add a *new layer to the stack*. The architectural layer I've added is a virtual block address space layer - it's a similar concept to ZFS's virtual device layer(*) - and I avoided re-implementing the wheel (again) by realising that we could just use a file to provide that virtual block address space mapping layer. Old stack New stack vfs vfs subvolume (fs) virtual address space (file) filesystem filesystem block device block device IO remapping (DM, MD, etc) IO remapping storage drivers storage drivers I've chosen to implement that new layer as a filesystem image in a file because that directly provides a virtual-to-physical translation layer without having to implment one. There is no need to make this more complex than it needs to be by re-inventing the wheel unnecessarily. As it is, the kernel itself doesn't care what type of device the filesystem sits on - filesystems make that choice themselves by using FS_REQUIRES_BDEV in their fstype definition. Removing that flag means XFS is free to parse the "source device" string however it wants. Indeed, mount(2) says: "mount() attaches the filesystem specified by source (which is often a pathname referring to a device, but can also be the pathname of a directory or file, or a dummy string) to the location (a directory or file) specified by the pathname in target." So the mount syscall documentation specifically documents that a file can be passed to the kernel as a source. Not only that, users are now accustomed to passing mount(8) image files directly. i.e. # mount /path/to/image/file /mntpt Will automatically mount the image file on the mount point. The mount(8) binary will quietly create a loopback device behind the scenes and mount the fs on that loopback device. So from a management POV, this "mount image files directly" management model already has widespread acceptance. Cheers, Dave. (*) Despite what most people claim, ZFS is has a very well thought out, strongly layered architecture - they are just *different layers* when compared to the traditional filesystem and IO stack. Maybe I see it differently because I think mostly at the architectural level, but that's the level at which layering really matters.... -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFD] XFS: Subvolumes and snapshots.... 2018-01-25 5:51 [RFD] XFS: Subvolumes and snapshots Dave Chinner ` (2 preceding siblings ...) 2018-01-28 12:59 ` Martin Steigerwald @ 2021-08-23 4:57 ` Chris Dunlop 2021-08-23 23:12 ` Dave Chinner 3 siblings, 1 reply; 11+ messages in thread From: Chris Dunlop @ 2021-08-23 4:57 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-xfs Hi, On Thu, Jan 25, 2018 at 04:51:44PM +1100, Dave Chinner wrote: > The video from my talk at LCA 2018 yesterday about the XFS subvolume and > snapshot support I'm working on has been uploaded and can be found > here: > > https://www.youtube.com/watch?v=wG8FUvSGROw Just out of curiosity... is anything still happening in this area, and if so, is there anywhere we can look to get a feel for the current state of affairs? Cheers, Chris ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFD] XFS: Subvolumes and snapshots.... 2021-08-23 4:57 ` Chris Dunlop @ 2021-08-23 23:12 ` Dave Chinner 0 siblings, 0 replies; 11+ messages in thread From: Dave Chinner @ 2021-08-23 23:12 UTC (permalink / raw) To: Chris Dunlop; +Cc: linux-xfs On Mon, Aug 23, 2021 at 02:57:01PM +1000, Chris Dunlop wrote: > Hi, > > On Thu, Jan 25, 2018 at 04:51:44PM +1100, Dave Chinner wrote: > > The video from my talk at LCA 2018 yesterday about the XFS subvolume and > > snapshot support I'm working on has been uploaded and can be found > > here: > > > > https://www.youtube.com/watch?v=wG8FUvSGROw > > Just out of curiosity... is anything still happening in this area, and if > so, is there anywhere we can look to get a feel for the current state of > affairs? It's at the back of the queue at the moment. There's not enough time and resources available to do everything we want to do - just look at the review backlog we already have... That said, this was largely an experiment to see how easily we could retrofit subvolumes to XFS, and whether there was a compelling reason for adding them. While there are some management benefits to integrating reflink based subvolumes into XFS, the performance and scalability just isn't there compared to production usage of things like dm-snapshot. O(1) snapshot time makes a huge difference to system performance, but reflink-based snapshots are O(N), not O(1). Hence snapshots run at about 100k extents/sec so a subvolume with a few million extents will take 10s of seconds to run a snapshot. During this time, the subvolume is completely frozen and you can't read from or write to it.... And that's really the unsolvable problem with a reflink based snapshot mechanism. Unless there is some other versioning mechanism in the filesystem metadata, we have to mark all the extents in the subvolume as shared so the next write will COW them correctly. XFS does not have that "some other mechanism" like btrfs (COW metadata) or bcachefs (snapshot epoch in btree keys), so it will never be able to solve this problem effectively. That's not to say we'll never add subvolumes and snapshots to XFS, but because it isn't compellingly better than existing mechanisms for snapshotting XFS filesystems it really isn't a priority. As such, if you want a performant, scalable, robust snapshotting subvolume capable filesystem, bcachefs is the direction you should be looking. All of the benefits of integrated subvolume snapshots, yet none of the fundamental architectural deficiencies and design flaws that limit the practical usability of btrfs for many important workloads. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2021-08-23 23:12 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-01-25 5:51 [RFD] XFS: Subvolumes and snapshots Dave Chinner 2018-01-27 8:34 ` Amir Goldstein 2018-01-27 11:28 ` Dave Chinner 2018-01-27 15:56 ` Amir Goldstein 2018-01-28 1:57 ` Dave Chinner 2018-01-27 17:05 ` Martin Raiber 2018-01-28 1:59 ` Dave Chinner 2018-01-28 12:59 ` Martin Steigerwald 2018-01-29 1:50 ` Dave Chinner 2021-08-23 4:57 ` Chris Dunlop 2021-08-23 23:12 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).