linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFD] XFS: Subvolumes and snapshots....
@ 2018-01-25  5:51 Dave Chinner
  2018-01-27  8:34 ` Amir Goldstein
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Dave Chinner @ 2018-01-25  5:51 UTC (permalink / raw)
  To: linux-xfs


The video from my talk at LCA 2018 yesterday about the XFS subvolume and
snapshot support I'm working on has been uploaded and can be found
here:

https://www.youtube.com/watch?v=wG8FUvSGROw

I don't have the code in a reviewable form yet - there's still quite
a bit of work before I get to that point, but this is a good
introduction to how all the pieces will fit together....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFD] XFS: Subvolumes and snapshots....
  2018-01-25  5:51 [RFD] XFS: Subvolumes and snapshots Dave Chinner
@ 2018-01-27  8:34 ` Amir Goldstein
  2018-01-27 11:28   ` Dave Chinner
  2018-01-27 17:05 ` Martin Raiber
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2018-01-27  8:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Thu, Jan 25, 2018 at 7:51 AM, Dave Chinner <david@fromorbit.com> wrote:
>
> The video from my talk at LCA 2018 yesterday about the XFS subvolume and
> snapshot support I'm working on has been uploaded and can be found
> here:
>
> https://www.youtube.com/watch?v=wG8FUvSGROw
>
> I don't have the code in a reviewable form yet - there's still quite
> a bit of work before I get to that point, but this is a good
> introduction to how all the pieces will fit together....
>

Very cool!

Got any paper napkin design photo to share?

What are the big unknowns at this point?

Is the data part challenging because of no buffer cache for data?

I suppose all subvolumes use the host fs journal?

Not gonna share this master plan with fsdevel?

Cheers,
Amir.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFD] XFS: Subvolumes and snapshots....
  2018-01-27  8:34 ` Amir Goldstein
@ 2018-01-27 11:28   ` Dave Chinner
  2018-01-27 15:56     ` Amir Goldstein
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2018-01-27 11:28 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-xfs

On Sat, Jan 27, 2018 at 10:34:25AM +0200, Amir Goldstein wrote:
> On Thu, Jan 25, 2018 at 7:51 AM, Dave Chinner <david@fromorbit.com> wrote:
> >
> > The video from my talk at LCA 2018 yesterday about the XFS subvolume and
> > snapshot support I'm working on has been uploaded and can be found
> > here:
> >
> > https://www.youtube.com/watch?v=wG8FUvSGROw
> >
> > I don't have the code in a reviewable form yet - there's still quite
> > a bit of work before I get to that point, but this is a good
> > introduction to how all the pieces will fit together....
> >
> 
> Very cool!
> 
> Got any paper napkin design photo to share?

No. I have some arch docs I wrote after the initial Poc on loopback
devices and a bunch of bash, sed, awk and xfs_io hacks....

> What are the big unknowns at this point?

None - all of the concepts needed for snapshot/clone/repl are now
proven and have a working implementation. Matthew Wilcox has a
pretty good handle on what is needed for page cache sharing, and
encryption is just a matter of implementing the generic
interfaces....

> Is the data part challenging because of no buffer cache for data?

Not at all. I just didn't have time to implement the remapping hooks
into the IO path before I gave the talk.

> I suppose all subvolumes use the host fs journal?

No. A subvolume is a "fully functioning filesystem" and so - by
definition - they each have their own internal journal. The journal
IO remapping and COW functionality all works as seen in that demo...

> Not gonna share this master plan with fsdevel?

There's nothing really to talk about outside of XFS until I split
the device space management API out from the XFS code. And that's
far from my highest priority right now...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFD] XFS: Subvolumes and snapshots....
  2018-01-27 11:28   ` Dave Chinner
@ 2018-01-27 15:56     ` Amir Goldstein
  2018-01-28  1:57       ` Dave Chinner
  0 siblings, 1 reply; 11+ messages in thread
From: Amir Goldstein @ 2018-01-27 15:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

On Sat, Jan 27, 2018 at 1:28 PM, Dave Chinner <david@fromorbit.com> wrote:
> On Sat, Jan 27, 2018 at 10:34:25AM +0200, Amir Goldstein wrote:
>> On Thu, Jan 25, 2018 at 7:51 AM, Dave Chinner <david@fromorbit.com> wrote:
>> >
>> > The video from my talk at LCA 2018 yesterday about the XFS subvolume and
>> > snapshot support I'm working on has been uploaded and can be found
>> > here:
>> >
>> > https://www.youtube.com/watch?v=wG8FUvSGROw
>> >
>> > I don't have the code in a reviewable form yet - there's still quite
>> > a bit of work before I get to that point, but this is a good
>> > introduction to how all the pieces will fit together....
>> >
>>
>> Very cool!
>>
>> Got any paper napkin design photo to share?
>
> No. I have some arch docs I wrote after the initial Poc on loopback
> devices and a bunch of bash, sed, awk and xfs_io hacks....
>
[...]
>
>> I suppose all subvolumes use the host fs journal?
>
> No. A subvolume is a "fully functioning filesystem" and so - by
> definition - they each have their own internal journal. The journal
> IO remapping and COW functionality all works as seen in that demo...

So is FUA from subvolume going to be handled the same as with loop
(fsync of entire image file) or more efficiently? for example by flushing
only dirty pages that are already mapped?

Cheers,
Amir.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFD] XFS: Subvolumes and snapshots....
  2018-01-25  5:51 [RFD] XFS: Subvolumes and snapshots Dave Chinner
  2018-01-27  8:34 ` Amir Goldstein
@ 2018-01-27 17:05 ` Martin Raiber
  2018-01-28  1:59   ` Dave Chinner
  2018-01-28 12:59 ` Martin Steigerwald
  2021-08-23  4:57 ` Chris Dunlop
  3 siblings, 1 reply; 11+ messages in thread
From: Martin Raiber @ 2018-01-27 17:05 UTC (permalink / raw)
  To: Dave Chinner, linux-xfs

On 25.01.2018 06:51 Dave Chinner wrote:
> The video from my talk at LCA 2018 yesterday about the XFS subvolume and
> snapshot support I'm working on has been uploaded and can be found
> here:
>
> https://www.youtube.com/watch?v=wG8FUvSGROw
>
> I don't have the code in a reviewable form yet - there's still quite
> a bit of work before I get to that point, but this is a good
> introduction to how all the pieces will fit together....

Great talk!

I'm pretty sure it's hard to add, but cross-subvolume reflinks would be
something useful for my use case.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFD] XFS: Subvolumes and snapshots....
  2018-01-27 15:56     ` Amir Goldstein
@ 2018-01-28  1:57       ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2018-01-28  1:57 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-xfs

On Sat, Jan 27, 2018 at 05:56:53PM +0200, Amir Goldstein wrote:
> On Sat, Jan 27, 2018 at 1:28 PM, Dave Chinner <david@fromorbit.com> wrote:
> > On Sat, Jan 27, 2018 at 10:34:25AM +0200, Amir Goldstein wrote:
> >> On Thu, Jan 25, 2018 at 7:51 AM, Dave Chinner <david@fromorbit.com> wrote:
> >> >
> >> > The video from my talk at LCA 2018 yesterday about the XFS subvolume and
> >> > snapshot support I'm working on has been uploaded and can be found
> >> > here:
> >> >
> >> > https://www.youtube.com/watch?v=wG8FUvSGROw
> >> >
> >> > I don't have the code in a reviewable form yet - there's still quite
> >> > a bit of work before I get to that point, but this is a good
> >> > introduction to how all the pieces will fit together....
> >> >
> >>
> >> Very cool!
> >>
> >> Got any paper napkin design photo to share?
> >
> > No. I have some arch docs I wrote after the initial Poc on loopback
> > devices and a bunch of bash, sed, awk and xfs_io hacks....
> >
> [...]
> >
> >> I suppose all subvolumes use the host fs journal?
> >
> > No. A subvolume is a "fully functioning filesystem" and so - by
> > definition - they each have their own internal journal. The journal
> > IO remapping and COW functionality all works as seen in that demo...
> 
> So is FUA from subvolume going to be handled the same as with loop
> (fsync of entire image file) or more efficiently? for example by flushing
> only dirty pages that are already mapped?

FUA from the subvoume is mapped directly to the nuderlying device,
just like all other IO. i.e. we never need to "fsync" the underlying
file. We only need to make sure the underlying extent map for the
subvolume is flushed when necessary. This is exactly the same
constraint as the PNFS file layout offload case, handled by the
->commit_metadata() export operation. (i.e
xfs_fs_nfs_commit_metadata()).

(I did mention in the talk that the pNFS model was instructive in
the talk, because it already handles issues like this.... :)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFD] XFS: Subvolumes and snapshots....
  2018-01-27 17:05 ` Martin Raiber
@ 2018-01-28  1:59   ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2018-01-28  1:59 UTC (permalink / raw)
  To: Martin Raiber; +Cc: linux-xfs

On Sat, Jan 27, 2018 at 05:05:31PM +0000, Martin Raiber wrote:
> On 25.01.2018 06:51 Dave Chinner wrote:
> > The video from my talk at LCA 2018 yesterday about the XFS subvolume and
> > snapshot support I'm working on has been uploaded and can be found
> > here:
> >
> > https://www.youtube.com/watch?v=wG8FUvSGROw
> >
> > I don't have the code in a reviewable form yet - there's still quite
> > a bit of work before I get to that point, but this is a good
> > introduction to how all the pieces will fit together....
> 
> Great talk!
> 
> I'm pretty sure it's hard to add, but cross-subvolume reflinks would be
> something useful for my use case.

That's not going to happen. reflinks across subvolumes (i.e. across
independent filesystems) will return -EXDEV. If you want subvolumes
to share data blocks of independent ancestry, then you can run
deduplication operations on the image files.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFD] XFS: Subvolumes and snapshots....
  2018-01-25  5:51 [RFD] XFS: Subvolumes and snapshots Dave Chinner
  2018-01-27  8:34 ` Amir Goldstein
  2018-01-27 17:05 ` Martin Raiber
@ 2018-01-28 12:59 ` Martin Steigerwald
  2018-01-29  1:50   ` Dave Chinner
  2021-08-23  4:57 ` Chris Dunlop
  3 siblings, 1 reply; 11+ messages in thread
From: Martin Steigerwald @ 2018-01-28 12:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Dave Chinner - 25.01.18, 06:51:
> The video from my talk at LCA 2018 yesterday about the XFS subvolume and
> snapshot support I'm working on has been uploaded and can be found
> here:
> 
> https://www.youtube.com/watch?v=wG8FUvSGROw

I somehow knew that something about snapshots would be coming for XFS after 
seeing the reflink / COW and online scrub/repair work by Darrick. But I am 
highly surprised on the how. I also did not really expect pNFS file layout of 
Christoph to play a role here.

It totally makes sense to me right now, but on the other hand I found myself 
thinking "It can´t be that easy, can it?" after watching your talk.

Easy not in amount of coding work needed and some complexities you mentioned, 
so I totally get that it is a lot of work needed to pull this off, but easy in 
terms of the concept behind it. Yet, if a concept is easy that is quite a hint 
that it might actually be a good one. And if you really can get away with it… 
then by all means, have a go at it!

I am looking forward to this new "extraordinary way to eat your data" 
(Darrick) or create "blammo" and "kaboom" (Dave). :)

>From what I understand it is also way less of a "layering violation" than the 
approach in taken in BTRFS or ZFS. Actually it might not be a "layering 
violation" at all, since the different layers are still there and 
communicating with each other. Which opens a lot of potential on applying this 
to other filesystems and storage subsystems of the kernel.

I see benefit in having more than one concept and learn from each other. Maybe 
even a new dog like BTRFS can learn a trick from an old dog at some point in 
time. It sounds crazy to me to think like this at the moment… but for a long 
time it sounded crazy to try to implement snapshots or subvolumes to 
traditional filesystems.

Kudos to thinking out of the box!

-- 
Martin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFD] XFS: Subvolumes and snapshots....
  2018-01-28 12:59 ` Martin Steigerwald
@ 2018-01-29  1:50   ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2018-01-29  1:50 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-xfs

On Sun, Jan 28, 2018 at 01:59:56PM +0100, Martin Steigerwald wrote:
> Dave Chinner - 25.01.18, 06:51:
> > The video from my talk at LCA 2018 yesterday about the XFS
> > subvolume and snapshot support I'm working on has been uploaded
> > and can be found here:
> > 
> > https://www.youtube.com/watch?v=wG8FUvSGROw
> 
> I somehow knew that something about snapshots would be coming for
> XFS after seeing the reflink / COW and online scrub/repair work by
> Darrick. But I am highly surprised on the how. I also did not
> really expect pNFS file layout of Christoph to play a role here.

Once I realised the similarities in isolation a pNFS client and
subvolumes it started to make sense. i.e. they are a 3rd party fs
on top of XFS that needs XFS to tell it how it can access the file
data on underlying block device directly...

> It totally makes sense to me right now, but on the other hand I
> found myself thinking "It can´t be that easy, can it?" after
> watching your talk.

You're not the only one who's asked that question. I have, too, many
times :P

> Easy not in amount of coding work needed and some complexities you
> mentioned, so I totally get that it is a lot of work needed to
> pull this off, but easy in terms of the concept behind it. Yet, if
> a concept is easy that is quite a hint that it might actually be a
> good one. And if you really can get away with it… then by all
> means, have a go at it!
> 
> I am looking forward to this new "extraordinary way to eat your
> data" (Darrick) or create "blammo" and "kaboom" (Dave). :)
> 
> From what I understand it is also way less of a "layering
> violation" than the approach in taken in BTRFS or ZFS. Actually it
> might not be a "layering violation" at all, since the different
> layers are still there and communicating with each other. Which
> opens a lot of potential on applying this to other filesystems and
> storage subsystems of the kernel.

I had a bonus slide in anticipation of the first question being
about "layering violations". :)

I, personally, don't think there are any layering violations because
what I've actually done is add a *new layer to the stack*. The
architectural layer I've added is a virtual block address space
layer - it's a similar concept to ZFS's virtual device layer(*) -
and I avoided re-implementing the wheel (again) by realising that we
could just use a file to provide that virtual block address space
mapping layer.

Old stack			New stack
vfs				vfs
				subvolume (fs)
				virtual address space (file)
filesystem			filesystem
block device			block device
IO remapping (DM, MD, etc)	IO remapping
storage drivers			storage drivers

I've chosen to implement that new layer as a filesystem image in a
file because that directly provides a virtual-to-physical
translation layer without having to implment one. There is no need
to make this more complex than it needs to be by re-inventing the
wheel unnecessarily.

As it is, the kernel itself doesn't care what type of device the
filesystem sits on - filesystems make that choice themselves by
using FS_REQUIRES_BDEV in their fstype definition.  Removing that
flag means XFS is free to parse the "source device" string however
it wants.

Indeed, mount(2) says:

	"mount()  attaches the filesystem specified by source (which
	is often a pathname referring to a device, but can also be
	the pathname of a directory or file, or a dummy string) to
	the location (a directory or  file) specified by the
	pathname in target."

So the mount syscall documentation specifically documents that a
file can be passed to the kernel as a source.

Not only that, users are now accustomed to passing mount(8) image
files directly. i.e.

# mount /path/to/image/file /mntpt

Will automatically mount the image file on the mount point. The
mount(8) binary will quietly create a loopback device behind the
scenes and mount the fs on that loopback device. So from a
management POV, this "mount image files directly" management model
already has widespread acceptance.

Cheers,

Dave.

(*) Despite what most people claim, ZFS is has a very well thought
out, strongly layered architecture - they are just *different
layers* when compared to the traditional filesystem and IO stack.
Maybe I see it differently because I think mostly at the
architectural level, but that's the level at which layering really
matters....

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFD] XFS: Subvolumes and snapshots....
  2018-01-25  5:51 [RFD] XFS: Subvolumes and snapshots Dave Chinner
                   ` (2 preceding siblings ...)
  2018-01-28 12:59 ` Martin Steigerwald
@ 2021-08-23  4:57 ` Chris Dunlop
  2021-08-23 23:12   ` Dave Chinner
  3 siblings, 1 reply; 11+ messages in thread
From: Chris Dunlop @ 2021-08-23  4:57 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs

Hi,

On Thu, Jan 25, 2018 at 04:51:44PM +1100, Dave Chinner wrote:
> The video from my talk at LCA 2018 yesterday about the XFS subvolume and
> snapshot support I'm working on has been uploaded and can be found
> here:
>
> https://www.youtube.com/watch?v=wG8FUvSGROw

Just out of curiosity... is anything still happening in this area, and if 
so, is there anywhere we can look to get a feel for the current state of 
affairs?

Cheers,

Chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFD] XFS: Subvolumes and snapshots....
  2021-08-23  4:57 ` Chris Dunlop
@ 2021-08-23 23:12   ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2021-08-23 23:12 UTC (permalink / raw)
  To: Chris Dunlop; +Cc: linux-xfs

On Mon, Aug 23, 2021 at 02:57:01PM +1000, Chris Dunlop wrote:
> Hi,
> 
> On Thu, Jan 25, 2018 at 04:51:44PM +1100, Dave Chinner wrote:
> > The video from my talk at LCA 2018 yesterday about the XFS subvolume and
> > snapshot support I'm working on has been uploaded and can be found
> > here:
> > 
> > https://www.youtube.com/watch?v=wG8FUvSGROw
> 
> Just out of curiosity... is anything still happening in this area, and if
> so, is there anywhere we can look to get a feel for the current state of
> affairs?

It's at the back of the queue at the moment. There's not enough
time and resources available to do everything we want to do - just
look at the review backlog we already have...

That said, this was largely an experiment to see how easily we could
retrofit subvolumes to XFS, and whether there was a compelling
reason for adding them. While there are some management benefits to
integrating reflink based subvolumes into XFS, the performance and
scalability just isn't there compared to production usage of things
like dm-snapshot.

O(1) snapshot time makes a huge difference to system performance,
but reflink-based snapshots are O(N), not O(1). Hence snapshots run
at about 100k extents/sec so a subvolume with a few million extents
will take 10s of seconds to run a snapshot. During this time, the
subvolume is completely frozen and you can't read from or write to
it....

And that's really the unsolvable problem with a reflink based
snapshot mechanism. Unless there is some other versioning mechanism
in the filesystem metadata, we have to mark all the extents in the
subvolume as shared so the next write will COW them correctly. XFS
does not have that "some other mechanism" like btrfs (COW metadata)
or bcachefs (snapshot epoch in btree keys), so it will never be able
to solve this problem effectively.

That's not to say we'll never add subvolumes and snapshots to XFS,
but because it isn't compellingly better than existing mechanisms
for snapshotting XFS filesystems it really isn't a priority.

As such, if you want a performant, scalable, robust snapshotting
subvolume capable filesystem, bcachefs is the direction you should
be looking. All of the benefits of integrated subvolume snapshots,
yet none of the fundamental architectural deficiencies and design
flaws that limit the practical usability of btrfs for many important
workloads.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-08-23 23:12 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-25  5:51 [RFD] XFS: Subvolumes and snapshots Dave Chinner
2018-01-27  8:34 ` Amir Goldstein
2018-01-27 11:28   ` Dave Chinner
2018-01-27 15:56     ` Amir Goldstein
2018-01-28  1:57       ` Dave Chinner
2018-01-27 17:05 ` Martin Raiber
2018-01-28  1:59   ` Dave Chinner
2018-01-28 12:59 ` Martin Steigerwald
2018-01-29  1:50   ` Dave Chinner
2021-08-23  4:57 ` Chris Dunlop
2021-08-23 23:12   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).