All of lore.kernel.org
 help / color / mirror / Atom feed
* Reading about CoW architecture / Performance Limits
@ 2017-01-10  7:07 Christian Theune
  2017-01-10  7:45 ` Darrick J. Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Christian Theune @ 2017-01-10  7:07 UTC (permalink / raw)
  To: linux-xfs

[-- Attachment #1: Type: text/plain, Size: 1177 bytes --]

Hi,

as XFS is gaining CoW support I’d like to understand the implementation on a specific aspect: we’re using CoW for making disk image backups as image files in btrfs. This has proven prohibitive once the chain of CoW reflinks grows too long and everything becomes too fragmented. btrfs has improved in some places but the issue still persists.

We’re currently considering to move away from CoW filesystems for our use case and implement a higher level strategy. I now wonder whether XFS will have the same issue or whether the architecture is different in a significant way that will avoid prohibitive performance regressions on long CoW chains (think: hundreds to a few thousand).

I would appreciate a pointer where to look at - I’m a coder but following kernel code to understand architecture hasn’t been successful/efficient for me in the past …

Kind regards,
Christian

--
Christian Theune · ct@flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 496 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Reading about CoW architecture / Performance Limits
  2017-01-10  7:07 Reading about CoW architecture / Performance Limits Christian Theune
@ 2017-01-10  7:45 ` Darrick J. Wong
       [not found]   ` <C97BB12C-3C30-493B-BE9A-9E8C7CB5D1A4@flyingcircus.io>
  0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2017-01-10  7:45 UTC (permalink / raw)
  To: Christian Theune; +Cc: linux-xfs

On Tue, Jan 10, 2017 at 08:07:39AM +0100, Christian Theune wrote:
> Hi,
> 
> As XFS is gaining CoW support I’d like to understand the
> implementation on a specific aspect: we’re using CoW for making disk
> image backups as image files in btrfs. This has proven prohibitive
> once the chain of CoW reflinks grows too long and everything becomes
> too fragmented. btrfs has improved in some places but the issue still
> persists.

As in making snapshots of a disk image via something like
"cp --reflink=always a.img a.img.20170110" ?

> We’re currently considering to move away from CoW filesystems for our
> use case and implement a higher level strategy. I now wonder whether
> XFS will have the same issue or whether the architecture is different
> in a significant way that will avoid prohibitive performance
> regressions on long CoW chains (think: hundreds to a few thousand).

The primary strategies XFS uses to combat fragmentation are a
combination of reusing the delayed allocation mechanism to defer CoW
block allocation as long as possible in the hopes of being able to make
larger requests; and implementing the "CoW extent size hint" (default 32
blocks or 128K) which rounds the start and end of an allocation request
to the nearest $cowextsize boundary.  So for example if you write to 32
adjacent shared blocks in random order, they'll end up on disk with a
single 128K extent, if possible.

Note also that XFS only performs CoW if the block is shared, so if you
write the same shared block in a file 20 times, the first write goes to
a new block and the next 19 overwrite that new block.  There will not be
another CoW unless you reflink the file again.

> I would appreciate a pointer where to look at - I’m a coder but
> following kernel code to understand architecture hasn’t been
> successful/efficient for me in the past …

You might try reading the huge comment blocks in fs/xfs/xfs_reflink.c.

--D

> 
> Kind regards,
> Christian
> 
> --
> Christian Theune · ct@flyingcircus.io · +49 345 219401 0
> Flying Circus Internet Operations GmbH · http://flyingcircus.io
> Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Reading about CoW architecture / Performance Limits
       [not found]   ` <C97BB12C-3C30-493B-BE9A-9E8C7CB5D1A4@flyingcircus.io>
@ 2017-01-11  7:59     ` Darrick J. Wong
       [not found]     ` <9713E613-6953-4AD3-89B1-C0EF639E771C@flyingcircus.io>
  1 sibling, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2017-01-11  7:59 UTC (permalink / raw)
  To: Christian Theune; +Cc: linux-xfs

On Tue, Jan 10, 2017 at 11:54:23AM +0100, Christian Theune wrote:
> Hi,
> 
> > On 10 Jan 2017, at 08:45, Darrick J. Wong <darrick.wong@oracle.com> wrote:
> > 
> > As in making snapshots of a disk image via something like
> > "cp --reflink=always a.img a.img.20170110” ?
> 
> Yes. Or rather in our case:
> 
> cp —reflink=always a-20170109.img a-20170110.img
> 
> and then go to the live storage and retrieve the changes from its
> 20170109 snapshot to the 20170110 snapshot and write them into the
> reflink-copied a-201701010.img
> 
> Once a backup expires we just delete the file. This perpetuates based
> on the backup schema.

<nod>

> >> We’re currently considering to move away from CoW filesystems for our
> >> use case and implement a higher level strategy. I now wonder whether
> >> XFS will have the same issue or whether the architecture is different
> >> in a significant way that will avoid prohibitive performance
> >> regressions on long CoW chains (think: hundreds to a few thousand).
> > 
> > The primary strategies XFS uses to combat fragmentation are a
> > combination of reusing the delayed allocation mechanism to defer CoW
> > block allocation as long as possible in the hopes of being able to make
> > larger requests; and implementing the "CoW extent size hint" (default 32
> > blocks or 128K) which rounds the start and end of an allocation request
> > to the nearest $cowextsize boundary.  So for example if you write to 32
> > adjacent shared blocks in random order, they'll end up on disk with a
> > single 128K extent, if possible.
> 
> Ah. In our case even larger extends might make sense, like 4MiB or such.

Perhaps.  You're only likely to see benefits if you actually write
4MB chunks.

> > Note also that XFS only performs CoW if the block is shared, so if you
> > write the same shared block in a file 20 times, the first write goes to
> > a new block and the next 19 overwrite that new block.  There will not be
> > another CoW unless you reflink the file again.
> 
> Actually every snapshot will be written exactly once, so depending on
> the workload larger extents might cause higher overhead (or will the
> hint + deferred still make smaller extents if only a small piece was
> changed?) if the overwrite ratio is small.

It'll make smaller extents if only a small piece gets changed.  We don't
try any tricks like preemptively CoWing non-dirty data to reduce
fragmentation.

> We definitely write all changes that exist sequentially (and skip the
> non-changed areas).
> 
> In our schema a new reflink would be created either every hour or
> every day. For hourly backups that’s a bit less than 9k “reflink
> generations” per year. For long running instances this can be in the
> range of 5-6 years for us easily.

~60,000, that will be interesting.  Haven't gotten that high in normal
usage, though a couple of the xfstests shoot for sharing the same block
1 million times to see how well the FS responds.

--D

> >> I would appreciate a pointer where to look at - I’m a coder but
> >> following kernel code to understand architecture hasn’t been
> >> successful/efficient for me in the past …
> > 
> > You might try reading the huge comment blocks in fs/xfs/xfs_reflink.c.
> 
> Great, thanks! I admit not having looked there myself as I didn’t
> expect it. Lesson learned!
> 
> Christian
> 
> --
> Christian Theune · ct@flyingcircus.io · +49 345 219401 0
> Flying Circus Internet Operations GmbH · http://flyingcircus.io
> Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Reading about CoW architecture / Performance Limits
       [not found]     ` <9713E613-6953-4AD3-89B1-C0EF639E771C@flyingcircus.io>
@ 2017-01-11  8:05       ` Darrick J. Wong
  2017-01-11  9:10         ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Darrick J. Wong @ 2017-01-11  8:05 UTC (permalink / raw)
  To: Christian Theune; +Cc: linux-xfs

On Tue, Jan 10, 2017 at 12:08:39PM +0100, Christian Theune wrote:
> Hi,
> 
> > On 10 Jan 2017, at 11:54, Christian Theune <ct@flyingcircus.io> wrote:
> > 
> > Hi,
> > 
> >> On 10 Jan 2017, at 08:45, Darrick J. Wong <darrick.wong@oracle.com <mailto:darrick.wong@oracle.com>> wrote:
> >> 
> > 
> >> You might try reading the huge comment blocks in fs/xfs/xfs_reflink.c.
> > 
> > Great, thanks! I admit not having looked there myself as I didn’t expect it. Lesson learned!
> 
> having read the code I think I understood three things:
> 
> 1. As you said, fragmentation may become an issue if the block
> allocation doesn’t manage to keep things together. This depends on the
> actual traffic patterns. (And probably free space?) I guess for long
> running environments regular/continuous reorganization would make
> sense?

It could, but keep in mind that xfs_fsr will break reflinks.

> 2. There is no data structure that introduces an additional
> “generational” penalty for CoW upon CoW upon CoW … ?

Nothing in reflink itself should do that.  The (also experimental)
reverse mapping feature will slowly consume space storing all the
backrefs, though there isn't any special overhead for shared stuff.
Reverse mappings are (at the moment) only useful for online metadata
reconstruction.

> 3. There appears to be code that allows retroactively creating CoW for
> two files that existed separately before. Is that (planned to be)
> exposed to userland?

No plans for that at the moment.  There used to be a debugging knob, but
it was ripped out before upstreaming the code.

--D

> 
> Cheers,
> Christian
> 
> --
> Christian Theune · ct@flyingcircus.io · +49 345 219401 0
> Flying Circus Internet Operations GmbH · http://flyingcircus.io
> Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick
> 



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Reading about CoW architecture / Performance Limits
  2017-01-11  8:05       ` Darrick J. Wong
@ 2017-01-11  9:10         ` Christoph Hellwig
  2017-01-11 17:52           ` Darrick J. Wong
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2017-01-11  9:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christian Theune, linux-xfs

On Wed, Jan 11, 2017 at 12:05:44AM -0800, Darrick J. Wong wrote:
> > 3. There appears to be code that allows retroactively creating CoW for
> > two files that existed separately before. Is that (planned to be)
> > exposed to userland?
> 
> No plans for that at the moment.  There used to be a debugging knob, but
> it was ripped out before upstreaming the code.

I think he's asking for the dedup ioctl, which is supported.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Reading about CoW architecture / Performance Limits
  2017-01-11  9:10         ` Christoph Hellwig
@ 2017-01-11 17:52           ` Darrick J. Wong
  0 siblings, 0 replies; 6+ messages in thread
From: Darrick J. Wong @ 2017-01-11 17:52 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Christian Theune, linux-xfs

On Wed, Jan 11, 2017 at 01:10:11AM -0800, Christoph Hellwig wrote:
> On Wed, Jan 11, 2017 at 12:05:44AM -0800, Darrick J. Wong wrote:
> > > 3. There appears to be code that allows retroactively creating CoW for
> > > two files that existed separately before. Is that (planned to be)
> > > exposed to userland?
> > 
> > No plans for that at the moment.  There used to be a debugging knob, but
> > it was ripped out before upstreaming the code.
> 
> I think he's asking for the dedup ioctl, which is supported.

Ah, that could be the case.  If you were asking about deduplication,
then yes, XFS supports that too.  duperemove supports XFS since August.
Some of the newer dedup tools that also want the space map (bees) might
support XFS once we get GETFSMAP working.

--D

> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-01-11 17:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-10  7:07 Reading about CoW architecture / Performance Limits Christian Theune
2017-01-10  7:45 ` Darrick J. Wong
     [not found]   ` <C97BB12C-3C30-493B-BE9A-9E8C7CB5D1A4@flyingcircus.io>
2017-01-11  7:59     ` Darrick J. Wong
     [not found]     ` <9713E613-6953-4AD3-89B1-C0EF639E771C@flyingcircus.io>
2017-01-11  8:05       ` Darrick J. Wong
2017-01-11  9:10         ` Christoph Hellwig
2017-01-11 17:52           ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.