linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: lsf-pc@lists.linux-foundation.org
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Christoph Hellwig <hch@lst.de>, Jan Kara <jack@suse.cz>
Subject: [LSF/MM TOPIC] Lazy file reflink
Date: Fri, 25 Jan 2019 16:27:52 +0200	[thread overview]
Message-ID: <CAOQ4uxgqm-m1Zj073o_vSnwkTbGObJiQ-CdWV2ESd_P-29=jZw@mail.gmail.com> (raw)

Hi,

I would like to discuss the concept of lazy file reflink.
The use case is backup of a very large read-mostly file.
Backup application would like to read consistent content from the
file, "atomic read" sort of speak.

With filesystem that supports reflink, that can be done by:
- Create O_TMPFILE
- Reflink origin to temp file
- Backup from temp file

However, since the origin file is very likely not to be modified,
the reflink step, that may incur lots of metadata updates, is a waste.
Instead, if filesystem could be notified that atomic content was
requested (O_ATOMIC|O_RDONLY or O_CLONE|O_RDONLY),
filesystem could defer reflink to an O_TMPFILE until origin file is
open for write or actually modified.

What I just described above is actually already implemented with
Overlayfs snapshots [1], but for many applications overlayfs snapshots
it is not a practical solution.

I have based my assumption that reflink of a large file may incur
lots of metadata updates on my limited knowledge of xfs reflink
implementation, but perhaps it is not the case for other filesystems?
(btrfs?) and perhaps the current metadata overhead on reflink of a large
file is an implementation detail that could be optimized in the future?

The point of the matter is that there is no API to make an explicit
request for a "volatile reflink" that does not need to survive power
failure and that limits the ability of filesytems to optimize this case.

I realize the "atomic read" requirement is somewhat adjacent to
the "atomic write" [2] requirement, if not only by name, but I am
not sure how much they really share in common?

A somewhat different approach for the problem is for the application
to use fanotify to register for pre-modify callback and implement the
lazy reflink by itself. This could work but will require to extend the
semantics of fanotify and application currently needs to have
CAP_SYS_ADMIN, because it can block access to file indefinitely.

Would love to get some feedback about the concept from filesystem
developers.

Thanks,
Amir.

[1] https://lwn.net/Articles/719772/
[2] https://lwn.net/Articles/715918/

             reply	other threads:[~2019-01-25 14:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-25 14:27 Amir Goldstein [this message]
2019-01-28 12:50 ` [LSF/MM TOPIC] Lazy file reflink Jan Kara
2019-01-28 21:26   ` Dave Chinner
2019-01-28 22:56     ` Amir Goldstein
2019-01-29  0:18       ` Dave Chinner
2019-01-29  7:18         ` Amir Goldstein
2019-01-29 23:01           ` Dave Chinner
2019-01-30 13:30             ` Amir Goldstein
2019-01-31 20:25               ` Chris Murphy
2019-01-31 21:13     ` Matthew Wilcox
2019-02-01 13:49       ` Amir Goldstein
2019-04-27 21:46         ` Amir Goldstein
2019-01-31 20:02 ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOQ4uxgqm-m1Zj073o_vSnwkTbGObJiQ-CdWV2ESd_P-29=jZw@mail.gmail.com' \
    --to=amir73il@gmail.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).