All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@dilger.ca>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Dave Chinner <david@fromorbit.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	fdmanana@kernel.org, fstests@vger.kernel.org,
	linux-btrfs@vger.kernel.org, Filipe Manana <fdmanana@suse.com>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [patch] file dedupe (and maybe clone) data corruption (was Re: [PATCH] generic: test for deduplication between different files)
Date: Mon, 1 Oct 2018 14:34:08 -0600	[thread overview]
Message-ID: <D2D8FCF8-165B-44DE-ABC6-39C22C6E9A10@dilger.ca> (raw)
In-Reply-To: <20180921044013.GD11392@hungrycats.org>

[-- Attachment #1: Type: text/plain, Size: 3763 bytes --]

On Sep 20, 2018, at 10:40 PM, Zygo Blaxell <ce3g8jdj@umail.furryterror.org> wrote:
> 
> On Fri, Sep 21, 2018 at 12:59:31PM +1000, Dave Chinner wrote:
>> On Wed, Sep 19, 2018 at 12:12:03AM -0400, Zygo Blaxell wrote:
> [...]
>> With no DMAPI in the future, people with custom HSM-like interfaces
>> based on dmapi are starting to turn to fanotify and friends to
>> provide them with the change notifications they require....
> 
> I had a fanotify-based scanner once, before I noticed btrfs effectively
> had timestamps all over its metadata.
> 
> fanotify won't tell me which parts of a file were modified (unless it
> got that feature in the last few years?).  fanotify was pretty useless
> when the only file on the system that was being modified was a 13TB
> VM image.  Or even a little 16GB one.  Has to scan the whole file to
> find the one new byte.  Even on desktops the poor thing spends most of
> its time looping over /var/log/messages.  It was sad.
> 
> If fanotify gave me (inode, offset, length) tuples of dirty pages in
> cache, I could look them up and use a dedupe_file_range call to replace
> the dirty pages with a reference to an existing disk block.  If my
> listener can do that fast enough, it's in-band dedupe; if it doesn't,
> the data gets flushed to disk as normal, and I fall back to a scan of
> the filesystem to clean it up later.
> 
>>>> e.g. a soft requirement is that we need to scan the entire fs at
>>>> least once a month.
>>> 
>>> I have to scan and dedupe multiple times per hour.  OK, the first-ever
>>> scan of a non-empty filesystem is allowed to take much longer, but after
>>> that, if you have enough spare iops for continuous autodefrag you should
>>> also have spare iops for continuous dedupe.
>> 
>> Yup, but using notifications avoids the for even these scans - you'd
>> know exactly what data has changed, when it changed, and know
>> exactly that you needed to read to calculate the new hashes.
> 
> ...if the scanner can keep up with the notifications; otherwise, the
> notification receiver has to log them somewhere for the scanner to
> catch up.  If there are missed or dropped notifications--or 23 hours a
> day we're not listening for notifications because we only have an hour
> a day maintenance window--some kind of filesystem scan has to be done
> after the fact anyway.

It is worthwhile to mention that Lustre has a persistent Changelog record
that is generated atomically with the filesystem transaction that the event
happened in.

Once there is a Changelog consumer that registers itself with the filesystem,
along with a mask of the event types that it is interested in, the Changelog
begins recording all such events to disk (e.g. create, mkdir, setattr, etc.).
The Changelog consumer periodically notifies the filesystem when it has
processed events up to X, so that it can purge old events from the log.  It
is possible to have multiple consumers registered, and the log is only purged
up to the slowest consumer.

If a consumer hasn't processed logs in some (relatively long) time (e.g. many
days or weeks), or if the filesystem is otherwise going to run out of space,
then the consumer is deregistered and the old log records are cleaned up.  This
also notifies the consumer that it is is no longer active, and it has to do a
full scan to update its state for the events that it missed.

Having a persistent changelog is useful for all kinds of event processing,
and avoids the need to do real-time processing.  If the userspace daemon fails,
or the system is restarted, etc. then there is no need to rescan the whole
filesystem, which is important when there are many billions of files therein.


Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

  reply	other threads:[~2018-10-01 20:34 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-17  8:39 [PATCH] generic: test for deduplication between different files fdmanana
2018-08-19 14:07 ` Eryu Guan
2018-08-19 15:41   ` Filipe Manana
2018-08-19 16:19     ` Eryu Guan
2018-08-19 16:21       ` Filipe Manana
2018-08-19 23:11 ` Dave Chinner
2018-08-20  1:09   ` [patch] file dedupe (and maybe clone) data corruption (was Re: [PATCH] generic: test for deduplication between different files) Dave Chinner
2018-08-20 15:33     ` Darrick J. Wong
2018-08-21  0:49       ` Dave Chinner
2018-08-21  1:17         ` Eric Sandeen
2018-08-21  4:49           ` Dave Chinner
2018-08-23 12:58       ` Zygo Blaxell
2018-08-24  2:19         ` Zygo Blaxell
2018-08-30  6:27         ` Dave Chinner
2018-08-31  5:10           ` Zygo Blaxell
2018-09-06  8:38             ` Dave Chinner
2018-09-07  3:53               ` Zygo Blaxell
2018-09-10  9:06                 ` Dave Chinner
2018-09-19  4:12                   ` Zygo Blaxell
2018-09-21  2:59                     ` Dave Chinner
2018-09-21  4:40                       ` Zygo Blaxell
2018-10-01 20:34                         ` Andreas Dilger [this message]
2018-08-21 15:55     ` Filipe Manana
2018-08-21 15:57   ` [PATCH] generic: test for deduplication between different files Filipe Manana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D2D8FCF8-165B-44DE-ABC6-39C22C6E9A10@dilger.ca \
    --to=adilger@dilger.ca \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=fdmanana@kernel.org \
    --cc=fdmanana@suse.com \
    --cc=fstests@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.