All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Jan Kara <jack@suse.cz>
Cc: linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Dave Chinner <david@fromorbit.com>
Subject: Re: thoughts about fanotify and HSM
Date: Wed, 23 Nov 2022 17:16:23 +0200	[thread overview]
Message-ID: <CAOQ4uxg0hfuyQk3dBXfF2VTtfyOg5bD_NPrz0VOSFuVoA4vpuw@mail.gmail.com> (raw)
In-Reply-To: <20221123101021.7a65fgjop3d45ryq@quack3>

On Wed, Nov 23, 2022 at 12:10 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 16-11-22 18:24:06, Amir Goldstein wrote:
> > > > Why then give up on the POST_WRITE events idea?
> > > > Don't you think it could work?
> > >
> > > So as we are discussing, the POST_WRITE event is not useful when we want to
> > > handle crash safety. And if we have some other mechanism (like SRCU) which
> > > is able to guarantee crash safety, then what is the benefit of POST_WRITE?
> > > I'm not against POST_WRITE, I just don't see much value in it if we have
> > > another mechanism to deal with events straddling checkpoint.
> > >
> >
> > Not sure I follow.
> >
> > I think that crash safety can be achieved also with PRE/POST_WRITE:
> > - PRE_WRITE records an intent to write in persistent snapshot T
> >   and add to in-memory map of in-progress writes of period T
> > - When "checkpoint T" starts, new PRE_WRITES are recorded in both
> >   T and T+1 persistent snapshots, but event is added only to
> >   in-memory map of in-progress writes of period T+1
> > - "checkpoint T" ends when all in-progress writes of T are completed
>
> So maybe I miss something but suppose the situation I was mentioning few
> emails earlier:
>
> PRE_WRITE for F                 -> F recorded as modified in T
> modify F
> POST_WRITE for F
>
> PRE_WRITE for F                 -> ignored because F is already marked as
>                                    modified
>
>                                 -> checkpoint T requested, modified files
>                                    reported, process modified files
> modify F
> --------- crash
>
> Now unless filesystem freeze or SRCU is part of checkpoint, we will never
> notify about the last modification to F. So I don't see how PRE +
> POST_WRITE alone can achieve crash safety...
>
> And if we use filesystem freeze or SRCU as part of checkpoint, then
> processing of POST_WRITE events does not give us anything new. E.g.
> synchronize_srcu() during checkpoing before handing out list of modified
> files makes sure all modifications to files for which PRE_MODIFY events
> were generated (and thus are listed as modified in checkpoint T) are
> visible for userspace.
>
> So am I missing some case where POST_WRITE would be more useful than SRCU?
> Because at this point I'd rather implement SRCU than POST_WRITE.
>

I tend to agree. Even if POST_WRITE can be done,
SRCU will be far better.

> > The trick with alternating snapshots "handover" is this
> > (perhaps I never explained it and I need to elaborate on the wiki [1]):
> >
> > [1] https://github.com/amir73il/fsnotify-utils/wiki/Hierarchical-Storage-Management-API#Modified_files_query
> >
> > The changed files query results need to include recorded changes in both
> > "finalizing" snapshot T and the new snapshot T+1 that was started in
> > the beginning of the query.
> >
> > Snapshot T MUST NOT be discarded until checkpoint/handover
> > is complete AND the query results that contain changes recorded
> > in T and T+1 snapshots have been consumed.
> >
> > When the consumer ACKs that the query results have been safely stored
> > or acted upon (I called this operation "bless" snapshot T+1) then and
> > only then can snapshot T be discarded.
> >
> > After snapshot T is discarded a new query will start snapshot T+2.
> > A changed files query result includes the id of the last blessed snapshot.
> >
> > I think this is more or less equivalent to the SRCU that you suggested,
> > but all the work is done in userspace at application level.
> >
> > If you see any problem with this scheme or don't understand it
> > please let me know and I will try to explain better.
>
> So until now I was imagining that query results will be returned like a one
> big memcpy. I.e. one off event where the "persistent log daemon" hands over
> the whole contents of checkpoint T to the client. Whatever happens with the
> returned data is the bussiness of the client, whatever happens with the
> checkpoint T records in the daemon is the daemon's bussiness. The model you
> seem to speak about here is somewhat different - more like readdir() kind
> of approach where client asks for access to checkpoint T data, daemon
> provides the data record by record (probably serving the data from its
> files on disk), and when the client is done and "closes" checkpoint T,
> daemon's records about checkpoint T can be erased. Am I getting it right?
>

Yes, something like that.
The query result (which is actually a recursive readdir) could be huge.
So it cannot really be returned as a blob, it must be steamed to consumers.

> This however seems somewhat orthogonal to the SRCU idea. SRCU essentially
> serves the only purpose - make sure that modifications to all files for
> which we have received PRE_WRITE event are visible in respective files.
>

Absolutely right.
Sorry for the noise, but at least you've learned one more thing
about my persistent change snapshots architecture ;-)

Thanks,
Amir.

  reply	other threads:[~2022-11-23 15:16 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-11 18:12 thoughts about fanotify and HSM Amir Goldstein
2022-09-12 12:57 ` Jan Kara
2022-09-12 16:38   ` Amir Goldstein
     [not found]     ` <BY5PR07MB652953061D3A2243F66F0798A3449@BY5PR07MB6529.namprd07.prod.outlook.com>
2022-09-13  2:41       ` Amir Goldstein
2022-09-14  7:27     ` Amir Goldstein
2022-09-14 10:30       ` Jan Kara
2022-09-14 11:52         ` Amir Goldstein
2022-09-20 18:19           ` Amir Goldstein
2022-09-22 10:48             ` Jan Kara
2022-09-22 13:03               ` Amir Goldstein
2022-09-26 15:27                 ` Jan Kara
2022-09-28 12:29                   ` Amir Goldstein
2022-09-29 10:01                     ` Jan Kara
2022-10-07 13:58                       ` Amir Goldstein
2022-10-12 15:44                         ` Jan Kara
2022-10-12 16:28                           ` Amir Goldstein
2022-10-13 12:16                             ` Amir Goldstein
2022-11-03 12:57                               ` Jan Kara
2022-11-03 13:38                                 ` Amir Goldstein
2022-10-28 12:50               ` Amir Goldstein
2022-11-03 16:30                 ` Jan Kara
2022-11-04  8:17                   ` Amir Goldstein
2022-11-07 11:10                     ` Jan Kara
2022-11-07 14:13                       ` Amir Goldstein
2022-11-14 19:17                         ` Jan Kara
2022-11-14 20:08                           ` Amir Goldstein
2022-11-15 10:16                             ` Jan Kara
2022-11-15 13:08                               ` Amir Goldstein
2022-11-16 10:56                                 ` Jan Kara
2022-11-16 16:24                                   ` Amir Goldstein
2022-11-17 12:38                                     ` Amir Goldstein
2022-11-23 10:49                                       ` Jan Kara
2022-11-23 13:07                                         ` Amir Goldstein
2022-11-21 16:40                                     ` Amir Goldstein
2022-11-23 12:11                                       ` Jan Kara
2022-11-23 13:30                                         ` Amir Goldstein
2022-11-23 10:10                                     ` Jan Kara
2022-11-23 15:16                                       ` Amir Goldstein [this message]
     [not found]     ` <BY5PR07MB6529795F49FB4E923AFCB062A3449@BY5PR07MB6529.namprd07.prod.outlook.com>
2022-09-14  9:29       ` Jan Kara
2022-09-21 23:27 ` Dave Chinner
2022-09-22  4:35   ` Amir Goldstein
2022-09-23  7:57     ` Dave Chinner
2022-09-23 11:22       ` Amir Goldstein

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ4uxg0hfuyQk3dBXfF2VTtfyOg5bD_NPrz0VOSFuVoA4vpuw@mail.gmail.com \
    --to=amir73il@gmail.com \
    --cc=david@fromorbit.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.