Re: robinhood, fanotify name info events and lustre changelog

From: "Quentin.BOUGET@cea.fr" <Quentin.BOUGET@cea.fr>
To: Amir Goldstein <amir73il@gmail.com>,
	Dominique Martinet <asmadeus@codewreck.org>
Cc: Jan Kara <jack@suse.cz>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-api@vger.kernel.org" <linux-api@vger.kernel.org>,
	"robinhood-devel@lists.sf.net" <robinhood-devel@lists.sf.net>
Subject: Re: robinhood, fanotify name info events and lustre changelog
Date: Mon, 1 Jun 2020 19:46:15 +0000	[thread overview]
Message-ID: <1591040775412.28640@cea.fr> (raw)
In-Reply-To: <CAOQ4uxiE9R4gRGwQQETvWK7SLm4J60SvfrSAOZxYJdRHquAwtA@mail.gmail.com>

> > > > I am guessing the most interesting bits for this discussion should be found
> > > > here:
> > > > https://github.com/cea-hpc/robinhood/blob/v4/include/robinhood/fsevent.h
> > > >
> > > 
> > > That is a very well documented API and a valuable resource for me.

Thank you!

> > > Notes for API choices that are aligned with current fanotify plans:
> > > - The combination of parent fid + object fid without name is never expected
> > > 
> > > Notes for API choices that are NOT aligned with current fanotify plans:
> > > - LINK/UNLINK events carry the linked/unlinked object fid
> > > - XATTR events for inode (not namespace) do not carry parent fid/name
> > > 
> > > This doesn't mean that fanotify -> rbh_fsevent translation is not going to
> > > be possible.
> > > 
> > > With fanotify FAN_CREATE event, for example, the parent fid + name
> > > information should be used by the rbh adapter code to call
> > > name_to_handle_at(2) and get the created object's file handle.
> > > 
> > > The reason we made this API choice is because fanotify events should
> > > not be perceived as a sequence of changes that can be followed to
> > > describe the current state of the filesystem.
> > > fanotify events should be perceived as a "poll" on the namespace.
> > > Whenever notified of a change, application should read the current state
> > > for the filesystem. fanotify events provide "just enough" information, so
> > > reading the current state of the filesystem is not too expensive.

I am a little worried about objects that would move around constantly and thus
"evade" name_to_handle_at(). A bad actor could try to hide a setuid binary this
way... Of course they could also just copy/delete the file repeatedly and in
this case having the fid becomes useless, but it seems harder to do, and it is
likely it would take more time than a simple rename.

> > > When fanotify event FAN_MODIFY reports a change of file size,
> > > along with parent fid + name, that do not match the parent/name robinhood
> > > knows about (i.e. because the event is received out of order with rename),
> > > you may use that information to create rbh_fsevent_ns_xattr event to
> > > update the path or you may wait for the FAN_MOVE_SELF event that
> > > should arrive later.
> > > Up to you.

This is making me think: if I receive such a FAN_MODIFY event, and an object
is moved at parent_fid + name before I query the FS, how can I know which file
the event was originally meant for?

> > > > So, to be fair, full paths _are_ computed solely from information in the
> > > > changelog records, even though it requires a bit of processing on the side.
> > > > No additional query to the filesystem for that.
> > > 
> > > As I wrote, that fact that robinhood trusts the information in changelog
> > > records doesn't mean that information needs to arrive from the kernel.
> > > The adapter code should use information provided by fanotify events
> > > then use open_by_handle_at(2) for directory fid to finds its current
> > > path in the filesystem then feed that information to a robinhood change
> > > record.
> > 
> > I can agree with that - it's not because for lustre we made the decision
> > to be able to run without querying the filesystem at all that it has to
> > hold true for all type of inputs.

I agree as well. The issue I mention above is a special case. In general, I am
fine with the "just enough information" approach.

> > > May I ask, what is the reason for embarking on the project to decouple
> > > robinhood v4 API from Lustre changelog API?

There is an impedance mismatch between what Lustre emits, and what robinhood
needs for its updates: even with Lustre's changelog, we still need to query
the filesystem to get additional information. I could have extended Lustre's
structures, but then I would have depended on them too much for my taste. It
just seemed cleaner to have a clear separation between the two.

> Looking at robinhood (especially v4), I seems like it could fit
> very well into the vacuum in Linux and act as "fsnotifyd".
> unprivileged applications and services could register to event streams
> and get fed from db, so applications not running will not loose events.
> Events delivered to unprivileged applications need to be filtered by
> subtree those applications, something that fanotify does not do and
> will not likely do and filtered by access permissions of application
> to the path of the reported object.

The plan is to use a dedicated message queue for the streaming part (such as
Kafka or RabbitMQ) and robinhood would only really deal with serializing events
into a standard communication format (the current target is YAML), and dumping
that into the message queues.

From there, it's definitely possible to write a program that will filter
events and route them to unprivileged applications... But it is unlikely I will
write it myself. =)

Cheers,
Quentin