Re: fanotify feature request FAN_MARK_PID

From: Amir Goldstein <amir73il@gmail.com>
To: Tycho Kirchner <tychokirchner@mail.de>
Cc: Matthew Bobrowski <mbobrowski@mbobrowski.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: fanotify feature request FAN_MARK_PID
Date: Mon, 17 Aug 2020 20:02:31 +0300	[thread overview]
Message-ID: <CAOQ4uxjEm=vj5Be5VoUyB9Q+YVq=+aO_4PfXp-iEYZA7qzO1Gw@mail.gmail.com> (raw)
In-Reply-To: <dde082eb-b3eb-859e-b442-a65846cff6fa@mail.de>

On Mon, Aug 17, 2020 at 7:08 PM Tycho Kirchner <tychokirchner@mail.de> wrote:
>
> Dear Amir Goldstein,
>

Hi Tycho,

> Dear Matthew Bobrowski,
>
> Dear developers of the kernel filesystem,
>
> First of all, thanks for your effort in improving Linux, especially your
> work regarding fanotify, which I heavily use in one of my projects:
>
> https://github.com/tycho-kirchner/shournal
>

Nice project!

> For a more scientfic introduction please take a look at
> Bashing irreproducibility with shournal
> https://doi.org/10.1101/2020.08.03.232843
>
> I wanted to kindly ask you, whether it is possible for you to add
> another feature to fanotify, that is reporting only events of a PID or
> any of its children.
> This would be very useful, because especially in the world of
> bioinformatics there is a huge need to automatically and efficiently
> track file events on the shell, that is, you enter a command on the
> shell (bash) and then track, which file events were modified by the
> shell or any of its child-processes.

I am not sure if fanotify is the right tool for the job.
fanotify is a *system* monitoring tool and its functionality is very limited.
If you want to watch what file operations a process and its children are doing,
you can use more powerful tracing tools like strace, seccomp, and eBPF.
For starters, did you look at bcc tools, for example:
https://github.com/iovisor/bcc/blob/master/tools/opensnoop.py

[...]

> I imagine e.g. the following syscalls:
>
> 1.
> Use fanotify_mark to restrict the fanotify notification group to a
> specific PID, optionally marking forked children as well.
> fanotify_mark(fan_fd, FAN_MARK_ADD | FAN_MARK_PID, FAN_EVENT_ON_CHILD,
> pid, NULL);
> // FAN_EVENT_ON_CHILD -> additional meaning: also forked child processes.
>

Technically, it is quite easy to filter out events generated by
processes outside
pid namespace (which would report pid 0), but I doubt if the use case you
presented justifies that. Maybe there are other use cases...

> 2.
> Use fanotify_mark to remove a PID from the notification group.
> fanotify_mark(fan_fd, FAN_MARK_REMOVE | FAN_MARK_PID, 0, pid, NULL);
>
> 3.
> When reading from a fan_fd, which is marked for PID's which have all
> ended or were removed, return e.g. ENOENT.
>
>
> Independent of that it would be also useful, to be able to track
> applications, which unshare their mount namespace as well (e.g.
> flatpak). So in case a process, whose mount points are observed,
> unshares, the new mount id's should also be added to the same fanotify
> notification group. To preserve backwards compatibility I suggest
> introducing a new flag FAN_MARK_MOUNT_REC:
> fanotify_mark(fan_fd, FAN_MARK_ADD | FAN_MARK_MOUNT |
> FAN_MARK_MOUNT_REC, mask, AT_FDCWD, path);
>

The inherited mark concept sounds useful.
I also thought of a likewise flag for directories.
The question is if and how you clean all the inherited marks when program
removes the original mark. It's an API question. Not a trivial one IMO.

The thing is, with FAN_MARK_FILESYSTEM (v5.1), you can sort of implement
what you want in userspace with the opposite approach:
1. Watch events on filesystem regardless of which mount
2. When getting an event with an open fd, resolve the mount
3. If you are NOT interested in that mount add a FAN_MARK_IGNORED
    mask on that mount
4. Soon, you will be left with only the events you care about
5. When mount is unshared, you will get the events generated on that mount

But that will only work if the unshared mount is visible in the mount namespace
of the listener, so it is not a complete solution, but maybe it works for some
of your use cases.

Thanks,
Amir.