Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 0/2] fanotify: Adding pidfd support to the fanotify API
@ 2021-04-15 23:21 Matthew Bobrowski
  2021-04-15 23:22 ` [PATCH 1/2] pidfd_create(): remove static qualifier and declare pidfd_create() in linux/pid.h Matthew Bobrowski
                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Matthew Bobrowski @ 2021-04-15 23:21 UTC (permalink / raw)
  To: jack, amir73il, christian.brauner; +Cc: linux-api, linux-fsdevel

Hey Jan/Amir/Christian,

This is the RFC patch series for adding pidfd support to the fanotify
API.

tl;dr rather than returning a pid within struct
fanotify_event_metadata, a pidfd is returned instead when fanotify has
been explicitly told to do so via the new FAN_REPORT_PIDFD flag.

The main driver behind returning a pidfd within an event instead of a
pid is that it permits userspace applications to better detect if
they've possibly lost a TOCTOU race. A common paradigm among userspace
applications that make use of the fanotify API is to crawl /proc/<pid>
upon receiving an event. Userspace applications do this in order to
further ascertain contextual meta-information about the process that
was responsible for generating the filesystem event. On high pressure
systems, where pid recycling isn't uncommon, it's no longer considered
as a reliable approach to directly sift through /proc/<pid> and have
userspace applications use the information contained within
/proc/<pid> as it could, and does, lead to program execution
inaccuracies.

Now when a pidfd is returned in an event, a userspace application can:

    a) Obtain the pid responsible for generating the filesystem event
       from the pidfds fdinfo.

    b) Detect whether the userspace application lost the procfs access
       race by sending a 0 signal on the pidfd and checking the return
       value. A -ESRCH is indicative of the userspace application
       losing the race, meaning that the pid has been recycled.

Matthew Bobrowski (2):
  pidfd_create(): remove static qualifier and declare pidfd_create() in
    linux/pid.h
  fanotify: Add pidfd support to the fanotify API

 fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++++++++++++---
 include/linux/fanotify.h           |  2 +-
 include/linux/pid.h                |  1 +
 include/uapi/linux/fanotify.h      |  2 ++
 kernel/pid.c                       |  2 +-
 5 files changed, 35 insertions(+), 5 deletions(-)

-- 
2.31.1.368.gbe11c130af-goog

/M

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/2] pidfd_create(): remove static qualifier and declare pidfd_create() in linux/pid.h
  2021-04-15 23:21 [PATCH 0/2] fanotify: Adding pidfd support to the fanotify API Matthew Bobrowski
@ 2021-04-15 23:22 ` Matthew Bobrowski
  2021-04-19 10:13   ` Jan Kara
  2021-04-19 12:50   ` Christian Brauner
  2021-04-15 23:22 ` [PATCH 2/2] fanotify: Add pidfd support to the fanotify API Matthew Bobrowski
  2021-04-19 12:34 ` [PATCH 0/2] fanotify: Adding " Christian Brauner
  2 siblings, 2 replies; 35+ messages in thread
From: Matthew Bobrowski @ 2021-04-15 23:22 UTC (permalink / raw)
  To: jack, amir73il, christian.brauner; +Cc: linux-api, linux-fsdevel

With the idea to have fanotify(7) return pidfds within a `struct
fanotify_event_metadata`, pidfd_create()'s scope is to increased so
that it can be called from other subsystems within the Linux
kernel. The current `static` qualifier from its definition is to be
removed and a new function declaration for pidfd_create() is to be
added to the linux/pid.h header file.

Signed-off-by: Matthew Bobrowski <repnop@google.com>
---
 include/linux/pid.h | 1 +
 kernel/pid.c        | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/pid.h b/include/linux/pid.h
index fa10acb8d6a4..af308e15f174 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -78,6 +78,7 @@ struct file;
 
 extern struct pid *pidfd_pid(const struct file *file);
 struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
+int pidfd_create(struct pid *pid, unsigned int flags);
 
 static inline struct pid *get_pid(struct pid *pid)
 {
diff --git a/kernel/pid.c b/kernel/pid.c
index ebdf9c60cd0b..91c4b6891c15 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -553,7 +553,7 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
  * Return: On success, a cloexec pidfd is returned.
  *         On error, a negative errno number will be returned.
  */
-static int pidfd_create(struct pid *pid, unsigned int flags)
+int pidfd_create(struct pid *pid, unsigned int flags)
 {
 	int fd;
 
-- 
2.31.1.368.gbe11c130af-goog

/M

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-15 23:21 [PATCH 0/2] fanotify: Adding pidfd support to the fanotify API Matthew Bobrowski
  2021-04-15 23:22 ` [PATCH 1/2] pidfd_create(): remove static qualifier and declare pidfd_create() in linux/pid.h Matthew Bobrowski
@ 2021-04-15 23:22 ` Matthew Bobrowski
  2021-04-16  6:27   ` Amir Goldstein
                     ` (2 more replies)
  2021-04-19 12:34 ` [PATCH 0/2] fanotify: Adding " Christian Brauner
  2 siblings, 3 replies; 35+ messages in thread
From: Matthew Bobrowski @ 2021-04-15 23:22 UTC (permalink / raw)
  To: jack, amir73il, christian.brauner; +Cc: linux-api, linux-fsdevel

Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
allows userspace applications to control whether a pidfd is to be
returned instead of a pid for `struct fanotify_event_metadata.pid`.

FAN_REPORT_PIDFD is mutually exclusive with FAN_REPORT_TID as the
pidfd API is currently restricted to only support pidfd generation for
thread-group leaders. Attempting to set them both when calling
fanotify_init(2) will result in -EINVAL being returned to the
caller. As the pidfd API evolves and support is added for tids, this
is something that could be relaxed in the future.

If pidfd creation fails, the pid in struct fanotify_event_metadata is
set to FAN_NOPIDFD(-1). Falling back and providing a pid instead of a
pidfd on pidfd creation failures was considered, although this could
possibly lead to confusion and unpredictability within userspace
applications as distinguishing between whether an actual pidfd or pid
was returned could be difficult, so it's best to be explicit.

Signed-off-by: Matthew Bobrowski <repnop@google.com>
---
 fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++++++++++++---
 include/linux/fanotify.h           |  2 +-
 include/uapi/linux/fanotify.h      |  2 ++
 3 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9e0c1afac8bd..fd8ae88796a8 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -329,7 +329,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
 	struct fanotify_info *info = fanotify_event_info(event);
 	unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
 	struct file *f = NULL;
-	int ret, fd = FAN_NOFD;
+	int ret, pidfd, fd = FAN_NOFD;
 	int info_type = 0;
 
 	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
@@ -340,7 +340,25 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
 	metadata.vers = FANOTIFY_METADATA_VERSION;
 	metadata.reserved = 0;
 	metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
-	metadata.pid = pid_vnr(event->pid);
+
+	if (FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD) &&
+		pid_has_task(event->pid, PIDTYPE_TGID)) {
+		/*
+		 * Given FAN_REPORT_PIDFD is to be mutually exclusive with
+		 * FAN_REPORT_TID, panic here if the mutual exclusion is ever
+		 * blindly lifted without pidfds for threads actually being
+		 * supported.
+		 */
+		WARN_ON(FAN_GROUP_FLAG(group, FAN_REPORT_TID));
+
+		pidfd = pidfd_create(event->pid, 0);
+		if (unlikely(pidfd < 0))
+			metadata.pid = FAN_NOPIDFD;
+		else
+			metadata.pid = pidfd;
+	} else {
+		metadata.pid = pid_vnr(event->pid);
+	}
 
 	if (path && path->mnt && path->dentry) {
 		fd = create_fd(group, path, &f);
@@ -941,6 +959,15 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 #endif
 		return -EINVAL;
 
+	/*
+	 * A pidfd can only be returned for a thread-group leader; thus
+	 * FAN_REPORT_TID and FAN_REPORT_PIDFD need to be mutually
+	 * exclusive. Once the pidfd API supports the creation of pidfds on
+	 * individual threads, then we can look at removing this conditional.
+	 */
+	if ((flags & FAN_REPORT_PIDFD) && (flags & FAN_REPORT_TID))
+		return -EINVAL;
+
 	if (event_f_flags & ~FANOTIFY_INIT_ALL_EVENT_F_BITS)
 		return -EINVAL;
 
@@ -1312,7 +1339,7 @@ SYSCALL32_DEFINE6(fanotify_mark,
  */
 static int __init fanotify_user_setup(void)
 {
-	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
+	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 11);
 	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
 
 	fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 3e9c56ee651f..894740a6f4e0 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -21,7 +21,7 @@
 #define FANOTIFY_FID_BITS	(FAN_REPORT_FID | FAN_REPORT_DFID_NAME)
 
 #define FANOTIFY_INIT_FLAGS	(FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \
-				 FAN_REPORT_TID | \
+				 FAN_REPORT_TID | FAN_REPORT_PIDFD | \
 				 FAN_CLOEXEC | FAN_NONBLOCK | \
 				 FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
 
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index fbf9c5c7dd59..369392644d4e 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -55,6 +55,7 @@
 #define FAN_REPORT_FID		0x00000200	/* Report unique file id */
 #define FAN_REPORT_DIR_FID	0x00000400	/* Report unique directory id */
 #define FAN_REPORT_NAME		0x00000800	/* Report events with name */
+#define FAN_REPORT_PIDFD	0x00001000	/* Return a pidfd for event->pid */
 
 /* Convenience macro - FAN_REPORT_NAME requires FAN_REPORT_DIR_FID */
 #define FAN_REPORT_DFID_NAME	(FAN_REPORT_DIR_FID | FAN_REPORT_NAME)
@@ -160,6 +161,7 @@ struct fanotify_response {
 
 /* No fd set in event */
 #define FAN_NOFD	-1
+#define FAN_NOPIDFD	FAN_NOFD
 
 /* Helper functions to deal with fanotify_event_metadata buffers */
 #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata))
-- 
2.31.1.368.gbe11c130af-goog

/M

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-15 23:22 ` [PATCH 2/2] fanotify: Add pidfd support to the fanotify API Matthew Bobrowski
@ 2021-04-16  6:27   ` Amir Goldstein
  2021-04-16  7:05     ` Matthew Bobrowski
  2021-04-19 13:02     ` Christian Brauner
  2021-04-19 10:21   ` Jan Kara
  2021-04-19 13:20   ` Christian Brauner
  2 siblings, 2 replies; 35+ messages in thread
From: Amir Goldstein @ 2021-04-16  6:27 UTC (permalink / raw)
  To: Matthew Bobrowski; +Cc: Jan Kara, Christian Brauner, Linux API, linux-fsdevel

On Fri, Apr 16, 2021 at 2:22 AM Matthew Bobrowski <repnop@google.com> wrote:
>
> Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
> allows userspace applications to control whether a pidfd is to be
> returned instead of a pid for `struct fanotify_event_metadata.pid`.
>
> FAN_REPORT_PIDFD is mutually exclusive with FAN_REPORT_TID as the
> pidfd API is currently restricted to only support pidfd generation for
> thread-group leaders. Attempting to set them both when calling
> fanotify_init(2) will result in -EINVAL being returned to the
> caller. As the pidfd API evolves and support is added for tids, this
> is something that could be relaxed in the future.
>
> If pidfd creation fails, the pid in struct fanotify_event_metadata is
> set to FAN_NOPIDFD(-1).

Hi Matthew,

All in all looks good, just a few small nits.

> Falling back and providing a pid instead of a
> pidfd on pidfd creation failures was considered, although this could
> possibly lead to confusion and unpredictability within userspace
> applications as distinguishing between whether an actual pidfd or pid
> was returned could be difficult, so it's best to be explicit.

I don't think this should have been even "considered" so I see little
value in this paragraph in commit message.

>
> Signed-off-by: Matthew Bobrowski <repnop@google.com>
> ---
>  fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++++++++++++---
>  include/linux/fanotify.h           |  2 +-
>  include/uapi/linux/fanotify.h      |  2 ++
>  3 files changed, 33 insertions(+), 4 deletions(-)
>
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 9e0c1afac8bd..fd8ae88796a8 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -329,7 +329,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>         struct fanotify_info *info = fanotify_event_info(event);
>         unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
>         struct file *f = NULL;
> -       int ret, fd = FAN_NOFD;
> +       int ret, pidfd, fd = FAN_NOFD;
>         int info_type = 0;
>
>         pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> @@ -340,7 +340,25 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>         metadata.vers = FANOTIFY_METADATA_VERSION;
>         metadata.reserved = 0;
>         metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
> -       metadata.pid = pid_vnr(event->pid);
> +
> +       if (FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD) &&
> +               pid_has_task(event->pid, PIDTYPE_TGID)) {
> +               /*
> +                * Given FAN_REPORT_PIDFD is to be mutually exclusive with
> +                * FAN_REPORT_TID, panic here if the mutual exclusion is ever
> +                * blindly lifted without pidfds for threads actually being
> +                * supported.
> +                */
> +               WARN_ON(FAN_GROUP_FLAG(group, FAN_REPORT_TID));

Better WARN_ON_ONCE() the outcome of this error is not terrible.
Also in the comment above I would not refer to this warning as "panic".

> +
> +               pidfd = pidfd_create(event->pid, 0);
> +               if (unlikely(pidfd < 0))
> +                       metadata.pid = FAN_NOPIDFD;
> +               else
> +                       metadata.pid = pidfd;
> +       } else {
> +               metadata.pid = pid_vnr(event->pid);
> +       }

You should rebase your work on:
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git fsnotify
and resolve conflicts with "unprivileged listener" code.

Need to make sure that pidfd is not reported to an unprivileged
listener even if group was initialized by a privileged process.
This is a conscious conservative choice that we made for reporting
pid info to unprivileged listener that can be revisited in the future.

>
>         if (path && path->mnt && path->dentry) {
>                 fd = create_fd(group, path, &f);
> @@ -941,6 +959,15 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
>  #endif
>                 return -EINVAL;
>
> +       /*
> +        * A pidfd can only be returned for a thread-group leader; thus
> +        * FAN_REPORT_TID and FAN_REPORT_PIDFD need to be mutually
> +        * exclusive. Once the pidfd API supports the creation of pidfds on
> +        * individual threads, then we can look at removing this conditional.
> +        */
> +       if ((flags & FAN_REPORT_PIDFD) && (flags & FAN_REPORT_TID))
> +               return -EINVAL;
> +
>         if (event_f_flags & ~FANOTIFY_INIT_ALL_EVENT_F_BITS)
>                 return -EINVAL;
>
> @@ -1312,7 +1339,7 @@ SYSCALL32_DEFINE6(fanotify_mark,
>   */
>  static int __init fanotify_user_setup(void)
>  {
> -       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
> +       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 11);
>         BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
>
>         fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index 3e9c56ee651f..894740a6f4e0 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -21,7 +21,7 @@
>  #define FANOTIFY_FID_BITS      (FAN_REPORT_FID | FAN_REPORT_DFID_NAME)
>
>  #define FANOTIFY_INIT_FLAGS    (FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \
> -                                FAN_REPORT_TID | \
> +                                FAN_REPORT_TID | FAN_REPORT_PIDFD | \
>                                  FAN_CLOEXEC | FAN_NONBLOCK | \
>                                  FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
>

FAN_REPORT_PIDFD should be added to FANOTIFY_ADMIN_INIT_FLAGS
from fsnotify branch.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-16  6:27   ` Amir Goldstein
@ 2021-04-16  7:05     ` Matthew Bobrowski
  2021-04-16  7:53       ` Amir Goldstein
  2021-04-19 13:02     ` Christian Brauner
  1 sibling, 1 reply; 35+ messages in thread
From: Matthew Bobrowski @ 2021-04-16  7:05 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Christian Brauner, Linux API, linux-fsdevel

On Fri, Apr 16, 2021 at 09:27:03AM +0300, Amir Goldstein wrote:
> On Fri, Apr 16, 2021 at 2:22 AM Matthew Bobrowski <repnop@google.com> wrote:
> >
> > Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
> > allows userspace applications to control whether a pidfd is to be
> > returned instead of a pid for `struct fanotify_event_metadata.pid`.
> >
> > FAN_REPORT_PIDFD is mutually exclusive with FAN_REPORT_TID as the
> > pidfd API is currently restricted to only support pidfd generation for
> > thread-group leaders. Attempting to set them both when calling
> > fanotify_init(2) will result in -EINVAL being returned to the
> > caller. As the pidfd API evolves and support is added for tids, this
> > is something that could be relaxed in the future.
> >
> > If pidfd creation fails, the pid in struct fanotify_event_metadata is
> > set to FAN_NOPIDFD(-1).
> 
> Hi Matthew,
> 
> All in all looks good, just a few small nits.

Thanks for feedback Amir! :)

> > Falling back and providing a pid instead of a
> > pidfd on pidfd creation failures was considered, although this could
> > possibly lead to confusion and unpredictability within userspace
> > applications as distinguishing between whether an actual pidfd or pid
> > was returned could be difficult, so it's best to be explicit.
> 
> I don't think this should have been even "considered" so I see little
> value in this paragraph in commit message.

Fair point. I will discard this sentence for all subsequent iterations
of this patch series. I guess the idea was that this patch series was
meant to be labeled as being "RFC", so some extra thoughts had been
noted.
 
> > Signed-off-by: Matthew Bobrowski <repnop@google.com>
> > ---
> >  fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++++++++++++---
> >  include/linux/fanotify.h           |  2 +-
> >  include/uapi/linux/fanotify.h      |  2 ++
> >  3 files changed, 33 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > index 9e0c1afac8bd..fd8ae88796a8 100644
> > --- a/fs/notify/fanotify/fanotify_user.c
> > +++ b/fs/notify/fanotify/fanotify_user.c
> > @@ -329,7 +329,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> >         struct fanotify_info *info = fanotify_event_info(event);
> >         unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
> >         struct file *f = NULL;
> > -       int ret, fd = FAN_NOFD;
> > +       int ret, pidfd, fd = FAN_NOFD;
> >         int info_type = 0;
> >
> >         pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> > @@ -340,7 +340,25 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> >         metadata.vers = FANOTIFY_METADATA_VERSION;
> >         metadata.reserved = 0;
> >         metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
> > -       metadata.pid = pid_vnr(event->pid);
> > +
> > +       if (FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD) &&
> > +               pid_has_task(event->pid, PIDTYPE_TGID)) {
> > +               /*
> > +                * Given FAN_REPORT_PIDFD is to be mutually exclusive with
> > +                * FAN_REPORT_TID, panic here if the mutual exclusion is ever
> > +                * blindly lifted without pidfds for threads actually being
> > +                * supported.
> > +                */
> > +               WARN_ON(FAN_GROUP_FLAG(group, FAN_REPORT_TID));
> 
> Better WARN_ON_ONCE() the outcome of this error is not terrible.
> Also in the comment above I would not refer to this warning as "panic".

ACK.

> > +
> > +               pidfd = pidfd_create(event->pid, 0);
> > +               if (unlikely(pidfd < 0))
> > +                       metadata.pid = FAN_NOPIDFD;
> > +               else
> > +                       metadata.pid = pidfd;
> > +       } else {
> > +               metadata.pid = pid_vnr(event->pid);
> > +       }
> 
> You should rebase your work on:
> git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git fsnotify
> and resolve conflicts with "unprivileged listener" code.

ACK.

> Need to make sure that pidfd is not reported to an unprivileged
> listener even if group was initialized by a privileged process.
> This is a conscious conservative choice that we made for reporting
> pid info to unprivileged listener that can be revisited in the future.

OK, I see. In that case, I guess I can add the FAN_REPORT_PIDFD check
above the current conditional [0]:

...
if (!capable(CAP_SYS_ADMIN) && task_tgid(current) != event->pid)
        metadata.pid = 0;
...

That way, AFAIK even if it is an unprivileged listener the pid info
will be overwritten as intended.

> >
> >         if (path && path->mnt && path->dentry) {
> >                 fd = create_fd(group, path, &f);
> > @@ -941,6 +959,15 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
> >  #endif
> >                 return -EINVAL;
> >
> > +       /*
> > +        * A pidfd can only be returned for a thread-group leader; thus
> > +        * FAN_REPORT_TID and FAN_REPORT_PIDFD need to be mutually
> > +        * exclusive. Once the pidfd API supports the creation of pidfds on
> > +        * individual threads, then we can look at removing this conditional.
> > +        */
> > +       if ((flags & FAN_REPORT_PIDFD) && (flags & FAN_REPORT_TID))
> > +               return -EINVAL;
> > +
> >         if (event_f_flags & ~FANOTIFY_INIT_ALL_EVENT_F_BITS)
> >                 return -EINVAL;
> >
> > @@ -1312,7 +1339,7 @@ SYSCALL32_DEFINE6(fanotify_mark,
> >   */
> >  static int __init fanotify_user_setup(void)
> >  {
> > -       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
> > +       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 11);
> >         BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
> >
> >         fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
> > diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> > index 3e9c56ee651f..894740a6f4e0 100644
> > --- a/include/linux/fanotify.h
> > +++ b/include/linux/fanotify.h
> > @@ -21,7 +21,7 @@
> >  #define FANOTIFY_FID_BITS      (FAN_REPORT_FID | FAN_REPORT_DFID_NAME)
> >
> >  #define FANOTIFY_INIT_FLAGS    (FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \
> > -                                FAN_REPORT_TID | \
> > +                                FAN_REPORT_TID | FAN_REPORT_PIDFD | \
> >                                  FAN_CLOEXEC | FAN_NONBLOCK | \
> >                                  FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
> >
> 
> FAN_REPORT_PIDFD should be added to FANOTIFY_ADMIN_INIT_FLAGS
> from fsnotify branch.

ACK.

Before sending any other version of this patch series through I will
see what Jan and Christian have to say.

[0]
https://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git/tree/fs/notify/fanotify/fanotify_user.c?h=fsnotify&id=7cea2a3c505e87a9d6afc78be4a7f7be636a73a7#n427  

/M

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-16  7:05     ` Matthew Bobrowski
@ 2021-04-16  7:53       ` Amir Goldstein
  2021-04-16  8:08         ` Matthew Bobrowski
  0 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2021-04-16  7:53 UTC (permalink / raw)
  To: Matthew Bobrowski; +Cc: Jan Kara, Christian Brauner, Linux API, linux-fsdevel

On Fri, Apr 16, 2021 at 10:06 AM Matthew Bobrowski <repnop@google.com> wrote:
>
> On Fri, Apr 16, 2021 at 09:27:03AM +0300, Amir Goldstein wrote:
> > On Fri, Apr 16, 2021 at 2:22 AM Matthew Bobrowski <repnop@google.com> wrote:
> > >
> > > Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
> > > allows userspace applications to control whether a pidfd is to be
> > > returned instead of a pid for `struct fanotify_event_metadata.pid`.
> > >
> > > FAN_REPORT_PIDFD is mutually exclusive with FAN_REPORT_TID as the
> > > pidfd API is currently restricted to only support pidfd generation for
> > > thread-group leaders. Attempting to set them both when calling
> > > fanotify_init(2) will result in -EINVAL being returned to the
> > > caller. As the pidfd API evolves and support is added for tids, this
> > > is something that could be relaxed in the future.
> > >
> > > If pidfd creation fails, the pid in struct fanotify_event_metadata is
> > > set to FAN_NOPIDFD(-1).
> >
> > Hi Matthew,
> >
> > All in all looks good, just a few small nits.
>
> Thanks for feedback Amir! :)
>
> > > Falling back and providing a pid instead of a
> > > pidfd on pidfd creation failures was considered, although this could
> > > possibly lead to confusion and unpredictability within userspace
> > > applications as distinguishing between whether an actual pidfd or pid
> > > was returned could be difficult, so it's best to be explicit.
> >
> > I don't think this should have been even "considered" so I see little
> > value in this paragraph in commit message.
>
> Fair point. I will discard this sentence for all subsequent iterations
> of this patch series. I guess the idea was that this patch series was
> meant to be labeled as being "RFC", so some extra thoughts had been
> noted.
>
> > > Signed-off-by: Matthew Bobrowski <repnop@google.com>
> > > ---
> > >  fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++++++++++++---
> > >  include/linux/fanotify.h           |  2 +-
> > >  include/uapi/linux/fanotify.h      |  2 ++
> > >  3 files changed, 33 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > > index 9e0c1afac8bd..fd8ae88796a8 100644
> > > --- a/fs/notify/fanotify/fanotify_user.c
> > > +++ b/fs/notify/fanotify/fanotify_user.c
> > > @@ -329,7 +329,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> > >         struct fanotify_info *info = fanotify_event_info(event);
> > >         unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
> > >         struct file *f = NULL;
> > > -       int ret, fd = FAN_NOFD;
> > > +       int ret, pidfd, fd = FAN_NOFD;
> > >         int info_type = 0;
> > >
> > >         pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> > > @@ -340,7 +340,25 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> > >         metadata.vers = FANOTIFY_METADATA_VERSION;
> > >         metadata.reserved = 0;
> > >         metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
> > > -       metadata.pid = pid_vnr(event->pid);
> > > +
> > > +       if (FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD) &&
> > > +               pid_has_task(event->pid, PIDTYPE_TGID)) {
> > > +               /*
> > > +                * Given FAN_REPORT_PIDFD is to be mutually exclusive with
> > > +                * FAN_REPORT_TID, panic here if the mutual exclusion is ever
> > > +                * blindly lifted without pidfds for threads actually being
> > > +                * supported.
> > > +                */
> > > +               WARN_ON(FAN_GROUP_FLAG(group, FAN_REPORT_TID));
> >
> > Better WARN_ON_ONCE() the outcome of this error is not terrible.
> > Also in the comment above I would not refer to this warning as "panic".
>
> ACK.
>
> > > +
> > > +               pidfd = pidfd_create(event->pid, 0);
> > > +               if (unlikely(pidfd < 0))
> > > +                       metadata.pid = FAN_NOPIDFD;
> > > +               else
> > > +                       metadata.pid = pidfd;
> > > +       } else {
> > > +               metadata.pid = pid_vnr(event->pid);
> > > +       }
> >
> > You should rebase your work on:
> > git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git fsnotify
> > and resolve conflicts with "unprivileged listener" code.
>
> ACK.
>
> > Need to make sure that pidfd is not reported to an unprivileged
> > listener even if group was initialized by a privileged process.
> > This is a conscious conservative choice that we made for reporting
> > pid info to unprivileged listener that can be revisited in the future.
>
> OK, I see. In that case, I guess I can add the FAN_REPORT_PIDFD check
> above the current conditional [0]:
>
> ...
> if (!capable(CAP_SYS_ADMIN) && task_tgid(current) != event->pid)
>         metadata.pid = 0;
> ...
>
> That way, AFAIK even if it is an unprivileged listener the pid info
> will be overwritten as intended.
>

Situation is a bit more subtle than that.
If you override event->pid with zero and zero is interpreted as pidfd
that would not be consistent with uapi documentation.
You need to make sure that event->pid is FAN_NOPIDFD in case
(!capable(CAP_SYS_ADMIN) &&
 FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD))
Hopefully, you can do that while keeping the special cases to minimum...


> > >
> > >         if (path && path->mnt && path->dentry) {
> > >                 fd = create_fd(group, path, &f);
> > > @@ -941,6 +959,15 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
> > >  #endif
> > >                 return -EINVAL;
> > >
> > > +       /*
> > > +        * A pidfd can only be returned for a thread-group leader; thus
> > > +        * FAN_REPORT_TID and FAN_REPORT_PIDFD need to be mutually
> > > +        * exclusive. Once the pidfd API supports the creation of pidfds on
> > > +        * individual threads, then we can look at removing this conditional.
> > > +        */
> > > +       if ((flags & FAN_REPORT_PIDFD) && (flags & FAN_REPORT_TID))
> > > +               return -EINVAL;
> > > +
> > >         if (event_f_flags & ~FANOTIFY_INIT_ALL_EVENT_F_BITS)
> > >                 return -EINVAL;
> > >
> > > @@ -1312,7 +1339,7 @@ SYSCALL32_DEFINE6(fanotify_mark,
> > >   */
> > >  static int __init fanotify_user_setup(void)
> > >  {
> > > -       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 10);
> > > +       BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 11);
> > >         BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
> > >
> > >         fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
> > > diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> > > index 3e9c56ee651f..894740a6f4e0 100644
> > > --- a/include/linux/fanotify.h
> > > +++ b/include/linux/fanotify.h
> > > @@ -21,7 +21,7 @@
> > >  #define FANOTIFY_FID_BITS      (FAN_REPORT_FID | FAN_REPORT_DFID_NAME)
> > >
> > >  #define FANOTIFY_INIT_FLAGS    (FANOTIFY_CLASS_BITS | FANOTIFY_FID_BITS | \
> > > -                                FAN_REPORT_TID | \
> > > +                                FAN_REPORT_TID | FAN_REPORT_PIDFD | \
> > >                                  FAN_CLOEXEC | FAN_NONBLOCK | \
> > >                                  FAN_UNLIMITED_QUEUE | FAN_UNLIMITED_MARKS)
> > >
> >
> > FAN_REPORT_PIDFD should be added to FANOTIFY_ADMIN_INIT_FLAGS
> > from fsnotify branch.
>
> ACK.
>
> Before sending any other version of this patch series through I will
> see what Jan and Christian have to say.
>

That makes sense.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-16  7:53       ` Amir Goldstein
@ 2021-04-16  8:08         ` Matthew Bobrowski
  0 siblings, 0 replies; 35+ messages in thread
From: Matthew Bobrowski @ 2021-04-16  8:08 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Christian Brauner, Linux API, linux-fsdevel

On Fri, Apr 16, 2021 at 10:53:48AM +0300, Amir Goldstein wrote:
> On Fri, Apr 16, 2021 at 10:06 AM Matthew Bobrowski <repnop@google.com> wrote:
> > > > +               pidfd = pidfd_create(event->pid, 0);
> > > > +               if (unlikely(pidfd < 0))
> > > > +                       metadata.pid = FAN_NOPIDFD;
> > > > +               else
> > > > +                       metadata.pid = pidfd;
> > > > +       } else {
> > > > +               metadata.pid = pid_vnr(event->pid);
> > > > +       }
> > >
> > > You should rebase your work on:
> > > git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git fsnotify
> > > and resolve conflicts with "unprivileged listener" code.
> >
> > ACK.
> >
> > > Need to make sure that pidfd is not reported to an unprivileged
> > > listener even if group was initialized by a privileged process.
> > > This is a conscious conservative choice that we made for reporting
> > > pid info to unprivileged listener that can be revisited in the future.
> >
> > OK, I see. In that case, I guess I can add the FAN_REPORT_PIDFD check
> > above the current conditional [0]:
> >
> > ...
> > if (!capable(CAP_SYS_ADMIN) && task_tgid(current) != event->pid)
> >         metadata.pid = 0;
> > ...
> >
> > That way, AFAIK even if it is an unprivileged listener the pid info
> > will be overwritten as intended.
> >
> 
> Situation is a bit more subtle than that.
> If you override event->pid with zero and zero is interpreted as pidfd
> that would not be consistent with uapi documentation.

Ah, yes, of course! I had totally overlooked this. Also, speaking of
UAPI documentation, I'll have it prepared along with the LTP tests
once I get the ACK for this particular concept from Jan and Christian.

> You need to make sure that event->pid is FAN_NOPIDFD in case
> (!capable(CAP_SYS_ADMIN) &&
>  FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD))
> Hopefully, you can do that while keeping the special cases to minimum...

I don't foresee any issues with doing this at all.

/M

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] pidfd_create(): remove static qualifier and declare pidfd_create() in linux/pid.h
  2021-04-15 23:22 ` [PATCH 1/2] pidfd_create(): remove static qualifier and declare pidfd_create() in linux/pid.h Matthew Bobrowski
@ 2021-04-19 10:13   ` Jan Kara
  2021-04-19 12:50   ` Christian Brauner
  1 sibling, 0 replies; 35+ messages in thread
From: Jan Kara @ 2021-04-19 10:13 UTC (permalink / raw)
  To: Matthew Bobrowski
  Cc: jack, amir73il, christian.brauner, linux-api, linux-fsdevel

On Fri 16-04-21 09:22:09, Matthew Bobrowski wrote:
> With the idea to have fanotify(7) return pidfds within a `struct
> fanotify_event_metadata`, pidfd_create()'s scope is to increased so
> that it can be called from other subsystems within the Linux
> kernel. The current `static` qualifier from its definition is to be
> removed and a new function declaration for pidfd_create() is to be
> added to the linux/pid.h header file.
> 
> Signed-off-by: Matthew Bobrowski <repnop@google.com>

I'm fine with this but it would be good to get explicit Christian's ack on
this patch (I know he's already agreed with the intent in general).

								Honza

> ---
>  include/linux/pid.h | 1 +
>  kernel/pid.c        | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index fa10acb8d6a4..af308e15f174 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -78,6 +78,7 @@ struct file;
>  
>  extern struct pid *pidfd_pid(const struct file *file);
>  struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
> +int pidfd_create(struct pid *pid, unsigned int flags);
>  
>  static inline struct pid *get_pid(struct pid *pid)
>  {
> diff --git a/kernel/pid.c b/kernel/pid.c
> index ebdf9c60cd0b..91c4b6891c15 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -553,7 +553,7 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
>   * Return: On success, a cloexec pidfd is returned.
>   *         On error, a negative errno number will be returned.
>   */
> -static int pidfd_create(struct pid *pid, unsigned int flags)
> +int pidfd_create(struct pid *pid, unsigned int flags)
>  {
>  	int fd;
>  
> -- 
> 2.31.1.368.gbe11c130af-goog
> 
> /M
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-15 23:22 ` [PATCH 2/2] fanotify: Add pidfd support to the fanotify API Matthew Bobrowski
  2021-04-16  6:27   ` Amir Goldstein
@ 2021-04-19 10:21   ` Jan Kara
  2021-04-20  1:35     ` Matthew Bobrowski
  2021-04-19 13:20   ` Christian Brauner
  2 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2021-04-19 10:21 UTC (permalink / raw)
  To: Matthew Bobrowski
  Cc: jack, amir73il, christian.brauner, linux-api, linux-fsdevel

On Fri 16-04-21 09:22:25, Matthew Bobrowski wrote:
> Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
> allows userspace applications to control whether a pidfd is to be
> returned instead of a pid for `struct fanotify_event_metadata.pid`.
> 
> FAN_REPORT_PIDFD is mutually exclusive with FAN_REPORT_TID as the
> pidfd API is currently restricted to only support pidfd generation for
> thread-group leaders. Attempting to set them both when calling
> fanotify_init(2) will result in -EINVAL being returned to the
> caller. As the pidfd API evolves and support is added for tids, this
> is something that could be relaxed in the future.
> 
> If pidfd creation fails, the pid in struct fanotify_event_metadata is
> set to FAN_NOPIDFD(-1). Falling back and providing a pid instead of a
> pidfd on pidfd creation failures was considered, although this could
> possibly lead to confusion and unpredictability within userspace
> applications as distinguishing between whether an actual pidfd or pid
> was returned could be difficult, so it's best to be explicit.
> 
> Signed-off-by: Matthew Bobrowski <repnop@google.com>

Overall this looks OK to me. Just one style nit & one question below in
addition to what Amir wrote.

> ---
>  fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++++++++++++---
>  include/linux/fanotify.h           |  2 +-
>  include/uapi/linux/fanotify.h      |  2 ++
>  3 files changed, 33 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 9e0c1afac8bd..fd8ae88796a8 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -329,7 +329,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>  	struct fanotify_info *info = fanotify_event_info(event);
>  	unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
>  	struct file *f = NULL;
> -	int ret, fd = FAN_NOFD;
> +	int ret, pidfd, fd = FAN_NOFD;
>  	int info_type = 0;
>  
>  	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> @@ -340,7 +340,25 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>  	metadata.vers = FANOTIFY_METADATA_VERSION;
>  	metadata.reserved = 0;
>  	metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
> -	metadata.pid = pid_vnr(event->pid);
> +
> +	if (FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD) &&
> +		pid_has_task(event->pid, PIDTYPE_TGID)) {

Please align the rest of the condition to the opening brace. I.e., like:

	if (FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD) &&
	    pid_has_task(event->pid, PIDTYPE_TGID)) {

BTW, why is the pid_has_task() check here? And why is it OK to fall back to
returning pid if pid_has_task() is false?

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/2] fanotify: Adding pidfd support to the fanotify API
  2021-04-15 23:21 [PATCH 0/2] fanotify: Adding pidfd support to the fanotify API Matthew Bobrowski
  2021-04-15 23:22 ` [PATCH 1/2] pidfd_create(): remove static qualifier and declare pidfd_create() in linux/pid.h Matthew Bobrowski
  2021-04-15 23:22 ` [PATCH 2/2] fanotify: Add pidfd support to the fanotify API Matthew Bobrowski
@ 2021-04-19 12:34 ` Christian Brauner
  2 siblings, 0 replies; 35+ messages in thread
From: Christian Brauner @ 2021-04-19 12:34 UTC (permalink / raw)
  To: Matthew Bobrowski, Jann Horn, Oleg Nesterov
  Cc: jack, amir73il, linux-api, linux-fsdevel

On Fri, Apr 16, 2021 at 09:21:53AM +1000, Matthew Bobrowski wrote:
> Hey Jan/Amir/Christian,
> 
> This is the RFC patch series for adding pidfd support to the fanotify
> API.
> 
> tl;dr rather than returning a pid within struct
> fanotify_event_metadata, a pidfd is returned instead when fanotify has
> been explicitly told to do so via the new FAN_REPORT_PIDFD flag.
> 
> The main driver behind returning a pidfd within an event instead of a
> pid is that it permits userspace applications to better detect if
> they've possibly lost a TOCTOU race. A common paradigm among userspace
> applications that make use of the fanotify API is to crawl /proc/<pid>
> upon receiving an event. Userspace applications do this in order to
> further ascertain contextual meta-information about the process that
> was responsible for generating the filesystem event. On high pressure
> systems, where pid recycling isn't uncommon, it's no longer considered
> as a reliable approach to directly sift through /proc/<pid> and have
> userspace applications use the information contained within
> /proc/<pid> as it could, and does, lead to program execution
> inaccuracies.
> 
> Now when a pidfd is returned in an event, a userspace application can:
> 
>     a) Obtain the pid responsible for generating the filesystem event
>        from the pidfds fdinfo.
> 
>     b) Detect whether the userspace application lost the procfs access
>        race by sending a 0 signal on the pidfd and checking the return
>        value. A -ESRCH is indicative of the userspace application
>        losing the race, meaning that the pid has been recycled.

Thank you for working on this Matthew.

I'm explicitly adding Jann and Oleg to the thread since both have been
involved in the original design.

Oleg, Jann, I know this isn't necessarily your area of expertise but if
you could share your thoughts about returning pidfds that haven't been
opened via pidfd_open() or CLONE_PIDFD that would be great.
I've been conservative about this so far but I find it hard to resist
use-cases such as this.

Christian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] pidfd_create(): remove static qualifier and declare pidfd_create() in linux/pid.h
  2021-04-15 23:22 ` [PATCH 1/2] pidfd_create(): remove static qualifier and declare pidfd_create() in linux/pid.h Matthew Bobrowski
  2021-04-19 10:13   ` Jan Kara
@ 2021-04-19 12:50   ` Christian Brauner
  2021-04-20  0:17     ` Matthew Bobrowski
  1 sibling, 1 reply; 35+ messages in thread
From: Christian Brauner @ 2021-04-19 12:50 UTC (permalink / raw)
  To: Matthew Bobrowski; +Cc: jack, amir73il, linux-api, linux-fsdevel

On Fri, Apr 16, 2021 at 09:22:09AM +1000, Matthew Bobrowski wrote:
> With the idea to have fanotify(7) return pidfds within a `struct
> fanotify_event_metadata`, pidfd_create()'s scope is to increased so
> that it can be called from other subsystems within the Linux
> kernel. The current `static` qualifier from its definition is to be
> removed and a new function declaration for pidfd_create() is to be
> added to the linux/pid.h header file.
> 
> Signed-off-by: Matthew Bobrowski <repnop@google.com>
> ---
>  include/linux/pid.h | 1 +
>  kernel/pid.c        | 2 +-
>  2 files changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/pid.h b/include/linux/pid.h
> index fa10acb8d6a4..af308e15f174 100644
> --- a/include/linux/pid.h
> +++ b/include/linux/pid.h
> @@ -78,6 +78,7 @@ struct file;
>  
>  extern struct pid *pidfd_pid(const struct file *file);
>  struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
> +int pidfd_create(struct pid *pid, unsigned int flags);
>  
>  static inline struct pid *get_pid(struct pid *pid)
>  {
> diff --git a/kernel/pid.c b/kernel/pid.c
> index ebdf9c60cd0b..91c4b6891c15 100644
> --- a/kernel/pid.c
> +++ b/kernel/pid.c
> @@ -553,7 +553,7 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
>   * Return: On success, a cloexec pidfd is returned.
>   *         On error, a negative errno number will be returned.
>   */
> -static int pidfd_create(struct pid *pid, unsigned int flags)

> +int pidfd_create(struct pid *pid, unsigned int flags)

Can you please add a comment to the kernel doc mentioning that this
helper is _not_ intended to be exported to modules? I don't want drivers
to get the idea that it's ok to start returning pidfds from everywhere
just yet.

And I think we should add sm like

	if (flags & ~(O_NONBLOCK | O_CLOEXEC | O_RDWR))
		return -EINVAL;

in pidfd_open() so future callers don't accidently create pidfds with
random flags we don't support.

Christian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-16  6:27   ` Amir Goldstein
  2021-04-16  7:05     ` Matthew Bobrowski
@ 2021-04-19 13:02     ` Christian Brauner
  1 sibling, 0 replies; 35+ messages in thread
From: Christian Brauner @ 2021-04-19 13:02 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Matthew Bobrowski, Jan Kara, Linux API, linux-fsdevel

On Fri, Apr 16, 2021 at 09:27:03AM +0300, Amir Goldstein wrote:
> On Fri, Apr 16, 2021 at 2:22 AM Matthew Bobrowski <repnop@google.com> wrote:
> >
> > Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
> > allows userspace applications to control whether a pidfd is to be
> > returned instead of a pid for `struct fanotify_event_metadata.pid`.
> >
> > FAN_REPORT_PIDFD is mutually exclusive with FAN_REPORT_TID as the
> > pidfd API is currently restricted to only support pidfd generation for
> > thread-group leaders. Attempting to set them both when calling
> > fanotify_init(2) will result in -EINVAL being returned to the
> > caller. As the pidfd API evolves and support is added for tids, this
> > is something that could be relaxed in the future.
> >
> > If pidfd creation fails, the pid in struct fanotify_event_metadata is
> > set to FAN_NOPIDFD(-1).
> 
> Hi Matthew,
> 
> All in all looks good, just a few small nits.
> 
> > Falling back and providing a pid instead of a
> > pidfd on pidfd creation failures was considered, although this could
> > possibly lead to confusion and unpredictability within userspace
> > applications as distinguishing between whether an actual pidfd or pid
> > was returned could be difficult, so it's best to be explicit.
> 
> I don't think this should have been even "considered" so I see little
> value in this paragraph in commit message.
> 
> >
> > Signed-off-by: Matthew Bobrowski <repnop@google.com>
> > ---
> >  fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++++++++++++---
> >  include/linux/fanotify.h           |  2 +-
> >  include/uapi/linux/fanotify.h      |  2 ++
> >  3 files changed, 33 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > index 9e0c1afac8bd..fd8ae88796a8 100644
> > --- a/fs/notify/fanotify/fanotify_user.c
> > +++ b/fs/notify/fanotify/fanotify_user.c
> > @@ -329,7 +329,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> >         struct fanotify_info *info = fanotify_event_info(event);
> >         unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
> >         struct file *f = NULL;
> > -       int ret, fd = FAN_NOFD;
> > +       int ret, pidfd, fd = FAN_NOFD;
> >         int info_type = 0;
> >
> >         pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> > @@ -340,7 +340,25 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> >         metadata.vers = FANOTIFY_METADATA_VERSION;
> >         metadata.reserved = 0;
> >         metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
> > -       metadata.pid = pid_vnr(event->pid);
> > +
> > +       if (FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD) &&
> > +               pid_has_task(event->pid, PIDTYPE_TGID)) {
> > +               /*
> > +                * Given FAN_REPORT_PIDFD is to be mutually exclusive with
> > +                * FAN_REPORT_TID, panic here if the mutual exclusion is ever
> > +                * blindly lifted without pidfds for threads actually being
> > +                * supported.
> > +                */
> > +               WARN_ON(FAN_GROUP_FLAG(group, FAN_REPORT_TID));
> 
> Better WARN_ON_ONCE() the outcome of this error is not terrible.
> Also in the comment above I would not refer to this warning as "panic".
> 
> > +
> > +               pidfd = pidfd_create(event->pid, 0);
> > +               if (unlikely(pidfd < 0))
> > +                       metadata.pid = FAN_NOPIDFD;
> > +               else
> > +                       metadata.pid = pidfd;
> > +       } else {
> > +               metadata.pid = pid_vnr(event->pid);
> > +       }
> 
> You should rebase your work on:
> git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git fsnotify
> and resolve conflicts with "unprivileged listener" code.
> 
> Need to make sure that pidfd is not reported to an unprivileged
> listener even if group was initialized by a privileged process.

I agree.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-15 23:22 ` [PATCH 2/2] fanotify: Add pidfd support to the fanotify API Matthew Bobrowski
  2021-04-16  6:27   ` Amir Goldstein
  2021-04-19 10:21   ` Jan Kara
@ 2021-04-19 13:20   ` Christian Brauner
  2021-04-19 13:53     ` Amir Goldstein
  2021-04-19 13:55     ` Jan Kara
  2 siblings, 2 replies; 35+ messages in thread
From: Christian Brauner @ 2021-04-19 13:20 UTC (permalink / raw)
  To: Matthew Bobrowski; +Cc: jack, amir73il, linux-api, linux-fsdevel

On Fri, Apr 16, 2021 at 09:22:25AM +1000, Matthew Bobrowski wrote:
> Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
> allows userspace applications to control whether a pidfd is to be
> returned instead of a pid for `struct fanotify_event_metadata.pid`.
> 
> FAN_REPORT_PIDFD is mutually exclusive with FAN_REPORT_TID as the
> pidfd API is currently restricted to only support pidfd generation for
> thread-group leaders. Attempting to set them both when calling
> fanotify_init(2) will result in -EINVAL being returned to the
> caller. As the pidfd API evolves and support is added for tids, this
> is something that could be relaxed in the future.
> 
> If pidfd creation fails, the pid in struct fanotify_event_metadata is
> set to FAN_NOPIDFD(-1). Falling back and providing a pid instead of a
> pidfd on pidfd creation failures was considered, although this could
> possibly lead to confusion and unpredictability within userspace
> applications as distinguishing between whether an actual pidfd or pid
> was returned could be difficult, so it's best to be explicit.
> 
> Signed-off-by: Matthew Bobrowski <repnop@google.com>
> ---
>  fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++++++++++++---
>  include/linux/fanotify.h           |  2 +-
>  include/uapi/linux/fanotify.h      |  2 ++
>  3 files changed, 33 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 9e0c1afac8bd..fd8ae88796a8 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -329,7 +329,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>  	struct fanotify_info *info = fanotify_event_info(event);
>  	unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
>  	struct file *f = NULL;
> -	int ret, fd = FAN_NOFD;
> +	int ret, pidfd, fd = FAN_NOFD;
>  	int info_type = 0;
>  
>  	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> @@ -340,7 +340,25 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
>  	metadata.vers = FANOTIFY_METADATA_VERSION;
>  	metadata.reserved = 0;
>  	metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
> -	metadata.pid = pid_vnr(event->pid);
> +
> +	if (FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD) &&
> +		pid_has_task(event->pid, PIDTYPE_TGID)) {
> +		/*
> +		 * Given FAN_REPORT_PIDFD is to be mutually exclusive with
> +		 * FAN_REPORT_TID, panic here if the mutual exclusion is ever
> +		 * blindly lifted without pidfds for threads actually being
> +		 * supported.
> +		 */
> +		WARN_ON(FAN_GROUP_FLAG(group, FAN_REPORT_TID));
> +
> +		pidfd = pidfd_create(event->pid, 0);
> +		if (unlikely(pidfd < 0))
> +			metadata.pid = FAN_NOPIDFD;
> +		else
> +			metadata.pid = pidfd;

I'm not a fan of overloading fields (Yes, we did this for the _legacy_
clone() syscall for CLONE_PIDFD/CLONE_PARENT_SETTID but in general it's
never a good idea if there are other options, imho.).
Could/should we consider the possibility of adding a new pidfd field to
struct fanotify_event_metadata?

Christian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-19 13:20   ` Christian Brauner
@ 2021-04-19 13:53     ` Amir Goldstein
  2021-04-19 14:44       ` Christian Brauner
  2021-04-19 13:55     ` Jan Kara
  1 sibling, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2021-04-19 13:53 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Matthew Bobrowski, Jan Kara, Linux API, linux-fsdevel

On Mon, Apr 19, 2021 at 4:20 PM Christian Brauner
<christian.brauner@ubuntu.com> wrote:
>
> On Fri, Apr 16, 2021 at 09:22:25AM +1000, Matthew Bobrowski wrote:
> > Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
> > allows userspace applications to control whether a pidfd is to be
> > returned instead of a pid for `struct fanotify_event_metadata.pid`.
> >
> > FAN_REPORT_PIDFD is mutually exclusive with FAN_REPORT_TID as the
> > pidfd API is currently restricted to only support pidfd generation for
> > thread-group leaders. Attempting to set them both when calling
> > fanotify_init(2) will result in -EINVAL being returned to the
> > caller. As the pidfd API evolves and support is added for tids, this
> > is something that could be relaxed in the future.
> >
> > If pidfd creation fails, the pid in struct fanotify_event_metadata is
> > set to FAN_NOPIDFD(-1). Falling back and providing a pid instead of a
> > pidfd on pidfd creation failures was considered, although this could
> > possibly lead to confusion and unpredictability within userspace
> > applications as distinguishing between whether an actual pidfd or pid
> > was returned could be difficult, so it's best to be explicit.
> >
> > Signed-off-by: Matthew Bobrowski <repnop@google.com>
> > ---
> >  fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++++++++++++---
> >  include/linux/fanotify.h           |  2 +-
> >  include/uapi/linux/fanotify.h      |  2 ++
> >  3 files changed, 33 insertions(+), 4 deletions(-)
> >
> > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > index 9e0c1afac8bd..fd8ae88796a8 100644
> > --- a/fs/notify/fanotify/fanotify_user.c
> > +++ b/fs/notify/fanotify/fanotify_user.c
> > @@ -329,7 +329,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> >       struct fanotify_info *info = fanotify_event_info(event);
> >       unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
> >       struct file *f = NULL;
> > -     int ret, fd = FAN_NOFD;
> > +     int ret, pidfd, fd = FAN_NOFD;
> >       int info_type = 0;
> >
> >       pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> > @@ -340,7 +340,25 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> >       metadata.vers = FANOTIFY_METADATA_VERSION;
> >       metadata.reserved = 0;
> >       metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
> > -     metadata.pid = pid_vnr(event->pid);
> > +
> > +     if (FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD) &&
> > +             pid_has_task(event->pid, PIDTYPE_TGID)) {
> > +             /*
> > +              * Given FAN_REPORT_PIDFD is to be mutually exclusive with
> > +              * FAN_REPORT_TID, panic here if the mutual exclusion is ever
> > +              * blindly lifted without pidfds for threads actually being
> > +              * supported.
> > +              */
> > +             WARN_ON(FAN_GROUP_FLAG(group, FAN_REPORT_TID));
> > +
> > +             pidfd = pidfd_create(event->pid, 0);
> > +             if (unlikely(pidfd < 0))
> > +                     metadata.pid = FAN_NOPIDFD;
> > +             else
> > +                     metadata.pid = pidfd;
>
> I'm not a fan of overloading fields (Yes, we did this for the _legacy_
> clone() syscall for CLONE_PIDFD/CLONE_PARENT_SETTID but in general it's
> never a good idea if there are other options, imho.).
> Could/should we consider the possibility of adding a new pidfd field to
> struct fanotify_event_metadata?

struct fanotify_event_metadata is fully booked.
We could use a variable length record, but IMO that's an overkill for pidfd.

If you are concerned about users wrongly interpreting pidfd as pid and
getting the the wrong process we could use invalid negative values for
pidfd, e.g. metadata.pid = ~pidfd.
This has the quality that 0 can mean both FAN_NOPIDFD and no pid.
UAPI to this would be abstracted with
#define FAN_EVENT_PIDFD(event) (~((event)->pid))

I am not convinced that this helps more than it hurts users, but abstracting
the accessor to pidfd could be a good idea anyway.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-19 13:20   ` Christian Brauner
  2021-04-19 13:53     ` Amir Goldstein
@ 2021-04-19 13:55     ` Jan Kara
  2021-04-19 15:02       ` Christian Brauner
  1 sibling, 1 reply; 35+ messages in thread
From: Jan Kara @ 2021-04-19 13:55 UTC (permalink / raw)
  To: Christian Brauner
  Cc: Matthew Bobrowski, jack, amir73il, linux-api, linux-fsdevel

On Mon 19-04-21 15:20:20, Christian Brauner wrote:
> On Fri, Apr 16, 2021 at 09:22:25AM +1000, Matthew Bobrowski wrote:
> > Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
> > allows userspace applications to control whether a pidfd is to be
> > returned instead of a pid for `struct fanotify_event_metadata.pid`.
> > 
> > FAN_REPORT_PIDFD is mutually exclusive with FAN_REPORT_TID as the
> > pidfd API is currently restricted to only support pidfd generation for
> > thread-group leaders. Attempting to set them both when calling
> > fanotify_init(2) will result in -EINVAL being returned to the
> > caller. As the pidfd API evolves and support is added for tids, this
> > is something that could be relaxed in the future.
> > 
> > If pidfd creation fails, the pid in struct fanotify_event_metadata is
> > set to FAN_NOPIDFD(-1). Falling back and providing a pid instead of a
> > pidfd on pidfd creation failures was considered, although this could
> > possibly lead to confusion and unpredictability within userspace
> > applications as distinguishing between whether an actual pidfd or pid
> > was returned could be difficult, so it's best to be explicit.
> > 
> > Signed-off-by: Matthew Bobrowski <repnop@google.com>
> > ---
> >  fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++++++++++++---
> >  include/linux/fanotify.h           |  2 +-
> >  include/uapi/linux/fanotify.h      |  2 ++
> >  3 files changed, 33 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > index 9e0c1afac8bd..fd8ae88796a8 100644
> > --- a/fs/notify/fanotify/fanotify_user.c
> > +++ b/fs/notify/fanotify/fanotify_user.c
> > @@ -329,7 +329,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> >  	struct fanotify_info *info = fanotify_event_info(event);
> >  	unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
> >  	struct file *f = NULL;
> > -	int ret, fd = FAN_NOFD;
> > +	int ret, pidfd, fd = FAN_NOFD;
> >  	int info_type = 0;
> >  
> >  	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> > @@ -340,7 +340,25 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> >  	metadata.vers = FANOTIFY_METADATA_VERSION;
> >  	metadata.reserved = 0;
> >  	metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
> > -	metadata.pid = pid_vnr(event->pid);
> > +
> > +	if (FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD) &&
> > +		pid_has_task(event->pid, PIDTYPE_TGID)) {
> > +		/*
> > +		 * Given FAN_REPORT_PIDFD is to be mutually exclusive with
> > +		 * FAN_REPORT_TID, panic here if the mutual exclusion is ever
> > +		 * blindly lifted without pidfds for threads actually being
> > +		 * supported.
> > +		 */
> > +		WARN_ON(FAN_GROUP_FLAG(group, FAN_REPORT_TID));
> > +
> > +		pidfd = pidfd_create(event->pid, 0);
> > +		if (unlikely(pidfd < 0))
> > +			metadata.pid = FAN_NOPIDFD;
> > +		else
> > +			metadata.pid = pidfd;
> 
> I'm not a fan of overloading fields (Yes, we did this for the _legacy_
> clone() syscall for CLONE_PIDFD/CLONE_PARENT_SETTID but in general it's
> never a good idea if there are other options, imho.).
> Could/should we consider the possibility of adding a new pidfd field to
> struct fanotify_event_metadata?

I'm not a huge fan of overloading fields either but in this particular case
I'm fine with that because:

a) storage size & type matches
b) it describes exactly the same information, just in a different way

It is not possible to store the pidfd elsewhere in fanotify_event_metadata.
But it is certainly possible to use extended event info to return pidfd
instead - similarly to how we return e.g. handle + fsid for some
notification groups. It just means somewhat longer events and more
complicated parsing of structured events in userspace. But as I write
above, in this case I don't think it is worth it - only if we think that
returning both pid and pidfd could ever be useful.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-19 13:53     ` Amir Goldstein
@ 2021-04-19 14:44       ` Christian Brauner
  0 siblings, 0 replies; 35+ messages in thread
From: Christian Brauner @ 2021-04-19 14:44 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Matthew Bobrowski, Jan Kara, Linux API, linux-fsdevel

On Mon, Apr 19, 2021 at 04:53:32PM +0300, Amir Goldstein wrote:
> On Mon, Apr 19, 2021 at 4:20 PM Christian Brauner
> <christian.brauner@ubuntu.com> wrote:
> >
> > On Fri, Apr 16, 2021 at 09:22:25AM +1000, Matthew Bobrowski wrote:
> > > Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
> > > allows userspace applications to control whether a pidfd is to be
> > > returned instead of a pid for `struct fanotify_event_metadata.pid`.
> > >
> > > FAN_REPORT_PIDFD is mutually exclusive with FAN_REPORT_TID as the
> > > pidfd API is currently restricted to only support pidfd generation for
> > > thread-group leaders. Attempting to set them both when calling
> > > fanotify_init(2) will result in -EINVAL being returned to the
> > > caller. As the pidfd API evolves and support is added for tids, this
> > > is something that could be relaxed in the future.
> > >
> > > If pidfd creation fails, the pid in struct fanotify_event_metadata is
> > > set to FAN_NOPIDFD(-1). Falling back and providing a pid instead of a
> > > pidfd on pidfd creation failures was considered, although this could
> > > possibly lead to confusion and unpredictability within userspace
> > > applications as distinguishing between whether an actual pidfd or pid
> > > was returned could be difficult, so it's best to be explicit.
> > >
> > > Signed-off-by: Matthew Bobrowski <repnop@google.com>
> > > ---
> > >  fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++++++++++++---
> > >  include/linux/fanotify.h           |  2 +-
> > >  include/uapi/linux/fanotify.h      |  2 ++
> > >  3 files changed, 33 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > > index 9e0c1afac8bd..fd8ae88796a8 100644
> > > --- a/fs/notify/fanotify/fanotify_user.c
> > > +++ b/fs/notify/fanotify/fanotify_user.c
> > > @@ -329,7 +329,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> > >       struct fanotify_info *info = fanotify_event_info(event);
> > >       unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
> > >       struct file *f = NULL;
> > > -     int ret, fd = FAN_NOFD;
> > > +     int ret, pidfd, fd = FAN_NOFD;
> > >       int info_type = 0;
> > >
> > >       pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> > > @@ -340,7 +340,25 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> > >       metadata.vers = FANOTIFY_METADATA_VERSION;
> > >       metadata.reserved = 0;
> > >       metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
> > > -     metadata.pid = pid_vnr(event->pid);
> > > +
> > > +     if (FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD) &&
> > > +             pid_has_task(event->pid, PIDTYPE_TGID)) {
> > > +             /*
> > > +              * Given FAN_REPORT_PIDFD is to be mutually exclusive with
> > > +              * FAN_REPORT_TID, panic here if the mutual exclusion is ever
> > > +              * blindly lifted without pidfds for threads actually being
> > > +              * supported.
> > > +              */
> > > +             WARN_ON(FAN_GROUP_FLAG(group, FAN_REPORT_TID));
> > > +
> > > +             pidfd = pidfd_create(event->pid, 0);
> > > +             if (unlikely(pidfd < 0))
> > > +                     metadata.pid = FAN_NOPIDFD;
> > > +             else
> > > +                     metadata.pid = pidfd;
> >
> > I'm not a fan of overloading fields (Yes, we did this for the _legacy_
> > clone() syscall for CLONE_PIDFD/CLONE_PARENT_SETTID but in general it's
> > never a good idea if there are other options, imho.).
> > Could/should we consider the possibility of adding a new pidfd field to
> > struct fanotify_event_metadata?
> 
> struct fanotify_event_metadata is fully booked.
> We could use a variable length record, but IMO that's an overkill for pidfd.
> 
> If you are concerned about users wrongly interpreting pidfd as pid and
> getting the the wrong process we could use invalid negative values for
> pidfd, e.g. metadata.pid = ~pidfd.
> This has the quality that 0 can mean both FAN_NOPIDFD and no pid.
> UAPI to this would be abstracted with
> #define FAN_EVENT_PIDFD(event) (~((event)->pid))
> 
> I am not convinced that this helps more than it hurts users, but abstracting
> the accessor to pidfd could be a good idea anyway.

Abstracting the accessor may be a good idea but let's not do the
negative values thing that seems clunky to me.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-19 13:55     ` Jan Kara
@ 2021-04-19 15:02       ` Christian Brauner
  2021-04-20  2:36         ` Matthew Bobrowski
  0 siblings, 1 reply; 35+ messages in thread
From: Christian Brauner @ 2021-04-19 15:02 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, amir73il, linux-api, linux-fsdevel

On Mon, Apr 19, 2021 at 03:55:50PM +0200, Jan Kara wrote:
> On Mon 19-04-21 15:20:20, Christian Brauner wrote:
> > On Fri, Apr 16, 2021 at 09:22:25AM +1000, Matthew Bobrowski wrote:
> > > Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
> > > allows userspace applications to control whether a pidfd is to be
> > > returned instead of a pid for `struct fanotify_event_metadata.pid`.
> > > 
> > > FAN_REPORT_PIDFD is mutually exclusive with FAN_REPORT_TID as the
> > > pidfd API is currently restricted to only support pidfd generation for
> > > thread-group leaders. Attempting to set them both when calling
> > > fanotify_init(2) will result in -EINVAL being returned to the
> > > caller. As the pidfd API evolves and support is added for tids, this
> > > is something that could be relaxed in the future.
> > > 
> > > If pidfd creation fails, the pid in struct fanotify_event_metadata is
> > > set to FAN_NOPIDFD(-1). Falling back and providing a pid instead of a
> > > pidfd on pidfd creation failures was considered, although this could
> > > possibly lead to confusion and unpredictability within userspace
> > > applications as distinguishing between whether an actual pidfd or pid
> > > was returned could be difficult, so it's best to be explicit.
> > > 
> > > Signed-off-by: Matthew Bobrowski <repnop@google.com>
> > > ---
> > >  fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++++++++++++---
> > >  include/linux/fanotify.h           |  2 +-
> > >  include/uapi/linux/fanotify.h      |  2 ++
> > >  3 files changed, 33 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > > index 9e0c1afac8bd..fd8ae88796a8 100644
> > > --- a/fs/notify/fanotify/fanotify_user.c
> > > +++ b/fs/notify/fanotify/fanotify_user.c
> > > @@ -329,7 +329,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> > >  	struct fanotify_info *info = fanotify_event_info(event);
> > >  	unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
> > >  	struct file *f = NULL;
> > > -	int ret, fd = FAN_NOFD;
> > > +	int ret, pidfd, fd = FAN_NOFD;
> > >  	int info_type = 0;
> > >  
> > >  	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> > > @@ -340,7 +340,25 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> > >  	metadata.vers = FANOTIFY_METADATA_VERSION;
> > >  	metadata.reserved = 0;
> > >  	metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
> > > -	metadata.pid = pid_vnr(event->pid);
> > > +
> > > +	if (FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD) &&
> > > +		pid_has_task(event->pid, PIDTYPE_TGID)) {
> > > +		/*
> > > +		 * Given FAN_REPORT_PIDFD is to be mutually exclusive with
> > > +		 * FAN_REPORT_TID, panic here if the mutual exclusion is ever
> > > +		 * blindly lifted without pidfds for threads actually being
> > > +		 * supported.
> > > +		 */
> > > +		WARN_ON(FAN_GROUP_FLAG(group, FAN_REPORT_TID));
> > > +
> > > +		pidfd = pidfd_create(event->pid, 0);
> > > +		if (unlikely(pidfd < 0))
> > > +			metadata.pid = FAN_NOPIDFD;
> > > +		else
> > > +			metadata.pid = pidfd;
> > 
> > I'm not a fan of overloading fields (Yes, we did this for the _legacy_
> > clone() syscall for CLONE_PIDFD/CLONE_PARENT_SETTID but in general it's
> > never a good idea if there are other options, imho.).
> > Could/should we consider the possibility of adding a new pidfd field to
> > struct fanotify_event_metadata?
> 
> I'm not a huge fan of overloading fields either but in this particular case
> I'm fine with that because:
> 
> a) storage size & type matches
> b) it describes exactly the same information, just in a different way
> 
> It is not possible to store the pidfd elsewhere in fanotify_event_metadata.
> But it is certainly possible to use extended event info to return pidfd
> instead - similarly to how we return e.g. handle + fsid for some
> notification groups. It just means somewhat longer events and more
> complicated parsing of structured events in userspace. But as I write
> above, in this case I don't think it is worth it - only if we think that
> returning both pid and pidfd could ever be useful.

Yeah, I don't hink users need to do that. After all they can parse the
PID out of the /proc/self/fdinfo/<pidfd> file.

A general question about struct fanotify_event_metadata and its
extensibility model:
looking through the code it seems that this struct is read via
fanotify_rad(). So the user is expected to supply a buffer with at least

#define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata))

bytes. In addition you can return the info to the user about how many
bytes the kernel has written from fanotify_read().

So afaict extending fanotify_event_metadata should be _fairly_
straightforward, right? It would essentially the complement to
copy_struct_from_user() which Aleksa and I added (1 or 2 years ago)
which deals with user->kernel and you're dealing with kernel->user:
- If the user supplied a buffer smaller than the minimum known struct
  size -> reject.
- If the user supplied a buffer < smaller than what the current kernel
  supports -> copy only what userspace knows about, and return the size
  userspace knows about.
- If the user supplied a buffer that is larger than what the current
  kernel knows about -> copy only what the kernel knows about, zero the
  rest, and return the kernel size.

Extension should then be fairly straightforward (64bit aligned
increments)?

Christian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/2] pidfd_create(): remove static qualifier and declare pidfd_create() in linux/pid.h
  2021-04-19 12:50   ` Christian Brauner
@ 2021-04-20  0:17     ` Matthew Bobrowski
  0 siblings, 0 replies; 35+ messages in thread
From: Matthew Bobrowski @ 2021-04-20  0:17 UTC (permalink / raw)
  To: Christian Brauner; +Cc: jack, amir73il, linux-api, linux-fsdevel

On Mon, Apr 19, 2021 at 02:50:33PM +0200, Christian Brauner wrote:
> On Fri, Apr 16, 2021 at 09:22:09AM +1000, Matthew Bobrowski wrote:
> > With the idea to have fanotify(7) return pidfds within a `struct
> > fanotify_event_metadata`, pidfd_create()'s scope is to increased so
> > that it can be called from other subsystems within the Linux
> > kernel. The current `static` qualifier from its definition is to be
> > removed and a new function declaration for pidfd_create() is to be
> > added to the linux/pid.h header file.
> > 
> > Signed-off-by: Matthew Bobrowski <repnop@google.com>
> > ---
> >  include/linux/pid.h | 1 +
> >  kernel/pid.c        | 2 +-
> >  2 files changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/linux/pid.h b/include/linux/pid.h
> > index fa10acb8d6a4..af308e15f174 100644
> > --- a/include/linux/pid.h
> > +++ b/include/linux/pid.h
> > @@ -78,6 +78,7 @@ struct file;
> >  
> >  extern struct pid *pidfd_pid(const struct file *file);
> >  struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags);
> > +int pidfd_create(struct pid *pid, unsigned int flags);
> >  
> >  static inline struct pid *get_pid(struct pid *pid)
> >  {
> > diff --git a/kernel/pid.c b/kernel/pid.c
> > index ebdf9c60cd0b..91c4b6891c15 100644
> > --- a/kernel/pid.c
> > +++ b/kernel/pid.c
> > @@ -553,7 +553,7 @@ struct pid *pidfd_get_pid(unsigned int fd, unsigned int *flags)
> >   * Return: On success, a cloexec pidfd is returned.
> >   *         On error, a negative errno number will be returned.
> >   */
> > -static int pidfd_create(struct pid *pid, unsigned int flags)
> 
> > +int pidfd_create(struct pid *pid, unsigned int flags)
> 
> Can you please add a comment to the kernel doc mentioning that this
> helper is _not_ intended to be exported to modules? I don't want drivers
> to get the idea that it's ok to start returning pidfds from everywhere
> just yet.

Sure, this makes sense to me. 
 
> And I think we should add sm like
> 
> 	if (flags & ~(O_NONBLOCK | O_CLOEXEC | O_RDWR))
> 		return -EINVAL;
> 
> in pidfd_open() so future callers don't accidently create pidfds with
> random flags we don't support.

In the context of exporting pidfd_create() to the rest of the kernel,
presumably we should be adding this flag check to pidfd_create() and
not pidfd_open(), right? I gather that's what you actually meant.

/M

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-19 10:21   ` Jan Kara
@ 2021-04-20  1:35     ` Matthew Bobrowski
  0 siblings, 0 replies; 35+ messages in thread
From: Matthew Bobrowski @ 2021-04-20  1:35 UTC (permalink / raw)
  To: Jan Kara; +Cc: amir73il, christian.brauner, linux-api, linux-fsdevel

On Mon, Apr 19, 2021 at 12:21:39PM +0200, Jan Kara wrote:
> On Fri 16-04-21 09:22:25, Matthew Bobrowski wrote:
> > Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
> > allows userspace applications to control whether a pidfd is to be
> > returned instead of a pid for `struct fanotify_event_metadata.pid`.
> > 
> > FAN_REPORT_PIDFD is mutually exclusive with FAN_REPORT_TID as the
> > pidfd API is currently restricted to only support pidfd generation for
> > thread-group leaders. Attempting to set them both when calling
> > fanotify_init(2) will result in -EINVAL being returned to the
> > caller. As the pidfd API evolves and support is added for tids, this
> > is something that could be relaxed in the future.
> > 
> > If pidfd creation fails, the pid in struct fanotify_event_metadata is
> > set to FAN_NOPIDFD(-1). Falling back and providing a pid instead of a
> > pidfd on pidfd creation failures was considered, although this could
> > possibly lead to confusion and unpredictability within userspace
> > applications as distinguishing between whether an actual pidfd or pid
> > was returned could be difficult, so it's best to be explicit.
> > 
> > Signed-off-by: Matthew Bobrowski <repnop@google.com>

> Overall this looks OK to me. Just one style nit & one question below in
> addition to what Amir wrote.

Thanks for the quick review Jan!

> > ---
> >  fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++++++++++++---
> >  include/linux/fanotify.h           |  2 +-
> >  include/uapi/linux/fanotify.h      |  2 ++
> >  3 files changed, 33 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > index 9e0c1afac8bd..fd8ae88796a8 100644
> > --- a/fs/notify/fanotify/fanotify_user.c
> > +++ b/fs/notify/fanotify/fanotify_user.c
> > @@ -329,7 +329,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> >  	struct fanotify_info *info = fanotify_event_info(event);
> >  	unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
> >  	struct file *f = NULL;
> > -	int ret, fd = FAN_NOFD;
> > +	int ret, pidfd, fd = FAN_NOFD;
> >  	int info_type = 0;
> >  
> >  	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> > @@ -340,7 +340,25 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> >  	metadata.vers = FANOTIFY_METADATA_VERSION;
> >  	metadata.reserved = 0;
> >  	metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
> > -	metadata.pid = pid_vnr(event->pid);
> > +
> > +	if (FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD) &&
> > +		pid_has_task(event->pid, PIDTYPE_TGID)) {
> 
> Please align the rest of the condition to the opening brace. I.e., like:
> 
> 	if (FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD) &&
> 	    pid_has_task(event->pid, PIDTYPE_TGID)) {

ACK.

> BTW, why is the pid_has_task() check here?

My thought was that we add a means of ensuring that event->pid holds a
reference to a thread-group leader as pidfds aren't supported for
individual threads just yet. The same check is implemented in
pidfd_open(), so I thought to make the preliminary checks consistent.

Actually, now that I've writeten that, perhaps the pid_has_task()
check can be rolled up into pidfd_create()?

> And why is it OK to fall back to returning pid if pid_has_task() is
> false?

Ah, I see, it's not OK. Good catch Jan! I will need to fix this up in
the follow up series.

/M

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-19 15:02       ` Christian Brauner
@ 2021-04-20  2:36         ` Matthew Bobrowski
  2021-04-21  8:04           ` Jan Kara
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Bobrowski @ 2021-04-20  2:36 UTC (permalink / raw)
  To: Christian Brauner; +Cc: Jan Kara, amir73il, linux-api, linux-fsdevel

On Mon, Apr 19, 2021 at 05:02:33PM +0200, Christian Brauner wrote:
> On Mon, Apr 19, 2021 at 03:55:50PM +0200, Jan Kara wrote:
> > On Mon 19-04-21 15:20:20, Christian Brauner wrote:
> > > On Fri, Apr 16, 2021 at 09:22:25AM +1000, Matthew Bobrowski wrote:
> > > > Introduce a new flag FAN_REPORT_PIDFD for fanotify_init(2) which
> > > > allows userspace applications to control whether a pidfd is to be
> > > > returned instead of a pid for `struct fanotify_event_metadata.pid`.
> > > > 
> > > > FAN_REPORT_PIDFD is mutually exclusive with FAN_REPORT_TID as the
> > > > pidfd API is currently restricted to only support pidfd generation for
> > > > thread-group leaders. Attempting to set them both when calling
> > > > fanotify_init(2) will result in -EINVAL being returned to the
> > > > caller. As the pidfd API evolves and support is added for tids, this
> > > > is something that could be relaxed in the future.
> > > > 
> > > > If pidfd creation fails, the pid in struct fanotify_event_metadata is
> > > > set to FAN_NOPIDFD(-1). Falling back and providing a pid instead of a
> > > > pidfd on pidfd creation failures was considered, although this could
> > > > possibly lead to confusion and unpredictability within userspace
> > > > applications as distinguishing between whether an actual pidfd or pid
> > > > was returned could be difficult, so it's best to be explicit.
> > > > 
> > > > Signed-off-by: Matthew Bobrowski <repnop@google.com>
> > > > ---
> > > >  fs/notify/fanotify/fanotify_user.c | 33 +++++++++++++++++++++++++++---
> > > >  include/linux/fanotify.h           |  2 +-
> > > >  include/uapi/linux/fanotify.h      |  2 ++
> > > >  3 files changed, 33 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> > > > index 9e0c1afac8bd..fd8ae88796a8 100644
> > > > --- a/fs/notify/fanotify/fanotify_user.c
> > > > +++ b/fs/notify/fanotify/fanotify_user.c
> > > > @@ -329,7 +329,7 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> > > >  	struct fanotify_info *info = fanotify_event_info(event);
> > > >  	unsigned int fid_mode = FAN_GROUP_FLAG(group, FANOTIFY_FID_BITS);
> > > >  	struct file *f = NULL;
> > > > -	int ret, fd = FAN_NOFD;
> > > > +	int ret, pidfd, fd = FAN_NOFD;
> > > >  	int info_type = 0;
> > > >  
> > > >  	pr_debug("%s: group=%p event=%p\n", __func__, group, event);
> > > > @@ -340,7 +340,25 @@ static ssize_t copy_event_to_user(struct fsnotify_group *group,
> > > >  	metadata.vers = FANOTIFY_METADATA_VERSION;
> > > >  	metadata.reserved = 0;
> > > >  	metadata.mask = event->mask & FANOTIFY_OUTGOING_EVENTS;
> > > > -	metadata.pid = pid_vnr(event->pid);
> > > > +
> > > > +	if (FAN_GROUP_FLAG(group, FAN_REPORT_PIDFD) &&
> > > > +		pid_has_task(event->pid, PIDTYPE_TGID)) {
> > > > +		/*
> > > > +		 * Given FAN_REPORT_PIDFD is to be mutually exclusive with
> > > > +		 * FAN_REPORT_TID, panic here if the mutual exclusion is ever
> > > > +		 * blindly lifted without pidfds for threads actually being
> > > > +		 * supported.
> > > > +		 */
> > > > +		WARN_ON(FAN_GROUP_FLAG(group, FAN_REPORT_TID));
> > > > +
> > > > +		pidfd = pidfd_create(event->pid, 0);
> > > > +		if (unlikely(pidfd < 0))
> > > > +			metadata.pid = FAN_NOPIDFD;
> > > > +		else
> > > > +			metadata.pid = pidfd;
> > > 
> > > I'm not a fan of overloading fields (Yes, we did this for the _legacy_
> > > clone() syscall for CLONE_PIDFD/CLONE_PARENT_SETTID but in general it's
> > > never a good idea if there are other options, imho.).
> > > Could/should we consider the possibility of adding a new pidfd field to
> > > struct fanotify_event_metadata?
> > 
> > I'm not a huge fan of overloading fields either but in this particular case
> > I'm fine with that because:
> > 
> > a) storage size & type matches
> > b) it describes exactly the same information, just in a different way
> > 
> > It is not possible to store the pidfd elsewhere in fanotify_event_metadata.
> > But it is certainly possible to use extended event info to return pidfd
> > instead - similarly to how we return e.g. handle + fsid for some
> > notification groups. It just means somewhat longer events and more
> > complicated parsing of structured events in userspace. But as I write
> > above, in this case I don't think it is worth it - only if we think that
> > returning both pid and pidfd could ever be useful.
> 
> Yeah, I don't hink users need to do that. After all they can parse the
> PID out of the /proc/self/fdinfo/<pidfd> file.

I agree. We can do this little PID extraction dance in userspace, it's
not a big deal at this point IMO.

> A general question about struct fanotify_event_metadata and its
> extensibility model:
> looking through the code it seems that this struct is read via
> fanotify_rad(). So the user is expected to supply a buffer with at least
> 
> #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata))
> 
> bytes. In addition you can return the info to the user about how many
> bytes the kernel has written from fanotify_read().
> 
> So afaict extending fanotify_event_metadata should be _fairly_
> straightforward, right? It would essentially the complement to
> copy_struct_from_user() which Aleksa and I added (1 or 2 years ago)
> which deals with user->kernel and you're dealing with kernel->user:
> - If the user supplied a buffer smaller than the minimum known struct
>   size -> reject.
> - If the user supplied a buffer < smaller than what the current kernel
>   supports -> copy only what userspace knows about, and return the size
>   userspace knows about.
> - If the user supplied a buffer that is larger than what the current
>   kernel knows about -> copy only what the kernel knows about, zero the
>   rest, and return the kernel size.
> 
> Extension should then be fairly straightforward (64bit aligned
> increments)?

You'd think that it's fairly straightforward, but I have a feeling
that the whole fanotify_event_metadata extensibility discussion and
the current limitation to do so revolves around whether it can be
achieved in a way which can guarantee that no userspace applications
would break. I think the answer to this is that there's no guarantee
because of <<reasons>>, so the decision to extend fanotify's feature
set was done via other means i.e. introduction of additional
structures.

/M

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-20  2:36         ` Matthew Bobrowski
@ 2021-04-21  8:04           ` Jan Kara
  2021-04-21  9:29             ` Amir Goldstein
  2021-04-22 23:06             ` Matthew Bobrowski
  0 siblings, 2 replies; 35+ messages in thread
From: Jan Kara @ 2021-04-21  8:04 UTC (permalink / raw)
  To: Matthew Bobrowski
  Cc: Christian Brauner, Jan Kara, amir73il, linux-api, linux-fsdevel

On Tue 20-04-21 12:36:59, Matthew Bobrowski wrote:
> On Mon, Apr 19, 2021 at 05:02:33PM +0200, Christian Brauner wrote:
> > A general question about struct fanotify_event_metadata and its
> > extensibility model:
> > looking through the code it seems that this struct is read via
> > fanotify_rad(). So the user is expected to supply a buffer with at least
> > 
> > #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata))
> > 
> > bytes. In addition you can return the info to the user about how many
> > bytes the kernel has written from fanotify_read().
> > 
> > So afaict extending fanotify_event_metadata should be _fairly_
> > straightforward, right? It would essentially the complement to
> > copy_struct_from_user() which Aleksa and I added (1 or 2 years ago)
> > which deals with user->kernel and you're dealing with kernel->user:
> > - If the user supplied a buffer smaller than the minimum known struct
> >   size -> reject.
> > - If the user supplied a buffer < smaller than what the current kernel
> >   supports -> copy only what userspace knows about, and return the size
> >   userspace knows about.
> > - If the user supplied a buffer that is larger than what the current
> >   kernel knows about -> copy only what the kernel knows about, zero the
> >   rest, and return the kernel size.
> > 
> > Extension should then be fairly straightforward (64bit aligned
> > increments)?
> 
> You'd think that it's fairly straightforward, but I have a feeling
> that the whole fanotify_event_metadata extensibility discussion and
> the current limitation to do so revolves around whether it can be
> achieved in a way which can guarantee that no userspace applications
> would break. I think the answer to this is that there's no guarantee
> because of <<reasons>>, so the decision to extend fanotify's feature
> set was done via other means i.e. introduction of additional
> structures.

There's no real problem extending fanotify_event_metadata. We already have
multiple extended version of that structure in use (see e.g. FAN_REPORT_FID
flag and its effect, extended versions of the structure in
include/uapi/linux/fanotify.h). The key for backward compatibility is to
create extended struct only when explicitely requested by a flag when
creating notification group - and that would be the case here -
FAN_REPORT_PIDFD or how you called it. It is just that extending the
structure means adding 8 bytes to each event and parsing extended structure
is more cumbersome than just fetching s32 from a well known location.

On the other hand extended structure is self-describing (i.e., you can tell
the meaning of all the fields just from the event you receive) while
reusing 'pid' field means that you have to know how the notification group
was created (whether FAN_REPORT_PIDFD was used or not) to be able to
interpret the contents of the event. Actually I think the self-describing
feature of fanotify event stream is useful (e.g. when application manages
multiple fanotify groups or when fanotify group descriptors are passed
among processes) so now I'm more leaning towards using the extended
structure instead of reusing 'pid' as Christian suggests. I'm sorry for the
confusion.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-21  8:04           ` Jan Kara
@ 2021-04-21  9:29             ` Amir Goldstein
  2021-04-21 10:00               ` Jan Kara
  2021-04-22 23:06             ` Matthew Bobrowski
  1 sibling, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2021-04-21  9:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, Christian Brauner, Linux API, linux-fsdevel

On Wed, Apr 21, 2021 at 11:04 AM Jan Kara <jack@suse.cz> wrote:
>
> On Tue 20-04-21 12:36:59, Matthew Bobrowski wrote:
> > On Mon, Apr 19, 2021 at 05:02:33PM +0200, Christian Brauner wrote:
> > > A general question about struct fanotify_event_metadata and its
> > > extensibility model:
> > > looking through the code it seems that this struct is read via
> > > fanotify_rad(). So the user is expected to supply a buffer with at least
> > >
> > > #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata))
> > >
> > > bytes. In addition you can return the info to the user about how many
> > > bytes the kernel has written from fanotify_read().
> > >
> > > So afaict extending fanotify_event_metadata should be _fairly_
> > > straightforward, right? It would essentially the complement to
> > > copy_struct_from_user() which Aleksa and I added (1 or 2 years ago)
> > > which deals with user->kernel and you're dealing with kernel->user:
> > > - If the user supplied a buffer smaller than the minimum known struct
> > >   size -> reject.
> > > - If the user supplied a buffer < smaller than what the current kernel
> > >   supports -> copy only what userspace knows about, and return the size
> > >   userspace knows about.
> > > - If the user supplied a buffer that is larger than what the current
> > >   kernel knows about -> copy only what the kernel knows about, zero the
> > >   rest, and return the kernel size.
> > >
> > > Extension should then be fairly straightforward (64bit aligned
> > > increments)?
> >
> > You'd think that it's fairly straightforward, but I have a feeling
> > that the whole fanotify_event_metadata extensibility discussion and
> > the current limitation to do so revolves around whether it can be
> > achieved in a way which can guarantee that no userspace applications
> > would break. I think the answer to this is that there's no guarantee
> > because of <<reasons>>, so the decision to extend fanotify's feature
> > set was done via other means i.e. introduction of additional
> > structures.
>
> There's no real problem extending fanotify_event_metadata. We already have
> multiple extended version of that structure in use (see e.g. FAN_REPORT_FID
> flag and its effect, extended versions of the structure in
> include/uapi/linux/fanotify.h). The key for backward compatibility is to
> create extended struct only when explicitely requested by a flag when
> creating notification group - and that would be the case here -
> FAN_REPORT_PIDFD or how you called it. It is just that extending the
> structure means adding 8 bytes to each event and parsing extended structure
> is more cumbersome than just fetching s32 from a well known location.
>
> On the other hand extended structure is self-describing (i.e., you can tell
> the meaning of all the fields just from the event you receive) while
> reusing 'pid' field means that you have to know how the notification group
> was created (whether FAN_REPORT_PIDFD was used or not) to be able to
> interpret the contents of the event. Actually I think the self-describing
> feature of fanotify event stream is useful (e.g. when application manages
> multiple fanotify groups or when fanotify group descriptors are passed
> among processes) so now I'm more leaning towards using the extended
> structure instead of reusing 'pid' as Christian suggests. I'm sorry for the
> confusion.
>

But there is a middle path option.
The event metadata can be self described without extending it:

 struct fanotify_event_metadata {
        __u32 event_len;
        __u8 vers;
-       __u8 reserved;
+#define FANOTIFY_METADATA_FLAG_PIDFD   1
+       __u8 flags;
        __u16 metadata_len;
        __aligned_u64 mask;
        __s32 fd;

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-21  9:29             ` Amir Goldstein
@ 2021-04-21 10:00               ` Jan Kara
  2021-04-21 10:12                 ` Amir Goldstein
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2021-04-21 10:00 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Matthew Bobrowski, Christian Brauner, Linux API, linux-fsdevel

On Wed 21-04-21 12:29:14, Amir Goldstein wrote:
> On Wed, Apr 21, 2021 at 11:04 AM Jan Kara <jack@suse.cz> wrote:
> >
> > On Tue 20-04-21 12:36:59, Matthew Bobrowski wrote:
> > > On Mon, Apr 19, 2021 at 05:02:33PM +0200, Christian Brauner wrote:
> > > > A general question about struct fanotify_event_metadata and its
> > > > extensibility model:
> > > > looking through the code it seems that this struct is read via
> > > > fanotify_rad(). So the user is expected to supply a buffer with at least
> > > >
> > > > #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata))
> > > >
> > > > bytes. In addition you can return the info to the user about how many
> > > > bytes the kernel has written from fanotify_read().
> > > >
> > > > So afaict extending fanotify_event_metadata should be _fairly_
> > > > straightforward, right? It would essentially the complement to
> > > > copy_struct_from_user() which Aleksa and I added (1 or 2 years ago)
> > > > which deals with user->kernel and you're dealing with kernel->user:
> > > > - If the user supplied a buffer smaller than the minimum known struct
> > > >   size -> reject.
> > > > - If the user supplied a buffer < smaller than what the current kernel
> > > >   supports -> copy only what userspace knows about, and return the size
> > > >   userspace knows about.
> > > > - If the user supplied a buffer that is larger than what the current
> > > >   kernel knows about -> copy only what the kernel knows about, zero the
> > > >   rest, and return the kernel size.
> > > >
> > > > Extension should then be fairly straightforward (64bit aligned
> > > > increments)?
> > >
> > > You'd think that it's fairly straightforward, but I have a feeling
> > > that the whole fanotify_event_metadata extensibility discussion and
> > > the current limitation to do so revolves around whether it can be
> > > achieved in a way which can guarantee that no userspace applications
> > > would break. I think the answer to this is that there's no guarantee
> > > because of <<reasons>>, so the decision to extend fanotify's feature
> > > set was done via other means i.e. introduction of additional
> > > structures.
> >
> > There's no real problem extending fanotify_event_metadata. We already have
> > multiple extended version of that structure in use (see e.g. FAN_REPORT_FID
> > flag and its effect, extended versions of the structure in
> > include/uapi/linux/fanotify.h). The key for backward compatibility is to
> > create extended struct only when explicitely requested by a flag when
> > creating notification group - and that would be the case here -
> > FAN_REPORT_PIDFD or how you called it. It is just that extending the
> > structure means adding 8 bytes to each event and parsing extended structure
> > is more cumbersome than just fetching s32 from a well known location.
> >
> > On the other hand extended structure is self-describing (i.e., you can tell
> > the meaning of all the fields just from the event you receive) while
> > reusing 'pid' field means that you have to know how the notification group
> > was created (whether FAN_REPORT_PIDFD was used or not) to be able to
> > interpret the contents of the event. Actually I think the self-describing
> > feature of fanotify event stream is useful (e.g. when application manages
> > multiple fanotify groups or when fanotify group descriptors are passed
> > among processes) so now I'm more leaning towards using the extended
> > structure instead of reusing 'pid' as Christian suggests. I'm sorry for the
> > confusion.
> >
> 
> But there is a middle path option.
> The event metadata can be self described without extending it:
> 
>  struct fanotify_event_metadata {
>         __u32 event_len;
>         __u8 vers;
> -       __u8 reserved;
> +#define FANOTIFY_METADATA_FLAG_PIDFD   1
> +       __u8 flags;
>         __u16 metadata_len;
>         __aligned_u64 mask;
>         __s32 fd;

Well, yes, but do we want another way to describe what fanotify_event_metadata
actually contains? I don't think parsing extended event information is that
bad to make changes like this worth it...

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-21 10:00               ` Jan Kara
@ 2021-04-21 10:12                 ` Amir Goldstein
  2021-04-21 13:48                   ` Jan Kara
  0 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2021-04-21 10:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, Christian Brauner, Linux API, linux-fsdevel

On Wed, Apr 21, 2021 at 1:00 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 21-04-21 12:29:14, Amir Goldstein wrote:
> > On Wed, Apr 21, 2021 at 11:04 AM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Tue 20-04-21 12:36:59, Matthew Bobrowski wrote:
> > > > On Mon, Apr 19, 2021 at 05:02:33PM +0200, Christian Brauner wrote:
> > > > > A general question about struct fanotify_event_metadata and its
> > > > > extensibility model:
> > > > > looking through the code it seems that this struct is read via
> > > > > fanotify_rad(). So the user is expected to supply a buffer with at least
> > > > >
> > > > > #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata))
> > > > >
> > > > > bytes. In addition you can return the info to the user about how many
> > > > > bytes the kernel has written from fanotify_read().
> > > > >
> > > > > So afaict extending fanotify_event_metadata should be _fairly_
> > > > > straightforward, right? It would essentially the complement to
> > > > > copy_struct_from_user() which Aleksa and I added (1 or 2 years ago)
> > > > > which deals with user->kernel and you're dealing with kernel->user:
> > > > > - If the user supplied a buffer smaller than the minimum known struct
> > > > >   size -> reject.
> > > > > - If the user supplied a buffer < smaller than what the current kernel
> > > > >   supports -> copy only what userspace knows about, and return the size
> > > > >   userspace knows about.
> > > > > - If the user supplied a buffer that is larger than what the current
> > > > >   kernel knows about -> copy only what the kernel knows about, zero the
> > > > >   rest, and return the kernel size.
> > > > >
> > > > > Extension should then be fairly straightforward (64bit aligned
> > > > > increments)?
> > > >
> > > > You'd think that it's fairly straightforward, but I have a feeling
> > > > that the whole fanotify_event_metadata extensibility discussion and
> > > > the current limitation to do so revolves around whether it can be
> > > > achieved in a way which can guarantee that no userspace applications
> > > > would break. I think the answer to this is that there's no guarantee
> > > > because of <<reasons>>, so the decision to extend fanotify's feature
> > > > set was done via other means i.e. introduction of additional
> > > > structures.
> > >
> > > There's no real problem extending fanotify_event_metadata. We already have
> > > multiple extended version of that structure in use (see e.g. FAN_REPORT_FID
> > > flag and its effect, extended versions of the structure in
> > > include/uapi/linux/fanotify.h). The key for backward compatibility is to
> > > create extended struct only when explicitely requested by a flag when
> > > creating notification group - and that would be the case here -
> > > FAN_REPORT_PIDFD or how you called it. It is just that extending the
> > > structure means adding 8 bytes to each event and parsing extended structure
> > > is more cumbersome than just fetching s32 from a well known location.
> > >
> > > On the other hand extended structure is self-describing (i.e., you can tell
> > > the meaning of all the fields just from the event you receive) while
> > > reusing 'pid' field means that you have to know how the notification group
> > > was created (whether FAN_REPORT_PIDFD was used or not) to be able to
> > > interpret the contents of the event. Actually I think the self-describing
> > > feature of fanotify event stream is useful (e.g. when application manages
> > > multiple fanotify groups or when fanotify group descriptors are passed
> > > among processes) so now I'm more leaning towards using the extended
> > > structure instead of reusing 'pid' as Christian suggests. I'm sorry for the
> > > confusion.
> > >
> >
> > But there is a middle path option.
> > The event metadata can be self described without extending it:
> >
> >  struct fanotify_event_metadata {
> >         __u32 event_len;
> >         __u8 vers;
> > -       __u8 reserved;
> > +#define FANOTIFY_METADATA_FLAG_PIDFD   1
> > +       __u8 flags;
> >         __u16 metadata_len;
> >         __aligned_u64 mask;
> >         __s32 fd;
>
> Well, yes, but do we want another way to describe what fanotify_event_metadata
> actually contains? I don't think parsing extended event information is that
> bad to make changes like this worth it...

Fine by me.
But in that case, we should report pidfd in addition to pid.

There is an advantage in doing that -
For the use case of filtering out events generated by self_pid
of the listener, there is no need to follow pidfd in order to check
if event->pid == self_pid.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-21 10:12                 ` Amir Goldstein
@ 2021-04-21 13:48                   ` Jan Kara
  2021-04-21 14:46                     ` Christian Brauner
  0 siblings, 1 reply; 35+ messages in thread
From: Jan Kara @ 2021-04-21 13:48 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Jan Kara, Matthew Bobrowski, Christian Brauner, Linux API, linux-fsdevel

On Wed 21-04-21 13:12:23, Amir Goldstein wrote:
> On Wed, Apr 21, 2021 at 1:00 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Wed 21-04-21 12:29:14, Amir Goldstein wrote:
> > > On Wed, Apr 21, 2021 at 11:04 AM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Tue 20-04-21 12:36:59, Matthew Bobrowski wrote:
> > > > > On Mon, Apr 19, 2021 at 05:02:33PM +0200, Christian Brauner wrote:
> > > > > > A general question about struct fanotify_event_metadata and its
> > > > > > extensibility model:
> > > > > > looking through the code it seems that this struct is read via
> > > > > > fanotify_rad(). So the user is expected to supply a buffer with at least
> > > > > >
> > > > > > #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata))
> > > > > >
> > > > > > bytes. In addition you can return the info to the user about how many
> > > > > > bytes the kernel has written from fanotify_read().
> > > > > >
> > > > > > So afaict extending fanotify_event_metadata should be _fairly_
> > > > > > straightforward, right? It would essentially the complement to
> > > > > > copy_struct_from_user() which Aleksa and I added (1 or 2 years ago)
> > > > > > which deals with user->kernel and you're dealing with kernel->user:
> > > > > > - If the user supplied a buffer smaller than the minimum known struct
> > > > > >   size -> reject.
> > > > > > - If the user supplied a buffer < smaller than what the current kernel
> > > > > >   supports -> copy only what userspace knows about, and return the size
> > > > > >   userspace knows about.
> > > > > > - If the user supplied a buffer that is larger than what the current
> > > > > >   kernel knows about -> copy only what the kernel knows about, zero the
> > > > > >   rest, and return the kernel size.
> > > > > >
> > > > > > Extension should then be fairly straightforward (64bit aligned
> > > > > > increments)?
> > > > >
> > > > > You'd think that it's fairly straightforward, but I have a feeling
> > > > > that the whole fanotify_event_metadata extensibility discussion and
> > > > > the current limitation to do so revolves around whether it can be
> > > > > achieved in a way which can guarantee that no userspace applications
> > > > > would break. I think the answer to this is that there's no guarantee
> > > > > because of <<reasons>>, so the decision to extend fanotify's feature
> > > > > set was done via other means i.e. introduction of additional
> > > > > structures.
> > > >
> > > > There's no real problem extending fanotify_event_metadata. We already have
> > > > multiple extended version of that structure in use (see e.g. FAN_REPORT_FID
> > > > flag and its effect, extended versions of the structure in
> > > > include/uapi/linux/fanotify.h). The key for backward compatibility is to
> > > > create extended struct only when explicitely requested by a flag when
> > > > creating notification group - and that would be the case here -
> > > > FAN_REPORT_PIDFD or how you called it. It is just that extending the
> > > > structure means adding 8 bytes to each event and parsing extended structure
> > > > is more cumbersome than just fetching s32 from a well known location.
> > > >
> > > > On the other hand extended structure is self-describing (i.e., you can tell
> > > > the meaning of all the fields just from the event you receive) while
> > > > reusing 'pid' field means that you have to know how the notification group
> > > > was created (whether FAN_REPORT_PIDFD was used or not) to be able to
> > > > interpret the contents of the event. Actually I think the self-describing
> > > > feature of fanotify event stream is useful (e.g. when application manages
> > > > multiple fanotify groups or when fanotify group descriptors are passed
> > > > among processes) so now I'm more leaning towards using the extended
> > > > structure instead of reusing 'pid' as Christian suggests. I'm sorry for the
> > > > confusion.
> > > >
> > >
> > > But there is a middle path option.
> > > The event metadata can be self described without extending it:
> > >
> > >  struct fanotify_event_metadata {
> > >         __u32 event_len;
> > >         __u8 vers;
> > > -       __u8 reserved;
> > > +#define FANOTIFY_METADATA_FLAG_PIDFD   1
> > > +       __u8 flags;
> > >         __u16 metadata_len;
> > >         __aligned_u64 mask;
> > >         __s32 fd;
> >
> > Well, yes, but do we want another way to describe what fanotify_event_metadata
> > actually contains? I don't think parsing extended event information is that
> > bad to make changes like this worth it...
> 
> Fine by me.
> But in that case, we should report pidfd in addition to pid.

Agreed.

> There is an advantage in doing that -
> For the use case of filtering out events generated by self_pid
> of the listener, there is no need to follow pidfd in order to check
> if event->pid == self_pid.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-21 13:48                   ` Jan Kara
@ 2021-04-21 14:46                     ` Christian Brauner
  0 siblings, 0 replies; 35+ messages in thread
From: Christian Brauner @ 2021-04-21 14:46 UTC (permalink / raw)
  To: Jan Kara; +Cc: Amir Goldstein, Matthew Bobrowski, Linux API, linux-fsdevel

On Wed, Apr 21, 2021 at 03:48:14PM +0200, Jan Kara wrote:
> On Wed 21-04-21 13:12:23, Amir Goldstein wrote:
> > On Wed, Apr 21, 2021 at 1:00 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Wed 21-04-21 12:29:14, Amir Goldstein wrote:
> > > > On Wed, Apr 21, 2021 at 11:04 AM Jan Kara <jack@suse.cz> wrote:
> > > > >
> > > > > On Tue 20-04-21 12:36:59, Matthew Bobrowski wrote:
> > > > > > On Mon, Apr 19, 2021 at 05:02:33PM +0200, Christian Brauner wrote:
> > > > > > > A general question about struct fanotify_event_metadata and its
> > > > > > > extensibility model:
> > > > > > > looking through the code it seems that this struct is read via
> > > > > > > fanotify_rad(). So the user is expected to supply a buffer with at least
> > > > > > >
> > > > > > > #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata))
> > > > > > >
> > > > > > > bytes. In addition you can return the info to the user about how many
> > > > > > > bytes the kernel has written from fanotify_read().
> > > > > > >
> > > > > > > So afaict extending fanotify_event_metadata should be _fairly_
> > > > > > > straightforward, right? It would essentially the complement to
> > > > > > > copy_struct_from_user() which Aleksa and I added (1 or 2 years ago)
> > > > > > > which deals with user->kernel and you're dealing with kernel->user:
> > > > > > > - If the user supplied a buffer smaller than the minimum known struct
> > > > > > >   size -> reject.
> > > > > > > - If the user supplied a buffer < smaller than what the current kernel
> > > > > > >   supports -> copy only what userspace knows about, and return the size
> > > > > > >   userspace knows about.
> > > > > > > - If the user supplied a buffer that is larger than what the current
> > > > > > >   kernel knows about -> copy only what the kernel knows about, zero the
> > > > > > >   rest, and return the kernel size.
> > > > > > >
> > > > > > > Extension should then be fairly straightforward (64bit aligned
> > > > > > > increments)?
> > > > > >
> > > > > > You'd think that it's fairly straightforward, but I have a feeling
> > > > > > that the whole fanotify_event_metadata extensibility discussion and
> > > > > > the current limitation to do so revolves around whether it can be
> > > > > > achieved in a way which can guarantee that no userspace applications
> > > > > > would break. I think the answer to this is that there's no guarantee
> > > > > > because of <<reasons>>, so the decision to extend fanotify's feature
> > > > > > set was done via other means i.e. introduction of additional
> > > > > > structures.
> > > > >
> > > > > There's no real problem extending fanotify_event_metadata. We already have
> > > > > multiple extended version of that structure in use (see e.g. FAN_REPORT_FID
> > > > > flag and its effect, extended versions of the structure in
> > > > > include/uapi/linux/fanotify.h). The key for backward compatibility is to
> > > > > create extended struct only when explicitely requested by a flag when
> > > > > creating notification group - and that would be the case here -
> > > > > FAN_REPORT_PIDFD or how you called it. It is just that extending the
> > > > > structure means adding 8 bytes to each event and parsing extended structure
> > > > > is more cumbersome than just fetching s32 from a well known location.
> > > > >
> > > > > On the other hand extended structure is self-describing (i.e., you can tell
> > > > > the meaning of all the fields just from the event you receive) while
> > > > > reusing 'pid' field means that you have to know how the notification group
> > > > > was created (whether FAN_REPORT_PIDFD was used or not) to be able to
> > > > > interpret the contents of the event. Actually I think the self-describing
> > > > > feature of fanotify event stream is useful (e.g. when application manages
> > > > > multiple fanotify groups or when fanotify group descriptors are passed
> > > > > among processes) so now I'm more leaning towards using the extended
> > > > > structure instead of reusing 'pid' as Christian suggests. I'm sorry for the
> > > > > confusion.
> > > > >
> > > >
> > > > But there is a middle path option.
> > > > The event metadata can be self described without extending it:
> > > >
> > > >  struct fanotify_event_metadata {
> > > >         __u32 event_len;
> > > >         __u8 vers;
> > > > -       __u8 reserved;
> > > > +#define FANOTIFY_METADATA_FLAG_PIDFD   1
> > > > +       __u8 flags;
> > > >         __u16 metadata_len;
> > > >         __aligned_u64 mask;
> > > >         __s32 fd;
> > >
> > > Well, yes, but do we want another way to describe what fanotify_event_metadata
> > > actually contains? I don't think parsing extended event information is that
> > > bad to make changes like this worth it...
> > 
> > Fine by me.
> > But in that case, we should report pidfd in addition to pid.
> 
> Agreed.

Sounds good to me.

Christian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-21  8:04           ` Jan Kara
  2021-04-21  9:29             ` Amir Goldstein
@ 2021-04-22 23:06             ` Matthew Bobrowski
  2021-04-23  7:39               ` Amir Goldstein
  1 sibling, 1 reply; 35+ messages in thread
From: Matthew Bobrowski @ 2021-04-22 23:06 UTC (permalink / raw)
  To: Jan Kara; +Cc: Christian Brauner, amir73il, linux-api, linux-fsdevel

On Wed, Apr 21, 2021 at 10:04:49AM +0200, Jan Kara wrote:
> On Tue 20-04-21 12:36:59, Matthew Bobrowski wrote:
> > On Mon, Apr 19, 2021 at 05:02:33PM +0200, Christian Brauner wrote:
> > > A general question about struct fanotify_event_metadata and its
> > > extensibility model:
> > > looking through the code it seems that this struct is read via
> > > fanotify_rad(). So the user is expected to supply a buffer with at least
> > > 
> > > #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata))
> > > 
> > > bytes. In addition you can return the info to the user about how many
> > > bytes the kernel has written from fanotify_read().
> > > 
> > > So afaict extending fanotify_event_metadata should be _fairly_
> > > straightforward, right? It would essentially the complement to
> > > copy_struct_from_user() which Aleksa and I added (1 or 2 years ago)
> > > which deals with user->kernel and you're dealing with kernel->user:
> > > - If the user supplied a buffer smaller than the minimum known struct
> > >   size -> reject.
> > > - If the user supplied a buffer < smaller than what the current kernel
> > >   supports -> copy only what userspace knows about, and return the size
> > >   userspace knows about.
> > > - If the user supplied a buffer that is larger than what the current
> > >   kernel knows about -> copy only what the kernel knows about, zero the
> > >   rest, and return the kernel size.
> > > 
> > > Extension should then be fairly straightforward (64bit aligned
> > > increments)?
> > 
> > You'd think that it's fairly straightforward, but I have a feeling
> > that the whole fanotify_event_metadata extensibility discussion and
> > the current limitation to do so revolves around whether it can be
> > achieved in a way which can guarantee that no userspace applications
> > would break. I think the answer to this is that there's no guarantee
> > because of <<reasons>>, so the decision to extend fanotify's feature
> > set was done via other means i.e. introduction of additional
> > structures.
> 
> There's no real problem extending fanotify_event_metadata. We already have
> multiple extended version of that structure in use (see e.g. FAN_REPORT_FID
> flag and its effect, extended versions of the structure in
> include/uapi/linux/fanotify.h). The key for backward compatibility is to
> create extended struct only when explicitely requested by a flag when
> creating notification group - and that would be the case here -
> FAN_REPORT_PIDFD or how you called it. It is just that extending the
> structure means adding 8 bytes to each event and parsing extended structure
> is more cumbersome than just fetching s32 from a well known location.
> 
> On the other hand extended structure is self-describing (i.e., you can tell
> the meaning of all the fields just from the event you receive) while
> reusing 'pid' field means that you have to know how the notification group
> was created (whether FAN_REPORT_PIDFD was used or not) to be able to
> interpret the contents of the event. Actually I think the self-describing
> feature of fanotify event stream is useful (e.g. when application manages
> multiple fanotify groups or when fanotify group descriptors are passed
> among processes) so now I'm more leaning towards using the extended
> structure instead of reusing 'pid' as Christian suggests. I'm sorry for the
> confusion.

This approach makes sense to me.

Jan/Amir, just to be clear, we've agreed to go ahead with the extended
struct approach whereby specifying the FAN_REPORT_PIDFD flag will
result in an event which includes an additional struct
(i.e. fanotify_event_info_pid) alongside the generic existing
fanotify_event_metadata (also ensuring that pid has been
provided). Events will be provided to userspace applications just like
when specifying FAN_REPORT_FID, correct?

/M

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-22 23:06             ` Matthew Bobrowski
@ 2021-04-23  7:39               ` Amir Goldstein
  2021-04-23  8:02                 ` Matthew Bobrowski
  0 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2021-04-23  7:39 UTC (permalink / raw)
  To: Matthew Bobrowski; +Cc: Jan Kara, Christian Brauner, Linux API, linux-fsdevel

On Fri, Apr 23, 2021 at 2:06 AM Matthew Bobrowski <repnop@google.com> wrote:
>
> On Wed, Apr 21, 2021 at 10:04:49AM +0200, Jan Kara wrote:
> > On Tue 20-04-21 12:36:59, Matthew Bobrowski wrote:
> > > On Mon, Apr 19, 2021 at 05:02:33PM +0200, Christian Brauner wrote:
> > > > A general question about struct fanotify_event_metadata and its
> > > > extensibility model:
> > > > looking through the code it seems that this struct is read via
> > > > fanotify_rad(). So the user is expected to supply a buffer with at least
> > > >
> > > > #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata))
> > > >
> > > > bytes. In addition you can return the info to the user about how many
> > > > bytes the kernel has written from fanotify_read().
> > > >
> > > > So afaict extending fanotify_event_metadata should be _fairly_
> > > > straightforward, right? It would essentially the complement to
> > > > copy_struct_from_user() which Aleksa and I added (1 or 2 years ago)
> > > > which deals with user->kernel and you're dealing with kernel->user:
> > > > - If the user supplied a buffer smaller than the minimum known struct
> > > >   size -> reject.
> > > > - If the user supplied a buffer < smaller than what the current kernel
> > > >   supports -> copy only what userspace knows about, and return the size
> > > >   userspace knows about.
> > > > - If the user supplied a buffer that is larger than what the current
> > > >   kernel knows about -> copy only what the kernel knows about, zero the
> > > >   rest, and return the kernel size.
> > > >
> > > > Extension should then be fairly straightforward (64bit aligned
> > > > increments)?
> > >
> > > You'd think that it's fairly straightforward, but I have a feeling
> > > that the whole fanotify_event_metadata extensibility discussion and
> > > the current limitation to do so revolves around whether it can be
> > > achieved in a way which can guarantee that no userspace applications
> > > would break. I think the answer to this is that there's no guarantee
> > > because of <<reasons>>, so the decision to extend fanotify's feature
> > > set was done via other means i.e. introduction of additional
> > > structures.
> >
> > There's no real problem extending fanotify_event_metadata. We already have
> > multiple extended version of that structure in use (see e.g. FAN_REPORT_FID
> > flag and its effect, extended versions of the structure in
> > include/uapi/linux/fanotify.h). The key for backward compatibility is to
> > create extended struct only when explicitely requested by a flag when
> > creating notification group - and that would be the case here -
> > FAN_REPORT_PIDFD or how you called it. It is just that extending the
> > structure means adding 8 bytes to each event and parsing extended structure
> > is more cumbersome than just fetching s32 from a well known location.
> >
> > On the other hand extended structure is self-describing (i.e., you can tell
> > the meaning of all the fields just from the event you receive) while
> > reusing 'pid' field means that you have to know how the notification group
> > was created (whether FAN_REPORT_PIDFD was used or not) to be able to
> > interpret the contents of the event. Actually I think the self-describing
> > feature of fanotify event stream is useful (e.g. when application manages
> > multiple fanotify groups or when fanotify group descriptors are passed
> > among processes) so now I'm more leaning towards using the extended
> > structure instead of reusing 'pid' as Christian suggests. I'm sorry for the
> > confusion.
>
> This approach makes sense to me.
>
> Jan/Amir, just to be clear, we've agreed to go ahead with the extended
> struct approach whereby specifying the FAN_REPORT_PIDFD flag will
> result in an event which includes an additional struct
> (i.e. fanotify_event_info_pid) alongside the generic existing

struct fanotify_event_info_pidfd?

> fanotify_event_metadata (also ensuring that pid has been
> provided). Events will be provided to userspace applications just like
> when specifying FAN_REPORT_FID, correct?
>

Yes.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-23  7:39               ` Amir Goldstein
@ 2021-04-23  8:02                 ` Matthew Bobrowski
  2021-04-23  8:14                   ` Amir Goldstein
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Bobrowski @ 2021-04-23  8:02 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Christian Brauner, Linux API, linux-fsdevel

On Fri, Apr 23, 2021 at 10:39:46AM +0300, Amir Goldstein wrote:
> On Fri, Apr 23, 2021 at 2:06 AM Matthew Bobrowski <repnop@google.com> wrote:
> >
> > On Wed, Apr 21, 2021 at 10:04:49AM +0200, Jan Kara wrote:
> > > On Tue 20-04-21 12:36:59, Matthew Bobrowski wrote:
> > > > On Mon, Apr 19, 2021 at 05:02:33PM +0200, Christian Brauner wrote:
> > > > > A general question about struct fanotify_event_metadata and its
> > > > > extensibility model:
> > > > > looking through the code it seems that this struct is read via
> > > > > fanotify_rad(). So the user is expected to supply a buffer with at least
> > > > >
> > > > > #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata))
> > > > >
> > > > > bytes. In addition you can return the info to the user about how many
> > > > > bytes the kernel has written from fanotify_read().
> > > > >
> > > > > So afaict extending fanotify_event_metadata should be _fairly_
> > > > > straightforward, right? It would essentially the complement to
> > > > > copy_struct_from_user() which Aleksa and I added (1 or 2 years ago)
> > > > > which deals with user->kernel and you're dealing with kernel->user:
> > > > > - If the user supplied a buffer smaller than the minimum known struct
> > > > >   size -> reject.
> > > > > - If the user supplied a buffer < smaller than what the current kernel
> > > > >   supports -> copy only what userspace knows about, and return the size
> > > > >   userspace knows about.
> > > > > - If the user supplied a buffer that is larger than what the current
> > > > >   kernel knows about -> copy only what the kernel knows about, zero the
> > > > >   rest, and return the kernel size.
> > > > >
> > > > > Extension should then be fairly straightforward (64bit aligned
> > > > > increments)?
> > > >
> > > > You'd think that it's fairly straightforward, but I have a feeling
> > > > that the whole fanotify_event_metadata extensibility discussion and
> > > > the current limitation to do so revolves around whether it can be
> > > > achieved in a way which can guarantee that no userspace applications
> > > > would break. I think the answer to this is that there's no guarantee
> > > > because of <<reasons>>, so the decision to extend fanotify's feature
> > > > set was done via other means i.e. introduction of additional
> > > > structures.
> > >
> > > There's no real problem extending fanotify_event_metadata. We already have
> > > multiple extended version of that structure in use (see e.g. FAN_REPORT_FID
> > > flag and its effect, extended versions of the structure in
> > > include/uapi/linux/fanotify.h). The key for backward compatibility is to
> > > create extended struct only when explicitely requested by a flag when
> > > creating notification group - and that would be the case here -
> > > FAN_REPORT_PIDFD or how you called it. It is just that extending the
> > > structure means adding 8 bytes to each event and parsing extended structure
> > > is more cumbersome than just fetching s32 from a well known location.
> > >
> > > On the other hand extended structure is self-describing (i.e., you can tell
> > > the meaning of all the fields just from the event you receive) while
> > > reusing 'pid' field means that you have to know how the notification group
> > > was created (whether FAN_REPORT_PIDFD was used or not) to be able to
> > > interpret the contents of the event. Actually I think the self-describing
> > > feature of fanotify event stream is useful (e.g. when application manages
> > > multiple fanotify groups or when fanotify group descriptors are passed
> > > among processes) so now I'm more leaning towards using the extended
> > > structure instead of reusing 'pid' as Christian suggests. I'm sorry for the
> > > confusion.
> >
> > This approach makes sense to me.
> >
> > Jan/Amir, just to be clear, we've agreed to go ahead with the extended
> > struct approach whereby specifying the FAN_REPORT_PIDFD flag will
> > result in an event which includes an additional struct
> > (i.e. fanotify_event_info_pid) alongside the generic existing
> 
> struct fanotify_event_info_pidfd?

Well, yeah? I mean, my line of thought was that we'd also need to
include struct fanotify_event_info_header alongside the event to
provide more meta-information about the additional event you'd expect
to receive when FAN_REPORT_PIDFD is provided, so we'd end up with
something like:

struct fanotify_event_info_pidfd {
       struct fanotify_event_info_header hdr;
       __s32 pidfd;
}

Unless this of course is overbaking it and there's no need to do this?

/M

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-23  8:02                 ` Matthew Bobrowski
@ 2021-04-23  8:14                   ` Amir Goldstein
  2021-04-26 10:26                     ` Matthew Bobrowski
  0 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2021-04-23  8:14 UTC (permalink / raw)
  To: Matthew Bobrowski; +Cc: Jan Kara, Christian Brauner, Linux API, linux-fsdevel

On Fri, Apr 23, 2021 at 11:02 AM Matthew Bobrowski <repnop@google.com> wrote:
>
> On Fri, Apr 23, 2021 at 10:39:46AM +0300, Amir Goldstein wrote:
> > On Fri, Apr 23, 2021 at 2:06 AM Matthew Bobrowski <repnop@google.com> wrote:
> > >
> > > On Wed, Apr 21, 2021 at 10:04:49AM +0200, Jan Kara wrote:
> > > > On Tue 20-04-21 12:36:59, Matthew Bobrowski wrote:
> > > > > On Mon, Apr 19, 2021 at 05:02:33PM +0200, Christian Brauner wrote:
> > > > > > A general question about struct fanotify_event_metadata and its
> > > > > > extensibility model:
> > > > > > looking through the code it seems that this struct is read via
> > > > > > fanotify_rad(). So the user is expected to supply a buffer with at least
> > > > > >
> > > > > > #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata))
> > > > > >
> > > > > > bytes. In addition you can return the info to the user about how many
> > > > > > bytes the kernel has written from fanotify_read().
> > > > > >
> > > > > > So afaict extending fanotify_event_metadata should be _fairly_
> > > > > > straightforward, right? It would essentially the complement to
> > > > > > copy_struct_from_user() which Aleksa and I added (1 or 2 years ago)
> > > > > > which deals with user->kernel and you're dealing with kernel->user:
> > > > > > - If the user supplied a buffer smaller than the minimum known struct
> > > > > >   size -> reject.
> > > > > > - If the user supplied a buffer < smaller than what the current kernel
> > > > > >   supports -> copy only what userspace knows about, and return the size
> > > > > >   userspace knows about.
> > > > > > - If the user supplied a buffer that is larger than what the current
> > > > > >   kernel knows about -> copy only what the kernel knows about, zero the
> > > > > >   rest, and return the kernel size.
> > > > > >
> > > > > > Extension should then be fairly straightforward (64bit aligned
> > > > > > increments)?
> > > > >
> > > > > You'd think that it's fairly straightforward, but I have a feeling
> > > > > that the whole fanotify_event_metadata extensibility discussion and
> > > > > the current limitation to do so revolves around whether it can be
> > > > > achieved in a way which can guarantee that no userspace applications
> > > > > would break. I think the answer to this is that there's no guarantee
> > > > > because of <<reasons>>, so the decision to extend fanotify's feature
> > > > > set was done via other means i.e. introduction of additional
> > > > > structures.
> > > >
> > > > There's no real problem extending fanotify_event_metadata. We already have
> > > > multiple extended version of that structure in use (see e.g. FAN_REPORT_FID
> > > > flag and its effect, extended versions of the structure in
> > > > include/uapi/linux/fanotify.h). The key for backward compatibility is to
> > > > create extended struct only when explicitely requested by a flag when
> > > > creating notification group - and that would be the case here -
> > > > FAN_REPORT_PIDFD or how you called it. It is just that extending the
> > > > structure means adding 8 bytes to each event and parsing extended structure
> > > > is more cumbersome than just fetching s32 from a well known location.
> > > >
> > > > On the other hand extended structure is self-describing (i.e., you can tell
> > > > the meaning of all the fields just from the event you receive) while
> > > > reusing 'pid' field means that you have to know how the notification group
> > > > was created (whether FAN_REPORT_PIDFD was used or not) to be able to
> > > > interpret the contents of the event. Actually I think the self-describing
> > > > feature of fanotify event stream is useful (e.g. when application manages
> > > > multiple fanotify groups or when fanotify group descriptors are passed
> > > > among processes) so now I'm more leaning towards using the extended
> > > > structure instead of reusing 'pid' as Christian suggests. I'm sorry for the
> > > > confusion.
> > >
> > > This approach makes sense to me.
> > >
> > > Jan/Amir, just to be clear, we've agreed to go ahead with the extended
> > > struct approach whereby specifying the FAN_REPORT_PIDFD flag will
> > > result in an event which includes an additional struct
> > > (i.e. fanotify_event_info_pid) alongside the generic existing
> >
> > struct fanotify_event_info_pidfd?
>
> Well, yeah? I mean, my line of thought was that we'd also need to
> include struct fanotify_event_info_header alongside the event to
> provide more meta-information about the additional event you'd expect
> to receive when FAN_REPORT_PIDFD is provided, so we'd end up with
> something like:
>
> struct fanotify_event_info_pidfd {
>        struct fanotify_event_info_header hdr;
>        __s32 pidfd;
> }
>
> Unless this of course is overbaking it and there's no need to do this?
>

We need this. I was just pointing out that you wrote fanotify_event_info_pid
must have been a typo.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-23  8:14                   ` Amir Goldstein
@ 2021-04-26 10:26                     ` Matthew Bobrowski
  2021-04-26 11:11                       ` Amir Goldstein
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Bobrowski @ 2021-04-26 10:26 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Christian Brauner, Linux API, linux-fsdevel

On Fri, Apr 23, 2021 at 11:14:34AM +0300, Amir Goldstein wrote:
> On Fri, Apr 23, 2021 at 11:02 AM Matthew Bobrowski <repnop@google.com> wrote:
> >
> > On Fri, Apr 23, 2021 at 10:39:46AM +0300, Amir Goldstein wrote:
> > > On Fri, Apr 23, 2021 at 2:06 AM Matthew Bobrowski <repnop@google.com> wrote:
> > > >
> > > > On Wed, Apr 21, 2021 at 10:04:49AM +0200, Jan Kara wrote:
> > > > > On Tue 20-04-21 12:36:59, Matthew Bobrowski wrote:
> > > > > > On Mon, Apr 19, 2021 at 05:02:33PM +0200, Christian Brauner wrote:
> > > > > > > A general question about struct fanotify_event_metadata and its
> > > > > > > extensibility model:
> > > > > > > looking through the code it seems that this struct is read via
> > > > > > > fanotify_rad(). So the user is expected to supply a buffer with at least
> > > > > > >
> > > > > > > #define FAN_EVENT_METADATA_LEN (sizeof(struct fanotify_event_metadata))
> > > > > > >
> > > > > > > bytes. In addition you can return the info to the user about how many
> > > > > > > bytes the kernel has written from fanotify_read().
> > > > > > >
> > > > > > > So afaict extending fanotify_event_metadata should be _fairly_
> > > > > > > straightforward, right? It would essentially the complement to
> > > > > > > copy_struct_from_user() which Aleksa and I added (1 or 2 years ago)
> > > > > > > which deals with user->kernel and you're dealing with kernel->user:
> > > > > > > - If the user supplied a buffer smaller than the minimum known struct
> > > > > > >   size -> reject.
> > > > > > > - If the user supplied a buffer < smaller than what the current kernel
> > > > > > >   supports -> copy only what userspace knows about, and return the size
> > > > > > >   userspace knows about.
> > > > > > > - If the user supplied a buffer that is larger than what the current
> > > > > > >   kernel knows about -> copy only what the kernel knows about, zero the
> > > > > > >   rest, and return the kernel size.
> > > > > > >
> > > > > > > Extension should then be fairly straightforward (64bit aligned
> > > > > > > increments)?
> > > > > >
> > > > > > You'd think that it's fairly straightforward, but I have a feeling
> > > > > > that the whole fanotify_event_metadata extensibility discussion and
> > > > > > the current limitation to do so revolves around whether it can be
> > > > > > achieved in a way which can guarantee that no userspace applications
> > > > > > would break. I think the answer to this is that there's no guarantee
> > > > > > because of <<reasons>>, so the decision to extend fanotify's feature
> > > > > > set was done via other means i.e. introduction of additional
> > > > > > structures.
> > > > >
> > > > > There's no real problem extending fanotify_event_metadata. We already have
> > > > > multiple extended version of that structure in use (see e.g. FAN_REPORT_FID
> > > > > flag and its effect, extended versions of the structure in
> > > > > include/uapi/linux/fanotify.h). The key for backward compatibility is to
> > > > > create extended struct only when explicitely requested by a flag when
> > > > > creating notification group - and that would be the case here -
> > > > > FAN_REPORT_PIDFD or how you called it. It is just that extending the
> > > > > structure means adding 8 bytes to each event and parsing extended structure
> > > > > is more cumbersome than just fetching s32 from a well known location.
> > > > >
> > > > > On the other hand extended structure is self-describing (i.e., you can tell
> > > > > the meaning of all the fields just from the event you receive) while
> > > > > reusing 'pid' field means that you have to know how the notification group
> > > > > was created (whether FAN_REPORT_PIDFD was used or not) to be able to
> > > > > interpret the contents of the event. Actually I think the self-describing
> > > > > feature of fanotify event stream is useful (e.g. when application manages
> > > > > multiple fanotify groups or when fanotify group descriptors are passed
> > > > > among processes) so now I'm more leaning towards using the extended
> > > > > structure instead of reusing 'pid' as Christian suggests. I'm sorry for the
> > > > > confusion.
> > > >
> > > > This approach makes sense to me.
> > > >
> > > > Jan/Amir, just to be clear, we've agreed to go ahead with the extended
> > > > struct approach whereby specifying the FAN_REPORT_PIDFD flag will
> > > > result in an event which includes an additional struct
> > > > (i.e. fanotify_event_info_pid) alongside the generic existing
> > >
> > > struct fanotify_event_info_pidfd?
> >
> > Well, yeah? I mean, my line of thought was that we'd also need to
> > include struct fanotify_event_info_header alongside the event to
> > provide more meta-information about the additional event you'd expect
> > to receive when FAN_REPORT_PIDFD is provided, so we'd end up with
> > something like:
> >
> > struct fanotify_event_info_pidfd {
> >        struct fanotify_event_info_header hdr;
> >        __s32 pidfd;
> > }
> >
> > Unless this of course is overbaking it and there's no need to do this?
> >
> 
> We need this. I was just pointing out that you wrote fanotify_event_info_pid
> must have been a typo.

Oh, right, that sure was a typo! :)

Amir, I was just thinking about this a little over the weekend and I
don't think we discussed how to handle the FAN_REPORT_PIDFD |
FAN_REPORT_FID and friends case? My immediate thought is to make
FAN_REPORT_PIDFD mutually exclusive with FAN_REPORT_FID and friends,
but then again receiving a pidfd along with FID events may be also
useful for some? What are your thoughts on this? If we don't go ahead
with mutual exclusion, then this multiple event types alongside struct
fanotify_event_metadata starts getting a little clunky, don't you
think?

/M

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-26 10:26                     ` Matthew Bobrowski
@ 2021-04-26 11:11                       ` Amir Goldstein
  2021-04-27  3:35                         ` Matthew Bobrowski
  0 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2021-04-26 11:11 UTC (permalink / raw)
  To: Matthew Bobrowski; +Cc: Jan Kara, Christian Brauner, Linux API, linux-fsdevel

> Amir, I was just thinking about this a little over the weekend and I
> don't think we discussed how to handle the FAN_REPORT_PIDFD |
> FAN_REPORT_FID and friends case? My immediate thought is to make
> FAN_REPORT_PIDFD mutually exclusive with FAN_REPORT_FID and friends,
> but then again receiving a pidfd along with FID events may be also
> useful for some? What are your thoughts on this? If we don't go ahead
> with mutual exclusion, then this multiple event types alongside struct
> fanotify_event_metadata starts getting a little clunky, don't you
> think?
>

The current format of an fanotify event already supports multiple info records:

[fanotify_event_metadata]
[[fanotify_event_info_header][event record #1]]
[[fanotify_event_info_header][event record #2]]...

(meta)->event_len is the total event length including all info records.

For example, FAN_REPORT_FID | FAN_REPORT_DFID_MAME produces
(for some events) two info records, one FAN_EVENT_INFO_TYPE_FID
record and one FAN_EVENT_INFO_TYPE_DFID_NAME record.

So I see no problem with combination of FAN_REPORT_FID
and FAN_REPORT_PIDFD.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-26 11:11                       ` Amir Goldstein
@ 2021-04-27  3:35                         ` Matthew Bobrowski
  2021-04-27  5:14                           ` Amir Goldstein
  0 siblings, 1 reply; 35+ messages in thread
From: Matthew Bobrowski @ 2021-04-27  3:35 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Christian Brauner, Linux API, linux-fsdevel

On Mon, Apr 26, 2021 at 02:11:30PM +0300, Amir Goldstein wrote:
> > Amir, I was just thinking about this a little over the weekend and I
> > don't think we discussed how to handle the FAN_REPORT_PIDFD |
> > FAN_REPORT_FID and friends case? My immediate thought is to make
> > FAN_REPORT_PIDFD mutually exclusive with FAN_REPORT_FID and friends,
> > but then again receiving a pidfd along with FID events may be also
> > useful for some? What are your thoughts on this? If we don't go ahead
> > with mutual exclusion, then this multiple event types alongside struct
> > fanotify_event_metadata starts getting a little clunky, don't you
> > think?
> >
> 
> The current format of an fanotify event already supports multiple info records:
> 
> [fanotify_event_metadata]
> [[fanotify_event_info_header][event record #1]]
> [[fanotify_event_info_header][event record #2]]...
> 
> (meta)->event_len is the total event length including all info records.
> 
> For example, FAN_REPORT_FID | FAN_REPORT_DFID_MAME produces
> (for some events) two info records, one FAN_EVENT_INFO_TYPE_FID
> record and one FAN_EVENT_INFO_TYPE_DFID_NAME record.

Ah, that's right! I now remember reviewing some patches associated
with the FID change series which mentioned the possibility of
receiving multiple FID info records. As the implementation currently
stands, AFAIK there's not possibility for fanotify to ever return more
than two info records, right?

> So I see no problem with combination of FAN_REPORT_FID
> and FAN_REPORT_PIDFD.

OK.

Is there any preference in terms of whether the new FAN_REPORT_PIDFD
info records precede or come after FAN_REPORT_FID/FAN_REPORT_DFID_NAME
info records in FAN_REPORT_FID or FAN_REPORT_FID |
FAN_REPORT_DFID_NAME configurations?

/M

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-27  3:35                         ` Matthew Bobrowski
@ 2021-04-27  5:14                           ` Amir Goldstein
  2021-04-28 22:53                             ` Matthew Bobrowski
  0 siblings, 1 reply; 35+ messages in thread
From: Amir Goldstein @ 2021-04-27  5:14 UTC (permalink / raw)
  To: Matthew Bobrowski; +Cc: Jan Kara, Christian Brauner, Linux API, linux-fsdevel

On Tue, Apr 27, 2021 at 6:35 AM Matthew Bobrowski <repnop@google.com> wrote:
>
> On Mon, Apr 26, 2021 at 02:11:30PM +0300, Amir Goldstein wrote:
> > > Amir, I was just thinking about this a little over the weekend and I
> > > don't think we discussed how to handle the FAN_REPORT_PIDFD |
> > > FAN_REPORT_FID and friends case? My immediate thought is to make
> > > FAN_REPORT_PIDFD mutually exclusive with FAN_REPORT_FID and friends,
> > > but then again receiving a pidfd along with FID events may be also
> > > useful for some? What are your thoughts on this? If we don't go ahead
> > > with mutual exclusion, then this multiple event types alongside struct
> > > fanotify_event_metadata starts getting a little clunky, don't you
> > > think?
> > >
> >
> > The current format of an fanotify event already supports multiple info records:
> >
> > [fanotify_event_metadata]
> > [[fanotify_event_info_header][event record #1]]
> > [[fanotify_event_info_header][event record #2]]...
> >
> > (meta)->event_len is the total event length including all info records.
> >
> > For example, FAN_REPORT_FID | FAN_REPORT_DFID_MAME produces
> > (for some events) two info records, one FAN_EVENT_INFO_TYPE_FID
> > record and one FAN_EVENT_INFO_TYPE_DFID_NAME record.
>
> Ah, that's right! I now remember reviewing some patches associated
> with the FID change series which mentioned the possibility of
> receiving multiple FID info records. As the implementation currently
> stands, AFAIK there's not possibility for fanotify to ever return more
> than two info records, right?
>

Right.
Record types FAN_EVENT_INFO_TYPE_DFID_NAME and
FAN_EVENT_INFO_TYPE_DFID are mutually exclusive.

> > So I see no problem with combination of FAN_REPORT_FID
> > and FAN_REPORT_PIDFD.
>
> OK.
>
> Is there any preference in terms of whether the new FAN_REPORT_PIDFD
> info records precede or come after FAN_REPORT_FID/FAN_REPORT_DFID_NAME
> info records in FAN_REPORT_FID or FAN_REPORT_FID |
> FAN_REPORT_DFID_NAME configurations?
>

Doesn't matter.
Your typical application would first filter by pid/pidfd and only if process
matches the filters would it care to examine the event fid info, correct?
So you go first :)

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/2] fanotify: Add pidfd support to the fanotify API
  2021-04-27  5:14                           ` Amir Goldstein
@ 2021-04-28 22:53                             ` Matthew Bobrowski
  0 siblings, 0 replies; 35+ messages in thread
From: Matthew Bobrowski @ 2021-04-28 22:53 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Christian Brauner, Linux API, linux-fsdevel

On Tue, Apr 27, 2021 at 08:14:05AM +0300, Amir Goldstein wrote:
> On Tue, Apr 27, 2021 at 6:35 AM Matthew Bobrowski <repnop@google.com> wrote:
> >
> > On Mon, Apr 26, 2021 at 02:11:30PM +0300, Amir Goldstein wrote:
> > > > Amir, I was just thinking about this a little over the weekend and I
> > > > don't think we discussed how to handle the FAN_REPORT_PIDFD |
> > > > FAN_REPORT_FID and friends case? My immediate thought is to make
> > > > FAN_REPORT_PIDFD mutually exclusive with FAN_REPORT_FID and friends,
> > > > but then again receiving a pidfd along with FID events may be also
> > > > useful for some? What are your thoughts on this? If we don't go ahead
> > > > with mutual exclusion, then this multiple event types alongside struct
> > > > fanotify_event_metadata starts getting a little clunky, don't you
> > > > think?
> > > >
> > >
> > > The current format of an fanotify event already supports multiple info records:
> > >
> > > [fanotify_event_metadata]
> > > [[fanotify_event_info_header][event record #1]]
> > > [[fanotify_event_info_header][event record #2]]...
> > >
> > > (meta)->event_len is the total event length including all info records.
> > >
> > > For example, FAN_REPORT_FID | FAN_REPORT_DFID_MAME produces
> > > (for some events) two info records, one FAN_EVENT_INFO_TYPE_FID
> > > record and one FAN_EVENT_INFO_TYPE_DFID_NAME record.
> >
> > Ah, that's right! I now remember reviewing some patches associated
> > with the FID change series which mentioned the possibility of
> > receiving multiple FID info records. As the implementation currently
> > stands, AFAIK there's not possibility for fanotify to ever return more
> > than two info records, right?
> >
> > Is there any preference in terms of whether the new FAN_REPORT_PIDFD
> > info records precede or come after FAN_REPORT_FID/FAN_REPORT_DFID_NAME
> > info records in FAN_REPORT_FID or FAN_REPORT_FID |
> > FAN_REPORT_DFID_NAME configurations?
>
> Doesn't matter.

OK, fair.

> Your typical application would first filter by pid/pidfd and only if process
> matches the filters would it care to examine the event fid info, correct?

Not necessarily, but you're right, the ordering doesn't really matter
too much.

/M

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, back to index

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-15 23:21 [PATCH 0/2] fanotify: Adding pidfd support to the fanotify API Matthew Bobrowski
2021-04-15 23:22 ` [PATCH 1/2] pidfd_create(): remove static qualifier and declare pidfd_create() in linux/pid.h Matthew Bobrowski
2021-04-19 10:13   ` Jan Kara
2021-04-19 12:50   ` Christian Brauner
2021-04-20  0:17     ` Matthew Bobrowski
2021-04-15 23:22 ` [PATCH 2/2] fanotify: Add pidfd support to the fanotify API Matthew Bobrowski
2021-04-16  6:27   ` Amir Goldstein
2021-04-16  7:05     ` Matthew Bobrowski
2021-04-16  7:53       ` Amir Goldstein
2021-04-16  8:08         ` Matthew Bobrowski
2021-04-19 13:02     ` Christian Brauner
2021-04-19 10:21   ` Jan Kara
2021-04-20  1:35     ` Matthew Bobrowski
2021-04-19 13:20   ` Christian Brauner
2021-04-19 13:53     ` Amir Goldstein
2021-04-19 14:44       ` Christian Brauner
2021-04-19 13:55     ` Jan Kara
2021-04-19 15:02       ` Christian Brauner
2021-04-20  2:36         ` Matthew Bobrowski
2021-04-21  8:04           ` Jan Kara
2021-04-21  9:29             ` Amir Goldstein
2021-04-21 10:00               ` Jan Kara
2021-04-21 10:12                 ` Amir Goldstein
2021-04-21 13:48                   ` Jan Kara
2021-04-21 14:46                     ` Christian Brauner
2021-04-22 23:06             ` Matthew Bobrowski
2021-04-23  7:39               ` Amir Goldstein
2021-04-23  8:02                 ` Matthew Bobrowski
2021-04-23  8:14                   ` Amir Goldstein
2021-04-26 10:26                     ` Matthew Bobrowski
2021-04-26 11:11                       ` Amir Goldstein
2021-04-27  3:35                         ` Matthew Bobrowski
2021-04-27  5:14                           ` Amir Goldstein
2021-04-28 22:53                             ` Matthew Bobrowski
2021-04-19 12:34 ` [PATCH 0/2] fanotify: Adding " Christian Brauner

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org
	public-inbox-index linux-fsdevel

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git