* [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify @ 2023-04-14 18:29 Amir Goldstein 2023-04-14 18:29 ` [RFC][PATCH 1/2] fanotify: add support for FAN_UNMOUNT event Amir Goldstein ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Amir Goldstein @ 2023-04-14 18:29 UTC (permalink / raw) To: Jan Kara; +Cc: Matthew Bobrowski, Christian Brauner, linux-fsdevel, linux-api Jan, Followup on my quest to close the gap with inotify functionality, here is a proposal for FAN_UNMOUNT event. I have had many design questions about this: 1) Should we also report FAN_UNMOUNT for marked inodes and sb on sb shutdown (same as IN_UNMOUNT)? 2) Should we also report FAN_UNMOUNT on sb mark for any unmounts of that sb? 3) Should we report also the fid of the mount root? and if we do... 4) Should we report/consider FAN_ONDIR filter? All of the questions above I answered "not unless somebody requests" in this first RFC. Specifically, I did get a request for an unmount event for containers use case. I have also had doubts regarding the info records. I decided that reporting fsid and mntid is minimum, but couldn't decide if they were better of in a single MNTID record or seprate records. I went with separate records, because: a) FAN_FS_ERROR has set a precendent of separate fid record with fsid and empty fid, so I followed this precendent b) MNTID record we may want to add later with FAN_REPORT_MNTID to all the path events, so better that it is independent There is test for the proposed API extention [1]. Thoughts? Thanks, Amir. [1] https://github.com/amir73il/ltp/commits/fan_unmount Amir Goldstein (2): fanotify: add support for FAN_UNMOUNT event fanotify: report mntid info record with FAN_UNMOUNT events fs/notify/fanotify/fanotify.c | 45 +++++++++++++++++++------- fs/notify/fanotify/fanotify.h | 26 +++++++++++++-- fs/notify/fanotify/fanotify_user.c | 52 ++++++++++++++++++++++++++++-- include/linux/fanotify.h | 3 +- include/linux/fsnotify.h | 16 +++++++++ include/uapi/linux/fanotify.h | 11 +++++++ 6 files changed, 135 insertions(+), 18 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC][PATCH 1/2] fanotify: add support for FAN_UNMOUNT event 2023-04-14 18:29 [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify Amir Goldstein @ 2023-04-14 18:29 ` Amir Goldstein 2023-04-18 13:43 ` Christian Brauner 2023-04-19 13:14 ` Jan Kara 2023-04-14 18:29 ` [RFC][PATCH 2/2] fanotify: report mntid info record with FAN_UNMOUNT events Amir Goldstein 2023-04-18 13:33 ` [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify Christian Brauner 2 siblings, 2 replies; 16+ messages in thread From: Amir Goldstein @ 2023-04-14 18:29 UTC (permalink / raw) To: Jan Kara; +Cc: Matthew Bobrowski, Christian Brauner, linux-fsdevel, linux-api inotify generates unsolicited IN_UNMOUNT events for every inode mark before the filesystem containing the inode is shutdown. Unlike IN_UNMOUNT, FAN_UNMOUNT is an opt-in event that can only be set on a mount mark and is generated when the mount is unmounted. FAN_UNMOUNT requires FAN_REPORT_FID and reports an fid info record with fsid of the filesystem and an empty file handle. Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- fs/notify/fanotify/fanotify.c | 10 +++++++--- fs/notify/fanotify/fanotify.h | 6 ++++-- fs/notify/fanotify/fanotify_user.c | 10 ++++++++++ include/linux/fanotify.h | 3 ++- include/linux/fsnotify.h | 16 ++++++++++++++++ include/uapi/linux/fanotify.h | 1 + 6 files changed, 40 insertions(+), 6 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 90d9210dc0d2..384d2b2e55e7 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -713,7 +713,7 @@ static struct fanotify_event *fanotify_alloc_error_event( inode = report->inode; fh_len = fanotify_encode_fh_len(inode); - /* Bad fh_len. Fallback to using an invalid fh. Should never happen. */ + /* Record empty fh for errors not associated with specific inode */ if (!fh_len && inode) inode = NULL; @@ -745,7 +745,10 @@ static struct fanotify_event *fanotify_alloc_event( bool ondir = mask & FAN_ONDIR; struct pid *pid; - if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) { + if (mask & FAN_UNMOUNT && !WARN_ON_ONCE(!path || !fid_mode)) { + /* Record fid event with fsid and empty fh */ + id = NULL; + } else if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) { /* * For certain events and group flags, report the child fid * in addition to reporting the parent fid and maybe child name. @@ -951,10 +954,11 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR); BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC); BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM); + BUILD_BUG_ON(FAN_UNMOUNT != FS_UNMOUNT); BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR); BUILD_BUG_ON(FAN_RENAME != FS_RENAME); - BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 21); + BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 22); mask = fanotify_group_event_mask(group, iter_info, &match_mask, mask, data, data_type, dir); diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index 7f0bf00a90f0..f98dcf5b7a19 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -382,10 +382,12 @@ static inline int fanotify_event_dir2_fh_len(struct fanotify_event *event) return info ? fanotify_info_dir2_fh_len(info) : 0; } +/* For error and unmount events, fsid with empty fh are reported. */ +#define FANOTIFY_EMPTY_FH_EVENTS (FAN_FS_ERROR | FAN_UNMOUNT) + static inline bool fanotify_event_has_object_fh(struct fanotify_event *event) { - /* For error events, even zeroed fh are reported. */ - if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR) + if (event->mask & FANOTIFY_EMPTY_FH_EVENTS) return true; return fanotify_event_object_fh_len(event) > 0; } diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 554b335b1733..0b3de6218c56 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -1766,6 +1766,16 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, (!fid_mode || mark_type == FAN_MARK_MOUNT)) goto fput_and_out; + /* + * inotify sends unsoliciled IN_UNMOUNT per marked inode on sb shutdown. + * FAN_UNMOUNT event is about unmount of a mount, not about sb shutdown, + * so allow setting it only in mount mark mask. + * FAN_UNMOUNT requires FAN_REPORT_FID to report fsid with empty fh. + */ + if (mask & FAN_UNMOUNT && + (!(fid_mode & FAN_REPORT_FID) || mark_type != FAN_MARK_MOUNT)) + goto fput_and_out; + /* * FAN_RENAME uses special info type records to report the old and * new parent+name. Reporting only old and new parent id is less diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 4c6f40a701c2..a64c26d9626f 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -80,7 +80,8 @@ * FSNOTIFY_EVENT_INODE. */ #define FANOTIFY_PATH_EVENTS (FAN_ACCESS | FAN_MODIFY | \ - FAN_CLOSE | FAN_OPEN | FAN_OPEN_EXEC) + FAN_CLOSE | FAN_OPEN | FAN_OPEN_EXEC | \ + FAN_UNMOUNT) /* * Directory entry modification events - reported only to directory diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h index bb8467cd11ae..3898bf858407 100644 --- a/include/linux/fsnotify.h +++ b/include/linux/fsnotify.h @@ -176,11 +176,27 @@ static inline void fsnotify_inode_delete(struct inode *inode) __fsnotify_inode_delete(inode); } +/* + * fsnotify_unmount - mount was unmounted. + */ +static inline int fsnotify_unmount(struct vfsmount *mnt) +{ + struct path path = { .mnt = mnt, .dentry = mnt->mnt_root }; + + if (atomic_long_read(&mnt->mnt_sb->s_fsnotify_connectors) == 0) + return 0; + + return fsnotify(FS_UNMOUNT, &path, FSNOTIFY_EVENT_PATH, NULL, NULL, + d_inode(path.dentry), 0); +} + /* * fsnotify_vfsmount_delete - a vfsmount is being destroyed, clean up is needed */ static inline void fsnotify_vfsmount_delete(struct vfsmount *mnt) { + /* Send FS_UNMOUNT to groups and then clear mount marks */ + fsnotify_unmount(mnt); __fsnotify_vfsmount_delete(mnt); } diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index 014e9682bd76..70f2d43e8ba4 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -19,6 +19,7 @@ #define FAN_MOVE_SELF 0x00000800 /* Self was moved */ #define FAN_OPEN_EXEC 0x00001000 /* File was opened for exec */ +#define FAN_UNMOUNT 0x00002000 /* Filesystem unmounted */ #define FAN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */ #define FAN_FS_ERROR 0x00008000 /* Filesystem error */ -- 2.34.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH 1/2] fanotify: add support for FAN_UNMOUNT event 2023-04-14 18:29 ` [RFC][PATCH 1/2] fanotify: add support for FAN_UNMOUNT event Amir Goldstein @ 2023-04-18 13:43 ` Christian Brauner 2023-04-19 13:14 ` Jan Kara 1 sibling, 0 replies; 16+ messages in thread From: Christian Brauner @ 2023-04-18 13:43 UTC (permalink / raw) To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, linux-api On Fri, Apr 14, 2023 at 09:29:02PM +0300, Amir Goldstein wrote: > inotify generates unsolicited IN_UNMOUNT events for every inode > mark before the filesystem containing the inode is shutdown. > > Unlike IN_UNMOUNT, FAN_UNMOUNT is an opt-in event that can only be > set on a mount mark and is generated when the mount is unmounted. > > FAN_UNMOUNT requires FAN_REPORT_FID and reports an fid info record > with fsid of the filesystem and an empty file handle. > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- I think this is a useful addition to the fanotify api and more useful than IN_UMOUNT. Going back to your cover letter. Afaiu, IN_UMOUNT is generated when the superblock in which a watched object resides goes away. That's roughly what I had in mind with FAN_SB_SHUTDOWN (ignoring naming discusions for now). ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH 1/2] fanotify: add support for FAN_UNMOUNT event 2023-04-14 18:29 ` [RFC][PATCH 1/2] fanotify: add support for FAN_UNMOUNT event Amir Goldstein 2023-04-18 13:43 ` Christian Brauner @ 2023-04-19 13:14 ` Jan Kara 2023-04-19 18:32 ` Amir Goldstein 1 sibling, 1 reply; 16+ messages in thread From: Jan Kara @ 2023-04-19 13:14 UTC (permalink / raw) To: Amir Goldstein Cc: Jan Kara, Matthew Bobrowski, Christian Brauner, linux-fsdevel, linux-api On Fri 14-04-23 21:29:02, Amir Goldstein wrote: > inotify generates unsolicited IN_UNMOUNT events for every inode > mark before the filesystem containing the inode is shutdown. > > Unlike IN_UNMOUNT, FAN_UNMOUNT is an opt-in event that can only be > set on a mount mark and is generated when the mount is unmounted. > > FAN_UNMOUNT requires FAN_REPORT_FID and reports an fid info record > with fsid of the filesystem and an empty file handle. > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> Seeing the discussion further in this thread regarding FAN_IGNORED won't it be more consistent (extensible) to implement the above functionality as FAN_IGNORED delivered to mount mark when it is getting destroyed? I.e., define FAN_IGNORED as an event that gets delivered when a mark is getting destroyed (with the records identifying the mark). For now start supporting it on mount marks, later we can add support to other mark types if there's demand. Thoughts? Honza > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c > index 90d9210dc0d2..384d2b2e55e7 100644 > --- a/fs/notify/fanotify/fanotify.c > +++ b/fs/notify/fanotify/fanotify.c > @@ -713,7 +713,7 @@ static struct fanotify_event *fanotify_alloc_error_event( > inode = report->inode; > fh_len = fanotify_encode_fh_len(inode); > > - /* Bad fh_len. Fallback to using an invalid fh. Should never happen. */ > + /* Record empty fh for errors not associated with specific inode */ > if (!fh_len && inode) > inode = NULL; > > @@ -745,7 +745,10 @@ static struct fanotify_event *fanotify_alloc_event( > bool ondir = mask & FAN_ONDIR; > struct pid *pid; > > - if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) { > + if (mask & FAN_UNMOUNT && !WARN_ON_ONCE(!path || !fid_mode)) { > + /* Record fid event with fsid and empty fh */ > + id = NULL; > + } else if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) { > /* > * For certain events and group flags, report the child fid > * in addition to reporting the parent fid and maybe child name. > @@ -951,10 +954,11 @@ static int fanotify_handle_event(struct fsnotify_group *group, u32 mask, > BUILD_BUG_ON(FAN_ONDIR != FS_ISDIR); > BUILD_BUG_ON(FAN_OPEN_EXEC != FS_OPEN_EXEC); > BUILD_BUG_ON(FAN_OPEN_EXEC_PERM != FS_OPEN_EXEC_PERM); > + BUILD_BUG_ON(FAN_UNMOUNT != FS_UNMOUNT); > BUILD_BUG_ON(FAN_FS_ERROR != FS_ERROR); > BUILD_BUG_ON(FAN_RENAME != FS_RENAME); > > - BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 21); > + BUILD_BUG_ON(HWEIGHT32(ALL_FANOTIFY_EVENT_BITS) != 22); > > mask = fanotify_group_event_mask(group, iter_info, &match_mask, > mask, data, data_type, dir); > diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h > index 7f0bf00a90f0..f98dcf5b7a19 100644 > --- a/fs/notify/fanotify/fanotify.h > +++ b/fs/notify/fanotify/fanotify.h > @@ -382,10 +382,12 @@ static inline int fanotify_event_dir2_fh_len(struct fanotify_event *event) > return info ? fanotify_info_dir2_fh_len(info) : 0; > } > > +/* For error and unmount events, fsid with empty fh are reported. */ > +#define FANOTIFY_EMPTY_FH_EVENTS (FAN_FS_ERROR | FAN_UNMOUNT) > + > static inline bool fanotify_event_has_object_fh(struct fanotify_event *event) > { > - /* For error events, even zeroed fh are reported. */ > - if (event->type == FANOTIFY_EVENT_TYPE_FS_ERROR) > + if (event->mask & FANOTIFY_EMPTY_FH_EVENTS) > return true; > return fanotify_event_object_fh_len(event) > 0; > } > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c > index 554b335b1733..0b3de6218c56 100644 > --- a/fs/notify/fanotify/fanotify_user.c > +++ b/fs/notify/fanotify/fanotify_user.c > @@ -1766,6 +1766,16 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, > (!fid_mode || mark_type == FAN_MARK_MOUNT)) > goto fput_and_out; > > + /* > + * inotify sends unsoliciled IN_UNMOUNT per marked inode on sb shutdown. > + * FAN_UNMOUNT event is about unmount of a mount, not about sb shutdown, > + * so allow setting it only in mount mark mask. > + * FAN_UNMOUNT requires FAN_REPORT_FID to report fsid with empty fh. > + */ > + if (mask & FAN_UNMOUNT && > + (!(fid_mode & FAN_REPORT_FID) || mark_type != FAN_MARK_MOUNT)) > + goto fput_and_out; > + > /* > * FAN_RENAME uses special info type records to report the old and > * new parent+name. Reporting only old and new parent id is less > diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h > index 4c6f40a701c2..a64c26d9626f 100644 > --- a/include/linux/fanotify.h > +++ b/include/linux/fanotify.h > @@ -80,7 +80,8 @@ > * FSNOTIFY_EVENT_INODE. > */ > #define FANOTIFY_PATH_EVENTS (FAN_ACCESS | FAN_MODIFY | \ > - FAN_CLOSE | FAN_OPEN | FAN_OPEN_EXEC) > + FAN_CLOSE | FAN_OPEN | FAN_OPEN_EXEC | \ > + FAN_UNMOUNT) > > /* > * Directory entry modification events - reported only to directory > diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h > index bb8467cd11ae..3898bf858407 100644 > --- a/include/linux/fsnotify.h > +++ b/include/linux/fsnotify.h > @@ -176,11 +176,27 @@ static inline void fsnotify_inode_delete(struct inode *inode) > __fsnotify_inode_delete(inode); > } > > +/* > + * fsnotify_unmount - mount was unmounted. > + */ > +static inline int fsnotify_unmount(struct vfsmount *mnt) > +{ > + struct path path = { .mnt = mnt, .dentry = mnt->mnt_root }; > + > + if (atomic_long_read(&mnt->mnt_sb->s_fsnotify_connectors) == 0) > + return 0; > + > + return fsnotify(FS_UNMOUNT, &path, FSNOTIFY_EVENT_PATH, NULL, NULL, > + d_inode(path.dentry), 0); > +} > + > /* > * fsnotify_vfsmount_delete - a vfsmount is being destroyed, clean up is needed > */ > static inline void fsnotify_vfsmount_delete(struct vfsmount *mnt) > { > + /* Send FS_UNMOUNT to groups and then clear mount marks */ > + fsnotify_unmount(mnt); > __fsnotify_vfsmount_delete(mnt); > } > > diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h > index 014e9682bd76..70f2d43e8ba4 100644 > --- a/include/uapi/linux/fanotify.h > +++ b/include/uapi/linux/fanotify.h > @@ -19,6 +19,7 @@ > #define FAN_MOVE_SELF 0x00000800 /* Self was moved */ > #define FAN_OPEN_EXEC 0x00001000 /* File was opened for exec */ > > +#define FAN_UNMOUNT 0x00002000 /* Filesystem unmounted */ > #define FAN_Q_OVERFLOW 0x00004000 /* Event queued overflowed */ > #define FAN_FS_ERROR 0x00008000 /* Filesystem error */ > > -- > 2.34.1 > -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH 1/2] fanotify: add support for FAN_UNMOUNT event 2023-04-19 13:14 ` Jan Kara @ 2023-04-19 18:32 ` Amir Goldstein 0 siblings, 0 replies; 16+ messages in thread From: Amir Goldstein @ 2023-04-19 18:32 UTC (permalink / raw) To: Jan Kara; +Cc: Matthew Bobrowski, Christian Brauner, linux-fsdevel, linux-api On Wed, Apr 19, 2023 at 4:14 PM Jan Kara <jack@suse.cz> wrote: > > On Fri 14-04-23 21:29:02, Amir Goldstein wrote: > > inotify generates unsolicited IN_UNMOUNT events for every inode > > mark before the filesystem containing the inode is shutdown. > > > > Unlike IN_UNMOUNT, FAN_UNMOUNT is an opt-in event that can only be > > set on a mount mark and is generated when the mount is unmounted. > > > > FAN_UNMOUNT requires FAN_REPORT_FID and reports an fid info record > > with fsid of the filesystem and an empty file handle. > > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > > Seeing the discussion further in this thread regarding FAN_IGNORED won't > it be more consistent (extensible) to implement the above functionality as > FAN_IGNORED delivered to mount mark when it is getting destroyed? > > I.e., define FAN_IGNORED as an event that gets delivered when a mark is > getting destroyed (with the records identifying the mark). For now start > supporting it on mount marks, later we can add support to other mark types > if there's demand. Thoughts? Sounds like a good idea. I will look into it. Thanks, Amir. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [RFC][PATCH 2/2] fanotify: report mntid info record with FAN_UNMOUNT events 2023-04-14 18:29 [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify Amir Goldstein 2023-04-14 18:29 ` [RFC][PATCH 1/2] fanotify: add support for FAN_UNMOUNT event Amir Goldstein @ 2023-04-14 18:29 ` Amir Goldstein 2023-04-17 8:38 ` Amir Goldstein 2023-04-19 13:30 ` Jan Kara 2023-04-18 13:33 ` [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify Christian Brauner 2 siblings, 2 replies; 16+ messages in thread From: Amir Goldstein @ 2023-04-14 18:29 UTC (permalink / raw) To: Jan Kara; +Cc: Matthew Bobrowski, Christian Brauner, linux-fsdevel, linux-api Report mntid in an info record of type FAN_EVENT_INFO_TYPE_MNTID with FAN_UNMOUNT event in addition to the fid info record. Signed-off-by: Amir Goldstein <amir73il@gmail.com> --- fs/notify/fanotify/fanotify.c | 37 +++++++++++++++++++------ fs/notify/fanotify/fanotify.h | 20 +++++++++++++- fs/notify/fanotify/fanotify_user.c | 44 +++++++++++++++++++++++++++--- include/uapi/linux/fanotify.h | 10 +++++++ 4 files changed, 97 insertions(+), 14 deletions(-) diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c index 384d2b2e55e7..c204259be6cc 100644 --- a/fs/notify/fanotify/fanotify.c +++ b/fs/notify/fanotify/fanotify.c @@ -17,6 +17,7 @@ #include <linux/stringhash.h> #include "fanotify.h" +#include "../../mount.h" /* for mnt_id */ static bool fanotify_path_equal(const struct path *p1, const struct path *p2) { @@ -41,6 +42,11 @@ static unsigned int fanotify_hash_fsid(__kernel_fsid_t *fsid) hash_32(fsid->val[1], FANOTIFY_EVENT_HASH_BITS); } +static unsigned int fanotify_hash_mntid(int mntid) +{ + return hash_32(mntid, FANOTIFY_EVENT_HASH_BITS); +} + static bool fanotify_fh_equal(struct fanotify_fh *fh1, struct fanotify_fh *fh2) { @@ -133,6 +139,12 @@ static bool fanotify_error_event_equal(struct fanotify_error_event *fee1, return true; } +/* + * FAN_RENAME and FAN_UNMOUNT are reported with special info record types, + * so we cannot merge them with other events. + */ +#define FANOTIFY_NO_MERGE_EVENTS (FAN_RENAME | FAN_UNMOUNT) + static bool fanotify_should_merge(struct fanotify_event *old, struct fanotify_event *new) { @@ -153,11 +165,8 @@ static bool fanotify_should_merge(struct fanotify_event *old, if ((old->mask & FS_ISDIR) != (new->mask & FS_ISDIR)) return false; - /* - * FAN_RENAME event is reported with special info record types, - * so we cannot merge it with other events. - */ - if ((old->mask & FAN_RENAME) != (new->mask & FAN_RENAME)) + if ((old->mask & FANOTIFY_NO_MERGE_EVENTS) || + (new->mask & FANOTIFY_NO_MERGE_EVENTS)) return false; switch (old->type) { @@ -593,9 +602,11 @@ static struct fanotify_event *fanotify_alloc_perm_event(const struct path *path, static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id, __kernel_fsid_t *fsid, + struct mount *mnt, unsigned int *hash, gfp_t gfp) { + unsigned int fh_len = fanotify_encode_fh_len(id); struct fanotify_fid_event *ffe; ffe = kmem_cache_alloc(fanotify_fid_event_cachep, gfp); @@ -605,8 +616,14 @@ static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id, ffe->fae.type = FANOTIFY_EVENT_TYPE_FID; ffe->fsid = *fsid; *hash ^= fanotify_hash_fsid(fsid); - fanotify_encode_fh(&ffe->object_fh, id, fanotify_encode_fh_len(id), - hash, gfp); + fanotify_encode_fh(&ffe->object_fh, id, fh_len, hash, gfp); + /* Record fid event with fsid, mntid and empty fh */ + if (mnt && !WARN_ON_ONCE(fh_len)) { + ffe->mnt_id = mnt->mnt_id; + ffe->object_fh.flags = FANOTIFY_FH_FLAG_MNT_ID; + if (hash) + *hash ^= fanotify_hash_mntid(mnt->mnt_id); + } return &ffe->fae; } @@ -737,6 +754,7 @@ static struct fanotify_event *fanotify_alloc_event( fid_mode); struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir); const struct path *path = fsnotify_data_path(data, data_type); + struct mount *mnt = NULL; struct mem_cgroup *old_memcg; struct dentry *moved = NULL; struct inode *child = NULL; @@ -746,7 +764,8 @@ static struct fanotify_event *fanotify_alloc_event( struct pid *pid; if (mask & FAN_UNMOUNT && !WARN_ON_ONCE(!path || !fid_mode)) { - /* Record fid event with fsid and empty fh */ + /* Record fid event with fsid, mntid and empty fh */ + mnt = real_mount(path->mnt); id = NULL; } else if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) { /* @@ -834,7 +853,7 @@ static struct fanotify_event *fanotify_alloc_event( event = fanotify_alloc_name_event(dirid, fsid, file_name, child, moved, &hash, gfp); } else if (fid_mode) { - event = fanotify_alloc_fid_event(id, fsid, &hash, gfp); + event = fanotify_alloc_fid_event(id, fsid, mnt, &hash, gfp); } else { event = fanotify_alloc_path_event(path, &hash, gfp); } diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h index f98dcf5b7a19..3d8391a77031 100644 --- a/fs/notify/fanotify/fanotify.h +++ b/fs/notify/fanotify/fanotify.h @@ -33,6 +33,7 @@ struct fanotify_fh { u8 type; u8 len; #define FANOTIFY_FH_FLAG_EXT_BUF 1 +#define FANOTIFY_FH_FLAG_MNT_ID 2 u8 flags; u8 pad; unsigned char buf[]; @@ -279,7 +280,10 @@ static inline void fanotify_init_event(struct fanotify_event *event, struct { \ struct fanotify_fh (name); \ /* Space for object_fh.buf[] - access with fanotify_fh_buf() */ \ - unsigned char _inline_fh_buf[(size)]; \ + union { \ + unsigned char _inline_fh_buf[(size)]; \ + int mnt_id; /* For FAN_UNMOUNT */ \ + }; \ } struct fanotify_fid_event { @@ -335,6 +339,20 @@ static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event) return NULL; } +static inline int fanotify_event_mntid(struct fanotify_event *event) +{ + struct fanotify_fh *fh = NULL; + + if (event->mask & FAN_UNMOUNT && + event->type == FANOTIFY_EVENT_TYPE_FID) + fh = &FANOTIFY_FE(event)->object_fh; + + if (fh && !fh->len && fh->flags == FANOTIFY_FH_FLAG_MNT_ID) + return FANOTIFY_FE(event)->mnt_id; + + return 0; +} + static inline struct fanotify_fh *fanotify_event_object_fh( struct fanotify_event *event) { diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c index 0b3de6218c56..db3b79b8e901 100644 --- a/fs/notify/fanotify/fanotify_user.c +++ b/fs/notify/fanotify/fanotify_user.c @@ -120,7 +120,9 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly; #define FANOTIFY_EVENT_ALIGN 4 #define FANOTIFY_FID_INFO_HDR_LEN \ (sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle)) -#define FANOTIFY_PIDFD_INFO_HDR_LEN \ +#define FANOTIFY_MNTID_INFO_LEN \ + sizeof(struct fanotify_event_info_mntid) +#define FANOTIFY_PIDFD_INFO_LEN \ sizeof(struct fanotify_event_info_pidfd) #define FANOTIFY_ERROR_INFO_LEN \ (sizeof(struct fanotify_event_info_error)) @@ -178,8 +180,11 @@ static size_t fanotify_event_len(unsigned int info_mode, dot_len = 1; } + if (fanotify_event_mntid(event)) + event_len += FANOTIFY_MNTID_INFO_LEN; + if (info_mode & FAN_REPORT_PIDFD) - event_len += FANOTIFY_PIDFD_INFO_HDR_LEN; + event_len += FANOTIFY_PIDFD_INFO_LEN; if (fanotify_event_has_object_fh(event)) { fh_len = fanotify_event_object_fh_len(event); @@ -515,7 +520,7 @@ static int copy_pidfd_info_to_user(int pidfd, size_t count) { struct fanotify_event_info_pidfd info = { }; - size_t info_len = FANOTIFY_PIDFD_INFO_HDR_LEN; + size_t info_len = FANOTIFY_PIDFD_INFO_LEN; if (WARN_ON_ONCE(info_len > count)) return -EFAULT; @@ -530,6 +535,26 @@ static int copy_pidfd_info_to_user(int pidfd, return info_len; } +static int copy_mntid_info_to_user(int mntid, + char __user *buf, + size_t count) +{ + struct fanotify_event_info_mntid info = { }; + size_t info_len = FANOTIFY_MNTID_INFO_LEN; + + if (WARN_ON_ONCE(info_len > count)) + return -EFAULT; + + info.hdr.info_type = FAN_EVENT_INFO_TYPE_MNTID; + info.hdr.len = info_len; + info.mnt_id = mntid; + + if (copy_to_user(buf, &info, info_len)) + return -EFAULT; + + return info_len; +} + static int copy_info_records_to_user(struct fanotify_event *event, struct fanotify_info *info, unsigned int info_mode, int pidfd, @@ -538,6 +563,7 @@ static int copy_info_records_to_user(struct fanotify_event *event, int ret, total_bytes = 0, info_type = 0; unsigned int fid_mode = info_mode & FANOTIFY_FID_BITS; unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD; + int mntid = fanotify_event_mntid(event); /* * Event info records order is as follows: @@ -632,6 +658,16 @@ static int copy_info_records_to_user(struct fanotify_event *event, total_bytes += ret; } + if (mntid) { + ret = copy_mntid_info_to_user(mntid, buf, count); + if (ret < 0) + return ret; + + buf += ret; + count -= ret; + total_bytes += ret; + } + if (pidfd_mode) { ret = copy_pidfd_info_to_user(pidfd, buf, count); if (ret < 0) @@ -1770,7 +1806,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, * inotify sends unsoliciled IN_UNMOUNT per marked inode on sb shutdown. * FAN_UNMOUNT event is about unmount of a mount, not about sb shutdown, * so allow setting it only in mount mark mask. - * FAN_UNMOUNT requires FAN_REPORT_FID to report fsid with empty fh. + * FAN_UNMOUNT requires FAN_REPORT_FID to report fsid and mntid. */ if (mask & FAN_UNMOUNT && (!(fid_mode & FAN_REPORT_FID) || mark_type != FAN_MARK_MOUNT)) diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h index 70f2d43e8ba4..886efbd877ba 100644 --- a/include/uapi/linux/fanotify.h +++ b/include/uapi/linux/fanotify.h @@ -144,6 +144,7 @@ struct fanotify_event_metadata { #define FAN_EVENT_INFO_TYPE_DFID 3 #define FAN_EVENT_INFO_TYPE_PIDFD 4 #define FAN_EVENT_INFO_TYPE_ERROR 5 +#define FAN_EVENT_INFO_TYPE_MNTID 6 /* Special info types for FAN_RENAME */ #define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME 10 @@ -184,6 +185,15 @@ struct fanotify_fhandle { __u64 fh_ino; }; +/* + * This structure is used for info records of type FAN_EVENT_INFO_TYPE_MNTID. + */ +struct fanotify_event_info_mntid { + struct fanotify_event_info_header hdr; + /* matches mount_id from name_to_handle_at(2) */ + __s32 mnt_id; +}; + /* * This structure is used for info records of type FAN_EVENT_INFO_TYPE_PIDFD. * It holds a pidfd for the pid that was responsible for generating an event. -- 2.34.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH 2/2] fanotify: report mntid info record with FAN_UNMOUNT events 2023-04-14 18:29 ` [RFC][PATCH 2/2] fanotify: report mntid info record with FAN_UNMOUNT events Amir Goldstein @ 2023-04-17 8:38 ` Amir Goldstein 2023-04-19 13:30 ` Jan Kara 1 sibling, 0 replies; 16+ messages in thread From: Amir Goldstein @ 2023-04-17 8:38 UTC (permalink / raw) To: Jan Kara; +Cc: Matthew Bobrowski, Christian Brauner, linux-fsdevel, linux-api On Fri, Apr 14, 2023 at 9:29 PM Amir Goldstein <amir73il@gmail.com> wrote: > > Report mntid in an info record of type FAN_EVENT_INFO_TYPE_MNTID > with FAN_UNMOUNT event in addition to the fid info record. > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> > --- > fs/notify/fanotify/fanotify.c | 37 +++++++++++++++++++------ > fs/notify/fanotify/fanotify.h | 20 +++++++++++++- > fs/notify/fanotify/fanotify_user.c | 44 +++++++++++++++++++++++++++--- > include/uapi/linux/fanotify.h | 10 +++++++ > 4 files changed, 97 insertions(+), 14 deletions(-) > > diff --git a/fs/notify/fanotify/fanotify.c b/fs/notify/fanotify/fanotify.c > index 384d2b2e55e7..c204259be6cc 100644 > --- a/fs/notify/fanotify/fanotify.c > +++ b/fs/notify/fanotify/fanotify.c > @@ -17,6 +17,7 @@ > #include <linux/stringhash.h> > > #include "fanotify.h" > +#include "../../mount.h" /* for mnt_id */ > > static bool fanotify_path_equal(const struct path *p1, const struct path *p2) > { > @@ -41,6 +42,11 @@ static unsigned int fanotify_hash_fsid(__kernel_fsid_t *fsid) > hash_32(fsid->val[1], FANOTIFY_EVENT_HASH_BITS); > } > > +static unsigned int fanotify_hash_mntid(int mntid) > +{ > + return hash_32(mntid, FANOTIFY_EVENT_HASH_BITS); > +} This hash is not needed of course as I later decided to never merge FAN_UNMOUNT events. > + > static bool fanotify_fh_equal(struct fanotify_fh *fh1, > struct fanotify_fh *fh2) > { > @@ -133,6 +139,12 @@ static bool fanotify_error_event_equal(struct fanotify_error_event *fee1, > return true; > } > > +/* > + * FAN_RENAME and FAN_UNMOUNT are reported with special info record types, > + * so we cannot merge them with other events. > + */ > +#define FANOTIFY_NO_MERGE_EVENTS (FAN_RENAME | FAN_UNMOUNT) > + > static bool fanotify_should_merge(struct fanotify_event *old, > struct fanotify_event *new) > { > @@ -153,11 +165,8 @@ static bool fanotify_should_merge(struct fanotify_event *old, > if ((old->mask & FS_ISDIR) != (new->mask & FS_ISDIR)) > return false; > > - /* > - * FAN_RENAME event is reported with special info record types, > - * so we cannot merge it with other events. > - */ > - if ((old->mask & FAN_RENAME) != (new->mask & FAN_RENAME)) > + if ((old->mask & FANOTIFY_NO_MERGE_EVENTS) || > + (new->mask & FANOTIFY_NO_MERGE_EVENTS)) > return false; BTW there is a minor change of logic here, from not merging FAN_RENAME events with anything but other FAN_RENAME events, to not merging FAN_RENAME events at all. But considering that FAN_RENAME events always have name info, the only possible merge of FAN_RENAME events is in a situation of loop { mv A B; mv B A;} and I don't think that this corner case justifies the merge of rename events. > > switch (old->type) { > @@ -593,9 +602,11 @@ static struct fanotify_event *fanotify_alloc_perm_event(const struct path *path, > > static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id, > __kernel_fsid_t *fsid, > + struct mount *mnt, > unsigned int *hash, > gfp_t gfp) > { > + unsigned int fh_len = fanotify_encode_fh_len(id); > struct fanotify_fid_event *ffe; > > ffe = kmem_cache_alloc(fanotify_fid_event_cachep, gfp); > @@ -605,8 +616,14 @@ static struct fanotify_event *fanotify_alloc_fid_event(struct inode *id, > ffe->fae.type = FANOTIFY_EVENT_TYPE_FID; > ffe->fsid = *fsid; > *hash ^= fanotify_hash_fsid(fsid); > - fanotify_encode_fh(&ffe->object_fh, id, fanotify_encode_fh_len(id), > - hash, gfp); > + fanotify_encode_fh(&ffe->object_fh, id, fh_len, hash, gfp); > + /* Record fid event with fsid, mntid and empty fh */ > + if (mnt && !WARN_ON_ONCE(fh_len)) { > + ffe->mnt_id = mnt->mnt_id; > + ffe->object_fh.flags = FANOTIFY_FH_FLAG_MNT_ID; > + if (hash) > + *hash ^= fanotify_hash_mntid(mnt->mnt_id); > + } > > return &ffe->fae; > } > @@ -737,6 +754,7 @@ static struct fanotify_event *fanotify_alloc_event( > fid_mode); > struct inode *dirid = fanotify_dfid_inode(mask, data, data_type, dir); > const struct path *path = fsnotify_data_path(data, data_type); > + struct mount *mnt = NULL; > struct mem_cgroup *old_memcg; > struct dentry *moved = NULL; > struct inode *child = NULL; > @@ -746,7 +764,8 @@ static struct fanotify_event *fanotify_alloc_event( > struct pid *pid; > > if (mask & FAN_UNMOUNT && !WARN_ON_ONCE(!path || !fid_mode)) { > - /* Record fid event with fsid and empty fh */ > + /* Record fid event with fsid, mntid and empty fh */ > + mnt = real_mount(path->mnt); > id = NULL; > } else if ((fid_mode & FAN_REPORT_DIR_FID) && dirid) { > /* > @@ -834,7 +853,7 @@ static struct fanotify_event *fanotify_alloc_event( > event = fanotify_alloc_name_event(dirid, fsid, file_name, child, > moved, &hash, gfp); > } else if (fid_mode) { > - event = fanotify_alloc_fid_event(id, fsid, &hash, gfp); > + event = fanotify_alloc_fid_event(id, fsid, mnt, &hash, gfp); > } else { > event = fanotify_alloc_path_event(path, &hash, gfp); > } > diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h > index f98dcf5b7a19..3d8391a77031 100644 > --- a/fs/notify/fanotify/fanotify.h > +++ b/fs/notify/fanotify/fanotify.h > @@ -33,6 +33,7 @@ struct fanotify_fh { > u8 type; > u8 len; > #define FANOTIFY_FH_FLAG_EXT_BUF 1 > +#define FANOTIFY_FH_FLAG_MNT_ID 2 > u8 flags; > u8 pad; > unsigned char buf[]; > @@ -279,7 +280,10 @@ static inline void fanotify_init_event(struct fanotify_event *event, > struct { \ > struct fanotify_fh (name); \ > /* Space for object_fh.buf[] - access with fanotify_fh_buf() */ \ > - unsigned char _inline_fh_buf[(size)]; \ > + union { \ > + unsigned char _inline_fh_buf[(size)]; \ > + int mnt_id; /* For FAN_UNMOUNT */ \ > + }; \ > } > > struct fanotify_fid_event { > @@ -335,6 +339,20 @@ static inline __kernel_fsid_t *fanotify_event_fsid(struct fanotify_event *event) > return NULL; > } > > +static inline int fanotify_event_mntid(struct fanotify_event *event) > +{ > + struct fanotify_fh *fh = NULL; > + > + if (event->mask & FAN_UNMOUNT && > + event->type == FANOTIFY_EVENT_TYPE_FID) > + fh = &FANOTIFY_FE(event)->object_fh; > + > + if (fh && !fh->len && fh->flags == FANOTIFY_FH_FLAG_MNT_ID) > + return FANOTIFY_FE(event)->mnt_id; > + > + return 0; > +} > + > static inline struct fanotify_fh *fanotify_event_object_fh( > struct fanotify_event *event) > { > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c > index 0b3de6218c56..db3b79b8e901 100644 > --- a/fs/notify/fanotify/fanotify_user.c > +++ b/fs/notify/fanotify/fanotify_user.c > @@ -120,7 +120,9 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly; > #define FANOTIFY_EVENT_ALIGN 4 > #define FANOTIFY_FID_INFO_HDR_LEN \ > (sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle)) > -#define FANOTIFY_PIDFD_INFO_HDR_LEN \ > +#define FANOTIFY_MNTID_INFO_LEN \ > + sizeof(struct fanotify_event_info_mntid) > +#define FANOTIFY_PIDFD_INFO_LEN \ > sizeof(struct fanotify_event_info_pidfd) > #define FANOTIFY_ERROR_INFO_LEN \ > (sizeof(struct fanotify_event_info_error)) > @@ -178,8 +180,11 @@ static size_t fanotify_event_len(unsigned int info_mode, > dot_len = 1; > } > > + if (fanotify_event_mntid(event)) > + event_len += FANOTIFY_MNTID_INFO_LEN; > + > if (info_mode & FAN_REPORT_PIDFD) > - event_len += FANOTIFY_PIDFD_INFO_HDR_LEN; > + event_len += FANOTIFY_PIDFD_INFO_LEN; > > if (fanotify_event_has_object_fh(event)) { > fh_len = fanotify_event_object_fh_len(event); > @@ -515,7 +520,7 @@ static int copy_pidfd_info_to_user(int pidfd, > size_t count) > { > struct fanotify_event_info_pidfd info = { }; > - size_t info_len = FANOTIFY_PIDFD_INFO_HDR_LEN; > + size_t info_len = FANOTIFY_PIDFD_INFO_LEN; > > if (WARN_ON_ONCE(info_len > count)) > return -EFAULT; > @@ -530,6 +535,26 @@ static int copy_pidfd_info_to_user(int pidfd, > return info_len; > } > > +static int copy_mntid_info_to_user(int mntid, > + char __user *buf, > + size_t count) > +{ > + struct fanotify_event_info_mntid info = { }; > + size_t info_len = FANOTIFY_MNTID_INFO_LEN; > + > + if (WARN_ON_ONCE(info_len > count)) > + return -EFAULT; > + > + info.hdr.info_type = FAN_EVENT_INFO_TYPE_MNTID; > + info.hdr.len = info_len; > + info.mnt_id = mntid; > + > + if (copy_to_user(buf, &info, info_len)) > + return -EFAULT; > + > + return info_len; > +} > + > static int copy_info_records_to_user(struct fanotify_event *event, > struct fanotify_info *info, > unsigned int info_mode, int pidfd, > @@ -538,6 +563,7 @@ static int copy_info_records_to_user(struct fanotify_event *event, > int ret, total_bytes = 0, info_type = 0; > unsigned int fid_mode = info_mode & FANOTIFY_FID_BITS; > unsigned int pidfd_mode = info_mode & FAN_REPORT_PIDFD; > + int mntid = fanotify_event_mntid(event); > > /* > * Event info records order is as follows: > @@ -632,6 +658,16 @@ static int copy_info_records_to_user(struct fanotify_event *event, > total_bytes += ret; > } > > + if (mntid) { > + ret = copy_mntid_info_to_user(mntid, buf, count); > + if (ret < 0) > + return ret; > + > + buf += ret; > + count -= ret; > + total_bytes += ret; > + } > + > if (pidfd_mode) { > ret = copy_pidfd_info_to_user(pidfd, buf, count); > if (ret < 0) > @@ -1770,7 +1806,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask, > * inotify sends unsoliciled IN_UNMOUNT per marked inode on sb shutdown. > * FAN_UNMOUNT event is about unmount of a mount, not about sb shutdown, > * so allow setting it only in mount mark mask. > - * FAN_UNMOUNT requires FAN_REPORT_FID to report fsid with empty fh. > + * FAN_UNMOUNT requires FAN_REPORT_FID to report fsid and mntid. > */ > if (mask & FAN_UNMOUNT && > (!(fid_mode & FAN_REPORT_FID) || mark_type != FAN_MARK_MOUNT)) > diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h > index 70f2d43e8ba4..886efbd877ba 100644 > --- a/include/uapi/linux/fanotify.h > +++ b/include/uapi/linux/fanotify.h > @@ -144,6 +144,7 @@ struct fanotify_event_metadata { > #define FAN_EVENT_INFO_TYPE_DFID 3 > #define FAN_EVENT_INFO_TYPE_PIDFD 4 > #define FAN_EVENT_INFO_TYPE_ERROR 5 > +#define FAN_EVENT_INFO_TYPE_MNTID 6 > > /* Special info types for FAN_RENAME */ > #define FAN_EVENT_INFO_TYPE_OLD_DFID_NAME 10 > @@ -184,6 +185,15 @@ struct fanotify_fhandle { > __u64 fh_ino; > }; > > +/* > + * This structure is used for info records of type FAN_EVENT_INFO_TYPE_MNTID. > + */ > +struct fanotify_event_info_mntid { > + struct fanotify_event_info_header hdr; > + /* matches mount_id from name_to_handle_at(2) */ > + __s32 mnt_id; > +}; > + > /* > * This structure is used for info records of type FAN_EVENT_INFO_TYPE_PIDFD. > * It holds a pidfd for the pid that was responsible for generating an event. > -- > 2.34.1 > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH 2/2] fanotify: report mntid info record with FAN_UNMOUNT events 2023-04-14 18:29 ` [RFC][PATCH 2/2] fanotify: report mntid info record with FAN_UNMOUNT events Amir Goldstein 2023-04-17 8:38 ` Amir Goldstein @ 2023-04-19 13:30 ` Jan Kara 1 sibling, 0 replies; 16+ messages in thread From: Jan Kara @ 2023-04-19 13:30 UTC (permalink / raw) To: Amir Goldstein Cc: Jan Kara, Matthew Bobrowski, Christian Brauner, linux-fsdevel, linux-api On Fri 14-04-23 21:29:03, Amir Goldstein wrote: > Report mntid in an info record of type FAN_EVENT_INFO_TYPE_MNTID > with FAN_UNMOUNT event in addition to the fid info record. > > Signed-off-by: Amir Goldstein <amir73il@gmail.com> This patch looks good to me (besides the things you've already noticed). Just one nit below. > diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c > index 0b3de6218c56..db3b79b8e901 100644 > --- a/fs/notify/fanotify/fanotify_user.c > +++ b/fs/notify/fanotify/fanotify_user.c > @@ -120,7 +120,9 @@ struct kmem_cache *fanotify_perm_event_cachep __read_mostly; > #define FANOTIFY_EVENT_ALIGN 4 > #define FANOTIFY_FID_INFO_HDR_LEN \ > (sizeof(struct fanotify_event_info_fid) + sizeof(struct file_handle)) > -#define FANOTIFY_PIDFD_INFO_HDR_LEN \ > +#define FANOTIFY_MNTID_INFO_LEN \ > + sizeof(struct fanotify_event_info_mntid) > +#define FANOTIFY_PIDFD_INFO_LEN \ I agree with changing FANOTIFY_PIDFD_INFO_HDR_LEN to FANOTIFY_PIDFD_INFO_LEN but as a separate patch please. Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify 2023-04-14 18:29 [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify Amir Goldstein 2023-04-14 18:29 ` [RFC][PATCH 1/2] fanotify: add support for FAN_UNMOUNT event Amir Goldstein 2023-04-14 18:29 ` [RFC][PATCH 2/2] fanotify: report mntid info record with FAN_UNMOUNT events Amir Goldstein @ 2023-04-18 13:33 ` Christian Brauner 2023-04-18 13:56 ` Amir Goldstein 2 siblings, 1 reply; 16+ messages in thread From: Christian Brauner @ 2023-04-18 13:33 UTC (permalink / raw) To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, linux-api On Fri, Apr 14, 2023 at 09:29:01PM +0300, Amir Goldstein wrote: > Jan, > > Followup on my quest to close the gap with inotify functionality, > here is a proposal for FAN_UNMOUNT event. > > I have had many design questions about this: I'm going to humbly express what I feel makes sense to me when looking at this from a user perspective: > 1) Should we also report FAN_UNMOUNT for marked inodes and sb > on sb shutdown (same as IN_UNMOUNT)? My preference would be if this would be a separate event type. FAN_SB_SHUTDOWN or something. > 2) Should we also report FAN_UNMOUNT on sb mark for any unmounts > of that sb? I don't think so. It feels to me that if you watch an sb you don't necessarily want to watch bind mounts of that sb. > 3) Should we report also the fid of the mount root? and if we do... > 4) Should we report/consider FAN_ONDIR filter? > > All of the questions above I answered "not unless somebody requests" > in this first RFC. Fwiw, I agree. > > Specifically, I did get a request for an unmount event for containers > use case. > > I have also had doubts regarding the info records. > I decided that reporting fsid and mntid is minimum, but couldn't > decide if they were better of in a single MNTID record or seprate > records. > > I went with separate records, because: > a) FAN_FS_ERROR has set a precendent of separate fid record with > fsid and empty fid, so I followed this precendent > b) MNTID record we may want to add later with FAN_REPORT_MNTID > to all the path events, so better that it is independent I agree. > > There is test for the proposed API extention [1]. > > Thoughts? I think this is a rather useful extension! ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify 2023-04-18 13:33 ` [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify Christian Brauner @ 2023-04-18 13:56 ` Amir Goldstein 2023-04-18 14:12 ` Christian Brauner 0 siblings, 1 reply; 16+ messages in thread From: Amir Goldstein @ 2023-04-18 13:56 UTC (permalink / raw) To: Christian Brauner; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, linux-api On Tue, Apr 18, 2023 at 4:33 PM Christian Brauner <brauner@kernel.org> wrote: > > On Fri, Apr 14, 2023 at 09:29:01PM +0300, Amir Goldstein wrote: > > Jan, > > > > Followup on my quest to close the gap with inotify functionality, > > here is a proposal for FAN_UNMOUNT event. > > > > I have had many design questions about this: > > I'm going to humbly express what I feel makes sense to me when looking > at this from a user perspective: > > > 1) Should we also report FAN_UNMOUNT for marked inodes and sb > > on sb shutdown (same as IN_UNMOUNT)? > > My preference would be if this would be a separate event type. > FAN_SB_SHUTDOWN or something. If we implement an event for this at all, I would suggest FAN_IGNORED or FAN_EVICTED, which has the same meaning as IN_IGNORED. When you get an event that the watch went away, it could be because of: 1. watch removed by user 2. watch removed because inode was evicted (with FAN_MARK_EVICTABLE) 3. inode deleted 4. sb shutdown IN_IGNORED is generated in all of the above except for inode evict that is not possible with inotify. User can figure out on his own if the inode was deleted or if fs was unmounted, so there is not really a need for FAN_SB_SHUTDOWN IMO. Actually, I think that FAN_IGNORED would be quite useful for the FAN_MARK_EVICTABLE case, but it is a bit less trivial to implement than FAN_UNMOUNT was. > > > 2) Should we also report FAN_UNMOUNT on sb mark for any unmounts > > of that sb? > > I don't think so. It feels to me that if you watch an sb you don't > necessarily want to watch bind mounts of that sb. > > > 3) Should we report also the fid of the mount root? and if we do... > > 4) Should we report/consider FAN_ONDIR filter? > > > > All of the questions above I answered "not unless somebody requests" > > in this first RFC. > > Fwiw, I agree. > > > > > Specifically, I did get a request for an unmount event for containers > > use case. > > > > I have also had doubts regarding the info records. > > I decided that reporting fsid and mntid is minimum, but couldn't > > decide if they were better of in a single MNTID record or seprate > > records. > > > > I went with separate records, because: > > a) FAN_FS_ERROR has set a precendent of separate fid record with > > fsid and empty fid, so I followed this precendent > > b) MNTID record we may want to add later with FAN_REPORT_MNTID > > to all the path events, so better that it is independent > Just thought of another reason: c) FAN_UNMOUNT does not need to require FAN_REPORT_FID so it does not depend on filesystem having a valid f_fsid nor exports_ops. In case of "pseudo" fs, FAN_UNMOUNT can report only MNTID record (I will amend the patch with this minor change). > I agree. > > > > > There is test for the proposed API extention [1]. > > > > Thoughts? > > I think this is a rather useful extension! Thanks for the API review! Amir. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify 2023-04-18 13:56 ` Amir Goldstein @ 2023-04-18 14:12 ` Christian Brauner 2023-04-18 15:20 ` Amir Goldstein 0 siblings, 1 reply; 16+ messages in thread From: Christian Brauner @ 2023-04-18 14:12 UTC (permalink / raw) To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, linux-api On Tue, Apr 18, 2023 at 04:56:40PM +0300, Amir Goldstein wrote: > On Tue, Apr 18, 2023 at 4:33 PM Christian Brauner <brauner@kernel.org> wrote: > > > > On Fri, Apr 14, 2023 at 09:29:01PM +0300, Amir Goldstein wrote: > > > Jan, > > > > > > Followup on my quest to close the gap with inotify functionality, > > > here is a proposal for FAN_UNMOUNT event. > > > > > > I have had many design questions about this: > > > > I'm going to humbly express what I feel makes sense to me when looking > > at this from a user perspective: > > > > > 1) Should we also report FAN_UNMOUNT for marked inodes and sb > > > on sb shutdown (same as IN_UNMOUNT)? > > > > My preference would be if this would be a separate event type. > > FAN_SB_SHUTDOWN or something. > > If we implement an event for this at all, I would suggest FAN_IGNORED > or FAN_EVICTED, which has the same meaning as IN_IGNORED. > When you get an event that the watch went away, it could be because of: > 1. watch removed by user > 2. watch removed because inode was evicted (with FAN_MARK_EVICTABLE) > 3. inode deleted > 4. sb shutdown > > IN_IGNORED is generated in all of the above except for inode evict > that is not possible with inotify. > > User can figure out on his own if the inode was deleted or if fs was unmounted, > so there is not really a need for FAN_SB_SHUTDOWN IMO. Ok, sounds good. > > Actually, I think that FAN_IGNORED would be quite useful for the > FAN_MARK_EVICTABLE case, but it is a bit less trivial to implement > than FAN_UNMOUNT was. > > > > > > 2) Should we also report FAN_UNMOUNT on sb mark for any unmounts > > > of that sb? > > > > I don't think so. It feels to me that if you watch an sb you don't > > necessarily want to watch bind mounts of that sb. > > > > > 3) Should we report also the fid of the mount root? and if we do... > > > 4) Should we report/consider FAN_ONDIR filter? > > > > > > All of the questions above I answered "not unless somebody requests" > > > in this first RFC. > > > > Fwiw, I agree. > > > > > > > > Specifically, I did get a request for an unmount event for containers > > > use case. > > > > > > I have also had doubts regarding the info records. > > > I decided that reporting fsid and mntid is minimum, but couldn't > > > decide if they were better of in a single MNTID record or seprate > > > records. > > > > > > I went with separate records, because: > > > a) FAN_FS_ERROR has set a precendent of separate fid record with > > > fsid and empty fid, so I followed this precendent > > > b) MNTID record we may want to add later with FAN_REPORT_MNTID > > > to all the path events, so better that it is independent > > > > Just thought of another reason: > c) FAN_UNMOUNT does not need to require FAN_REPORT_FID > so it does not depend on filesystem having a valid f_fsid nor > exports_ops. In case of "pseudo" fs, FAN_UNMOUNT can report > only MNTID record (I will amend the patch with this minor change). I see some pseudo fses generate f_fsid, e.g., tmpfs in mm/shmem.c At the risk of putting my foot in my mouth, what's stopping us from making them all support f_fsid? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify 2023-04-18 14:12 ` Christian Brauner @ 2023-04-18 15:20 ` Amir Goldstein 2023-04-19 17:19 ` Christian Brauner 0 siblings, 1 reply; 16+ messages in thread From: Amir Goldstein @ 2023-04-18 15:20 UTC (permalink / raw) To: Christian Brauner; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, linux-api On Tue, Apr 18, 2023 at 5:12 PM Christian Brauner <brauner@kernel.org> wrote: > > On Tue, Apr 18, 2023 at 04:56:40PM +0300, Amir Goldstein wrote: > > On Tue, Apr 18, 2023 at 4:33 PM Christian Brauner <brauner@kernel.org> wrote: > > > > > > On Fri, Apr 14, 2023 at 09:29:01PM +0300, Amir Goldstein wrote: > > > > Jan, > > > > > > > > Followup on my quest to close the gap with inotify functionality, > > > > here is a proposal for FAN_UNMOUNT event. > > > > > > > > I have had many design questions about this: > > > > > > I'm going to humbly express what I feel makes sense to me when looking > > > at this from a user perspective: > > > > > > > 1) Should we also report FAN_UNMOUNT for marked inodes and sb > > > > on sb shutdown (same as IN_UNMOUNT)? > > > > > > My preference would be if this would be a separate event type. > > > FAN_SB_SHUTDOWN or something. > > > > If we implement an event for this at all, I would suggest FAN_IGNORED > > or FAN_EVICTED, which has the same meaning as IN_IGNORED. > > When you get an event that the watch went away, it could be because of: > > 1. watch removed by user > > 2. watch removed because inode was evicted (with FAN_MARK_EVICTABLE) > > 3. inode deleted > > 4. sb shutdown > > > > IN_IGNORED is generated in all of the above except for inode evict > > that is not possible with inotify. > > > > User can figure out on his own if the inode was deleted or if fs was unmounted, > > so there is not really a need for FAN_SB_SHUTDOWN IMO. > > Ok, sounds good. > > > > > Actually, I think that FAN_IGNORED would be quite useful for the > > FAN_MARK_EVICTABLE case, but it is a bit less trivial to implement > > than FAN_UNMOUNT was. > > > > > > > > > 2) Should we also report FAN_UNMOUNT on sb mark for any unmounts > > > > of that sb? > > > > > > I don't think so. It feels to me that if you watch an sb you don't > > > necessarily want to watch bind mounts of that sb. > > > > > > > 3) Should we report also the fid of the mount root? and if we do... > > > > 4) Should we report/consider FAN_ONDIR filter? > > > > > > > > All of the questions above I answered "not unless somebody requests" > > > > in this first RFC. > > > > > > Fwiw, I agree. > > > > > > > > > > > Specifically, I did get a request for an unmount event for containers > > > > use case. > > > > > > > > I have also had doubts regarding the info records. > > > > I decided that reporting fsid and mntid is minimum, but couldn't > > > > decide if they were better of in a single MNTID record or seprate > > > > records. > > > > > > > > I went with separate records, because: > > > > a) FAN_FS_ERROR has set a precendent of separate fid record with > > > > fsid and empty fid, so I followed this precendent > > > > b) MNTID record we may want to add later with FAN_REPORT_MNTID > > > > to all the path events, so better that it is independent > > > > > > > Just thought of another reason: > > c) FAN_UNMOUNT does not need to require FAN_REPORT_FID > > so it does not depend on filesystem having a valid f_fsid nor > > exports_ops. In case of "pseudo" fs, FAN_UNMOUNT can report > > only MNTID record (I will amend the patch with this minor change). > > I see some pseudo fses generate f_fsid, e.g., tmpfs in mm/shmem.c tmpfs is not "pseudo" in my eyes, because it implements a great deal of the vfs interfaces, including export_ops. and also I fixed its f_fsid recently: 59cda49ecf6c shmem: allow reporting fanotify events with file handles on tmpfs > At the risk of putting my foot in my mouth, what's stopping us from > making them all support f_fsid? Nothing much. Jan had the same opinion [1]. We could do either: 1. use uuid_to_fsid() in vfs_statfs() if fs has set s_uuid and not set f_fsid 2. use s_dev as f_fsid in vfs_statfs() if fs did not set f_fsid nor s_uuid 3. randomize s_uuid for simple fs (like tmpfs) 4. any combination of the above and more Note that we will also need to decide what to do with name_to_handle_at() for those pseudo fs. Quoting Jan from [1]: "But otherwise the proposal to make name_to_handle_at() work even for filesystems not exportable through NFS makes sense to me. But I guess we need some buy-in from VFS maintainers for this." (hint hint). Thanks, Amir. [1] https://lore.kernel.org/linux-fsdevel/20230417162721.ouzs33oh6mb7vtft@quack3/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify 2023-04-18 15:20 ` Amir Goldstein @ 2023-04-19 17:19 ` Christian Brauner 2023-04-20 6:12 ` Amir Goldstein 0 siblings, 1 reply; 16+ messages in thread From: Christian Brauner @ 2023-04-19 17:19 UTC (permalink / raw) To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, linux-api On Tue, Apr 18, 2023 at 06:20:22PM +0300, Amir Goldstein wrote: > On Tue, Apr 18, 2023 at 5:12 PM Christian Brauner <brauner@kernel.org> wrote: > > > > On Tue, Apr 18, 2023 at 04:56:40PM +0300, Amir Goldstein wrote: > > > On Tue, Apr 18, 2023 at 4:33 PM Christian Brauner <brauner@kernel.org> wrote: > > > > > > > > On Fri, Apr 14, 2023 at 09:29:01PM +0300, Amir Goldstein wrote: > > > > > Jan, > > > > > > > > > > Followup on my quest to close the gap with inotify functionality, > > > > > here is a proposal for FAN_UNMOUNT event. > > > > > > > > > > I have had many design questions about this: > > > > > > > > I'm going to humbly express what I feel makes sense to me when looking > > > > at this from a user perspective: > > > > > > > > > 1) Should we also report FAN_UNMOUNT for marked inodes and sb > > > > > on sb shutdown (same as IN_UNMOUNT)? > > > > > > > > My preference would be if this would be a separate event type. > > > > FAN_SB_SHUTDOWN or something. > > > > > > If we implement an event for this at all, I would suggest FAN_IGNORED > > > or FAN_EVICTED, which has the same meaning as IN_IGNORED. > > > When you get an event that the watch went away, it could be because of: > > > 1. watch removed by user > > > 2. watch removed because inode was evicted (with FAN_MARK_EVICTABLE) > > > 3. inode deleted > > > 4. sb shutdown > > > > > > IN_IGNORED is generated in all of the above except for inode evict > > > that is not possible with inotify. > > > > > > User can figure out on his own if the inode was deleted or if fs was unmounted, > > > so there is not really a need for FAN_SB_SHUTDOWN IMO. > > > > Ok, sounds good. > > > > > > > > Actually, I think that FAN_IGNORED would be quite useful for the > > > FAN_MARK_EVICTABLE case, but it is a bit less trivial to implement > > > than FAN_UNMOUNT was. > > > > > > > > > > > > 2) Should we also report FAN_UNMOUNT on sb mark for any unmounts > > > > > of that sb? > > > > > > > > I don't think so. It feels to me that if you watch an sb you don't > > > > necessarily want to watch bind mounts of that sb. > > > > > > > > > 3) Should we report also the fid of the mount root? and if we do... > > > > > 4) Should we report/consider FAN_ONDIR filter? > > > > > > > > > > All of the questions above I answered "not unless somebody requests" > > > > > in this first RFC. > > > > > > > > Fwiw, I agree. > > > > > > > > > > > > > > Specifically, I did get a request for an unmount event for containers > > > > > use case. > > > > > > > > > > I have also had doubts regarding the info records. > > > > > I decided that reporting fsid and mntid is minimum, but couldn't > > > > > decide if they were better of in a single MNTID record or seprate > > > > > records. > > > > > > > > > > I went with separate records, because: > > > > > a) FAN_FS_ERROR has set a precendent of separate fid record with > > > > > fsid and empty fid, so I followed this precendent > > > > > b) MNTID record we may want to add later with FAN_REPORT_MNTID > > > > > to all the path events, so better that it is independent > > > > > > > > > > Just thought of another reason: > > > c) FAN_UNMOUNT does not need to require FAN_REPORT_FID > > > so it does not depend on filesystem having a valid f_fsid nor > > > exports_ops. In case of "pseudo" fs, FAN_UNMOUNT can report > > > only MNTID record (I will amend the patch with this minor change). > > > > I see some pseudo fses generate f_fsid, e.g., tmpfs in mm/shmem.c > > tmpfs is not "pseudo" in my eyes, because it implements a great deal of the > vfs interfaces, including export_ops. The term "pseudo" is somewhat well-defined though, no? It really just means that there's no backing device associated with it. So for example, anything that uses get_tree_nodev() including tmpfs. If erofs is compiled with fscache support it's even a pseudo fs (TIL). > > and also I fixed its f_fsid recently: > 59cda49ecf6c shmem: allow reporting fanotify events with file handles on tmpfs Well thank you for that this has been very useful in userspace already I've been told. > > > At the risk of putting my foot in my mouth, what's stopping us from > > making them all support f_fsid? > > Nothing much. Jan had the same opinion [1]. I think that's what we should try to do without having thought too much about potential edge-cases. > > We could do either: > 1. use uuid_to_fsid() in vfs_statfs() if fs has set s_uuid and not set f_fsid > 2. use s_dev as f_fsid in vfs_statfs() if fs did not set f_fsid nor s_uuid > 3. randomize s_uuid for simple fs (like tmpfs) > 4. any combination of the above and more > > Note that we will also need to decide what to do with > name_to_handle_at() for those pseudo fs. Doing it on the fly during vfs_statfs() feels a bit messy and could cause bugs. One should never underestimate the possibility that there's some fs that somehow would get into trouble because of odd behavior. So switching each fs over to generate a s_uuid seems the prudent thing to do. Doing it the hard way also forces us to make sure that each filesystem can deal with this. It seems that for pseudo fses we can just allocate a new s_uuid for each instance. So each tmpfs instance - like your patch did - would just get a new s_uuid. For kernel internal filesystems - mostly those that use init_pseudo - the s_uuid would remain stable until the next reboot when it is regenerated. Looking around just a little there's some block-backed fses like fat that have an f_fsid but no s_uuid. So if we give those s_uuid then it'll mean that the f_fsid isn't generated based on the s_uuid. That should be ok though and shouldn't matter to userspace. Afterwards we could probably lift the ext4 and xfs specific ioctls to retrieve the s_uuid into a generic ioctl to allow userspace to get the s_uuid. That's my thinking without having crawled to all possible corner cases... Also needs documenting that s_uuid is not optional anymore and explain the difference between pseudo and device-backed fses. I hope that's not completely naive... > > Quoting Jan from [1]: > "But otherwise the proposal to make name_to_handle_at() work even for > filesystems not exportable through NFS makes sense to me. But I guess we > need some buy-in from VFS maintainers for this." (hint hint). > > Thanks, > Amir. > > [1] https://lore.kernel.org/linux-fsdevel/20230417162721.ouzs33oh6mb7vtft@quack3/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify 2023-04-19 17:19 ` Christian Brauner @ 2023-04-20 6:12 ` Amir Goldstein 2023-04-20 7:46 ` Christian Brauner 0 siblings, 1 reply; 16+ messages in thread From: Amir Goldstein @ 2023-04-20 6:12 UTC (permalink / raw) To: Christian Brauner Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, linux-api, Miklos Szeredi On Wed, Apr 19, 2023 at 8:19 PM Christian Brauner <brauner@kernel.org> wrote: > > On Tue, Apr 18, 2023 at 06:20:22PM +0300, Amir Goldstein wrote: > > On Tue, Apr 18, 2023 at 5:12 PM Christian Brauner <brauner@kernel.org> wrote: > > > > > > On Tue, Apr 18, 2023 at 04:56:40PM +0300, Amir Goldstein wrote: > > > > On Tue, Apr 18, 2023 at 4:33 PM Christian Brauner <brauner@kernel.org> wrote: [...] > > > > Just thought of another reason: > > > > c) FAN_UNMOUNT does not need to require FAN_REPORT_FID > > > > so it does not depend on filesystem having a valid f_fsid nor > > > > exports_ops. In case of "pseudo" fs, FAN_UNMOUNT can report > > > > only MNTID record (I will amend the patch with this minor change). > > > > > > I see some pseudo fses generate f_fsid, e.g., tmpfs in mm/shmem.c > > > > tmpfs is not "pseudo" in my eyes, because it implements a great deal of the > > vfs interfaces, including export_ops. > > The term "pseudo" is somewhat well-defined though, no? It really just > means that there's no backing device associated with it. So for example, > anything that uses get_tree_nodev() including tmpfs. If erofs is > compiled with fscache support it's even a pseudo fs (TIL). > Ok, "pseudo fs" is an ambiguous term. For the sake of this discussion, let's refer to fs that use get_tree_nodev() "non-disk fs". But as far as fsnotify is concerned, tmpfs is equivalent to xfs, because all of the changes are made by users via vfs. Let's call fs where changes can occur not via vfs "remote fs", those include the network fs and some "internal fs" like the kernfs class of fs and the "simple fs" class of fs (i.e. simple_fill_super). With all the remote fs, the behavior of fsnotify is (and has always been) undefined, that is, you can use inotify to subscribe for events and you never know what you will get when changes are not made via vfs. Some people (hypothetical) may expect to watch nsfs for dying ns and may be disappointed to find out that they do not get the desired IN_DELETE event. We have had lengthy discussions about remote fs change notifications with no clear decisions of the best API for them: https://lore.kernel.org/linux-fsdevel/20211025204634.2517-1-iangelak@redhat.com/ > > > > and also I fixed its f_fsid recently: > > 59cda49ecf6c shmem: allow reporting fanotify events with file handles on tmpfs > > Well thank you for that this has been very useful in userspace already > I've been told. > > > > > > At the risk of putting my foot in my mouth, what's stopping us from > > > making them all support f_fsid? > > > > Nothing much. Jan had the same opinion [1]. > > I think that's what we should try to do without having thought too much > about potential edge-cases. > > > > > We could do either: > > 1. use uuid_to_fsid() in vfs_statfs() if fs has set s_uuid and not set f_fsid > > 2. use s_dev as f_fsid in vfs_statfs() if fs did not set f_fsid nor s_uuid > > 3. randomize s_uuid for simple fs (like tmpfs) > > 4. any combination of the above and more > > > > Note that we will also need to decide what to do with > > name_to_handle_at() for those pseudo fs. > > Doing it on the fly during vfs_statfs() feels a bit messy and could > cause bugs. One should never underestimate the possibility that there's > some fs that somehow would get into trouble because of odd behavior. > > So switching each fs over to generate a s_uuid seems the prudent thing > to do. Doing it the hard way also forces us to make sure that each > filesystem can deal with this. > > It seems that for pseudo fses we can just allocate a new s_uuid for each > instance. So each tmpfs instance - like your patch did - would just get > a new s_uuid. > > For kernel internal filesystems - mostly those that use init_pseudo - > the s_uuid would remain stable until the next reboot when it is > regenerated. > I am fine with opt-in for every fs as long as we do not duplicate boilerplate code. An FS_ flag could be a simple way to opt-in for this generic behavior. > Looking around just a little there's some block-backed fses like fat > that have an f_fsid but no s_uuid. So if we give those s_uuid then it'll > mean that the f_fsid isn't generated based on the s_uuid. That should be > ok though and shouldn't matter to userspace. > > Afterwards we could probably lift the ext4 and xfs specific ioctls to > retrieve the s_uuid into a generic ioctl to allow userspace to get the > s_uuid. > > That's my thinking without having crawled to all possible corner > cases... Also needs documenting that s_uuid is not optional anymore and > explain the difference between pseudo and device-backed fses. I hope > that's not completely naive... > I don't think that the dichotomy of device-backed vs. pseudo is enough to describe the situation. I think what needs to be better documented and annotated is what type of fsnotify services can be expected to work on a given fs. Jan has already introduced FS_DISALLOW_NOTIFY_PERM to opt-out of permission events (for procfs). Perhaps this could be generalized to s_type->fs_notify_supported_events or s_type->fs_notify_supported_features. For example, if an fs opts-in to FAN_REPORT_FID, then it gets an auto allocated s_uuid and f_fsid if it did not fill them in fill_super and in statfs and it gets a default implementation for encoding file handles from ino/gen. Thanks, Amir. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify 2023-04-20 6:12 ` Amir Goldstein @ 2023-04-20 7:46 ` Christian Brauner 2023-04-20 8:15 ` Amir Goldstein 0 siblings, 1 reply; 16+ messages in thread From: Christian Brauner @ 2023-04-20 7:46 UTC (permalink / raw) To: Amir Goldstein Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, linux-api, Miklos Szeredi On Thu, Apr 20, 2023 at 09:12:52AM +0300, Amir Goldstein wrote: > On Wed, Apr 19, 2023 at 8:19 PM Christian Brauner <brauner@kernel.org> wrote: > > > > On Tue, Apr 18, 2023 at 06:20:22PM +0300, Amir Goldstein wrote: > > > On Tue, Apr 18, 2023 at 5:12 PM Christian Brauner <brauner@kernel.org> wrote: > > > > > > > > On Tue, Apr 18, 2023 at 04:56:40PM +0300, Amir Goldstein wrote: > > > > > On Tue, Apr 18, 2023 at 4:33 PM Christian Brauner <brauner@kernel.org> wrote: > > [...] > > > > > > Just thought of another reason: > > > > > c) FAN_UNMOUNT does not need to require FAN_REPORT_FID > > > > > so it does not depend on filesystem having a valid f_fsid nor > > > > > exports_ops. In case of "pseudo" fs, FAN_UNMOUNT can report > > > > > only MNTID record (I will amend the patch with this minor change). > > > > > > > > I see some pseudo fses generate f_fsid, e.g., tmpfs in mm/shmem.c > > > > > > tmpfs is not "pseudo" in my eyes, because it implements a great deal of the > > > vfs interfaces, including export_ops. > > > > The term "pseudo" is somewhat well-defined though, no? It really just > > means that there's no backing device associated with it. So for example, > > anything that uses get_tree_nodev() including tmpfs. If erofs is > > compiled with fscache support it's even a pseudo fs (TIL). > > > > Ok, "pseudo fs" is an ambiguous term. > > For the sake of this discussion, let's refer to fs that use get_tree_nodev() > "non-disk fs". > > But as far as fsnotify is concerned, tmpfs is equivalent to xfs, because > all of the changes are made by users via vfs. > > Let's call fs where changes can occur not via vfs "remote fs", those > include the network fs and some "internal fs" like the kernfs class of fs > and the "simple fs" class of fs (i.e. simple_fill_super). > > With all the remote fs, the behavior of fsnotify is (and has always been) > undefined, that is, you can use inotify to subscribe for events and you > never know what you will get when changes are not made via vfs. > > Some people (hypothetical) may expect to watch nsfs for dying ns > and may be disappointed to find out that they do not get the desired > IN_DELETE event. > > We have had lengthy discussions about remote fs change notifications > with no clear decisions of the best API for them: > https://lore.kernel.org/linux-fsdevel/20211025204634.2517-1-iangelak@redhat.com/ > > > > > > > and also I fixed its f_fsid recently: > > > 59cda49ecf6c shmem: allow reporting fanotify events with file handles on tmpfs > > > > Well thank you for that this has been very useful in userspace already > > I've been told. > > > > > > > > > At the risk of putting my foot in my mouth, what's stopping us from > > > > making them all support f_fsid? > > > > > > Nothing much. Jan had the same opinion [1]. > > > > I think that's what we should try to do without having thought too much > > about potential edge-cases. > > > > > > > > We could do either: > > > 1. use uuid_to_fsid() in vfs_statfs() if fs has set s_uuid and not set f_fsid > > > 2. use s_dev as f_fsid in vfs_statfs() if fs did not set f_fsid nor s_uuid > > > 3. randomize s_uuid for simple fs (like tmpfs) > > > 4. any combination of the above and more > > > > > > Note that we will also need to decide what to do with > > > name_to_handle_at() for those pseudo fs. > > > > Doing it on the fly during vfs_statfs() feels a bit messy and could > > cause bugs. One should never underestimate the possibility that there's > > some fs that somehow would get into trouble because of odd behavior. > > > > So switching each fs over to generate a s_uuid seems the prudent thing > > to do. Doing it the hard way also forces us to make sure that each > > filesystem can deal with this. > > > > It seems that for pseudo fses we can just allocate a new s_uuid for each > > instance. So each tmpfs instance - like your patch did - would just get > > a new s_uuid. > > > > For kernel internal filesystems - mostly those that use init_pseudo - > > the s_uuid would remain stable until the next reboot when it is > > regenerated. > > > > I am fine with opt-in for every fs as long as we do not duplicate > boilerplate code. > An FS_ flag could be a simple way to opt-in for this generic behavior. > > > Looking around just a little there's some block-backed fses like fat > > that have an f_fsid but no s_uuid. So if we give those s_uuid then it'll > > mean that the f_fsid isn't generated based on the s_uuid. That should be > > ok though and shouldn't matter to userspace. > > > > Afterwards we could probably lift the ext4 and xfs specific ioctls to > > retrieve the s_uuid into a generic ioctl to allow userspace to get the > > s_uuid. > > > > That's my thinking without having crawled to all possible corner > > cases... Also needs documenting that s_uuid is not optional anymore and > > explain the difference between pseudo and device-backed fses. I hope > > that's not completely naive... > > > > I don't think that the dichotomy of device-backed vs. pseudo is enough > to describe the situation. > > I think what needs to be better documented and annotated is what type > of fsnotify services can be expected to work on a given fs. You're looking at this solely from the angle of fanotify. In my earier message I was looking at this as something that is generally useful. Fanotify uses the s_uuid and f_fsid but they have value independent of this. > > Jan has already introduced FS_DISALLOW_NOTIFY_PERM to opt-out > of permission events (for procfs). That sounds like a decent solution. > > Perhaps this could be generalized to s_type->fs_notify_supported_events > or s_type->fs_notify_supported_features. > > For example, if an fs opts-in to FAN_REPORT_FID, then it gets an auto > allocated s_uuid and f_fsid if it did not fill them in fill_super and in statfs This appears a layering violation to me. The s_uuid should be allocated when the superblock is created just like tmpfs does it and not retroactively/lazily when fanotify on the filesystem is reported. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify 2023-04-20 7:46 ` Christian Brauner @ 2023-04-20 8:15 ` Amir Goldstein 0 siblings, 0 replies; 16+ messages in thread From: Amir Goldstein @ 2023-04-20 8:15 UTC (permalink / raw) To: Christian Brauner Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel, linux-api, Miklos Szeredi On Thu, Apr 20, 2023 at 10:46 AM Christian Brauner <brauner@kernel.org> wrote: > > On Thu, Apr 20, 2023 at 09:12:52AM +0300, Amir Goldstein wrote: > > On Wed, Apr 19, 2023 at 8:19 PM Christian Brauner <brauner@kernel.org> wrote: > > > > > > On Tue, Apr 18, 2023 at 06:20:22PM +0300, Amir Goldstein wrote: > > > > On Tue, Apr 18, 2023 at 5:12 PM Christian Brauner <brauner@kernel.org> wrote: > > > > > > > > > > On Tue, Apr 18, 2023 at 04:56:40PM +0300, Amir Goldstein wrote: > > > > > > On Tue, Apr 18, 2023 at 4:33 PM Christian Brauner <brauner@kernel.org> wrote: > > > > [...] > > > > > > > > Just thought of another reason: > > > > > > c) FAN_UNMOUNT does not need to require FAN_REPORT_FID > > > > > > so it does not depend on filesystem having a valid f_fsid nor > > > > > > exports_ops. In case of "pseudo" fs, FAN_UNMOUNT can report > > > > > > only MNTID record (I will amend the patch with this minor change). > > > > > > > > > > I see some pseudo fses generate f_fsid, e.g., tmpfs in mm/shmem.c > > > > > > > > tmpfs is not "pseudo" in my eyes, because it implements a great deal of the > > > > vfs interfaces, including export_ops. > > > > > > The term "pseudo" is somewhat well-defined though, no? It really just > > > means that there's no backing device associated with it. So for example, > > > anything that uses get_tree_nodev() including tmpfs. If erofs is > > > compiled with fscache support it's even a pseudo fs (TIL). > > > > > > > Ok, "pseudo fs" is an ambiguous term. > > > > For the sake of this discussion, let's refer to fs that use get_tree_nodev() > > "non-disk fs". > > > > But as far as fsnotify is concerned, tmpfs is equivalent to xfs, because > > all of the changes are made by users via vfs. > > > > Let's call fs where changes can occur not via vfs "remote fs", those > > include the network fs and some "internal fs" like the kernfs class of fs > > and the "simple fs" class of fs (i.e. simple_fill_super). > > > > With all the remote fs, the behavior of fsnotify is (and has always been) > > undefined, that is, you can use inotify to subscribe for events and you > > never know what you will get when changes are not made via vfs. > > > > Some people (hypothetical) may expect to watch nsfs for dying ns > > and may be disappointed to find out that they do not get the desired > > IN_DELETE event. > > > > We have had lengthy discussions about remote fs change notifications > > with no clear decisions of the best API for them: > > https://lore.kernel.org/linux-fsdevel/20211025204634.2517-1-iangelak@redhat.com/ > > > > > > > > > > and also I fixed its f_fsid recently: > > > > 59cda49ecf6c shmem: allow reporting fanotify events with file handles on tmpfs > > > > > > Well thank you for that this has been very useful in userspace already > > > I've been told. > > > > > > > > > > > > At the risk of putting my foot in my mouth, what's stopping us from > > > > > making them all support f_fsid? > > > > > > > > Nothing much. Jan had the same opinion [1]. > > > > > > I think that's what we should try to do without having thought too much > > > about potential edge-cases. > > > > > > > > > > > We could do either: > > > > 1. use uuid_to_fsid() in vfs_statfs() if fs has set s_uuid and not set f_fsid > > > > 2. use s_dev as f_fsid in vfs_statfs() if fs did not set f_fsid nor s_uuid > > > > 3. randomize s_uuid for simple fs (like tmpfs) > > > > 4. any combination of the above and more > > > > > > > > Note that we will also need to decide what to do with > > > > name_to_handle_at() for those pseudo fs. > > > > > > Doing it on the fly during vfs_statfs() feels a bit messy and could > > > cause bugs. One should never underestimate the possibility that there's > > > some fs that somehow would get into trouble because of odd behavior. > > > > > > So switching each fs over to generate a s_uuid seems the prudent thing > > > to do. Doing it the hard way also forces us to make sure that each > > > filesystem can deal with this. > > > > > > It seems that for pseudo fses we can just allocate a new s_uuid for each > > > instance. So each tmpfs instance - like your patch did - would just get > > > a new s_uuid. > > > > > > For kernel internal filesystems - mostly those that use init_pseudo - > > > the s_uuid would remain stable until the next reboot when it is > > > regenerated. > > > > > > > I am fine with opt-in for every fs as long as we do not duplicate > > boilerplate code. > > An FS_ flag could be a simple way to opt-in for this generic behavior. > > > > > Looking around just a little there's some block-backed fses like fat > > > that have an f_fsid but no s_uuid. So if we give those s_uuid then it'll > > > mean that the f_fsid isn't generated based on the s_uuid. That should be > > > ok though and shouldn't matter to userspace. > > > > > > Afterwards we could probably lift the ext4 and xfs specific ioctls to > > > retrieve the s_uuid into a generic ioctl to allow userspace to get the > > > s_uuid. > > > > > > That's my thinking without having crawled to all possible corner > > > cases... Also needs documenting that s_uuid is not optional anymore and > > > explain the difference between pseudo and device-backed fses. I hope > > > that's not completely naive... > > > > > > > I don't think that the dichotomy of device-backed vs. pseudo is enough > > to describe the situation. > > > > I think what needs to be better documented and annotated is what type > > of fsnotify services can be expected to work on a given fs. > > You're looking at this solely from the angle of fanotify. In my earier > message I was looking at this as something that is generally useful. > Fanotify uses the s_uuid and f_fsid but they have value independent of > this. > Right. Overlayfs to name another internal user of s_uuid. and it would be useful for exportfs. Currently, userspace needs to workaround this by assigning fsid= manually in /etc/exports or by querying the uuid of the blockdev within libblkid. > > > > Jan has already introduced FS_DISALLOW_NOTIFY_PERM to opt-out > > of permission events (for procfs). > > That sounds like a decent solution. > > > > > Perhaps this could be generalized to s_type->fs_notify_supported_events > > or s_type->fs_notify_supported_features. > > > > For example, if an fs opts-in to FAN_REPORT_FID, then it gets an auto > > allocated s_uuid and f_fsid if it did not fill them in fill_super and in statfs > > This appears a layering violation to me. The s_uuid should be allocated > when the superblock is created just like tmpfs does it and not > retroactively/lazily when fanotify on the filesystem is reported. That's not what I meant. Extending the NFS file handles to fs object identification in a generic concept - it does not serve only fanotify. For fanotify, I deliberately chose to report object information that is available to userspace via other UAPIs. What I meant was: * filesystem can opt-in for exporting file id's to userspace, either with .fs_flags = FS_EXPORT_FID or with a new export_op method as Jan suggested, e.g.: export_op.encode_fid = generic_encode_fid64; With this opt-in, filesystem objects are more uniquely identified, so: * vfs enforces non empty s_uuid and non empty f_fsid * name_to_handle_at(....,AT_HANDLE_FID) returns a file handle for ID purpose that may (or may not) be useful in open_by_handle_at() even for fs that do not support NFS export * fanotify allows FAN_ERPORT_FID even with the absence of NFS export support I think we are all thinking more or less about the same solution. This is the time for me to stop writing emails and start writing patches so we have a better ground for discussion on the details... Thanks, Amir. ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-04-20 8:15 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-04-14 18:29 [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify Amir Goldstein 2023-04-14 18:29 ` [RFC][PATCH 1/2] fanotify: add support for FAN_UNMOUNT event Amir Goldstein 2023-04-18 13:43 ` Christian Brauner 2023-04-19 13:14 ` Jan Kara 2023-04-19 18:32 ` Amir Goldstein 2023-04-14 18:29 ` [RFC][PATCH 2/2] fanotify: report mntid info record with FAN_UNMOUNT events Amir Goldstein 2023-04-17 8:38 ` Amir Goldstein 2023-04-19 13:30 ` Jan Kara 2023-04-18 13:33 ` [RFC][PATCH 0/2] Monitoring unmounted fs with fanotify Christian Brauner 2023-04-18 13:56 ` Amir Goldstein 2023-04-18 14:12 ` Christian Brauner 2023-04-18 15:20 ` Amir Goldstein 2023-04-19 17:19 ` Christian Brauner 2023-04-20 6:12 ` Amir Goldstein 2023-04-20 7:46 ` Christian Brauner 2023-04-20 8:15 ` Amir Goldstein
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).