linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] Volatile fanotify marks
@ 2022-03-07 15:57 Amir Goldstein
  2022-03-07 15:57 ` [PATCH 1/5] fsnotify: move inotify control flags to mark flags Amir Goldstein
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Amir Goldstein @ 2022-03-07 15:57 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

Jan,

Following RFC discussion [1], following are the volatile mark patches.

Tested both manually and with this LTP test [2].
I was struggling with this test for a while because drop caches
did not get rid of the un-pinned inode when test was run with
ext2 or ext4 on my test VM. With xfs, the test works fine for me,
but it may not work for everyone.

Perhaps you have a suggestion for a better way to test inode eviction.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/CAOQ4uxiRDpuS=2uA6+ZUM7yG9vVU-u212tkunBmSnP_u=mkv=Q@mail.gmail.com/
[2] https://github.com/amir73il/ltp/commits/fan_volatile

Amir Goldstein (5):
  fsnotify: move inotify control flags to mark flags
  fsnotify: pass flags argument to fsnotify_add_mark()
  fsnotify: allow adding an inode mark without pinning inode
  fanotify: add support for exclusive create of mark
  fanotify: add support for "volatile" inode marks

 fs/notify/fanotify/fanotify_user.c   | 32 +++++++++--
 fs/notify/fsnotify.c                 |  4 +-
 fs/notify/inotify/inotify_fsnotify.c |  2 +-
 fs/notify/inotify/inotify_user.c     | 40 +++++++++-----
 fs/notify/mark.c                     | 83 +++++++++++++++++++++++-----
 include/linux/fanotify.h             |  9 ++-
 include/linux/fsnotify_backend.h     | 32 ++++++-----
 include/uapi/linux/fanotify.h        |  2 +
 kernel/audit_fsnotify.c              |  3 +-
 9 files changed, 151 insertions(+), 56 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/5] fsnotify: move inotify control flags to mark flags
  2022-03-07 15:57 [PATCH 0/5] Volatile fanotify marks Amir Goldstein
@ 2022-03-07 15:57 ` Amir Goldstein
  2022-03-17 14:25   ` Jan Kara
  2022-03-07 15:57 ` [PATCH 2/5] fsnotify: pass flags argument to fsnotify_add_mark() Amir Goldstein
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2022-03-07 15:57 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

The inotify control flags in the mark mask (e.g. FS_IN_ONE_SHOT) are not
relevant to object interest mask, so move them to the mark flags.

This frees up some bits in the object interest mask.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fsnotify.c                 |  4 +--
 fs/notify/inotify/inotify_fsnotify.c |  2 +-
 fs/notify/inotify/inotify_user.c     | 40 +++++++++++++++++-----------
 include/linux/fsnotify_backend.h     | 11 ++++----
 4 files changed, 34 insertions(+), 23 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 494f653efbc6..f5d2547e2411 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -255,7 +255,7 @@ static int fsnotify_handle_inode_event(struct fsnotify_group *group,
 	if (WARN_ON_ONCE(!inode && !dir))
 		return 0;
 
-	if ((inode_mark->mask & FS_EXCL_UNLINK) &&
+	if ((inode_mark->flags & FSNOTIFY_MARK_FLAG_EXCL_UNLINK) &&
 	    path && d_unlinked(path->dentry))
 		return 0;
 
@@ -583,7 +583,7 @@ static __init int fsnotify_init(void)
 {
 	int ret;
 
-	BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25);
+	BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 23);
 
 	ret = init_srcu_struct(&fsnotify_mark_srcu);
 	if (ret)
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index d92d7b0adc9a..49cfe2ae6d23 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -122,7 +122,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask,
 		fsnotify_destroy_event(group, fsn_event);
 	}
 
-	if (inode_mark->mask & IN_ONESHOT)
+	if (inode_mark->flags & FSNOTIFY_MARK_FLAG_IN_ONESHOT)
 		fsnotify_destroy_mark(inode_mark, group);
 
 	return 0;
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 54583f62dc44..324c49d26b53 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -110,11 +110,26 @@ static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg)
 		mask |= FS_EVENT_ON_CHILD;
 
 	/* mask off the flags used to open the fd */
-	mask |= (arg & (IN_ALL_EVENTS | IN_ONESHOT | IN_EXCL_UNLINK));
+	mask |= (arg & IN_ALL_EVENTS);
 
 	return mask;
 }
 
+#define INOTIFY_MARK_FLAGS \
+	(FSNOTIFY_MARK_FLAG_EXCL_UNLINK | FSNOTIFY_MARK_FLAG_IN_ONESHOT)
+
+static inline unsigned int inotify_arg_to_flags(u32 arg)
+{
+	unsigned int flags = 0;
+
+	if (arg & IN_EXCL_UNLINK)
+		flags |= FSNOTIFY_MARK_FLAG_EXCL_UNLINK;
+	if (arg & IN_ONESHOT)
+		flags |= FSNOTIFY_MARK_FLAG_IN_ONESHOT;
+
+	return flags;
+}
+
 static inline u32 inotify_mask_to_arg(__u32 mask)
 {
 	return mask & (IN_ALL_EVENTS | IN_ISDIR | IN_UNMOUNT | IN_IGNORED |
@@ -526,13 +541,10 @@ static int inotify_update_existing_watch(struct fsnotify_group *group,
 	struct fsnotify_mark *fsn_mark;
 	struct inotify_inode_mark *i_mark;
 	__u32 old_mask, new_mask;
-	__u32 mask;
-	int add = (arg & IN_MASK_ADD);
+	int replace = !(arg & IN_MASK_ADD);
 	int create = (arg & IN_MASK_CREATE);
 	int ret;
 
-	mask = inotify_arg_to_mask(inode, arg);
-
 	fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, group);
 	if (!fsn_mark)
 		return -ENOENT;
@@ -545,10 +557,12 @@ static int inotify_update_existing_watch(struct fsnotify_group *group,
 
 	spin_lock(&fsn_mark->lock);
 	old_mask = fsn_mark->mask;
-	if (add)
-		fsn_mark->mask |= mask;
-	else
-		fsn_mark->mask = mask;
+	if (replace) {
+		fsn_mark->mask = 0;
+		fsn_mark->flags &= ~INOTIFY_MARK_FLAGS;
+	}
+	fsn_mark->mask |= inotify_arg_to_mask(inode, arg);
+	fsn_mark->flags |= inotify_arg_to_flags(arg);
 	new_mask = fsn_mark->mask;
 	spin_unlock(&fsn_mark->lock);
 
@@ -579,19 +593,17 @@ static int inotify_new_watch(struct fsnotify_group *group,
 			     u32 arg)
 {
 	struct inotify_inode_mark *tmp_i_mark;
-	__u32 mask;
 	int ret;
 	struct idr *idr = &group->inotify_data.idr;
 	spinlock_t *idr_lock = &group->inotify_data.idr_lock;
 
-	mask = inotify_arg_to_mask(inode, arg);
-
 	tmp_i_mark = kmem_cache_alloc(inotify_inode_mark_cachep, GFP_KERNEL);
 	if (unlikely(!tmp_i_mark))
 		return -ENOMEM;
 
 	fsnotify_init_mark(&tmp_i_mark->fsn_mark, group);
-	tmp_i_mark->fsn_mark.mask = mask;
+	tmp_i_mark->fsn_mark.mask = inotify_arg_to_mask(inode, arg);
+	tmp_i_mark->fsn_mark.flags = inotify_arg_to_flags(arg);
 	tmp_i_mark->wd = -1;
 
 	ret = inotify_add_to_idr(idr, idr_lock, tmp_i_mark);
@@ -845,9 +857,7 @@ static int __init inotify_user_setup(void)
 	BUILD_BUG_ON(IN_UNMOUNT != FS_UNMOUNT);
 	BUILD_BUG_ON(IN_Q_OVERFLOW != FS_Q_OVERFLOW);
 	BUILD_BUG_ON(IN_IGNORED != FS_IN_IGNORED);
-	BUILD_BUG_ON(IN_EXCL_UNLINK != FS_EXCL_UNLINK);
 	BUILD_BUG_ON(IN_ISDIR != FS_ISDIR);
-	BUILD_BUG_ON(IN_ONESHOT != FS_IN_ONESHOT);
 
 	BUILD_BUG_ON(HWEIGHT32(ALL_INOTIFY_BITS) != 22);
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 0805b74cae44..0bd63608e935 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -55,7 +55,6 @@
 #define FS_ACCESS_PERM		0x00020000	/* access event in a permissions hook */
 #define FS_OPEN_EXEC_PERM	0x00040000	/* open/exec event in a permission hook */
 
-#define FS_EXCL_UNLINK		0x04000000	/* do not send events if object is unlinked */
 /*
  * Set on inode mark that cares about things that happen to its children.
  * Always set for dnotify and inotify.
@@ -66,7 +65,6 @@
 #define FS_RENAME		0x10000000	/* File was renamed */
 #define FS_DN_MULTISHOT		0x20000000	/* dnotify multishot */
 #define FS_ISDIR		0x40000000	/* event occurred against dir */
-#define FS_IN_ONESHOT		0x80000000	/* only send event once */
 
 #define FS_MOVE			(FS_MOVED_FROM | FS_MOVED_TO)
 
@@ -106,8 +104,7 @@
 			     FS_ERROR)
 
 /* Extra flags that may be reported with event or control handling of events */
-#define ALL_FSNOTIFY_FLAGS  (FS_EXCL_UNLINK | FS_ISDIR | FS_IN_ONESHOT | \
-			     FS_DN_MULTISHOT | FS_EVENT_ON_CHILD)
+#define ALL_FSNOTIFY_FLAGS  (FS_ISDIR | FS_EVENT_ON_CHILD | FS_DN_MULTISHOT)
 
 #define ALL_FSNOTIFY_BITS   (ALL_FSNOTIFY_EVENTS | ALL_FSNOTIFY_FLAGS)
 
@@ -473,9 +470,13 @@ struct fsnotify_mark {
 	struct fsnotify_mark_connector *connector;
 	/* Events types to ignore [mark->lock, group->mark_mutex] */
 	__u32 ignored_mask;
-#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY	0x01
+	/* Internal fsnotify flags */
 #define FSNOTIFY_MARK_FLAG_ALIVE		0x02
 #define FSNOTIFY_MARK_FLAG_ATTACHED		0x04
+	/* Backend controlled flags */
+#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY	0x10
+#define FSNOTIFY_MARK_FLAG_EXCL_UNLINK		0x20
+#define FSNOTIFY_MARK_FLAG_IN_ONESHOT		0x40
 	unsigned int flags;		/* flags [mark->lock] */
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/5] fsnotify: pass flags argument to fsnotify_add_mark()
  2022-03-07 15:57 [PATCH 0/5] Volatile fanotify marks Amir Goldstein
  2022-03-07 15:57 ` [PATCH 1/5] fsnotify: move inotify control flags to mark flags Amir Goldstein
@ 2022-03-07 15:57 ` Amir Goldstein
  2022-03-07 15:57 ` [PATCH 3/5] fsnotify: allow adding an inode mark without pinning inode Amir Goldstein
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 28+ messages in thread
From: Amir Goldstein @ 2022-03-07 15:57 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

Instead of passing the allow_dups argument to fsnotify_add_mark()
as a 'boolean' int, pass a generic flags argument and define the flag
FSNOTIFY_ADD_MARK_ALLOW_DUPS to express the old allow_dups meaning.

We are going to use the flags argument to pass more options to
to fsnotify_add_mark().

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/mark.c                 | 13 ++++++-------
 include/linux/fsnotify_backend.h | 18 +++++++++---------
 kernel/audit_fsnotify.c          |  3 ++-
 3 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 4853184f7dde..190df435919f 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -574,7 +574,7 @@ static struct fsnotify_mark_connector *fsnotify_grab_connector(
 static int fsnotify_add_mark_list(struct fsnotify_mark *mark,
 				  fsnotify_connp_t *connp,
 				  unsigned int obj_type,
-				  int allow_dups, __kernel_fsid_t *fsid)
+				  int flags, __kernel_fsid_t *fsid)
 {
 	struct fsnotify_mark *lmark, *last = NULL;
 	struct fsnotify_mark_connector *conn;
@@ -633,7 +633,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark,
 
 		if ((lmark->group == mark->group) &&
 		    (lmark->flags & FSNOTIFY_MARK_FLAG_ATTACHED) &&
-		    !allow_dups) {
+		    !(flags & FSNOTIFY_ADD_MARK_ALLOW_DUPS)) {
 			err = -EEXIST;
 			goto out_err;
 		}
@@ -668,7 +668,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark,
  */
 int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 			     fsnotify_connp_t *connp, unsigned int obj_type,
-			     int allow_dups, __kernel_fsid_t *fsid)
+			     int flags, __kernel_fsid_t *fsid)
 {
 	struct fsnotify_group *group = mark->group;
 	int ret = 0;
@@ -688,7 +688,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 	fsnotify_get_mark(mark); /* for g_list */
 	spin_unlock(&mark->lock);
 
-	ret = fsnotify_add_mark_list(mark, connp, obj_type, allow_dups, fsid);
+	ret = fsnotify_add_mark_list(mark, connp, obj_type, flags, fsid);
 	if (ret)
 		goto err;
 
@@ -708,14 +708,13 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 }
 
 int fsnotify_add_mark(struct fsnotify_mark *mark, fsnotify_connp_t *connp,
-		      unsigned int obj_type, int allow_dups,
-		      __kernel_fsid_t *fsid)
+		      unsigned int obj_type, int flags, __kernel_fsid_t *fsid)
 {
 	int ret;
 	struct fsnotify_group *group = mark->group;
 
 	mutex_lock(&group->mark_mutex);
-	ret = fsnotify_add_mark_locked(mark, connp, obj_type, allow_dups, fsid);
+	ret = fsnotify_add_mark_locked(mark, connp, obj_type, flags, fsid);
 	mutex_unlock(&group->mark_mutex);
 	return ret;
 }
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 0bd63608e935..0b548518b166 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -633,30 +633,30 @@ extern struct fsnotify_mark *fsnotify_find_mark(fsnotify_connp_t *connp,
 /* Get cached fsid of filesystem containing object */
 extern int fsnotify_get_conn_fsid(const struct fsnotify_mark_connector *conn,
 				  __kernel_fsid_t *fsid);
+
 /* attach the mark to the object */
+#define FSNOTIFY_ADD_MARK_ALLOW_DUPS	0x1
+
 extern int fsnotify_add_mark(struct fsnotify_mark *mark,
 			     fsnotify_connp_t *connp, unsigned int obj_type,
-			     int allow_dups, __kernel_fsid_t *fsid);
+			     int flags, __kernel_fsid_t *fsid);
 extern int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 				    fsnotify_connp_t *connp,
-				    unsigned int obj_type, int allow_dups,
+				    unsigned int obj_type, int flags,
 				    __kernel_fsid_t *fsid);
 
 /* attach the mark to the inode */
 static inline int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
-					  struct inode *inode,
-					  int allow_dups)
+					  struct inode *inode, int flags)
 {
 	return fsnotify_add_mark(mark, &inode->i_fsnotify_marks,
-				 FSNOTIFY_OBJ_TYPE_INODE, allow_dups, NULL);
+				 FSNOTIFY_OBJ_TYPE_INODE, flags, NULL);
 }
 static inline int fsnotify_add_inode_mark_locked(struct fsnotify_mark *mark,
-						 struct inode *inode,
-						 int allow_dups)
+						 struct inode *inode, int flags)
 {
 	return fsnotify_add_mark_locked(mark, &inode->i_fsnotify_marks,
-					FSNOTIFY_OBJ_TYPE_INODE, allow_dups,
-					NULL);
+					FSNOTIFY_OBJ_TYPE_INODE, flags, NULL);
 }
 
 /* given a group and a mark, flag mark to be freed when all references are dropped */
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
index 02348b48447c..82233f651c62 100644
--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -100,7 +100,8 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
 	audit_update_mark(audit_mark, dentry->d_inode);
 	audit_mark->rule = krule;
 
-	ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, true);
+	ret = fsnotify_add_inode_mark(&audit_mark->mark, inode,
+				      FSNOTIFY_ADD_MARK_ALLOW_DUPS);
 	if (ret < 0) {
 		fsnotify_put_mark(&audit_mark->mark);
 		audit_mark = ERR_PTR(ret);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 3/5] fsnotify: allow adding an inode mark without pinning inode
  2022-03-07 15:57 [PATCH 0/5] Volatile fanotify marks Amir Goldstein
  2022-03-07 15:57 ` [PATCH 1/5] fsnotify: move inotify control flags to mark flags Amir Goldstein
  2022-03-07 15:57 ` [PATCH 2/5] fsnotify: pass flags argument to fsnotify_add_mark() Amir Goldstein
@ 2022-03-07 15:57 ` Amir Goldstein
  2022-03-17 15:27   ` Jan Kara
  2022-03-07 15:57 ` [PATCH 4/5] fanotify: add support for exclusive create of mark Amir Goldstein
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2022-03-07 15:57 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

fsnotify_add_mark() and variants implicitly take a reference on inode
when attaching a mark to an inode.

Make that behavior opt-out with the flag FSNOTIFY_ADD_MARK_NO_IREF.

Instead of taking the inode reference when attaching connector to inode
and dropping the inode reference when detaching connector from inode,
take the inode reference on attach of the first mark that wants to hold
an inode reference and drop the inode reference on detach of the last
mark that wants to hold an inode reference.

This leaves the choice to the backend whether or not to pin the inode
when adding an inode mark.

This is intended to be used when adding a mark with ignored mask that is
used for optimization in cases where group can afford getting unneeded
events and reinstate the mark with ignored mask when inode is accessed
again after being evicted.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/mark.c                 | 70 ++++++++++++++++++++++++++++----
 include/linux/fsnotify_backend.h |  3 ++
 2 files changed, 65 insertions(+), 8 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 190df435919f..f71b6814bfa7 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -213,6 +213,17 @@ static void *fsnotify_detach_connector_from_object(
 	if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) {
 		inode = fsnotify_conn_inode(conn);
 		inode->i_fsnotify_mask = 0;
+
+		pr_debug("%s: inode=%p iref=%u sb_connectors=%lu icount=%u\n",
+			 __func__, inode, atomic_read(&conn->proxy_iref),
+			 atomic_long_read(&inode->i_sb->s_fsnotify_connectors),
+			 atomic_read(&inode->i_count));
+
+		/* Unpin inode when detaching from connector */
+		if (atomic_read(&conn->proxy_iref))
+			atomic_set(&conn->proxy_iref, 0);
+		else
+			inode = NULL;
 	} else if (conn->type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) {
 		fsnotify_conn_mount(conn)->mnt_fsnotify_mask = 0;
 	} else if (conn->type == FSNOTIFY_OBJ_TYPE_SB) {
@@ -240,12 +251,43 @@ static void fsnotify_final_mark_destroy(struct fsnotify_mark *mark)
 /* Drop object reference originally held by a connector */
 static void fsnotify_drop_object(unsigned int type, void *objp)
 {
+	struct inode *inode = objp;
+
 	if (!objp)
 		return;
 	/* Currently only inode references are passed to be dropped */
 	if (WARN_ON_ONCE(type != FSNOTIFY_OBJ_TYPE_INODE))
 		return;
-	fsnotify_put_inode_ref(objp);
+
+	pr_debug("%s: inode=%p sb_connectors=%lu, icount=%u\n", __func__,
+		 inode, atomic_long_read(&inode->i_sb->s_fsnotify_connectors),
+		 atomic_read(&inode->i_count));
+
+	fsnotify_put_inode_ref(inode);
+}
+
+/* Drop the proxy refcount on inode maintainted by connector */
+static struct inode *fsnotify_drop_iref(struct fsnotify_mark_connector *conn,
+					unsigned int *type)
+{
+	struct inode *inode = fsnotify_conn_inode(conn);
+
+	if (WARN_ON_ONCE(!inode || conn->type != FSNOTIFY_OBJ_TYPE_INODE))
+		return NULL;
+
+	pr_debug("%s: inode=%p iref=%u sb_connectors=%lu icount=%u\n",
+		 __func__, inode, atomic_read(&conn->proxy_iref),
+		 atomic_long_read(&inode->i_sb->s_fsnotify_connectors),
+		 atomic_read(&inode->i_count));
+
+	if (WARN_ON_ONCE(!atomic_read(&conn->proxy_iref)) ||
+	    !atomic_dec_and_test(&conn->proxy_iref))
+		return NULL;
+
+	fsnotify_put_inode_ref(inode);
+	*type = FSNOTIFY_OBJ_TYPE_INODE;
+
+	return inode;
 }
 
 void fsnotify_put_mark(struct fsnotify_mark *mark)
@@ -275,6 +317,9 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
 		free_conn = true;
 	} else {
 		__fsnotify_recalc_mask(conn);
+		/* Unpin inode on last mark that wants inode refcount held */
+		if (mark->flags & FSNOTIFY_MARK_FLAG_HAS_IREF)
+			objp = fsnotify_drop_iref(conn, &type);
 	}
 	WRITE_ONCE(mark->connector, NULL);
 	spin_unlock(&conn->lock);
@@ -499,7 +544,6 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp,
 					       unsigned int obj_type,
 					       __kernel_fsid_t *fsid)
 {
-	struct inode *inode = NULL;
 	struct fsnotify_mark_connector *conn;
 
 	conn = kmem_cache_alloc(fsnotify_mark_connector_cachep, GFP_KERNEL);
@@ -507,6 +551,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp,
 		return -ENOMEM;
 	spin_lock_init(&conn->lock);
 	INIT_HLIST_HEAD(&conn->list);
+	atomic_set(&conn->proxy_iref, 0);
 	conn->type = obj_type;
 	conn->obj = connp;
 	/* Cache fsid of filesystem containing the object */
@@ -517,10 +562,6 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp,
 		conn->fsid.val[0] = conn->fsid.val[1] = 0;
 		conn->flags = 0;
 	}
-	if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) {
-		inode = fsnotify_conn_inode(conn);
-		fsnotify_get_inode_ref(inode);
-	}
 	fsnotify_get_sb_connectors(conn);
 
 	/*
@@ -529,8 +570,6 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp,
 	 */
 	if (cmpxchg(connp, NULL, conn)) {
 		/* Someone else created list structure for us */
-		if (inode)
-			fsnotify_put_inode_ref(inode);
 		fsnotify_put_sb_connectors(conn);
 		kmem_cache_free(fsnotify_mark_connector_cachep, conn);
 	}
@@ -649,6 +688,21 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark,
 	/* mark should be the last entry.  last is the current last entry */
 	hlist_add_behind_rcu(&mark->obj_list, &last->obj_list);
 added:
+	/* Pin inode on first mark that wants inode refcount held */
+	if (conn->type == FSNOTIFY_OBJ_TYPE_INODE &&
+	    !(flags & FSNOTIFY_ADD_MARK_NO_IREF)) {
+		struct inode *inode = fsnotify_conn_inode(conn);
+
+		mark->flags |= FSNOTIFY_MARK_FLAG_HAS_IREF;
+		if (atomic_inc_return(&conn->proxy_iref) == 1)
+			fsnotify_get_inode_ref(inode);
+
+		pr_debug("%s: inode=%p iref=%u sb_connectors=%lu icount=%u\n",
+			 __func__, inode, atomic_read(&conn->proxy_iref),
+			 atomic_long_read(&inode->i_sb->s_fsnotify_connectors),
+			 atomic_read(&inode->i_count));
+	}
+
 	/*
 	 * Since connector is attached to object using cmpxchg() we are
 	 * guaranteed that connector initialization is fully visible by anyone
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 0b548518b166..de718864c5f5 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -425,6 +425,7 @@ struct fsnotify_mark_connector {
 	unsigned short type;	/* Type of object [lock] */
 #define FSNOTIFY_CONN_FLAG_HAS_FSID	0x01
 	unsigned short flags;	/* flags [lock] */
+	atomic_t proxy_iref;	/* marks that want inode reference held */
 	__kernel_fsid_t fsid;	/* fsid of filesystem containing object */
 	union {
 		/* Object pointer [lock] */
@@ -471,6 +472,7 @@ struct fsnotify_mark {
 	/* Events types to ignore [mark->lock, group->mark_mutex] */
 	__u32 ignored_mask;
 	/* Internal fsnotify flags */
+#define FSNOTIFY_MARK_FLAG_HAS_IREF		0x01
 #define FSNOTIFY_MARK_FLAG_ALIVE		0x02
 #define FSNOTIFY_MARK_FLAG_ATTACHED		0x04
 	/* Backend controlled flags */
@@ -636,6 +638,7 @@ extern int fsnotify_get_conn_fsid(const struct fsnotify_mark_connector *conn,
 
 /* attach the mark to the object */
 #define FSNOTIFY_ADD_MARK_ALLOW_DUPS	0x1
+#define FSNOTIFY_ADD_MARK_NO_IREF	0x2
 
 extern int fsnotify_add_mark(struct fsnotify_mark *mark,
 			     fsnotify_connp_t *connp, unsigned int obj_type,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 4/5] fanotify: add support for exclusive create of mark
  2022-03-07 15:57 [PATCH 0/5] Volatile fanotify marks Amir Goldstein
                   ` (2 preceding siblings ...)
  2022-03-07 15:57 ` [PATCH 3/5] fsnotify: allow adding an inode mark without pinning inode Amir Goldstein
@ 2022-03-07 15:57 ` Amir Goldstein
  2022-03-17 15:34   ` Jan Kara
  2022-03-07 15:57 ` [PATCH 5/5] fanotify: add support for "volatile" inode marks Amir Goldstein
  2022-03-17 14:12 ` [PATCH 0/5] Volatile fanotify marks Jan Kara
  5 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2022-03-07 15:57 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

Similar to inotify's IN_MARK_CREATE, adding an fanotify mark with flag
FAN_MARK_CREATE will fail with error EEXIST if an fanotify mark already
exists on the object.

Unlike inotify's IN_MARK_CREATE, FAN_MARK_CREATE has to supplied in
combination with FAN_MARK_ADD (FAN_MARK_ADD is like inotify_add_watch()
and the behavior of IN_MARK_ADD is the default for fanotify_mark()).

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify_user.c | 13 ++++++++++---
 include/linux/fanotify.h           |  8 +++++---
 include/uapi/linux/fanotify.h      |  1 +
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9b32b76a9c30..99c5ced6abd8 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1185,6 +1185,9 @@ static int fanotify_add_mark(struct fsnotify_group *group,
 			mutex_unlock(&group->mark_mutex);
 			return PTR_ERR(fsn_mark);
 		}
+	} else if (flags & FAN_MARK_CREATE) {
+		ret = -EEXIST;
+		goto out;
 	}
 
 	/*
@@ -1510,6 +1513,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	__kernel_fsid_t __fsid, *fsid = NULL;
 	u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS;
 	unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS;
+	unsigned int mark_cmd = flags & FANOTIFY_MARK_CMD_BITS;
 	bool ignored = flags & FAN_MARK_IGNORED_MASK;
 	unsigned int obj_type, fid_mode;
 	u32 umask = 0;
@@ -1539,7 +1543,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 		return -EINVAL;
 	}
 
-	switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE | FAN_MARK_FLUSH)) {
+	if (flags & FAN_MARK_CREATE && mark_cmd != FAN_MARK_ADD)
+		return -EINVAL;
+
+	switch (mark_cmd) {
 	case FAN_MARK_ADD:
 	case FAN_MARK_REMOVE:
 		if (!mask)
@@ -1671,7 +1678,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	}
 
 	/* create/update an inode mark */
-	switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE)) {
+	switch (mark_cmd) {
 	case FAN_MARK_ADD:
 		if (mark_type == FAN_MARK_MOUNT)
 			ret = fanotify_add_vfsmount_mark(group, mnt, mask,
@@ -1749,7 +1756,7 @@ static int __init fanotify_user_setup(void)
 
 	BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS);
 	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 12);
-	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
+	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 10);
 
 	fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
 					 SLAB_PANIC|SLAB_ACCOUNT);
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 419cadcd7ff5..780f4b17d4c9 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -59,14 +59,16 @@
 #define FANOTIFY_MARK_TYPE_BITS	(FAN_MARK_INODE | FAN_MARK_MOUNT | \
 				 FAN_MARK_FILESYSTEM)
 
+#define FANOTIFY_MARK_CMD_BITS	(FAN_MARK_ADD | FAN_MARK_REMOVE | \
+				 FAN_MARK_FLUSH)
+
 #define FANOTIFY_MARK_FLAGS	(FANOTIFY_MARK_TYPE_BITS | \
-				 FAN_MARK_ADD | \
-				 FAN_MARK_REMOVE | \
+				 FANOTIFY_MARK_CMD_BITS | \
 				 FAN_MARK_DONT_FOLLOW | \
 				 FAN_MARK_ONLYDIR | \
 				 FAN_MARK_IGNORED_MASK | \
 				 FAN_MARK_IGNORED_SURV_MODIFY | \
-				 FAN_MARK_FLUSH)
+				 FAN_MARK_CREATE)
 
 /*
  * Events that can be reported with data type FSNOTIFY_EVENT_PATH.
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index e8ac38cc2fd6..c41feac21fe9 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -82,6 +82,7 @@
 #define FAN_MARK_IGNORED_SURV_MODIFY	0x00000040
 #define FAN_MARK_FLUSH		0x00000080
 /* FAN_MARK_FILESYSTEM is	0x00000100 */
+#define FAN_MARK_CREATE		0x00000200
 
 /* These are NOT bitwise flags.  Both bits can be used togther.  */
 #define FAN_MARK_INODE		0x00000000
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 5/5] fanotify: add support for "volatile" inode marks
  2022-03-07 15:57 [PATCH 0/5] Volatile fanotify marks Amir Goldstein
                   ` (3 preceding siblings ...)
  2022-03-07 15:57 ` [PATCH 4/5] fanotify: add support for exclusive create of mark Amir Goldstein
@ 2022-03-07 15:57 ` Amir Goldstein
  2022-03-17 14:12 ` [PATCH 0/5] Volatile fanotify marks Jan Kara
  5 siblings, 0 replies; 28+ messages in thread
From: Amir Goldstein @ 2022-03-07 15:57 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

When an inode mark is created with flag FAN_MARK_VOLATILE, it will not
pin the marked inode to inode cache, so when inode is evicted from cache
due to memory pressure, the mark will be lost.

Volatile inode marks can be used to setup inode marks with ignored mask
to suppress events from uninteresting files or directories in a lazy
manner, upon receiving the first event, without having to iterate all
the uninteresting files or directories before hand.

The volatile inode mark feature allows performing this lazy marks setup
without exhausting the system memory with pinned inodes.

Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiRDpuS=2uA6+ZUM7yG9vVU-u212tkunBmSnP_u=mkv=Q@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify_user.c | 21 ++++++++++++++++++---
 include/linux/fanotify.h           |  1 +
 include/uapi/linux/fanotify.h      |  1 +
 3 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 99c5ced6abd8..6e9e4020ef40 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1122,11 +1122,14 @@ static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
 static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
 						   fsnotify_connp_t *connp,
 						   unsigned int obj_type,
+						   unsigned int fan_flags,
 						   __kernel_fsid_t *fsid)
 {
 	struct ucounts *ucounts = group->fanotify_data.ucounts;
 	struct fsnotify_mark *mark;
 	int ret;
+	unsigned int fsn_flags = (fan_flags & FAN_MARK_VOLATILE) ?
+				 FSNOTIFY_ADD_MARK_NO_IREF : 0;
 
 	/*
 	 * Enforce per user marks limits per user in all containing user ns.
@@ -1144,7 +1147,7 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
 	}
 
 	fsnotify_init_mark(mark, group);
-	ret = fsnotify_add_mark_locked(mark, connp, obj_type, 0, fsid);
+	ret = fsnotify_add_mark_locked(mark, connp, obj_type, fsn_flags, fsid);
 	if (ret) {
 		fsnotify_put_mark(mark);
 		goto out_dec_ucounts;
@@ -1180,7 +1183,8 @@ static int fanotify_add_mark(struct fsnotify_group *group,
 	mutex_lock(&group->mark_mutex);
 	fsn_mark = fsnotify_find_mark(connp, group);
 	if (!fsn_mark) {
-		fsn_mark = fanotify_add_new_mark(group, connp, obj_type, fsid);
+		fsn_mark = fanotify_add_new_mark(group, connp, obj_type, flags,
+						 fsid);
 		if (IS_ERR(fsn_mark)) {
 			mutex_unlock(&group->mark_mutex);
 			return PTR_ERR(fsn_mark);
@@ -1604,6 +1608,17 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	    mark_type != FAN_MARK_FILESYSTEM)
 		goto fput_and_out;
 
+	/*
+	 * Volatile is only relevant for inode, because only inode object can be
+	 * evicted on memory pressure.  Inode is pinned when attaching the mark
+	 * to the inode, so require the FAN_MARK_CREATE flag to make sure that
+	 * we are not updating an existing mark on a pinned inode.
+	 */
+	if (flags & FAN_MARK_VOLATILE &&
+	    (!(flags & FAN_MARK_CREATE) ||
+	     mark_type != FAN_MARK_INODE))
+		goto fput_and_out;
+
 	/*
 	 * Events that do not carry enough information to report
 	 * event->fd require a group that supports reporting fid.  Those
@@ -1756,7 +1771,7 @@ static int __init fanotify_user_setup(void)
 
 	BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS);
 	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 12);
-	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 10);
+	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 11);
 
 	fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
 					 SLAB_PANIC|SLAB_ACCOUNT);
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 780f4b17d4c9..bf88c547d93f 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -68,6 +68,7 @@
 				 FAN_MARK_ONLYDIR | \
 				 FAN_MARK_IGNORED_MASK | \
 				 FAN_MARK_IGNORED_SURV_MODIFY | \
+				 FAN_MARK_VOLATILE | \
 				 FAN_MARK_CREATE)
 
 /*
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index c41feac21fe9..1a67e6be994e 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -83,6 +83,7 @@
 #define FAN_MARK_FLUSH		0x00000080
 /* FAN_MARK_FILESYSTEM is	0x00000100 */
 #define FAN_MARK_CREATE		0x00000200
+#define FAN_MARK_VOLATILE	0x00000400
 
 /* These are NOT bitwise flags.  Both bits can be used togther.  */
 #define FAN_MARK_INODE		0x00000000
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5] Volatile fanotify marks
  2022-03-07 15:57 [PATCH 0/5] Volatile fanotify marks Amir Goldstein
                   ` (4 preceding siblings ...)
  2022-03-07 15:57 ` [PATCH 5/5] fanotify: add support for "volatile" inode marks Amir Goldstein
@ 2022-03-17 14:12 ` Jan Kara
  2022-03-17 15:14   ` Amir Goldstein
  5 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2022-03-17 14:12 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel

On Mon 07-03-22 17:57:36, Amir Goldstein wrote:
> Jan,
> 
> Following RFC discussion [1], following are the volatile mark patches.
> 
> Tested both manually and with this LTP test [2].
> I was struggling with this test for a while because drop caches
> did not get rid of the un-pinned inode when test was run with
> ext2 or ext4 on my test VM. With xfs, the test works fine for me,
> but it may not work for everyone.
> 
> Perhaps you have a suggestion for a better way to test inode eviction.

Drop caches does not evict dirty inodes. The inode is likely dirty because
you have chmodded it just before drop caches. So I think calling sync or
syncfs before dropping caches should fix your problems with ext2 / ext4.  I
suspect this has worked for XFS only because it does its private inode
dirtiness tracking and keeps the inode behind VFS's back.

								Honza

> 
> Thanks,
> Amir.
> 
> [1] https://lore.kernel.org/linux-fsdevel/CAOQ4uxiRDpuS=2uA6+ZUM7yG9vVU-u212tkunBmSnP_u=mkv=Q@mail.gmail.com/
> [2] https://github.com/amir73il/ltp/commits/fan_volatile
> 
> Amir Goldstein (5):
>   fsnotify: move inotify control flags to mark flags
>   fsnotify: pass flags argument to fsnotify_add_mark()
>   fsnotify: allow adding an inode mark without pinning inode
>   fanotify: add support for exclusive create of mark
>   fanotify: add support for "volatile" inode marks
> 
>  fs/notify/fanotify/fanotify_user.c   | 32 +++++++++--
>  fs/notify/fsnotify.c                 |  4 +-
>  fs/notify/inotify/inotify_fsnotify.c |  2 +-
>  fs/notify/inotify/inotify_user.c     | 40 +++++++++-----
>  fs/notify/mark.c                     | 83 +++++++++++++++++++++++-----
>  include/linux/fanotify.h             |  9 ++-
>  include/linux/fsnotify_backend.h     | 32 ++++++-----
>  include/uapi/linux/fanotify.h        |  2 +
>  kernel/audit_fsnotify.c              |  3 +-
>  9 files changed, 151 insertions(+), 56 deletions(-)
> 
> -- 
> 2.25.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/5] fsnotify: move inotify control flags to mark flags
  2022-03-07 15:57 ` [PATCH 1/5] fsnotify: move inotify control flags to mark flags Amir Goldstein
@ 2022-03-17 14:25   ` Jan Kara
  2022-03-17 15:12     ` Amir Goldstein
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2022-03-17 14:25 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel

On Mon 07-03-22 17:57:37, Amir Goldstein wrote:
> The inotify control flags in the mark mask (e.g. FS_IN_ONE_SHOT) are not
> relevant to object interest mask, so move them to the mark flags.
> 
> This frees up some bits in the object interest mask.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Looks good. I'm just wondering: Can we treat FS_DN_MULTISHOT in a similar way?

									Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/5] fsnotify: move inotify control flags to mark flags
  2022-03-17 14:25   ` Jan Kara
@ 2022-03-17 15:12     ` Amir Goldstein
  0 siblings, 0 replies; 28+ messages in thread
From: Amir Goldstein @ 2022-03-17 15:12 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

On Thu, Mar 17, 2022 at 4:25 PM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 07-03-22 17:57:37, Amir Goldstein wrote:
> > The inotify control flags in the mark mask (e.g. FS_IN_ONE_SHOT) are not
> > relevant to object interest mask, so move them to the mark flags.
> >
> > This frees up some bits in the object interest mask.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> Looks good. I'm just wondering: Can we treat FS_DN_MULTISHOT in a similar way?

Probably, but it is less straightforward. Need to add dn_flags to
struct  dnotify_struct.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5] Volatile fanotify marks
  2022-03-17 14:12 ` [PATCH 0/5] Volatile fanotify marks Jan Kara
@ 2022-03-17 15:14   ` Amir Goldstein
  2022-03-20 12:54     ` Amir Goldstein
  0 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2022-03-17 15:14 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

On Thu, Mar 17, 2022 at 4:12 PM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 07-03-22 17:57:36, Amir Goldstein wrote:
> > Jan,
> >
> > Following RFC discussion [1], following are the volatile mark patches.
> >
> > Tested both manually and with this LTP test [2].
> > I was struggling with this test for a while because drop caches
> > did not get rid of the un-pinned inode when test was run with
> > ext2 or ext4 on my test VM. With xfs, the test works fine for me,
> > but it may not work for everyone.
> >
> > Perhaps you have a suggestion for a better way to test inode eviction.
>
> Drop caches does not evict dirty inodes. The inode is likely dirty because
> you have chmodded it just before drop caches. So I think calling sync or
> syncfs before dropping caches should fix your problems with ext2 / ext4.  I
> suspect this has worked for XFS only because it does its private inode
> dirtiness tracking and keeps the inode behind VFS's back.

I did think of that and tried to fsync which did not help, but maybe
I messed it up somehow.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5] fsnotify: allow adding an inode mark without pinning inode
  2022-03-07 15:57 ` [PATCH 3/5] fsnotify: allow adding an inode mark without pinning inode Amir Goldstein
@ 2022-03-17 15:27   ` Jan Kara
  2022-03-18  6:23     ` Amir Goldstein
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2022-03-17 15:27 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel

On Mon 07-03-22 17:57:39, Amir Goldstein wrote:
> fsnotify_add_mark() and variants implicitly take a reference on inode
> when attaching a mark to an inode.
> 
> Make that behavior opt-out with the flag FSNOTIFY_ADD_MARK_NO_IREF.
> 
> Instead of taking the inode reference when attaching connector to inode
> and dropping the inode reference when detaching connector from inode,
> take the inode reference on attach of the first mark that wants to hold
> an inode reference and drop the inode reference on detach of the last
> mark that wants to hold an inode reference.
> 
> This leaves the choice to the backend whether or not to pin the inode
> when adding an inode mark.
> 
> This is intended to be used when adding a mark with ignored mask that is
> used for optimization in cases where group can afford getting unneeded
> events and reinstate the mark with ignored mask when inode is accessed
> again after being evicted.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Couple of notes below.

> diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> index 190df435919f..f71b6814bfa7 100644
> --- a/fs/notify/mark.c
> +++ b/fs/notify/mark.c
> @@ -213,6 +213,17 @@ static void *fsnotify_detach_connector_from_object(
>  	if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) {
>  		inode = fsnotify_conn_inode(conn);
>  		inode->i_fsnotify_mask = 0;
> +
> +		pr_debug("%s: inode=%p iref=%u sb_connectors=%lu icount=%u\n",
> +			 __func__, inode, atomic_read(&conn->proxy_iref),
> +			 atomic_long_read(&inode->i_sb->s_fsnotify_connectors),
> +			 atomic_read(&inode->i_count));

Are these pr_debug() prints that useful? My experience is they get rarely used
after the code is debugged... If you think some places are useful longer
term, tracepoints are probably easier to use these days?

> +
> +		/* Unpin inode when detaching from connector */
> +		if (atomic_read(&conn->proxy_iref))
> +			atomic_set(&conn->proxy_iref, 0);
> +		else
> +			inode = NULL;

proxy_iref is always manipulated under conn->lock so there's no need for
atomic operations here.

>  	} else if (conn->type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) {
>  		fsnotify_conn_mount(conn)->mnt_fsnotify_mask = 0;
>  	} else if (conn->type == FSNOTIFY_OBJ_TYPE_SB) {
> @@ -240,12 +251,43 @@ static void fsnotify_final_mark_destroy(struct fsnotify_mark *mark)
>  /* Drop object reference originally held by a connector */
>  static void fsnotify_drop_object(unsigned int type, void *objp)
>  {
> +	struct inode *inode = objp;
> +
>  	if (!objp)
>  		return;
>  	/* Currently only inode references are passed to be dropped */
>  	if (WARN_ON_ONCE(type != FSNOTIFY_OBJ_TYPE_INODE))
>  		return;
> -	fsnotify_put_inode_ref(objp);
> +
> +	pr_debug("%s: inode=%p sb_connectors=%lu, icount=%u\n", __func__,
> +		 inode, atomic_long_read(&inode->i_sb->s_fsnotify_connectors),
> +		 atomic_read(&inode->i_count));
> +
> +	fsnotify_put_inode_ref(inode);
> +}
> +
> +/* Drop the proxy refcount on inode maintainted by connector */
> +static struct inode *fsnotify_drop_iref(struct fsnotify_mark_connector *conn,
> +					unsigned int *type)
> +{
> +	struct inode *inode = fsnotify_conn_inode(conn);
> +
> +	if (WARN_ON_ONCE(!inode || conn->type != FSNOTIFY_OBJ_TYPE_INODE))
> +		return NULL;
> +
> +	pr_debug("%s: inode=%p iref=%u sb_connectors=%lu icount=%u\n",
> +		 __func__, inode, atomic_read(&conn->proxy_iref),
> +		 atomic_long_read(&inode->i_sb->s_fsnotify_connectors),
> +		 atomic_read(&inode->i_count));
> +
> +	if (WARN_ON_ONCE(!atomic_read(&conn->proxy_iref)) ||
> +	    !atomic_dec_and_test(&conn->proxy_iref))
> +		return NULL;
> +
> +	fsnotify_put_inode_ref(inode);

You cannot call fsnotify_put_inode_ref() here because the function is
called under conn->lock and iput() can sleep... You need to play similar
game with passing inode pointer like
fsnotify_detach_connector_from_object() does.

> +	*type = FSNOTIFY_OBJ_TYPE_INODE;
> +
> +	return inode;
>  }
>  
>  void fsnotify_put_mark(struct fsnotify_mark *mark)
> @@ -275,6 +317,9 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
>  		free_conn = true;
>  	} else {
>  		__fsnotify_recalc_mask(conn);
> +		/* Unpin inode on last mark that wants inode refcount held */
> +		if (mark->flags & FSNOTIFY_MARK_FLAG_HAS_IREF)
> +			objp = fsnotify_drop_iref(conn, &type);
>  	}

This is going to be interesting. What if the connector got detached from
the inode before fsnotify_put_mark() was called? Then iref_proxy would be
already 0 and we would barf? I think
fsnotify_detach_connector_from_object() needs to drop inode reference but
leave iref_proxy alone for this to work. fsnotify_drop_iref() would then
drop inode reference only if iref_proxy reaches 0 and conn->objp != NULL...


>  	WRITE_ONCE(mark->connector, NULL);
>  	spin_unlock(&conn->lock);

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] fanotify: add support for exclusive create of mark
  2022-03-07 15:57 ` [PATCH 4/5] fanotify: add support for exclusive create of mark Amir Goldstein
@ 2022-03-17 15:34   ` Jan Kara
  2022-03-17 15:45     ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2022-03-17 15:34 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel

On Mon 07-03-22 17:57:40, Amir Goldstein wrote:
> Similar to inotify's IN_MARK_CREATE, adding an fanotify mark with flag
> FAN_MARK_CREATE will fail with error EEXIST if an fanotify mark already
> exists on the object.
> 
> Unlike inotify's IN_MARK_CREATE, FAN_MARK_CREATE has to supplied in
> combination with FAN_MARK_ADD (FAN_MARK_ADD is like inotify_add_watch()
> and the behavior of IN_MARK_ADD is the default for fanotify_mark()).
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

What I'm missing in this changelog is "why". Is it just about feature
parity with inotify? I don't find this feature particularly useful...

								Honza

> ---
>  fs/notify/fanotify/fanotify_user.c | 13 ++++++++++---
>  include/linux/fanotify.h           |  8 +++++---
>  include/uapi/linux/fanotify.h      |  1 +
>  3 files changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 9b32b76a9c30..99c5ced6abd8 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -1185,6 +1185,9 @@ static int fanotify_add_mark(struct fsnotify_group *group,
>  			mutex_unlock(&group->mark_mutex);
>  			return PTR_ERR(fsn_mark);
>  		}
> +	} else if (flags & FAN_MARK_CREATE) {
> +		ret = -EEXIST;
> +		goto out;
>  	}
>  
>  	/*
> @@ -1510,6 +1513,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
>  	__kernel_fsid_t __fsid, *fsid = NULL;
>  	u32 valid_mask = FANOTIFY_EVENTS | FANOTIFY_EVENT_FLAGS;
>  	unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS;
> +	unsigned int mark_cmd = flags & FANOTIFY_MARK_CMD_BITS;
>  	bool ignored = flags & FAN_MARK_IGNORED_MASK;
>  	unsigned int obj_type, fid_mode;
>  	u32 umask = 0;
> @@ -1539,7 +1543,10 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
>  		return -EINVAL;
>  	}
>  
> -	switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE | FAN_MARK_FLUSH)) {
> +	if (flags & FAN_MARK_CREATE && mark_cmd != FAN_MARK_ADD)
> +		return -EINVAL;
> +
> +	switch (mark_cmd) {
>  	case FAN_MARK_ADD:
>  	case FAN_MARK_REMOVE:
>  		if (!mask)
> @@ -1671,7 +1678,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
>  	}
>  
>  	/* create/update an inode mark */
> -	switch (flags & (FAN_MARK_ADD | FAN_MARK_REMOVE)) {
> +	switch (mark_cmd) {
>  	case FAN_MARK_ADD:
>  		if (mark_type == FAN_MARK_MOUNT)
>  			ret = fanotify_add_vfsmount_mark(group, mnt, mask,
> @@ -1749,7 +1756,7 @@ static int __init fanotify_user_setup(void)
>  
>  	BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS);
>  	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 12);
> -	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
> +	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 10);
>  
>  	fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
>  					 SLAB_PANIC|SLAB_ACCOUNT);
> diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
> index 419cadcd7ff5..780f4b17d4c9 100644
> --- a/include/linux/fanotify.h
> +++ b/include/linux/fanotify.h
> @@ -59,14 +59,16 @@
>  #define FANOTIFY_MARK_TYPE_BITS	(FAN_MARK_INODE | FAN_MARK_MOUNT | \
>  				 FAN_MARK_FILESYSTEM)
>  
> +#define FANOTIFY_MARK_CMD_BITS	(FAN_MARK_ADD | FAN_MARK_REMOVE | \
> +				 FAN_MARK_FLUSH)
> +
>  #define FANOTIFY_MARK_FLAGS	(FANOTIFY_MARK_TYPE_BITS | \
> -				 FAN_MARK_ADD | \
> -				 FAN_MARK_REMOVE | \
> +				 FANOTIFY_MARK_CMD_BITS | \
>  				 FAN_MARK_DONT_FOLLOW | \
>  				 FAN_MARK_ONLYDIR | \
>  				 FAN_MARK_IGNORED_MASK | \
>  				 FAN_MARK_IGNORED_SURV_MODIFY | \
> -				 FAN_MARK_FLUSH)
> +				 FAN_MARK_CREATE)
>  
>  /*
>   * Events that can be reported with data type FSNOTIFY_EVENT_PATH.
> diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
> index e8ac38cc2fd6..c41feac21fe9 100644
> --- a/include/uapi/linux/fanotify.h
> +++ b/include/uapi/linux/fanotify.h
> @@ -82,6 +82,7 @@
>  #define FAN_MARK_IGNORED_SURV_MODIFY	0x00000040
>  #define FAN_MARK_FLUSH		0x00000080
>  /* FAN_MARK_FILESYSTEM is	0x00000100 */
> +#define FAN_MARK_CREATE		0x00000200
>  
>  /* These are NOT bitwise flags.  Both bits can be used togther.  */
>  #define FAN_MARK_INODE		0x00000000
> -- 
> 2.25.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] fanotify: add support for exclusive create of mark
  2022-03-17 15:34   ` Jan Kara
@ 2022-03-17 15:45     ` Jan Kara
  2022-03-18  3:13       ` Amir Goldstein
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2022-03-17 15:45 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel

On Thu 17-03-22 16:34:43, Jan Kara wrote:
> On Mon 07-03-22 17:57:40, Amir Goldstein wrote:
> > Similar to inotify's IN_MARK_CREATE, adding an fanotify mark with flag
> > FAN_MARK_CREATE will fail with error EEXIST if an fanotify mark already
> > exists on the object.
> > 
> > Unlike inotify's IN_MARK_CREATE, FAN_MARK_CREATE has to supplied in
> > combination with FAN_MARK_ADD (FAN_MARK_ADD is like inotify_add_watch()
> > and the behavior of IN_MARK_ADD is the default for fanotify_mark()).
> > 
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> 
> What I'm missing in this changelog is "why". Is it just about feature
> parity with inotify? I don't find this feature particularly useful...

OK, now I understand after reading patch 5/5. Hum, but I'm not quite happy
about the limitation to non-existing mark as much as I understand why you
need it. Let me think...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] fanotify: add support for exclusive create of mark
  2022-03-17 15:45     ` Jan Kara
@ 2022-03-18  3:13       ` Amir Goldstein
  2022-03-18 10:32         ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2022-03-18  3:13 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

On Thu, Mar 17, 2022 at 5:45 PM Jan Kara <jack@suse.cz> wrote:
>
> On Thu 17-03-22 16:34:43, Jan Kara wrote:
> > On Mon 07-03-22 17:57:40, Amir Goldstein wrote:
> > > Similar to inotify's IN_MARK_CREATE, adding an fanotify mark with flag
> > > FAN_MARK_CREATE will fail with error EEXIST if an fanotify mark already
> > > exists on the object.
> > >
> > > Unlike inotify's IN_MARK_CREATE, FAN_MARK_CREATE has to supplied in
> > > combination with FAN_MARK_ADD (FAN_MARK_ADD is like inotify_add_watch()
> > > and the behavior of IN_MARK_ADD is the default for fanotify_mark()).
> > >
> > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> >
> > What I'm missing in this changelog is "why". Is it just about feature
> > parity with inotify? I don't find this feature particularly useful...
>
> OK, now I understand after reading patch 5/5. Hum, but I'm not quite happy
> about the limitation to non-existing mark as much as I understand why you
> need it. Let me think...
>

Sorry for not articulating the problem better.
Let me write up the problem and maybe someone can come up with a better
solution than I did.

The problem that I was trying to avoid with FAN_MARK_VOLATILE is similar
to an existing UAPI problem with FAN_MARK_IGNORED_SURV_MODIFY -
This flag can only be set and not cleared and when set it affects all the events
set in the mask prior to that time, leading to unpredictable results.

Let's say a user sets FAN_CLOSE in ignored mask without _SURV_MODIFY
and later sets FAN_OPEN  in ignored mask with _SURV_MODIFY.
Does the ignored mask now include FAN_CLOSE? That depends
whether or not FAN_MODIFY event took place between the two calls.

That is one of the reasons I was trying to get rid of _SURV_MODIFY with
new FAN_MARK_IGNORE flag. The trickery in FAN_MARK_CREATE is
that the problem is averted - if a mark property can only be set and never
cleared and if it affects all past and future changes to mask, allow to set
this property during mark creation time and only during mark creation time.

I don't think there is a real use case for changing the _SURV_MODIFY
nor _VOLATILE property of a mark and indeed with new FAN_MARK_IGNORE
semantics, we may only allow to set _SURV_MODIFY along with
FAN_MARK_CREATE, so there are two problems solved using this method.

The fact that FAN_MARK_CREATE has feature parity with inotify is not
the reason to add it, but it does help to swallow this somewhat awkward
solution. And it is certainly easy to document.

As the commit message implies, I was contemplating whether
FAN_MARK_CREATE should be an alternative to FAN_MARK_ADD
or an ORed flag.
Semantics-wise we could decide either way.
I chose the option that seemed easier to implement and document
the behavior of FAN_MARK_VOLATILE.
Using FAN_MARK_CREATE as an alternative to FAN_MARK_ADD may
be a bit more elegant for UAPI though.
We could use a macro to get UAPI elegance without compromising simplicity:

#define FAN_MARK_NEW (FAN_MARK_ADD | FAN_MARK_CREATE)

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5] fsnotify: allow adding an inode mark without pinning inode
  2022-03-17 15:27   ` Jan Kara
@ 2022-03-18  6:23     ` Amir Goldstein
  2022-03-20 13:06       ` Amir Goldstein
  0 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2022-03-18  6:23 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

 >

On Thu, Mar 17, 2022 at 5:27 PM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 07-03-22 17:57:39, Amir Goldstein wrote:
> > fsnotify_add_mark() and variants implicitly take a reference on inode
> > when attaching a mark to an inode.
> >
> > Make that behavior opt-out with the flag FSNOTIFY_ADD_MARK_NO_IREF.
> >
> > Instead of taking the inode reference when attaching connector to inode
> > and dropping the inode reference when detaching connector from inode,
> > take the inode reference on attach of the first mark that wants to hold
> > an inode reference and drop the inode reference on detach of the last
> > mark that wants to hold an inode reference.
> >
> > This leaves the choice to the backend whether or not to pin the inode
> > when adding an inode mark.
> >
> > This is intended to be used when adding a mark with ignored mask that is
> > used for optimization in cases where group can afford getting unneeded
> > events and reinstate the mark with ignored mask when inode is accessed
> > again after being evicted.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> Couple of notes below.
>
> > diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> > index 190df435919f..f71b6814bfa7 100644
> > --- a/fs/notify/mark.c
> > +++ b/fs/notify/mark.c
> > @@ -213,6 +213,17 @@ static void *fsnotify_detach_connector_from_object(
> >       if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) {
> >               inode = fsnotify_conn_inode(conn);
> >               inode->i_fsnotify_mask = 0;
> > +
> > +             pr_debug("%s: inode=%p iref=%u sb_connectors=%lu icount=%u\n",
> > +                      __func__, inode, atomic_read(&conn->proxy_iref),
> > +                      atomic_long_read(&inode->i_sb->s_fsnotify_connectors),
> > +                      atomic_read(&inode->i_count));
>
> Are these pr_debug() prints that useful? My experience is they get rarely used
> after the code is debugged... If you think some places are useful longer
> term, tracepoints are probably easier to use these days?
>

Well in this case, the debug prints didn't even help me find the refcount bug
I had in the code, so not useful.

> > +
> > +             /* Unpin inode when detaching from connector */
> > +             if (atomic_read(&conn->proxy_iref))
> > +                     atomic_set(&conn->proxy_iref, 0);
> > +             else
> > +                     inode = NULL;
>
> proxy_iref is always manipulated under conn->lock so there's no need for
> atomic operations here.

Of course. Much simpler!

>
> >       } else if (conn->type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) {
> >               fsnotify_conn_mount(conn)->mnt_fsnotify_mask = 0;
> >       } else if (conn->type == FSNOTIFY_OBJ_TYPE_SB) {
> > @@ -240,12 +251,43 @@ static void fsnotify_final_mark_destroy(struct fsnotify_mark *mark)
> >  /* Drop object reference originally held by a connector */
> >  static void fsnotify_drop_object(unsigned int type, void *objp)
> >  {
> > +     struct inode *inode = objp;
> > +
> >       if (!objp)
> >               return;
> >       /* Currently only inode references are passed to be dropped */
> >       if (WARN_ON_ONCE(type != FSNOTIFY_OBJ_TYPE_INODE))
> >               return;
> > -     fsnotify_put_inode_ref(objp);
> > +
> > +     pr_debug("%s: inode=%p sb_connectors=%lu, icount=%u\n", __func__,
> > +              inode, atomic_long_read(&inode->i_sb->s_fsnotify_connectors),
> > +              atomic_read(&inode->i_count));
> > +
> > +     fsnotify_put_inode_ref(inode);
> > +}
> > +
> > +/* Drop the proxy refcount on inode maintainted by connector */
> > +static struct inode *fsnotify_drop_iref(struct fsnotify_mark_connector *conn,
> > +                                     unsigned int *type)
> > +{
> > +     struct inode *inode = fsnotify_conn_inode(conn);
> > +
> > +     if (WARN_ON_ONCE(!inode || conn->type != FSNOTIFY_OBJ_TYPE_INODE))
> > +             return NULL;
> > +
> > +     pr_debug("%s: inode=%p iref=%u sb_connectors=%lu icount=%u\n",
> > +              __func__, inode, atomic_read(&conn->proxy_iref),
> > +              atomic_long_read(&inode->i_sb->s_fsnotify_connectors),
> > +              atomic_read(&inode->i_count));
> > +
> > +     if (WARN_ON_ONCE(!atomic_read(&conn->proxy_iref)) ||
> > +         !atomic_dec_and_test(&conn->proxy_iref))
> > +             return NULL;
> > +
> > +     fsnotify_put_inode_ref(inode);
>
> You cannot call fsnotify_put_inode_ref() here because the function is
> called under conn->lock and iput() can sleep... You need to play similar
> game with passing inode pointer like
> fsnotify_detach_connector_from_object() does.

That was a plain bug.
That game is already being played and fsnotify_drop_object()
is responsible of iput().


BTW, I realized that incrementing/decrementing s_fsnotify_connectors
along with ihold/iput is completely useless, so I will remove the
fsnotify_{put,get}_inode_ref() helpers.

> > +     *type = FSNOTIFY_OBJ_TYPE_INODE;
> > +
> > +     return inode;
> >  }
> >
> >  void fsnotify_put_mark(struct fsnotify_mark *mark)
> > @@ -275,6 +317,9 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
> >               free_conn = true;
> >       } else {
> >               __fsnotify_recalc_mask(conn);
> > +             /* Unpin inode on last mark that wants inode refcount held */
> > +             if (mark->flags & FSNOTIFY_MARK_FLAG_HAS_IREF)
> > +                     objp = fsnotify_drop_iref(conn, &type);
> >       }
>
> This is going to be interesting. What if the connector got detached from
> the inode before fsnotify_put_mark() was called? Then iref_proxy would be
> already 0 and we would barf? I think
> fsnotify_detach_connector_from_object() needs to drop inode reference but
> leave iref_proxy alone for this to work. fsnotify_drop_iref() would then
> drop inode reference only if iref_proxy reaches 0 and conn->objp != NULL...
>

Good catch! but solution I think the is way simpler:

+               /* Unpin inode on last mark that wants inode refcount held */
+               if (conn->type == FSNOTIFY_OBJ_TYPE_INODE &&
+                   mark->flags & FSNOTIFY_MARK_FLAG_HAS_IREF)
+                       objp = fsnotify_drop_iref(conn, &type);

(iref_proxy > 0) always infers a single i_count reference, so it makes
fsnotify_detach_connector_from_object() sets iref_proxy = 0 and
conn->type = FSNOTIFY_OBJ_TYPE_DETACHED, so we should be good here.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] fanotify: add support for exclusive create of mark
  2022-03-18  3:13       ` Amir Goldstein
@ 2022-03-18 10:32         ` Jan Kara
  2022-03-18 11:04           ` Amir Goldstein
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2022-03-18 10:32 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel

On Fri 18-03-22 05:13:01, Amir Goldstein wrote:
> On Thu, Mar 17, 2022 at 5:45 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Thu 17-03-22 16:34:43, Jan Kara wrote:
> > > On Mon 07-03-22 17:57:40, Amir Goldstein wrote:
> > > > Similar to inotify's IN_MARK_CREATE, adding an fanotify mark with flag
> > > > FAN_MARK_CREATE will fail with error EEXIST if an fanotify mark already
> > > > exists on the object.
> > > >
> > > > Unlike inotify's IN_MARK_CREATE, FAN_MARK_CREATE has to supplied in
> > > > combination with FAN_MARK_ADD (FAN_MARK_ADD is like inotify_add_watch()
> > > > and the behavior of IN_MARK_ADD is the default for fanotify_mark()).
> > > >
> > > > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > >
> > > What I'm missing in this changelog is "why". Is it just about feature
> > > parity with inotify? I don't find this feature particularly useful...
> >
> > OK, now I understand after reading patch 5/5. Hum, but I'm not quite happy
> > about the limitation to non-existing mark as much as I understand why you
> > need it. Let me think...
> >
> 
> Sorry for not articulating the problem better.
> Let me write up the problem and maybe someone can come up with a better
> solution than I did.
> 
> The problem that I was trying to avoid with FAN_MARK_VOLATILE is similar
> to an existing UAPI problem with FAN_MARK_IGNORED_SURV_MODIFY -
> This flag can only be set and not cleared and when set it affects all the events
> set in the mask prior to that time, leading to unpredictable results.
>
> Let's say a user sets FAN_CLOSE in ignored mask without _SURV_MODIFY
> and later sets FAN_OPEN  in ignored mask with _SURV_MODIFY.
> Does the ignored mask now include FAN_CLOSE? That depends
> whether or not FAN_MODIFY event took place between the two calls.

Yeah, but with FAN_MARK_VOLATILE the problem also goes the other way
around. If I set FAN_MARK_VOLATILE on some inode and later add something to
a normal mask, I might be rightfully surprised when the mark gets evicted
and thus I will not get events I'm expecting. Granted the application would
be stepping on its own toes because marks are "merged" only for the same
notification group but still it could be surprising and avoiding such
mishaps would probably involve extra tracking on the application side.

The problem essentially lies in mixing mark "flags" (ONDIR, ON_CHILD,
VOLATILE, SURV_MODIFY) with mark mask. Mark operations with identical set
of flags can be merged without troubles but once flags are different
results of the merge are always "interesting". So far the consequences were
mostly benign (getting more events than the application may have expected)
but with FAN_MARK_VOLATILE we can also start loosing events and that is
more serious.

So far my thinking is that we either follow the path of possibly generating
more events than necessary (i.e., any merge of two masks that do not both
have FAN_MARK_VOLATILE set will clear FAN_MARK_VOLATILE) or we rework the
whole mark API (and implementation!) to completely avoid these strange
effects of flag merging. I don't like FAN_MARK_CREATE much because IMO it
solves only half of the problem - when new mark with a flag wants to merge
with an existing mark, but does not solve the other half when some other
mark wants to merge to a mark with a flag. Thoughts?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] fanotify: add support for exclusive create of mark
  2022-03-18 10:32         ` Jan Kara
@ 2022-03-18 11:04           ` Amir Goldstein
  2022-03-18 14:09             ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2022-03-18 11:04 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

> > The problem that I was trying to avoid with FAN_MARK_VOLATILE is similar
> > to an existing UAPI problem with FAN_MARK_IGNORED_SURV_MODIFY -
> > This flag can only be set and not cleared and when set it affects all the events
> > set in the mask prior to that time, leading to unpredictable results.
> >
> > Let's say a user sets FAN_CLOSE in ignored mask without _SURV_MODIFY
> > and later sets FAN_OPEN  in ignored mask with _SURV_MODIFY.
> > Does the ignored mask now include FAN_CLOSE? That depends
> > whether or not FAN_MODIFY event took place between the two calls.
>
> Yeah, but with FAN_MARK_VOLATILE the problem also goes the other way
> around. If I set FAN_MARK_VOLATILE on some inode and later add something to
> a normal mask, I might be rightfully surprised when the mark gets evicted
> and thus I will not get events I'm expecting. Granted the application would
> be stepping on its own toes because marks are "merged" only for the same
> notification group but still it could be surprising and avoiding such
> mishaps would probably involve extra tracking on the application side.
>
> The problem essentially lies in mixing mark "flags" (ONDIR, ON_CHILD,
> VOLATILE, SURV_MODIFY) with mark mask. Mark operations with identical set
> of flags can be merged without troubles but once flags are different
> results of the merge are always "interesting". So far the consequences were
> mostly benign (getting more events than the application may have expected)
> but with FAN_MARK_VOLATILE we can also start loosing events and that is
> more serious.
>
> So far my thinking is that we either follow the path of possibly generating
> more events than necessary (i.e., any merge of two masks that do not both
> have FAN_MARK_VOLATILE set will clear FAN_MARK_VOLATILE) or we rework the
> whole mark API (and implementation!) to completely avoid these strange
> effects of flag merging. I don't like FAN_MARK_CREATE much because IMO it
> solves only half of the problem - when new mark with a flag wants to merge
> with an existing mark, but does not solve the other half when some other
> mark wants to merge to a mark with a flag. Thoughts?
>

Yes. Just one thought.
My applications never needed to change the mark mask after it was
set and I don't really see a huge use case for changing the mask
once it was set (besides removing the entire mark).

So instead of FAN_MARK_CREATE, we may try to see if FAN_MARK_CONST
results in something that is useful and not too complicated to implement
and document.

IMO using a "const" initialization for the "volatile" mark is not such a big
limitation and should not cripple the feature.

Thoughts?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] fanotify: add support for exclusive create of mark
  2022-03-18 11:04           ` Amir Goldstein
@ 2022-03-18 14:09             ` Jan Kara
  2022-03-18 16:06               ` Amir Goldstein
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2022-03-18 14:09 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel

On Fri 18-03-22 13:04:51, Amir Goldstein wrote:
> > > The problem that I was trying to avoid with FAN_MARK_VOLATILE is similar
> > > to an existing UAPI problem with FAN_MARK_IGNORED_SURV_MODIFY -
> > > This flag can only be set and not cleared and when set it affects all the events
> > > set in the mask prior to that time, leading to unpredictable results.
> > >
> > > Let's say a user sets FAN_CLOSE in ignored mask without _SURV_MODIFY
> > > and later sets FAN_OPEN  in ignored mask with _SURV_MODIFY.
> > > Does the ignored mask now include FAN_CLOSE? That depends
> > > whether or not FAN_MODIFY event took place between the two calls.
> >
> > Yeah, but with FAN_MARK_VOLATILE the problem also goes the other way
> > around. If I set FAN_MARK_VOLATILE on some inode and later add something to
> > a normal mask, I might be rightfully surprised when the mark gets evicted
> > and thus I will not get events I'm expecting. Granted the application would
> > be stepping on its own toes because marks are "merged" only for the same
> > notification group but still it could be surprising and avoiding such
> > mishaps would probably involve extra tracking on the application side.
> >
> > The problem essentially lies in mixing mark "flags" (ONDIR, ON_CHILD,
> > VOLATILE, SURV_MODIFY) with mark mask. Mark operations with identical set
> > of flags can be merged without troubles but once flags are different
> > results of the merge are always "interesting". So far the consequences were
> > mostly benign (getting more events than the application may have expected)
> > but with FAN_MARK_VOLATILE we can also start loosing events and that is
> > more serious.
> >
> > So far my thinking is that we either follow the path of possibly generating
> > more events than necessary (i.e., any merge of two masks that do not both
> > have FAN_MARK_VOLATILE set will clear FAN_MARK_VOLATILE) or we rework the
> > whole mark API (and implementation!) to completely avoid these strange
> > effects of flag merging. I don't like FAN_MARK_CREATE much because IMO it
> > solves only half of the problem - when new mark with a flag wants to merge
> > with an existing mark, but does not solve the other half when some other
> > mark wants to merge to a mark with a flag. Thoughts?
> >
> 
> Yes. Just one thought.
> My applications never needed to change the mark mask after it was
> set and I don't really see a huge use case for changing the mask
> once it was set (besides removing the entire mark).
> 
> So instead of FAN_MARK_CREATE, we may try to see if FAN_MARK_CONST
> results in something that is useful and not too complicated to implement
> and document.
> 
> IMO using a "const" initialization for the "volatile" mark is not such a big
> limitation and should not cripple the feature.

OK, so basically if there's mark already placed at the inode and we try to
add FAN_MARK_CONST, the addition would fail, and similarly if we later tried
to add further mark to the inode with FAN_MARK_CONST mark, it would fail?

Thinking out loud: What does FAN_MARK_CONST bring compared to the
suggestion to go via the path of possibly generating more events by
clearing FAN_MARK_VOLATILE? I guess some additional safety if you would add
another mark to the same inode by an accident. Because if you never update
marks, there's no problem with additional mark flags. Is the new flag worth
it? Not sure...  :)

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] fanotify: add support for exclusive create of mark
  2022-03-18 14:09             ` Jan Kara
@ 2022-03-18 16:06               ` Amir Goldstein
  2022-03-20 13:00                 ` Amir Goldstein
  2022-03-21  9:09                 ` [PATCH 4/5] fanotify: add support for exclusive create of mark Jan Kara
  0 siblings, 2 replies; 28+ messages in thread
From: Amir Goldstein @ 2022-03-18 16:06 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

> > > So far my thinking is that we either follow the path of possibly generating
> > > more events than necessary (i.e., any merge of two masks that do not both
> > > have FAN_MARK_VOLATILE set will clear FAN_MARK_VOLATILE)

I agree that would provide predictable behavior which is also similar to
that of _SURV_MODIFY.
But IMO, this is very weird to explain/document in the wider sense.
However, if we only document that
"FAN_MARK_VOLATILE cannot be set on an existing mark and any update
 of the mask without FAN_MARK_VOLATILE clears that flag"
(i.e. we make _VOLATILE imply the _CREATE behavior)
then the merge logic is the same as you suggested, but easier to explain.

> > > or we rework the
> > > whole mark API (and implementation!) to completely avoid these strange
> > > effects of flag merging. I don't like FAN_MARK_CREATE much because IMO it
> > > solves only half of the problem - when new mark with a flag wants to merge
> > > with an existing mark, but does not solve the other half when some other
> > > mark wants to merge to a mark with a flag. Thoughts?
> > >
> >
> > Yes. Just one thought.
> > My applications never needed to change the mark mask after it was
> > set and I don't really see a huge use case for changing the mask
> > once it was set (besides removing the entire mark).
> >
> > So instead of FAN_MARK_CREATE, we may try to see if FAN_MARK_CONST
> > results in something that is useful and not too complicated to implement
> > and document.
> >
> > IMO using a "const" initialization for the "volatile" mark is not such a big
> > limitation and should not cripple the feature.
>
> OK, so basically if there's mark already placed at the inode and we try to
> add FAN_MARK_CONST, the addition would fail, and similarly if we later tried
> to add further mark to the inode with FAN_MARK_CONST mark, it would fail?
>
> Thinking out loud: What does FAN_MARK_CONST bring compared to the
> suggestion to go via the path of possibly generating more events by
> clearing FAN_MARK_VOLATILE? I guess some additional safety if you would add
> another mark to the same inode by an accident. Because if you never update
> marks, there's no problem with additional mark flags.
> Is the new flag worth it? Not sure...  :)

I rather not add new flags if we can do without them.

To summarize my last proposal:

1. On fanotify_mark() with FAN_MARK_VOLATILE
1.a. If the mark is new, the HAS_IREF flag is not set and no ihold()
1.b. If mark already exists without HAS_IREF flag, mask is updated
1.c. If mark already exists with HAS_IREF flag, mark add fails with EEXISTS

2. On fanotify_mark() without FAN_MARK_VOLATILE
2.a. If the mark is new or exists without HAS_IREF, the HAS_IREF flag
is set and ihold()
2.b. If mark already exists with HAS_IREF flag, mask is updated

Do we have a winner?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 0/5] Volatile fanotify marks
  2022-03-17 15:14   ` Amir Goldstein
@ 2022-03-20 12:54     ` Amir Goldstein
  2022-06-13  5:40       ` LTP test for fanotify evictable marks Amir Goldstein
  0 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2022-03-20 12:54 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

On Thu, Mar 17, 2022 at 5:14 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Mar 17, 2022 at 4:12 PM Jan Kara <jack@suse.cz> wrote:
> >
> > On Mon 07-03-22 17:57:36, Amir Goldstein wrote:
> > > Jan,
> > >
> > > Following RFC discussion [1], following are the volatile mark patches.
> > >
> > > Tested both manually and with this LTP test [2].
> > > I was struggling with this test for a while because drop caches
> > > did not get rid of the un-pinned inode when test was run with
> > > ext2 or ext4 on my test VM. With xfs, the test works fine for me,
> > > but it may not work for everyone.
> > >
> > > Perhaps you have a suggestion for a better way to test inode eviction.
> >
> > Drop caches does not evict dirty inodes. The inode is likely dirty because
> > you have chmodded it just before drop caches. So I think calling sync or
> > syncfs before dropping caches should fix your problems with ext2 / ext4.  I
> > suspect this has worked for XFS only because it does its private inode
> > dirtiness tracking and keeps the inode behind VFS's back.
>
> I did think of that and tried to fsync which did not help, but maybe
> I messed it up somehow.
>

You were right. fsync did fix the test.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] fanotify: add support for exclusive create of mark
  2022-03-18 16:06               ` Amir Goldstein
@ 2022-03-20 13:00                 ` Amir Goldstein
  2022-03-22 16:44                   ` direct reclaim of fanotify evictable marks Amir Goldstein
  2022-03-21  9:09                 ` [PATCH 4/5] fanotify: add support for exclusive create of mark Jan Kara
  1 sibling, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2022-03-20 13:00 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

On Fri, Mar 18, 2022 at 6:06 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > > > So far my thinking is that we either follow the path of possibly generating
> > > > more events than necessary (i.e., any merge of two masks that do not both
> > > > have FAN_MARK_VOLATILE set will clear FAN_MARK_VOLATILE)
>
> I agree that would provide predictable behavior which is also similar to
> that of _SURV_MODIFY.
> But IMO, this is very weird to explain/document in the wider sense.
> However, if we only document that
> "FAN_MARK_VOLATILE cannot be set on an existing mark and any update
>  of the mask without FAN_MARK_VOLATILE clears that flag"
> (i.e. we make _VOLATILE imply the _CREATE behavior)
> then the merge logic is the same as you suggested, but easier to explain.
>

[...]

>
> To summarize my last proposal:
>
> 1. On fanotify_mark() with FAN_MARK_VOLATILE
> 1.a. If the mark is new, the HAS_IREF flag is not set and no ihold()
> 1.b. If mark already exists without HAS_IREF flag, mask is updated
> 1.c. If mark already exists with HAS_IREF flag, mark add fails with EEXISTS
>
> 2. On fanotify_mark() without FAN_MARK_VOLATILE
> 2.a. If the mark is new or exists without HAS_IREF, the HAS_IREF flag
> is set and ihold()
> 2.b. If mark already exists with HAS_IREF flag, mask is updated
>
> Do we have a winner?
>

FYI, I've implemented the above and pushed to branch fan_evictable.
Yes, I also changed the name of the flag to be more coherent with the
documented behavior:

    fanotify: add support for "evictable" inode marks

    When an inode mark is created with flag FAN_MARK_EVICTABLE, it will not
    pin the marked inode to inode cache, so when inode is evicted from cache
    due to memory pressure, the mark will be lost.

    When an inode mark with flag FAN_MARK_EVICATBLE is updated without using
    this flag, the marked inode is pinned to inode cache.

    When an inode mark is updated with flag FAN_MARK_EVICTABLE but an
    existing mark already has the inode pinned, the mark update fails with
    error EEXIST.

I also took care of avoiding direct reclaim deadlocks from fanotify_add_mark().
If you agree to the proposed UAPI I will post v2 patches.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 3/5] fsnotify: allow adding an inode mark without pinning inode
  2022-03-18  6:23     ` Amir Goldstein
@ 2022-03-20 13:06       ` Amir Goldstein
  0 siblings, 0 replies; 28+ messages in thread
From: Amir Goldstein @ 2022-03-20 13:06 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

> > >  void fsnotify_put_mark(struct fsnotify_mark *mark)
> > > @@ -275,6 +317,9 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
> > >               free_conn = true;
> > >       } else {
> > >               __fsnotify_recalc_mask(conn);
> > > +             /* Unpin inode on last mark that wants inode refcount held */
> > > +             if (mark->flags & FSNOTIFY_MARK_FLAG_HAS_IREF)
> > > +                     objp = fsnotify_drop_iref(conn, &type);
> > >       }
> >
> > This is going to be interesting. What if the connector got detached from
> > the inode before fsnotify_put_mark() was called? Then iref_proxy would be
> > already 0 and we would barf? I think
> > fsnotify_detach_connector_from_object() needs to drop inode reference but
> > leave iref_proxy alone for this to work. fsnotify_drop_iref() would then
> > drop inode reference only if iref_proxy reaches 0 and conn->objp != NULL...
> >
>
> Good catch! but solution I think the is way simpler:
>
> +               /* Unpin inode on last mark that wants inode refcount held */
> +               if (conn->type == FSNOTIFY_OBJ_TYPE_INODE &&
> +                   mark->flags & FSNOTIFY_MARK_FLAG_HAS_IREF)
> +                       objp = fsnotify_drop_iref(conn, &type);
>
> (iref_proxy > 0) always infers a single i_count reference, so it makes
> fsnotify_detach_connector_from_object() sets iref_proxy = 0 and
> conn->type = FSNOTIFY_OBJ_TYPE_DETACHED, so we should be good here.
>

FWIW, I completely changed the proxy iref tracking in v2 (branch fan_evictable).
There is no iref_proxy, only FSNOTIFY_CONN_FLAG_HAS_IREF flag on the
connector which is aligned with elevated inode reference.

The "virtual iref_proxy" is recalculated in __fsnotify_recalc_mask()
which allows
for the "upgrade to pinned inode" logic, but iput() is only called on
detach of mark
or detach of object, if connector had FSNOTIFY_CONN_FLAG_HAS_IREF and
the  "virtual iref_proxy" dropped to 0.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 4/5] fanotify: add support for exclusive create of mark
  2022-03-18 16:06               ` Amir Goldstein
  2022-03-20 13:00                 ` Amir Goldstein
@ 2022-03-21  9:09                 ` Jan Kara
  1 sibling, 0 replies; 28+ messages in thread
From: Jan Kara @ 2022-03-21  9:09 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel

On Fri 18-03-22 18:06:40, Amir Goldstein wrote:
> > > > So far my thinking is that we either follow the path of possibly generating
> > > > more events than necessary (i.e., any merge of two masks that do not both
> > > > have FAN_MARK_VOLATILE set will clear FAN_MARK_VOLATILE)
> 
> I agree that would provide predictable behavior which is also similar to
> that of _SURV_MODIFY.
> But IMO, this is very weird to explain/document in the wider sense.
> However, if we only document that
> "FAN_MARK_VOLATILE cannot be set on an existing mark and any update
>  of the mask without FAN_MARK_VOLATILE clears that flag"
> (i.e. we make _VOLATILE imply the _CREATE behavior)
> then the merge logic is the same as you suggested, but easier to explain.

Yes, makes sense.

> > > > or we rework the
> > > > whole mark API (and implementation!) to completely avoid these strange
> > > > effects of flag merging. I don't like FAN_MARK_CREATE much because IMO it
> > > > solves only half of the problem - when new mark with a flag wants to merge
> > > > with an existing mark, but does not solve the other half when some other
> > > > mark wants to merge to a mark with a flag. Thoughts?
> > > >
> > >
> > > Yes. Just one thought.
> > > My applications never needed to change the mark mask after it was
> > > set and I don't really see a huge use case for changing the mask
> > > once it was set (besides removing the entire mark).
> > >
> > > So instead of FAN_MARK_CREATE, we may try to see if FAN_MARK_CONST
> > > results in something that is useful and not too complicated to implement
> > > and document.
> > >
> > > IMO using a "const" initialization for the "volatile" mark is not such a big
> > > limitation and should not cripple the feature.
> >
> > OK, so basically if there's mark already placed at the inode and we try to
> > add FAN_MARK_CONST, the addition would fail, and similarly if we later tried
> > to add further mark to the inode with FAN_MARK_CONST mark, it would fail?
> >
> > Thinking out loud: What does FAN_MARK_CONST bring compared to the
> > suggestion to go via the path of possibly generating more events by
> > clearing FAN_MARK_VOLATILE? I guess some additional safety if you would add
> > another mark to the same inode by an accident. Because if you never update
> > marks, there's no problem with additional mark flags.
> > Is the new flag worth it? Not sure...  :)
> 
> I rather not add new flags if we can do without them.
> 
> To summarize my last proposal:
> 
> 1. On fanotify_mark() with FAN_MARK_VOLATILE
> 1.a. If the mark is new, the HAS_IREF flag is not set and no ihold()
> 1.b. If mark already exists without HAS_IREF flag, mask is updated
> 1.c. If mark already exists with HAS_IREF flag, mark add fails with EEXISTS
> 
> 2. On fanotify_mark() without FAN_MARK_VOLATILE
> 2.a. If the mark is new or exists without HAS_IREF, the HAS_IREF flag
> is set and ihold()
> 2.b. If mark already exists with HAS_IREF flag, mask is updated
> 
> Do we have a winner?

Sounds good to me.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: direct reclaim of fanotify evictable marks
  2022-03-20 13:00                 ` Amir Goldstein
@ 2022-03-22 16:44                   ` Amir Goldstein
  2022-03-23  9:16                     ` Amir Goldstein
  0 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2022-03-22 16:44 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

> FYI, I've implemented the above and pushed to branch fan_evictable.
> Yes, I also changed the name of the flag to be more coherent with the
> documented behavior:
>
>     fanotify: add support for "evictable" inode marks
>
>     When an inode mark is created with flag FAN_MARK_EVICTABLE, it will not
>     pin the marked inode to inode cache, so when inode is evicted from cache
>     due to memory pressure, the mark will be lost.
>
>     When an inode mark with flag FAN_MARK_EVICATBLE is updated without using
>     this flag, the marked inode is pinned to inode cache.
>
>     When an inode mark is updated with flag FAN_MARK_EVICTABLE but an
>     existing mark already has the inode pinned, the mark update fails with
>     error EEXIST.
>
> I also took care of avoiding direct reclaim deadlocks from fanotify_add_mark().
> If you agree to the proposed UAPI I will post v2 patches.

Jan,

Before implementing your suggested solution, I wrote a test patch
that reproduces the deadlock.
It took me a while to get to a reproducible scenario and I ended up using
a debug feature called FAN_MARK_LARGE to get there.

You can see the test patches at the tip of these kernel and ltp branches:
https://github.com/amir73il/linux/commits/fan_evictable
https://github.com/amir73il/ltp/commits/fan_evictable

The question is how can we test this in release kernels?
Should we include FAN_MARK_LARGE as a hidden (admin only)
feature? Use sysctl knob to enable it? use an ioctl? something else?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: direct reclaim of fanotify evictable marks
  2022-03-22 16:44                   ` direct reclaim of fanotify evictable marks Amir Goldstein
@ 2022-03-23  9:16                     ` Amir Goldstein
  0 siblings, 0 replies; 28+ messages in thread
From: Amir Goldstein @ 2022-03-23  9:16 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

On Tue, Mar 22, 2022 at 6:44 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> > FYI, I've implemented the above and pushed to branch fan_evictable.
> > Yes, I also changed the name of the flag to be more coherent with the
> > documented behavior:
> >
> >     fanotify: add support for "evictable" inode marks
> >
> >     When an inode mark is created with flag FAN_MARK_EVICTABLE, it will not
> >     pin the marked inode to inode cache, so when inode is evicted from cache
> >     due to memory pressure, the mark will be lost.
> >
> >     When an inode mark with flag FAN_MARK_EVICATBLE is updated without using
> >     this flag, the marked inode is pinned to inode cache.
> >
> >     When an inode mark is updated with flag FAN_MARK_EVICTABLE but an
> >     existing mark already has the inode pinned, the mark update fails with
> >     error EEXIST.
> >
> > I also took care of avoiding direct reclaim deadlocks from fanotify_add_mark().
> > If you agree to the proposed UAPI I will post v2 patches.
>
> Jan,
>
> Before implementing your suggested solution, I wrote a test patch
> that reproduces the deadlock.
> It took me a while to get to a reproducible scenario and I ended up using
> a debug feature called FAN_MARK_LARGE to get there.
>
> You can see the test patches at the tip of these kernel and ltp branches:
> https://github.com/amir73il/linux/commits/fan_evictable
> https://github.com/amir73il/ltp/commits/fan_evictable
>
> The question is how can we test this in release kernels?
> Should we include FAN_MARK_LARGE as a hidden (admin only)
> feature? Use sysctl knob to enable it? use an ioctl? something else?
>

I went for ioctl and I kind of like it like.
Pushed.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: LTP test for fanotify evictable marks
  2022-03-20 12:54     ` Amir Goldstein
@ 2022-06-13  5:40       ` Amir Goldstein
  2022-06-13 11:59         ` Jan Kara
  0 siblings, 1 reply; 28+ messages in thread
From: Amir Goldstein @ 2022-06-13  5:40 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

On Sun, Mar 20, 2022 at 2:54 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> On Thu, Mar 17, 2022 at 5:14 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Thu, Mar 17, 2022 at 4:12 PM Jan Kara <jack@suse.cz> wrote:
> > >
> > > On Mon 07-03-22 17:57:36, Amir Goldstein wrote:
> > > > Jan,
> > > >
> > > > Following RFC discussion [1], following are the volatile mark patches.
> > > >
> > > > Tested both manually and with this LTP test [2].
> > > > I was struggling with this test for a while because drop caches
> > > > did not get rid of the un-pinned inode when test was run with
> > > > ext2 or ext4 on my test VM. With xfs, the test works fine for me,
> > > > but it may not work for everyone.
> > > >
> > > > Perhaps you have a suggestion for a better way to test inode eviction.
> > >
> > > Drop caches does not evict dirty inodes. The inode is likely dirty because
> > > you have chmodded it just before drop caches. So I think calling sync or
> > > syncfs before dropping caches should fix your problems with ext2 / ext4.  I
> > > suspect this has worked for XFS only because it does its private inode
> > > dirtiness tracking and keeps the inode behind VFS's back.
> >
> > I did think of that and tried to fsync which did not help, but maybe
> > I messed it up somehow.
> >
>
> You were right. fsync did fix the test.

Hi Jan,

I was preparing to post the LTP test for FAN_MARK_EVICTABLE [1]
and I realized the issue we discussed above was not really resolved.
fsync() + drop_caches is not enough to guarantee reliable inode eviction.

It "kind of" works for ext2 and xfs, but not for ext4, ext3, btrfs.
"kind of" because even for ext2 and xfs, dropping only inode cache (2)
doesn't evict the inode/mark and dropping inode+page cache (3) does work
most of the time, although I did occasionally see failures.
I suspect those failures were related to running the test on a system with very
low page cache usage.
The fact that I had to tweak vfs_cache_pressure to increase test reliability
also suggests that there are heuristics at play.

I guess I could fill the page cache with pages to rig the game.
Do you have other suggestions on how to increase the reliability of the test?
That is, how to reliably evict a non-dirty inode.

Thanks,
Amir.

[1] https://github.com/amir73il/ltp/blob/fan_evictable/testcases/kernel/syscalls/fanotify/fanotify23.c

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: LTP test for fanotify evictable marks
  2022-06-13  5:40       ` LTP test for fanotify evictable marks Amir Goldstein
@ 2022-06-13 11:59         ` Jan Kara
  2022-06-13 14:16           ` Amir Goldstein
  0 siblings, 1 reply; 28+ messages in thread
From: Jan Kara @ 2022-06-13 11:59 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel

On Mon 13-06-22 08:40:37, Amir Goldstein wrote:
> On Sun, Mar 20, 2022 at 2:54 PM Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > On Thu, Mar 17, 2022 at 5:14 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Thu, Mar 17, 2022 at 4:12 PM Jan Kara <jack@suse.cz> wrote:
> > > >
> > > > On Mon 07-03-22 17:57:36, Amir Goldstein wrote:
> > > > > Jan,
> > > > >
> > > > > Following RFC discussion [1], following are the volatile mark patches.
> > > > >
> > > > > Tested both manually and with this LTP test [2].
> > > > > I was struggling with this test for a while because drop caches
> > > > > did not get rid of the un-pinned inode when test was run with
> > > > > ext2 or ext4 on my test VM. With xfs, the test works fine for me,
> > > > > but it may not work for everyone.
> > > > >
> > > > > Perhaps you have a suggestion for a better way to test inode eviction.
> > > >
> > > > Drop caches does not evict dirty inodes. The inode is likely dirty because
> > > > you have chmodded it just before drop caches. So I think calling sync or
> > > > syncfs before dropping caches should fix your problems with ext2 / ext4.  I
> > > > suspect this has worked for XFS only because it does its private inode
> > > > dirtiness tracking and keeps the inode behind VFS's back.
> > >
> > > I did think of that and tried to fsync which did not help, but maybe
> > > I messed it up somehow.
> > >
> >
> > You were right. fsync did fix the test.
> 
> Hi Jan,
> 
> I was preparing to post the LTP test for FAN_MARK_EVICTABLE [1]
> and I realized the issue we discussed above was not really resolved.
> fsync() + drop_caches is not enough to guarantee reliable inode eviction.
> 
> It "kind of" works for ext2 and xfs, but not for ext4, ext3, btrfs.
> "kind of" because even for ext2 and xfs, dropping only inode cache (2)
> doesn't evict the inode/mark and dropping inode+page cache (3) does work
> most of the time, although I did occasionally see failures.
> I suspect those failures were related to running the test on a system
> with very low page cache usage.
> The fact that I had to tweak vfs_cache_pressure to increase test reliability
> also suggests that there are heuristics at play.

Well, yes, there's no guaranteed way to force inode out of cache. It is all
best-effort stuff. When we needed to make sure inode goes out of cache on
nearest occasion, we have introduced d_mark_dontcache() but there's no
fs common way to set this flag on dentry and I don't think we want to
expose one.

I was thinking whether we have some more reliable way to test this
functionality and I didn't find one. One other obvious approach to the test
is to create memcgroup with low memory limit, tag large tree with evictable
mark, and see whether the memory gets exhausted. This is kind of where this
functionality is aimed. But there are also variables in this testing scheme
that may be difficult to tame and the test will likely take rather long
time to perform.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: LTP test for fanotify evictable marks
  2022-06-13 11:59         ` Jan Kara
@ 2022-06-13 14:16           ` Amir Goldstein
  0 siblings, 0 replies; 28+ messages in thread
From: Amir Goldstein @ 2022-06-13 14:16 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

On Mon, Jun 13, 2022 at 2:59 PM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 13-06-22 08:40:37, Amir Goldstein wrote:
> > On Sun, Mar 20, 2022 at 2:54 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > >
> > > On Thu, Mar 17, 2022 at 5:14 PM Amir Goldstein <amir73il@gmail.com> wrote:
> > > >
> > > > On Thu, Mar 17, 2022 at 4:12 PM Jan Kara <jack@suse.cz> wrote:
> > > > >
> > > > > On Mon 07-03-22 17:57:36, Amir Goldstein wrote:
> > > > > > Jan,
> > > > > >
> > > > > > Following RFC discussion [1], following are the volatile mark patches.
> > > > > >
> > > > > > Tested both manually and with this LTP test [2].
> > > > > > I was struggling with this test for a while because drop caches
> > > > > > did not get rid of the un-pinned inode when test was run with
> > > > > > ext2 or ext4 on my test VM. With xfs, the test works fine for me,
> > > > > > but it may not work for everyone.
> > > > > >
> > > > > > Perhaps you have a suggestion for a better way to test inode eviction.
> > > > >
> > > > > Drop caches does not evict dirty inodes. The inode is likely dirty because
> > > > > you have chmodded it just before drop caches. So I think calling sync or
> > > > > syncfs before dropping caches should fix your problems with ext2 / ext4.  I
> > > > > suspect this has worked for XFS only because it does its private inode
> > > > > dirtiness tracking and keeps the inode behind VFS's back.
> > > >
> > > > I did think of that and tried to fsync which did not help, but maybe
> > > > I messed it up somehow.
> > > >
> > >
> > > You were right. fsync did fix the test.
> >
> > Hi Jan,
> >
> > I was preparing to post the LTP test for FAN_MARK_EVICTABLE [1]
> > and I realized the issue we discussed above was not really resolved.
> > fsync() + drop_caches is not enough to guarantee reliable inode eviction.
> >
> > It "kind of" works for ext2 and xfs, but not for ext4, ext3, btrfs.
> > "kind of" because even for ext2 and xfs, dropping only inode cache (2)
> > doesn't evict the inode/mark and dropping inode+page cache (3) does work
> > most of the time, although I did occasionally see failures.
> > I suspect those failures were related to running the test on a system
> > with very low page cache usage.
> > The fact that I had to tweak vfs_cache_pressure to increase test reliability
> > also suggests that there are heuristics at play.
>
> Well, yes, there's no guaranteed way to force inode out of cache. It is all
> best-effort stuff. When we needed to make sure inode goes out of cache on
> nearest occasion, we have introduced d_mark_dontcache() but there's no
> fs common way to set this flag on dentry and I don't think we want to
> expose one.
>
> I was thinking whether we have some more reliable way to test this
> functionality and I didn't find one. One other obvious approach to the test
> is to create memcgroup with low memory limit, tag large tree with evictable
> mark, and see whether the memory gets exhausted. This is kind of where this
> functionality is aimed. But there are also variables in this testing scheme
> that may be difficult to tame and the test will likely take rather long
> time to perform.

That's why I had the debugging FAN_IOC_SET_MARK_PAGE_ORDER
ioctl, which I used for a reliable and fast direct reclaim test.
Anyway, I am going to submit the test with the current kludges to run
on ext2 and see if anyone complains.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-06-13 18:17 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-07 15:57 [PATCH 0/5] Volatile fanotify marks Amir Goldstein
2022-03-07 15:57 ` [PATCH 1/5] fsnotify: move inotify control flags to mark flags Amir Goldstein
2022-03-17 14:25   ` Jan Kara
2022-03-17 15:12     ` Amir Goldstein
2022-03-07 15:57 ` [PATCH 2/5] fsnotify: pass flags argument to fsnotify_add_mark() Amir Goldstein
2022-03-07 15:57 ` [PATCH 3/5] fsnotify: allow adding an inode mark without pinning inode Amir Goldstein
2022-03-17 15:27   ` Jan Kara
2022-03-18  6:23     ` Amir Goldstein
2022-03-20 13:06       ` Amir Goldstein
2022-03-07 15:57 ` [PATCH 4/5] fanotify: add support for exclusive create of mark Amir Goldstein
2022-03-17 15:34   ` Jan Kara
2022-03-17 15:45     ` Jan Kara
2022-03-18  3:13       ` Amir Goldstein
2022-03-18 10:32         ` Jan Kara
2022-03-18 11:04           ` Amir Goldstein
2022-03-18 14:09             ` Jan Kara
2022-03-18 16:06               ` Amir Goldstein
2022-03-20 13:00                 ` Amir Goldstein
2022-03-22 16:44                   ` direct reclaim of fanotify evictable marks Amir Goldstein
2022-03-23  9:16                     ` Amir Goldstein
2022-03-21  9:09                 ` [PATCH 4/5] fanotify: add support for exclusive create of mark Jan Kara
2022-03-07 15:57 ` [PATCH 5/5] fanotify: add support for "volatile" inode marks Amir Goldstein
2022-03-17 14:12 ` [PATCH 0/5] Volatile fanotify marks Jan Kara
2022-03-17 15:14   ` Amir Goldstein
2022-03-20 12:54     ` Amir Goldstein
2022-06-13  5:40       ` LTP test for fanotify evictable marks Amir Goldstein
2022-06-13 11:59         ` Jan Kara
2022-06-13 14:16           ` Amir Goldstein

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).