All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 00/16] Evictable fanotify marks
@ 2022-04-13  9:09 Amir Goldstein
  2022-04-13  9:09 ` [PATCH v3 01/16] inotify: show inotify mask flags in proc fdinfo Amir Goldstein
                   ` (16 more replies)
  0 siblings, 17 replies; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

Jan,

Following v3 patch set addresses your review comments on v2 [2].

Please see LTP test [3] and man page draft [4] for evictable marks.

Thanks,
Amir.

Changes since v2 [2]:
- Simplify group lock helpers (Jan)
- Move FSNOTIFY_GROUP_NOFS flag to group object (Jan)
- Split patch of fanotify_mark_user_flags() (Jan)
- Fix bug in case of EEXIST
- Drop ioctl for debugging
- Rebased and tested on v5.18-rc1

Changes since v1 [1]:
- Fixes for direct reclaim deadlock
- Add ioctl for direct reclaim test
- Rebrand as FAN_MARK_EVICTABLE
- Remove FAN_MARK_CREATE and allow clearing FAN_MARK_EVICTABLE
- Replace connector proxy_iref with HAS_IREF flag
- Take iref in fsnotify_reclac_mark() rather than on add mark to list
- Remove fsnotify_add_mark() allow_dups/flags argument
- Remove pr_debug() prints

[1] https://lore.kernel.org/r/20220307155741.1352405-1-amir73il@gmail.com/
[2] https://lore.kernel.org/r/20220329074904.2980320-1-amir73il@gmail.com/
[3] https://github.com/amir73il/ltp/commits/fan_evictable
[4] https://github.com/amir73il/man-pages/commits/fan_evictable

Amir Goldstein (16):
  inotify: show inotify mask flags in proc fdinfo
  inotify: move control flags from mask to mark flags
  fsnotify: fix wrong lockdep annotations
  fsnotify: pass flags argument to fsnotify_add_mark() via mark
  fsnotify: pass flags argument to fsnotify_alloc_group()
  fsnotify: create helpers for group mark_mutex lock
  inotify: use fsnotify group lock helpers
  audit: use fsnotify group lock helpers
  nfsd: use fsnotify group lock helpers
  dnotify: use fsnotify group lock helpers
  fsnotify: allow adding an inode mark without pinning inode
  fanotify: create helper fanotify_mark_user_flags()
  fanotify: factor out helper fanotify_mark_update_flags()
  fanotify: implement "evictable" inode marks
  fanotify: use fsnotify group lock helpers
  fanotify: enable "evictable" inode marks

 fs/nfsd/filecache.c                  |  14 ++--
 fs/notify/dnotify/dnotify.c          |  13 +--
 fs/notify/fanotify/fanotify.h        |  12 +++
 fs/notify/fanotify/fanotify_user.c   |  95 +++++++++++++++-------
 fs/notify/fdinfo.c                   |  21 ++---
 fs/notify/fsnotify.c                 |   4 +-
 fs/notify/group.c                    |  32 +++++---
 fs/notify/inotify/inotify.h          |  19 +++++
 fs/notify/inotify/inotify_fsnotify.c |   2 +-
 fs/notify/inotify/inotify_user.c     |  49 ++++++-----
 fs/notify/mark.c                     | 117 ++++++++++++++++++---------
 include/linux/fanotify.h             |   1 +
 include/linux/fsnotify_backend.h     |  75 ++++++++++++-----
 include/uapi/linux/fanotify.h        |   1 +
 kernel/audit_fsnotify.c              |   6 +-
 kernel/audit_tree.c                  |  34 ++++----
 kernel/audit_watch.c                 |   2 +-
 17 files changed, 330 insertions(+), 167 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3 01/16] inotify: show inotify mask flags in proc fdinfo
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-13  9:09 ` [PATCH v3 02/16] inotify: move control flags from mask to mark flags Amir Goldstein
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

The inotify mask flags IN_ONESHOT and IN_EXCL_UNLINK are not "internal
to kernel" and should be exposed in procfs fdinfo so CRIU can restore
them.

Fixes: 6933599697c9 ("inotify: hide internal kernel bits from fdinfo")
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fdinfo.c               | 11 ++---------
 fs/notify/inotify/inotify.h      | 12 ++++++++++++
 fs/notify/inotify/inotify_user.c |  2 +-
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index 57f0d5d9f934..3451708fd035 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -83,16 +83,9 @@ static void inotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
 	inode_mark = container_of(mark, struct inotify_inode_mark, fsn_mark);
 	inode = igrab(fsnotify_conn_inode(mark->connector));
 	if (inode) {
-		/*
-		 * IN_ALL_EVENTS represents all of the mask bits
-		 * that we expose to userspace.  There is at
-		 * least one bit (FS_EVENT_ON_CHILD) which is
-		 * used only internally to the kernel.
-		 */
-		u32 mask = mark->mask & IN_ALL_EVENTS;
-		seq_printf(m, "inotify wd:%x ino:%lx sdev:%x mask:%x ignored_mask:%x ",
+		seq_printf(m, "inotify wd:%x ino:%lx sdev:%x mask:%x ignored_mask:0 ",
 			   inode_mark->wd, inode->i_ino, inode->i_sb->s_dev,
-			   mask, mark->ignored_mask);
+			   inotify_mark_user_mask(mark));
 		show_mark_fhandle(m, inode);
 		seq_putc(m, '\n');
 		iput(inode);
diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h
index 2007e3711916..8f00151eb731 100644
--- a/fs/notify/inotify/inotify.h
+++ b/fs/notify/inotify/inotify.h
@@ -22,6 +22,18 @@ static inline struct inotify_event_info *INOTIFY_E(struct fsnotify_event *fse)
 	return container_of(fse, struct inotify_event_info, fse);
 }
 
+/*
+ * INOTIFY_USER_FLAGS represents all of the mask bits that we expose to
+ * userspace.  There is at least one bit (FS_EVENT_ON_CHILD) which is
+ * used only internally to the kernel.
+ */
+#define INOTIFY_USER_MASK (IN_ALL_EVENTS | IN_ONESHOT | IN_EXCL_UNLINK)
+
+static inline __u32 inotify_mark_user_mask(struct fsnotify_mark *fsn_mark)
+{
+	return fsn_mark->mask & INOTIFY_USER_MASK;
+}
+
 extern void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark,
 					   struct fsnotify_group *group);
 extern int inotify_handle_inode_event(struct fsnotify_mark *inode_mark,
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 54583f62dc44..3ef57db0ec9d 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -110,7 +110,7 @@ static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg)
 		mask |= FS_EVENT_ON_CHILD;
 
 	/* mask off the flags used to open the fd */
-	mask |= (arg & (IN_ALL_EVENTS | IN_ONESHOT | IN_EXCL_UNLINK));
+	mask |= (arg & INOTIFY_USER_MASK);
 
 	return mask;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 02/16] inotify: move control flags from mask to mark flags
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
  2022-04-13  9:09 ` [PATCH v3 01/16] inotify: show inotify mask flags in proc fdinfo Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-13  9:09 ` [PATCH v3 03/16] fsnotify: fix wrong lockdep annotations Amir Goldstein
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

The inotify control flags in the mark mask (e.g. FS_IN_ONE_SHOT) are not
relevant to object interest mask, so move them to the mark flags.

This frees up some bits in the object interest mask.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fsnotify.c                 |  4 +--
 fs/notify/inotify/inotify.h          | 11 ++++++--
 fs/notify/inotify/inotify_fsnotify.c |  2 +-
 fs/notify/inotify/inotify_user.c     | 38 ++++++++++++++++++----------
 include/linux/fsnotify_backend.h     | 16 +++++++-----
 5 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 70a8516b78bc..6eee19d15e8c 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -253,7 +253,7 @@ static int fsnotify_handle_inode_event(struct fsnotify_group *group,
 	if (WARN_ON_ONCE(!inode && !dir))
 		return 0;
 
-	if ((inode_mark->mask & FS_EXCL_UNLINK) &&
+	if ((inode_mark->flags & FSNOTIFY_MARK_FLAG_EXCL_UNLINK) &&
 	    path && d_unlinked(path->dentry))
 		return 0;
 
@@ -581,7 +581,7 @@ static __init int fsnotify_init(void)
 {
 	int ret;
 
-	BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 25);
+	BUILD_BUG_ON(HWEIGHT32(ALL_FSNOTIFY_BITS) != 23);
 
 	ret = init_srcu_struct(&fsnotify_mark_srcu);
 	if (ret)
diff --git a/fs/notify/inotify/inotify.h b/fs/notify/inotify/inotify.h
index 8f00151eb731..7d5df7a21539 100644
--- a/fs/notify/inotify/inotify.h
+++ b/fs/notify/inotify/inotify.h
@@ -27,11 +27,18 @@ static inline struct inotify_event_info *INOTIFY_E(struct fsnotify_event *fse)
  * userspace.  There is at least one bit (FS_EVENT_ON_CHILD) which is
  * used only internally to the kernel.
  */
-#define INOTIFY_USER_MASK (IN_ALL_EVENTS | IN_ONESHOT | IN_EXCL_UNLINK)
+#define INOTIFY_USER_MASK (IN_ALL_EVENTS)
 
 static inline __u32 inotify_mark_user_mask(struct fsnotify_mark *fsn_mark)
 {
-	return fsn_mark->mask & INOTIFY_USER_MASK;
+	__u32 mask = fsn_mark->mask & INOTIFY_USER_MASK;
+
+	if (fsn_mark->flags & FSNOTIFY_MARK_FLAG_EXCL_UNLINK)
+		mask |= IN_EXCL_UNLINK;
+	if (fsn_mark->flags & FSNOTIFY_MARK_FLAG_IN_ONESHOT)
+		mask |= IN_ONESHOT;
+
+	return mask;
 }
 
 extern void inotify_ignored_and_remove_idr(struct fsnotify_mark *fsn_mark,
diff --git a/fs/notify/inotify/inotify_fsnotify.c b/fs/notify/inotify/inotify_fsnotify.c
index d92d7b0adc9a..49cfe2ae6d23 100644
--- a/fs/notify/inotify/inotify_fsnotify.c
+++ b/fs/notify/inotify/inotify_fsnotify.c
@@ -122,7 +122,7 @@ int inotify_handle_inode_event(struct fsnotify_mark *inode_mark, u32 mask,
 		fsnotify_destroy_event(group, fsn_event);
 	}
 
-	if (inode_mark->mask & IN_ONESHOT)
+	if (inode_mark->flags & FSNOTIFY_MARK_FLAG_IN_ONESHOT)
 		fsnotify_destroy_mark(inode_mark, group);
 
 	return 0;
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 3ef57db0ec9d..d8907d32a05b 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -115,6 +115,21 @@ static inline __u32 inotify_arg_to_mask(struct inode *inode, u32 arg)
 	return mask;
 }
 
+#define INOTIFY_MARK_FLAGS \
+	(FSNOTIFY_MARK_FLAG_EXCL_UNLINK | FSNOTIFY_MARK_FLAG_IN_ONESHOT)
+
+static inline unsigned int inotify_arg_to_flags(u32 arg)
+{
+	unsigned int flags = 0;
+
+	if (arg & IN_EXCL_UNLINK)
+		flags |= FSNOTIFY_MARK_FLAG_EXCL_UNLINK;
+	if (arg & IN_ONESHOT)
+		flags |= FSNOTIFY_MARK_FLAG_IN_ONESHOT;
+
+	return flags;
+}
+
 static inline u32 inotify_mask_to_arg(__u32 mask)
 {
 	return mask & (IN_ALL_EVENTS | IN_ISDIR | IN_UNMOUNT | IN_IGNORED |
@@ -526,13 +541,10 @@ static int inotify_update_existing_watch(struct fsnotify_group *group,
 	struct fsnotify_mark *fsn_mark;
 	struct inotify_inode_mark *i_mark;
 	__u32 old_mask, new_mask;
-	__u32 mask;
-	int add = (arg & IN_MASK_ADD);
+	int replace = !(arg & IN_MASK_ADD);
 	int create = (arg & IN_MASK_CREATE);
 	int ret;
 
-	mask = inotify_arg_to_mask(inode, arg);
-
 	fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, group);
 	if (!fsn_mark)
 		return -ENOENT;
@@ -545,10 +557,12 @@ static int inotify_update_existing_watch(struct fsnotify_group *group,
 
 	spin_lock(&fsn_mark->lock);
 	old_mask = fsn_mark->mask;
-	if (add)
-		fsn_mark->mask |= mask;
-	else
-		fsn_mark->mask = mask;
+	if (replace) {
+		fsn_mark->mask = 0;
+		fsn_mark->flags &= ~INOTIFY_MARK_FLAGS;
+	}
+	fsn_mark->mask |= inotify_arg_to_mask(inode, arg);
+	fsn_mark->flags |= inotify_arg_to_flags(arg);
 	new_mask = fsn_mark->mask;
 	spin_unlock(&fsn_mark->lock);
 
@@ -579,19 +593,17 @@ static int inotify_new_watch(struct fsnotify_group *group,
 			     u32 arg)
 {
 	struct inotify_inode_mark *tmp_i_mark;
-	__u32 mask;
 	int ret;
 	struct idr *idr = &group->inotify_data.idr;
 	spinlock_t *idr_lock = &group->inotify_data.idr_lock;
 
-	mask = inotify_arg_to_mask(inode, arg);
-
 	tmp_i_mark = kmem_cache_alloc(inotify_inode_mark_cachep, GFP_KERNEL);
 	if (unlikely(!tmp_i_mark))
 		return -ENOMEM;
 
 	fsnotify_init_mark(&tmp_i_mark->fsn_mark, group);
-	tmp_i_mark->fsn_mark.mask = mask;
+	tmp_i_mark->fsn_mark.mask = inotify_arg_to_mask(inode, arg);
+	tmp_i_mark->fsn_mark.flags = inotify_arg_to_flags(arg);
 	tmp_i_mark->wd = -1;
 
 	ret = inotify_add_to_idr(idr, idr_lock, tmp_i_mark);
@@ -845,9 +857,7 @@ static int __init inotify_user_setup(void)
 	BUILD_BUG_ON(IN_UNMOUNT != FS_UNMOUNT);
 	BUILD_BUG_ON(IN_Q_OVERFLOW != FS_Q_OVERFLOW);
 	BUILD_BUG_ON(IN_IGNORED != FS_IN_IGNORED);
-	BUILD_BUG_ON(IN_EXCL_UNLINK != FS_EXCL_UNLINK);
 	BUILD_BUG_ON(IN_ISDIR != FS_ISDIR);
-	BUILD_BUG_ON(IN_ONESHOT != FS_IN_ONESHOT);
 
 	BUILD_BUG_ON(HWEIGHT32(ALL_INOTIFY_BITS) != 22);
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 0805b74cae44..b1c72edd9784 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -55,7 +55,6 @@
 #define FS_ACCESS_PERM		0x00020000	/* access event in a permissions hook */
 #define FS_OPEN_EXEC_PERM	0x00040000	/* open/exec event in a permission hook */
 
-#define FS_EXCL_UNLINK		0x04000000	/* do not send events if object is unlinked */
 /*
  * Set on inode mark that cares about things that happen to its children.
  * Always set for dnotify and inotify.
@@ -66,7 +65,6 @@
 #define FS_RENAME		0x10000000	/* File was renamed */
 #define FS_DN_MULTISHOT		0x20000000	/* dnotify multishot */
 #define FS_ISDIR		0x40000000	/* event occurred against dir */
-#define FS_IN_ONESHOT		0x80000000	/* only send event once */
 
 #define FS_MOVE			(FS_MOVED_FROM | FS_MOVED_TO)
 
@@ -106,8 +104,7 @@
 			     FS_ERROR)
 
 /* Extra flags that may be reported with event or control handling of events */
-#define ALL_FSNOTIFY_FLAGS  (FS_EXCL_UNLINK | FS_ISDIR | FS_IN_ONESHOT | \
-			     FS_DN_MULTISHOT | FS_EVENT_ON_CHILD)
+#define ALL_FSNOTIFY_FLAGS  (FS_ISDIR | FS_EVENT_ON_CHILD | FS_DN_MULTISHOT)
 
 #define ALL_FSNOTIFY_BITS   (ALL_FSNOTIFY_EVENTS | ALL_FSNOTIFY_FLAGS)
 
@@ -473,9 +470,14 @@ struct fsnotify_mark {
 	struct fsnotify_mark_connector *connector;
 	/* Events types to ignore [mark->lock, group->mark_mutex] */
 	__u32 ignored_mask;
-#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY	0x01
-#define FSNOTIFY_MARK_FLAG_ALIVE		0x02
-#define FSNOTIFY_MARK_FLAG_ATTACHED		0x04
+	/* General fsnotify mark flags */
+#define FSNOTIFY_MARK_FLAG_ALIVE		0x0001
+#define FSNOTIFY_MARK_FLAG_ATTACHED		0x0002
+	/* inotify mark flags */
+#define FSNOTIFY_MARK_FLAG_EXCL_UNLINK		0x0010
+#define FSNOTIFY_MARK_FLAG_IN_ONESHOT		0x0020
+	/* fanotify mark flags */
+#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY	0x0100
 	unsigned int flags;		/* flags [mark->lock] */
 };
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 03/16] fsnotify: fix wrong lockdep annotations
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
  2022-04-13  9:09 ` [PATCH v3 01/16] inotify: show inotify mask flags in proc fdinfo Amir Goldstein
  2022-04-13  9:09 ` [PATCH v3 02/16] inotify: move control flags from mask to mark flags Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-13  9:09 ` [PATCH v3 04/16] fsnotify: pass flags argument to fsnotify_add_mark() via mark Amir Goldstein
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

Commit 6960b0d909cd ("fsnotify: change locking order") changed some
of the mark_mutex locks in direct reclaim path to use:
  mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);

This change is explained:
 "...It uses nested locking to avoid deadlock in case we do the final
  iput() on an inode which still holds marks and thus would take the
  mutex again when calling fsnotify_inode_delete() in destroy_inode()."

The problem is that the mutex_lock_nested() is not a nested lock at
all. In fact, it has the opposite effect of preventing lockdep from
warning about a very possible deadlock.

Due to these wrong annotations, a deadlock that was introduced with
nfsd filecache in kernel v5.4 went unnoticed in v5.4.y for over two
years until it was reported recently by Khazhismel Kumykov, only to
find out that the deadlock was already fixed in kernel v5.5.

Fix the wrong lockdep annotations.

Cc: Khazhismel Kumykov <khazhy@google.com>
Fixes: 6960b0d909cd ("fsnotify: change locking order")
Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/mark.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 4853184f7dde..c86982be2d50 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -452,7 +452,7 @@ void fsnotify_free_mark(struct fsnotify_mark *mark)
 void fsnotify_destroy_mark(struct fsnotify_mark *mark,
 			   struct fsnotify_group *group)
 {
-	mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
+	mutex_lock(&group->mark_mutex);
 	fsnotify_detach_mark(mark);
 	mutex_unlock(&group->mark_mutex);
 	fsnotify_free_mark(mark);
@@ -770,7 +770,7 @@ void fsnotify_clear_marks_by_group(struct fsnotify_group *group,
 	 * move marks to free to to_free list in one go and then free marks in
 	 * to_free list one by one.
 	 */
-	mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
+	mutex_lock(&group->mark_mutex);
 	list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) {
 		if (mark->connector->type == obj_type)
 			list_move(&mark->g_list, &to_free);
@@ -779,7 +779,7 @@ void fsnotify_clear_marks_by_group(struct fsnotify_group *group,
 
 clear:
 	while (1) {
-		mutex_lock_nested(&group->mark_mutex, SINGLE_DEPTH_NESTING);
+		mutex_lock(&group->mark_mutex);
 		if (list_empty(head)) {
 			mutex_unlock(&group->mark_mutex);
 			break;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 04/16] fsnotify: pass flags argument to fsnotify_add_mark() via mark
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
                   ` (2 preceding siblings ...)
  2022-04-13  9:09 ` [PATCH v3 03/16] fsnotify: fix wrong lockdep annotations Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-21 14:18   ` Jan Kara
  2022-04-13  9:09 ` [PATCH v3 05/16] fsnotify: pass flags argument to fsnotify_alloc_group() Amir Goldstein
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

Instead of passing the allow_dups argument to fsnotify_add_mark()
as an argument, define the mark flag FSNOTIFY_MARK_FLAG_ALLOW_DUPS
to express the old allow_dups meaning and pass this information on the
mark itself.

We use mark->flags to pass inotify control flags and will pass more
control flags in the future so let's make this the standard.

Although the FSNOTIFY_MARK_FLAG_ALLOW_DUPS flag is not used after the
call to fsnotify_add_mark(), it does not hurt to leave this information
on the mark for future use.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify_user.c |  2 +-
 fs/notify/inotify/inotify_user.c   |  4 ++--
 fs/notify/mark.c                   | 13 ++++++-------
 include/linux/fsnotify_backend.h   | 19 ++++++++++---------
 kernel/audit_fsnotify.c            |  3 ++-
 5 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 9b32b76a9c30..0f0db1efa379 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1144,7 +1144,7 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
 	}
 
 	fsnotify_init_mark(mark, group);
-	ret = fsnotify_add_mark_locked(mark, connp, obj_type, 0, fsid);
+	ret = fsnotify_add_mark_locked(mark, connp, obj_type, fsid);
 	if (ret) {
 		fsnotify_put_mark(mark);
 		goto out_dec_ucounts;
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index d8907d32a05b..6fc0f598a7aa 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -603,7 +603,6 @@ static int inotify_new_watch(struct fsnotify_group *group,
 
 	fsnotify_init_mark(&tmp_i_mark->fsn_mark, group);
 	tmp_i_mark->fsn_mark.mask = inotify_arg_to_mask(inode, arg);
-	tmp_i_mark->fsn_mark.flags = inotify_arg_to_flags(arg);
 	tmp_i_mark->wd = -1;
 
 	ret = inotify_add_to_idr(idr, idr_lock, tmp_i_mark);
@@ -618,7 +617,8 @@ static int inotify_new_watch(struct fsnotify_group *group,
 	}
 
 	/* we are on the idr, now get on the inode */
-	ret = fsnotify_add_inode_mark_locked(&tmp_i_mark->fsn_mark, inode, 0);
+	ret = fsnotify_add_inode_mark_locked(&tmp_i_mark->fsn_mark, inode,
+					     inotify_arg_to_flags(arg));
 	if (ret) {
 		/* we failed to get on the inode, get off the idr */
 		inotify_remove_from_idr(group, tmp_i_mark);
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index c86982be2d50..ea8f557881b1 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -574,7 +574,7 @@ static struct fsnotify_mark_connector *fsnotify_grab_connector(
 static int fsnotify_add_mark_list(struct fsnotify_mark *mark,
 				  fsnotify_connp_t *connp,
 				  unsigned int obj_type,
-				  int allow_dups, __kernel_fsid_t *fsid)
+				  __kernel_fsid_t *fsid)
 {
 	struct fsnotify_mark *lmark, *last = NULL;
 	struct fsnotify_mark_connector *conn;
@@ -633,7 +633,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark,
 
 		if ((lmark->group == mark->group) &&
 		    (lmark->flags & FSNOTIFY_MARK_FLAG_ATTACHED) &&
-		    !allow_dups) {
+		    !(mark->flags & FSNOTIFY_MARK_FLAG_ALLOW_DUPS)) {
 			err = -EEXIST;
 			goto out_err;
 		}
@@ -668,7 +668,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark,
  */
 int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 			     fsnotify_connp_t *connp, unsigned int obj_type,
-			     int allow_dups, __kernel_fsid_t *fsid)
+			     __kernel_fsid_t *fsid)
 {
 	struct fsnotify_group *group = mark->group;
 	int ret = 0;
@@ -688,7 +688,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 	fsnotify_get_mark(mark); /* for g_list */
 	spin_unlock(&mark->lock);
 
-	ret = fsnotify_add_mark_list(mark, connp, obj_type, allow_dups, fsid);
+	ret = fsnotify_add_mark_list(mark, connp, obj_type, fsid);
 	if (ret)
 		goto err;
 
@@ -708,14 +708,13 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 }
 
 int fsnotify_add_mark(struct fsnotify_mark *mark, fsnotify_connp_t *connp,
-		      unsigned int obj_type, int allow_dups,
-		      __kernel_fsid_t *fsid)
+		      unsigned int obj_type, __kernel_fsid_t *fsid)
 {
 	int ret;
 	struct fsnotify_group *group = mark->group;
 
 	mutex_lock(&group->mark_mutex);
-	ret = fsnotify_add_mark_locked(mark, connp, obj_type, allow_dups, fsid);
+	ret = fsnotify_add_mark_locked(mark, connp, obj_type, fsid);
 	mutex_unlock(&group->mark_mutex);
 	return ret;
 }
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index b1c72edd9784..2ff686882303 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -473,6 +473,7 @@ struct fsnotify_mark {
 	/* General fsnotify mark flags */
 #define FSNOTIFY_MARK_FLAG_ALIVE		0x0001
 #define FSNOTIFY_MARK_FLAG_ATTACHED		0x0002
+#define FSNOTIFY_MARK_FLAG_ALLOW_DUPS		0x0004
 	/* inotify mark flags */
 #define FSNOTIFY_MARK_FLAG_EXCL_UNLINK		0x0010
 #define FSNOTIFY_MARK_FLAG_IN_ONESHOT		0x0020
@@ -634,30 +635,30 @@ extern struct fsnotify_mark *fsnotify_find_mark(fsnotify_connp_t *connp,
 /* Get cached fsid of filesystem containing object */
 extern int fsnotify_get_conn_fsid(const struct fsnotify_mark_connector *conn,
 				  __kernel_fsid_t *fsid);
+
 /* attach the mark to the object */
 extern int fsnotify_add_mark(struct fsnotify_mark *mark,
 			     fsnotify_connp_t *connp, unsigned int obj_type,
-			     int allow_dups, __kernel_fsid_t *fsid);
+			     __kernel_fsid_t *fsid);
 extern int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 				    fsnotify_connp_t *connp,
-				    unsigned int obj_type, int allow_dups,
+				    unsigned int obj_type,
 				    __kernel_fsid_t *fsid);
 
 /* attach the mark to the inode */
 static inline int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
-					  struct inode *inode,
-					  int allow_dups)
+					  struct inode *inode, int flags)
 {
+	mark->flags = flags;
 	return fsnotify_add_mark(mark, &inode->i_fsnotify_marks,
-				 FSNOTIFY_OBJ_TYPE_INODE, allow_dups, NULL);
+				 FSNOTIFY_OBJ_TYPE_INODE, NULL);
 }
 static inline int fsnotify_add_inode_mark_locked(struct fsnotify_mark *mark,
-						 struct inode *inode,
-						 int allow_dups)
+						 struct inode *inode, int flags)
 {
+	mark->flags = flags;
 	return fsnotify_add_mark_locked(mark, &inode->i_fsnotify_marks,
-					FSNOTIFY_OBJ_TYPE_INODE, allow_dups,
-					NULL);
+					FSNOTIFY_OBJ_TYPE_INODE, NULL);
 }
 
 /* given a group and a mark, flag mark to be freed when all references are dropped */
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
index 02348b48447c..3c35649bc7f5 100644
--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -100,7 +100,8 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
 	audit_update_mark(audit_mark, dentry->d_inode);
 	audit_mark->rule = krule;
 
-	ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, true);
+	ret = fsnotify_add_inode_mark(&audit_mark->mark, inode,
+				      FSNOTIFY_MARK_FLAG_ALLOW_DUPS);
 	if (ret < 0) {
 		fsnotify_put_mark(&audit_mark->mark);
 		audit_mark = ERR_PTR(ret);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 05/16] fsnotify: pass flags argument to fsnotify_alloc_group()
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
                   ` (3 preceding siblings ...)
  2022-04-13  9:09 ` [PATCH v3 04/16] fsnotify: pass flags argument to fsnotify_add_mark() via mark Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-21 14:34   ` Jan Kara
  2022-04-13  9:09 ` [PATCH v3 06/16] fsnotify: create helpers for group mark_mutex lock Amir Goldstein
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

Add flags argument to fsnotify_alloc_group(), define and use the flag
FSNOTIFY_GROUP_USER in inotify and fanotify instead of the helper
fsnotify_alloc_user_group() to indicate user allocation.

Although the flag FSNOTIFY_GROUP_USER is currently not used after group
allocation, we store the flags argument in the group struct for future
use of other group flags.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/nfsd/filecache.c                |  3 ++-
 fs/notify/dnotify/dnotify.c        |  2 +-
 fs/notify/fanotify/fanotify_user.c |  3 ++-
 fs/notify/group.c                  | 21 +++++++++------------
 fs/notify/inotify/inotify_user.c   |  3 ++-
 include/linux/fsnotify_backend.h   | 10 ++++++++--
 kernel/audit_fsnotify.c            |  3 ++-
 kernel/audit_tree.c                |  2 +-
 kernel/audit_watch.c               |  2 +-
 9 files changed, 28 insertions(+), 21 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index c08882f5867b..79a5b052fcdf 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -673,7 +673,8 @@ nfsd_file_cache_init(void)
 		goto out_shrinker;
 	}
 
-	nfsd_file_fsnotify_group = fsnotify_alloc_group(&nfsd_file_fsnotify_ops);
+	nfsd_file_fsnotify_group = fsnotify_alloc_group(&nfsd_file_fsnotify_ops,
+							0);
 	if (IS_ERR(nfsd_file_fsnotify_group)) {
 		pr_err("nfsd: unable to create fsnotify group: %ld\n",
 			PTR_ERR(nfsd_file_fsnotify_group));
diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index 829dd4a61b66..e4779926edf4 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -401,7 +401,7 @@ static int __init dnotify_init(void)
 					  SLAB_PANIC|SLAB_ACCOUNT);
 	dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC|SLAB_ACCOUNT);
 
-	dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops);
+	dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops, 0);
 	if (IS_ERR(dnotify_group))
 		panic("unable to allocate fsnotify group for dnotify\n");
 	dnotify_sysctl_init();
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 0f0db1efa379..bf72856da42e 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1355,7 +1355,8 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 		f_flags |= O_NONBLOCK;
 
 	/* fsnotify_alloc_group takes a ref.  Dropped in fanotify_release */
-	group = fsnotify_alloc_user_group(&fanotify_fsnotify_ops);
+	group = fsnotify_alloc_group(&fanotify_fsnotify_ops,
+				     FSNOTIFY_GROUP_USER);
 	if (IS_ERR(group)) {
 		return PTR_ERR(group);
 	}
diff --git a/fs/notify/group.c b/fs/notify/group.c
index b7d4d64f87c2..18446b7b0d49 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -112,7 +112,8 @@ void fsnotify_put_group(struct fsnotify_group *group)
 EXPORT_SYMBOL_GPL(fsnotify_put_group);
 
 static struct fsnotify_group *__fsnotify_alloc_group(
-				const struct fsnotify_ops *ops, gfp_t gfp)
+				const struct fsnotify_ops *ops,
+				int flags, gfp_t gfp)
 {
 	struct fsnotify_group *group;
 
@@ -133,6 +134,7 @@ static struct fsnotify_group *__fsnotify_alloc_group(
 	INIT_LIST_HEAD(&group->marks_list);
 
 	group->ops = ops;
+	group->flags = flags;
 
 	return group;
 }
@@ -140,20 +142,15 @@ static struct fsnotify_group *__fsnotify_alloc_group(
 /*
  * Create a new fsnotify_group and hold a reference for the group returned.
  */
-struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops)
+struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops,
+					    int flags)
 {
-	return __fsnotify_alloc_group(ops, GFP_KERNEL);
-}
-EXPORT_SYMBOL_GPL(fsnotify_alloc_group);
+	gfp_t gfp = (flags & FSNOTIFY_GROUP_USER) ? GFP_KERNEL_ACCOUNT :
+						    GFP_KERNEL;
 
-/*
- * Create a new fsnotify_group and hold a reference for the group returned.
- */
-struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops)
-{
-	return __fsnotify_alloc_group(ops, GFP_KERNEL_ACCOUNT);
+	return __fsnotify_alloc_group(ops, flags, gfp);
 }
-EXPORT_SYMBOL_GPL(fsnotify_alloc_user_group);
+EXPORT_SYMBOL_GPL(fsnotify_alloc_group);
 
 int fsnotify_fasync(int fd, struct file *file, int on)
 {
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 6fc0f598a7aa..65ff637cb4a3 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -656,7 +656,8 @@ static struct fsnotify_group *inotify_new_group(unsigned int max_events)
 	struct fsnotify_group *group;
 	struct inotify_event_info *oevent;
 
-	group = fsnotify_alloc_user_group(&inotify_fsnotify_ops);
+	group = fsnotify_alloc_group(&inotify_fsnotify_ops,
+				     FSNOTIFY_GROUP_USER);
 	if (IS_ERR(group))
 		return group;
 
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 2ff686882303..2057ae4bf8e9 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -210,6 +210,11 @@ struct fsnotify_group {
 	unsigned int priority;
 	bool shutdown;		/* group is being shut down, don't queue more events */
 
+#define FSNOTIFY_GROUP_USER	0x01 /* user allocated group */
+#define FSNOTIFY_GROUP_FLAG(group, flag) \
+	((group)->flags & FSNOTIFY_GROUP_ ## flag)
+	int flags;
+
 	/* stores all fastpath marks assoc with this group so they can be cleaned on unregister */
 	struct mutex mark_mutex;	/* protect marks_list */
 	atomic_t user_waits;		/* Number of tasks waiting for user
@@ -544,8 +549,9 @@ static inline void fsnotify_update_flags(struct dentry *dentry)
 /* called from fsnotify listeners, such as fanotify or dnotify */
 
 /* create a new group */
-extern struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops);
-extern struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops);
+extern struct fsnotify_group *fsnotify_alloc_group(
+				const struct fsnotify_ops *ops,
+				int flags);
 /* get reference to a group */
 extern void fsnotify_get_group(struct fsnotify_group *group);
 /* drop reference on a group from fsnotify_alloc_group */
diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
index 3c35649bc7f5..95e8b75e7634 100644
--- a/kernel/audit_fsnotify.c
+++ b/kernel/audit_fsnotify.c
@@ -182,7 +182,8 @@ static const struct fsnotify_ops audit_mark_fsnotify_ops = {
 
 static int __init audit_fsnotify_init(void)
 {
-	audit_fsnotify_group = fsnotify_alloc_group(&audit_mark_fsnotify_ops);
+	audit_fsnotify_group = fsnotify_alloc_group(&audit_mark_fsnotify_ops,
+						    0);
 	if (IS_ERR(audit_fsnotify_group)) {
 		audit_fsnotify_group = NULL;
 		audit_panic("cannot create audit fsnotify group");
diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index e7315d487163..b5c02f8573fe 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -1074,7 +1074,7 @@ static int __init audit_tree_init(void)
 
 	audit_tree_mark_cachep = KMEM_CACHE(audit_tree_mark, SLAB_PANIC);
 
-	audit_tree_group = fsnotify_alloc_group(&audit_tree_ops);
+	audit_tree_group = fsnotify_alloc_group(&audit_tree_ops, 0);
 	if (IS_ERR(audit_tree_group))
 		audit_panic("cannot initialize fsnotify group for rectree watches");
 
diff --git a/kernel/audit_watch.c b/kernel/audit_watch.c
index 713b256be944..4b0957aa2cd4 100644
--- a/kernel/audit_watch.c
+++ b/kernel/audit_watch.c
@@ -493,7 +493,7 @@ static const struct fsnotify_ops audit_watch_fsnotify_ops = {
 
 static int __init audit_watch_init(void)
 {
-	audit_watch_group = fsnotify_alloc_group(&audit_watch_fsnotify_ops);
+	audit_watch_group = fsnotify_alloc_group(&audit_watch_fsnotify_ops, 0);
 	if (IS_ERR(audit_watch_group)) {
 		audit_watch_group = NULL;
 		audit_panic("cannot create audit fsnotify group");
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 06/16] fsnotify: create helpers for group mark_mutex lock
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
                   ` (4 preceding siblings ...)
  2022-04-13  9:09 ` [PATCH v3 05/16] fsnotify: pass flags argument to fsnotify_alloc_group() Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-13  9:09 ` [PATCH v3 07/16] inotify: use fsnotify group lock helpers Amir Goldstein
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

Create helpers to take and release the group mark_mutex lock.

Define a flag FSNOTIFY_GROUP_NOFS in fsnotify_group that determines
if the mark_mutex lock is fs reclaim safe or not.  If not safe, the
lock helpers take the lock and disable direct fs reclaim.

In that case we annotate the mutex with a different lockdep class to
express to lockdep that an allocation of mark of an fs reclaim safe group
may take the group lock of another "NOFS" group to evict inodes.

For now, converted only the callers in common code and no backend
defines the NOFS flag.  It is intended to be set by fanotify for
evictable marks support.

Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fdinfo.c               |  4 ++--
 fs/notify/group.c                | 11 +++++++++++
 fs/notify/mark.c                 | 24 +++++++++++-------------
 include/linux/fsnotify_backend.h | 28 ++++++++++++++++++++++++++++
 4 files changed, 52 insertions(+), 15 deletions(-)

diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index 3451708fd035..1f34c5c29fdb 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -28,13 +28,13 @@ static void show_fdinfo(struct seq_file *m, struct file *f,
 	struct fsnotify_group *group = f->private_data;
 	struct fsnotify_mark *mark;
 
-	mutex_lock(&group->mark_mutex);
+	fsnotify_group_lock(group);
 	list_for_each_entry(mark, &group->marks_list, g_list) {
 		show(m, mark);
 		if (seq_has_overflowed(m))
 			break;
 	}
-	mutex_unlock(&group->mark_mutex);
+	fsnotify_group_unlock(group);
 }
 
 #if defined(CONFIG_EXPORTFS)
diff --git a/fs/notify/group.c b/fs/notify/group.c
index 18446b7b0d49..3664029af5ae 100644
--- a/fs/notify/group.c
+++ b/fs/notify/group.c
@@ -115,6 +115,7 @@ static struct fsnotify_group *__fsnotify_alloc_group(
 				const struct fsnotify_ops *ops,
 				int flags, gfp_t gfp)
 {
+	static struct lock_class_key nofs_marks_lock;
 	struct fsnotify_group *group;
 
 	group = kzalloc(sizeof(struct fsnotify_group), gfp);
@@ -135,6 +136,16 @@ static struct fsnotify_group *__fsnotify_alloc_group(
 
 	group->ops = ops;
 	group->flags = flags;
+	/*
+	 * For most backends, eviction of inode with a mark is not expected,
+	 * because marks hold a refcount on the inode against eviction.
+	 *
+	 * Use a different lockdep class for groups that support evictable
+	 * inode marks, because with evictable marks, mark_mutex is NOT
+	 * fs-reclaim safe - the mutex is taken when evicting inodes.
+	 */
+	if (FSNOTIFY_GROUP_FLAG(group, NOFS))
+		lockdep_set_class(&group->mark_mutex, &nofs_marks_lock);
 
 	return group;
 }
diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index ea8f557881b1..7120918d8251 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -398,9 +398,7 @@ void fsnotify_finish_user_wait(struct fsnotify_iter_info *iter_info)
  */
 void fsnotify_detach_mark(struct fsnotify_mark *mark)
 {
-	struct fsnotify_group *group = mark->group;
-
-	WARN_ON_ONCE(!mutex_is_locked(&group->mark_mutex));
+	fsnotify_group_assert_locked(mark->group);
 	WARN_ON_ONCE(!srcu_read_lock_held(&fsnotify_mark_srcu) &&
 		     refcount_read(&mark->refcnt) < 1 +
 			!!(mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED));
@@ -452,9 +450,9 @@ void fsnotify_free_mark(struct fsnotify_mark *mark)
 void fsnotify_destroy_mark(struct fsnotify_mark *mark,
 			   struct fsnotify_group *group)
 {
-	mutex_lock(&group->mark_mutex);
+	fsnotify_group_lock(group);
 	fsnotify_detach_mark(mark);
-	mutex_unlock(&group->mark_mutex);
+	fsnotify_group_unlock(group);
 	fsnotify_free_mark(mark);
 }
 EXPORT_SYMBOL_GPL(fsnotify_destroy_mark);
@@ -673,7 +671,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 	struct fsnotify_group *group = mark->group;
 	int ret = 0;
 
-	BUG_ON(!mutex_is_locked(&group->mark_mutex));
+	fsnotify_group_assert_locked(group);
 
 	/*
 	 * LOCKING ORDER!!!!
@@ -713,9 +711,9 @@ int fsnotify_add_mark(struct fsnotify_mark *mark, fsnotify_connp_t *connp,
 	int ret;
 	struct fsnotify_group *group = mark->group;
 
-	mutex_lock(&group->mark_mutex);
+	fsnotify_group_lock(group);
 	ret = fsnotify_add_mark_locked(mark, connp, obj_type, fsid);
-	mutex_unlock(&group->mark_mutex);
+	fsnotify_group_unlock(group);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(fsnotify_add_mark);
@@ -769,24 +767,24 @@ void fsnotify_clear_marks_by_group(struct fsnotify_group *group,
 	 * move marks to free to to_free list in one go and then free marks in
 	 * to_free list one by one.
 	 */
-	mutex_lock(&group->mark_mutex);
+	fsnotify_group_lock(group);
 	list_for_each_entry_safe(mark, lmark, &group->marks_list, g_list) {
 		if (mark->connector->type == obj_type)
 			list_move(&mark->g_list, &to_free);
 	}
-	mutex_unlock(&group->mark_mutex);
+	fsnotify_group_unlock(group);
 
 clear:
 	while (1) {
-		mutex_lock(&group->mark_mutex);
+		fsnotify_group_lock(group);
 		if (list_empty(head)) {
-			mutex_unlock(&group->mark_mutex);
+			fsnotify_group_unlock(group);
 			break;
 		}
 		mark = list_first_entry(head, struct fsnotify_mark, g_list);
 		fsnotify_get_mark(mark);
 		fsnotify_detach_mark(mark);
-		mutex_unlock(&group->mark_mutex);
+		fsnotify_group_unlock(group);
 		fsnotify_free_mark(mark);
 		fsnotify_put_mark(mark);
 	}
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index 2057ae4bf8e9..fc6fdbaac797 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -20,6 +20,7 @@
 #include <linux/user_namespace.h>
 #include <linux/refcount.h>
 #include <linux/mempool.h>
+#include <linux/sched/mm.h>
 
 /*
  * IN_* from inotfy.h lines up EXACTLY with FS_*, this is so we can easily
@@ -211,9 +212,11 @@ struct fsnotify_group {
 	bool shutdown;		/* group is being shut down, don't queue more events */
 
 #define FSNOTIFY_GROUP_USER	0x01 /* user allocated group */
+#define FSNOTIFY_GROUP_NOFS	0x02 /* group lock is not direct reclaim safe */
 #define FSNOTIFY_GROUP_FLAG(group, flag) \
 	((group)->flags & FSNOTIFY_GROUP_ ## flag)
 	int flags;
+	unsigned int owner_flags;	/* stored flags of mark_mutex owner */
 
 	/* stores all fastpath marks assoc with this group so they can be cleaned on unregister */
 	struct mutex mark_mutex;	/* protect marks_list */
@@ -255,6 +258,31 @@ struct fsnotify_group {
 	};
 };
 
+/*
+ * These helpers are used to prevent deadlock when reclaiming inodes with
+ * evictable marks of the same group that is allocating a new mark.
+ */
+static inline void fsnotify_group_lock(struct fsnotify_group *group)
+{
+	mutex_lock(&group->mark_mutex);
+	if (FSNOTIFY_GROUP_FLAG(group, NOFS))
+		group->owner_flags = memalloc_nofs_save();
+}
+
+static inline void fsnotify_group_unlock(struct fsnotify_group *group)
+{
+	if (FSNOTIFY_GROUP_FLAG(group, NOFS))
+		memalloc_nofs_restore(group->owner_flags);
+	mutex_unlock(&group->mark_mutex);
+}
+
+static inline void fsnotify_group_assert_locked(struct fsnotify_group *group)
+{
+	WARN_ON_ONCE(!mutex_is_locked(&group->mark_mutex));
+	if (FSNOTIFY_GROUP_FLAG(group, NOFS))
+		WARN_ON_ONCE(!(current->flags & PF_MEMALLOC_NOFS));
+}
+
 /* When calling fsnotify tell it if the data is a path or inode */
 enum fsnotify_data_type {
 	FSNOTIFY_EVENT_NONE,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 07/16] inotify: use fsnotify group lock helpers
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
                   ` (5 preceding siblings ...)
  2022-04-13  9:09 ` [PATCH v3 06/16] fsnotify: create helpers for group mark_mutex lock Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-13  9:09 ` [PATCH v3 08/16] audit: " Amir Goldstein
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

inotify inode marks pin the inode so there is no need to set the
FSNOTIFY_GROUP_NOFS flag.

Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/inotify/inotify_user.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 65ff637cb4a3..5cdb2f74bca6 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -640,13 +640,13 @@ static int inotify_update_watch(struct fsnotify_group *group, struct inode *inod
 {
 	int ret = 0;
 
-	mutex_lock(&group->mark_mutex);
+	fsnotify_group_lock(group);
 	/* try to update and existing watch with the new arg */
 	ret = inotify_update_existing_watch(group, inode, arg);
 	/* no mark present, try to add a new one */
 	if (ret == -ENOENT)
 		ret = inotify_new_watch(group, inode, arg);
-	mutex_unlock(&group->mark_mutex);
+	fsnotify_group_unlock(group);
 
 	return ret;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 08/16] audit: use fsnotify group lock helpers
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
                   ` (6 preceding siblings ...)
  2022-04-13  9:09 ` [PATCH v3 07/16] inotify: use fsnotify group lock helpers Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-13  9:09 ` [PATCH v3 09/16] nfsd: " Amir Goldstein
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

audit inode marks pin the inode so there is no need to set the
FSNOTIFY_GROUP_NOFS flag.

Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 kernel/audit_tree.c | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/kernel/audit_tree.c b/kernel/audit_tree.c
index b5c02f8573fe..e867c17d3f84 100644
--- a/kernel/audit_tree.c
+++ b/kernel/audit_tree.c
@@ -351,7 +351,7 @@ static void untag_chunk(struct audit_chunk *chunk, struct fsnotify_mark *mark)
 	struct audit_chunk *new;
 	int size;
 
-	mutex_lock(&audit_tree_group->mark_mutex);
+	fsnotify_group_lock(audit_tree_group);
 	/*
 	 * mark_mutex stabilizes chunk attached to the mark so we can check
 	 * whether it didn't change while we've dropped hash_lock.
@@ -368,7 +368,7 @@ static void untag_chunk(struct audit_chunk *chunk, struct fsnotify_mark *mark)
 		replace_mark_chunk(mark, NULL);
 		spin_unlock(&hash_lock);
 		fsnotify_detach_mark(mark);
-		mutex_unlock(&audit_tree_group->mark_mutex);
+		fsnotify_group_unlock(audit_tree_group);
 		audit_mark_put_chunk(chunk);
 		fsnotify_free_mark(mark);
 		return;
@@ -385,12 +385,12 @@ static void untag_chunk(struct audit_chunk *chunk, struct fsnotify_mark *mark)
 	 */
 	replace_chunk(new, chunk);
 	spin_unlock(&hash_lock);
-	mutex_unlock(&audit_tree_group->mark_mutex);
+	fsnotify_group_unlock(audit_tree_group);
 	audit_mark_put_chunk(chunk);
 	return;
 
 out_mutex:
-	mutex_unlock(&audit_tree_group->mark_mutex);
+	fsnotify_group_unlock(audit_tree_group);
 }
 
 /* Call with group->mark_mutex held, releases it */
@@ -400,19 +400,19 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
 	struct audit_chunk *chunk = alloc_chunk(1);
 
 	if (!chunk) {
-		mutex_unlock(&audit_tree_group->mark_mutex);
+		fsnotify_group_unlock(audit_tree_group);
 		return -ENOMEM;
 	}
 
 	mark = alloc_mark();
 	if (!mark) {
-		mutex_unlock(&audit_tree_group->mark_mutex);
+		fsnotify_group_unlock(audit_tree_group);
 		kfree(chunk);
 		return -ENOMEM;
 	}
 
 	if (fsnotify_add_inode_mark_locked(mark, inode, 0)) {
-		mutex_unlock(&audit_tree_group->mark_mutex);
+		fsnotify_group_unlock(audit_tree_group);
 		fsnotify_put_mark(mark);
 		kfree(chunk);
 		return -ENOSPC;
@@ -422,7 +422,7 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
 	if (tree->goner) {
 		spin_unlock(&hash_lock);
 		fsnotify_detach_mark(mark);
-		mutex_unlock(&audit_tree_group->mark_mutex);
+		fsnotify_group_unlock(audit_tree_group);
 		fsnotify_free_mark(mark);
 		fsnotify_put_mark(mark);
 		kfree(chunk);
@@ -444,7 +444,7 @@ static int create_chunk(struct inode *inode, struct audit_tree *tree)
 	 */
 	insert_hash(chunk);
 	spin_unlock(&hash_lock);
-	mutex_unlock(&audit_tree_group->mark_mutex);
+	fsnotify_group_unlock(audit_tree_group);
 	/*
 	 * Drop our initial reference. When mark we point to is getting freed,
 	 * we get notification through ->freeing_mark callback and cleanup
@@ -462,7 +462,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	struct audit_node *p;
 	int n;
 
-	mutex_lock(&audit_tree_group->mark_mutex);
+	fsnotify_group_lock(audit_tree_group);
 	mark = fsnotify_find_mark(&inode->i_fsnotify_marks, audit_tree_group);
 	if (!mark)
 		return create_chunk(inode, tree);
@@ -478,7 +478,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	for (n = 0; n < old->count; n++) {
 		if (old->owners[n].owner == tree) {
 			spin_unlock(&hash_lock);
-			mutex_unlock(&audit_tree_group->mark_mutex);
+			fsnotify_group_unlock(audit_tree_group);
 			fsnotify_put_mark(mark);
 			return 0;
 		}
@@ -487,7 +487,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 
 	chunk = alloc_chunk(old->count + 1);
 	if (!chunk) {
-		mutex_unlock(&audit_tree_group->mark_mutex);
+		fsnotify_group_unlock(audit_tree_group);
 		fsnotify_put_mark(mark);
 		return -ENOMEM;
 	}
@@ -495,7 +495,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	spin_lock(&hash_lock);
 	if (tree->goner) {
 		spin_unlock(&hash_lock);
-		mutex_unlock(&audit_tree_group->mark_mutex);
+		fsnotify_group_unlock(audit_tree_group);
 		fsnotify_put_mark(mark);
 		kfree(chunk);
 		return 0;
@@ -515,7 +515,7 @@ static int tag_chunk(struct inode *inode, struct audit_tree *tree)
 	 */
 	replace_chunk(chunk, old);
 	spin_unlock(&hash_lock);
-	mutex_unlock(&audit_tree_group->mark_mutex);
+	fsnotify_group_unlock(audit_tree_group);
 	fsnotify_put_mark(mark); /* pair to fsnotify_find_mark */
 	audit_mark_put_chunk(old);
 
@@ -1044,12 +1044,12 @@ static void audit_tree_freeing_mark(struct fsnotify_mark *mark,
 {
 	struct audit_chunk *chunk;
 
-	mutex_lock(&mark->group->mark_mutex);
+	fsnotify_group_lock(mark->group);
 	spin_lock(&hash_lock);
 	chunk = mark_chunk(mark);
 	replace_mark_chunk(mark, NULL);
 	spin_unlock(&hash_lock);
-	mutex_unlock(&mark->group->mark_mutex);
+	fsnotify_group_unlock(mark->group);
 	if (chunk) {
 		evict_chunk(chunk);
 		audit_mark_put_chunk(chunk);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 09/16] nfsd: use fsnotify group lock helpers
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
                   ` (7 preceding siblings ...)
  2022-04-13  9:09 ` [PATCH v3 08/16] audit: " Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-13  9:09 ` [PATCH v3 10/16] dnotify: " Amir Goldstein
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

Before commit 9542e6a643fc6 ("nfsd: Containerise filecache laundrette")
nfsd would close open files in direct reclaim context and that could
cause a deadlock when fsnotify mark allocation went into direct reclaim
and nfsd shrinker tried to free existing fsnotify marks.

To avoid issues like this in future code, set the FSNOTIFY_GROUP_NOFS
flag on nfsd fsnotify group to prevent going into direct reclaim from
fsnotify_add_inode_mark().

Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/nfsd/filecache.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 79a5b052fcdf..258599db119f 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -119,14 +119,14 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf)
 	struct inode *inode = nf->nf_inode;
 
 	do {
-		mutex_lock(&nfsd_file_fsnotify_group->mark_mutex);
+		fsnotify_group_lock(nfsd_file_fsnotify_group);
 		mark = fsnotify_find_mark(&inode->i_fsnotify_marks,
-				nfsd_file_fsnotify_group);
+					  nfsd_file_fsnotify_group);
 		if (mark) {
 			nfm = nfsd_file_mark_get(container_of(mark,
 						 struct nfsd_file_mark,
 						 nfm_mark));
-			mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex);
+			fsnotify_group_unlock(nfsd_file_fsnotify_group);
 			if (nfm) {
 				fsnotify_put_mark(mark);
 				break;
@@ -134,8 +134,9 @@ nfsd_file_mark_find_or_create(struct nfsd_file *nf)
 			/* Avoid soft lockup race with nfsd_file_mark_put() */
 			fsnotify_destroy_mark(mark, nfsd_file_fsnotify_group);
 			fsnotify_put_mark(mark);
-		} else
-			mutex_unlock(&nfsd_file_fsnotify_group->mark_mutex);
+		} else {
+			fsnotify_group_unlock(nfsd_file_fsnotify_group);
+		}
 
 		/* allocate a new nfm */
 		new = kmem_cache_alloc(nfsd_file_mark_slab, GFP_KERNEL);
@@ -674,7 +675,7 @@ nfsd_file_cache_init(void)
 	}
 
 	nfsd_file_fsnotify_group = fsnotify_alloc_group(&nfsd_file_fsnotify_ops,
-							0);
+							FSNOTIFY_GROUP_NOFS);
 	if (IS_ERR(nfsd_file_fsnotify_group)) {
 		pr_err("nfsd: unable to create fsnotify group: %ld\n",
 			PTR_ERR(nfsd_file_fsnotify_group));
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 10/16] dnotify: use fsnotify group lock helpers
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
                   ` (8 preceding siblings ...)
  2022-04-13  9:09 ` [PATCH v3 09/16] nfsd: " Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-13  9:09 ` [PATCH v3 11/16] fsnotify: allow adding an inode mark without pinning inode Amir Goldstein
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

Before commit 9542e6a643fc6 ("nfsd: Containerise filecache laundrette")
nfsd would close open files in direct reclaim context.  There is no
guarantee that others memory shrinkers don't do the same and no
guarantee that future shrinkers won't do that.

For example, if overlayfs implements inode cache of fscache would
keep open files to cached objects, inode shrinkers could end up closing
open files to underlying fs.

Direct reclaim from dnotify mark allocation context may try to close
open files that have dnotify marks of the same group and hit a deadlock
on mark_mutex.

Set the FSNOTIFY_GROUP_NOFS flag to prevent going into direct reclaim
from allocations under dnotify group lock and use the safe group lock
helpers.

Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/dnotify/dnotify.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index e4779926edf4..190aa717fa32 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -168,7 +168,7 @@ void dnotify_flush(struct file *filp, fl_owner_t id)
 		return;
 	dn_mark = container_of(fsn_mark, struct dnotify_mark, fsn_mark);
 
-	mutex_lock(&dnotify_group->mark_mutex);
+	fsnotify_group_lock(dnotify_group);
 
 	spin_lock(&fsn_mark->lock);
 	prev = &dn_mark->dn;
@@ -191,7 +191,7 @@ void dnotify_flush(struct file *filp, fl_owner_t id)
 		free = true;
 	}
 
-	mutex_unlock(&dnotify_group->mark_mutex);
+	fsnotify_group_unlock(dnotify_group);
 
 	if (free)
 		fsnotify_free_mark(fsn_mark);
@@ -324,7 +324,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 	new_dn_mark->dn = NULL;
 
 	/* this is needed to prevent the fcntl/close race described below */
-	mutex_lock(&dnotify_group->mark_mutex);
+	fsnotify_group_lock(dnotify_group);
 
 	/* add the new_fsn_mark or find an old one. */
 	fsn_mark = fsnotify_find_mark(&inode->i_fsnotify_marks, dnotify_group);
@@ -334,7 +334,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 	} else {
 		error = fsnotify_add_inode_mark_locked(new_fsn_mark, inode, 0);
 		if (error) {
-			mutex_unlock(&dnotify_group->mark_mutex);
+			fsnotify_group_unlock(dnotify_group);
 			goto out_err;
 		}
 		spin_lock(&new_fsn_mark->lock);
@@ -383,7 +383,7 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 
 	if (destroy)
 		fsnotify_detach_mark(fsn_mark);
-	mutex_unlock(&dnotify_group->mark_mutex);
+	fsnotify_group_unlock(dnotify_group);
 	if (destroy)
 		fsnotify_free_mark(fsn_mark);
 	fsnotify_put_mark(fsn_mark);
@@ -401,7 +401,8 @@ static int __init dnotify_init(void)
 					  SLAB_PANIC|SLAB_ACCOUNT);
 	dnotify_mark_cache = KMEM_CACHE(dnotify_mark, SLAB_PANIC|SLAB_ACCOUNT);
 
-	dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops, 0);
+	dnotify_group = fsnotify_alloc_group(&dnotify_fsnotify_ops,
+					     FSNOTIFY_GROUP_NOFS);
 	if (IS_ERR(dnotify_group))
 		panic("unable to allocate fsnotify group for dnotify\n");
 	dnotify_sysctl_init();
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 11/16] fsnotify: allow adding an inode mark without pinning inode
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
                   ` (9 preceding siblings ...)
  2022-04-13  9:09 ` [PATCH v3 10/16] dnotify: " Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-21 14:54   ` Jan Kara
  2022-04-13  9:09 ` [PATCH v3 12/16] fanotify: create helper fanotify_mark_user_flags() Amir Goldstein
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

fsnotify_add_mark() and variants implicitly take a reference on inode
when attaching a mark to an inode.

Make that behavior opt-out with the mark flag FSNOTIFY_MARK_FLAG_NO_IREF.

Instead of taking the inode reference when attaching connector to inode
and dropping the inode reference when detaching connector from inode,
take the inode reference on attach of the first mark that wants to hold
an inode reference and drop the inode reference on detach of the last
mark that wants to hold an inode reference.

Backends can "upgrade" an existing mark to take an inode reference, but
cannot "downgrade" a mark with inode reference to release the refernce.

This leaves the choice to the backend whether or not to pin the inode
when adding an inode mark.

This is intended to be used when adding a mark with ignored mask that is
used for optimization in cases where group can afford getting unneeded
events and reinstate the mark with ignored mask when inode is accessed
again after being evicted.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/mark.c                 | 80 ++++++++++++++++++++++++--------
 include/linux/fsnotify_backend.h |  2 +
 2 files changed, 62 insertions(+), 20 deletions(-)

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 7120918d8251..e38cb241536f 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -116,20 +116,67 @@ __u32 fsnotify_conn_mask(struct fsnotify_mark_connector *conn)
 	return *fsnotify_conn_mask_p(conn);
 }
 
-static void __fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
+/*
+ * Update the proxy refcount on inode maintainted by connector.
+ *
+ * When it's time to drop the proxy refcount, clear the HAS_IREF flag
+ * and return the inode object.  fsnotify_drop_object() will be resonsible
+ * for doing iput() outside of spinlocks when last mark that wanted iref
+ * is detached.
+ *
+ * Note that the proxy refcount is NOT dropped if backend only sets the
+ * NO_IREF mark flag and does detach the mark!
+ */
+static void fsnotify_get_inode_ref(struct inode *inode)
+{
+	ihold(inode);
+	atomic_long_inc(&inode->i_sb->s_fsnotify_connectors);
+}
+
+static struct inode *fsnotify_update_iref(struct fsnotify_mark_connector *conn,
+					  bool want_iref)
+{
+	bool has_iref = conn->flags & FSNOTIFY_CONN_FLAG_HAS_IREF;
+	struct inode *inode = NULL;
+
+	if (conn->type != FSNOTIFY_OBJ_TYPE_INODE ||
+	    want_iref == has_iref)
+		return NULL;
+
+	if (want_iref) {
+		/* Pin inode if any mark wants inode refcount held */
+		fsnotify_get_inode_ref(fsnotify_conn_inode(conn));
+		conn->flags |= FSNOTIFY_CONN_FLAG_HAS_IREF;
+	} else {
+		/* Unpin inode after detach of last mark that wanted iref */
+		inode = fsnotify_conn_inode(conn);
+		conn->flags &= ~FSNOTIFY_CONN_FLAG_HAS_IREF;
+	}
+
+	return inode;
+}
+
+static void *__fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
 {
 	u32 new_mask = 0;
+	bool want_iref = false;
 	struct fsnotify_mark *mark;
 
 	assert_spin_locked(&conn->lock);
 	/* We can get detached connector here when inode is getting unlinked. */
 	if (!fsnotify_valid_obj_type(conn->type))
-		return;
+		return NULL;
 	hlist_for_each_entry(mark, &conn->list, obj_list) {
-		if (mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED)
-			new_mask |= fsnotify_calc_mask(mark);
+		if (!(mark->flags & FSNOTIFY_MARK_FLAG_ATTACHED))
+			continue;
+		new_mask |= fsnotify_calc_mask(mark);
+		if (conn->type == FSNOTIFY_OBJ_TYPE_INODE &&
+		    !(mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF))
+			want_iref = true;
 	}
 	*fsnotify_conn_mask_p(conn) = new_mask;
+
+	return fsnotify_update_iref(conn, want_iref);
 }
 
 /*
@@ -169,12 +216,6 @@ static void fsnotify_connector_destroy_workfn(struct work_struct *work)
 	}
 }
 
-static void fsnotify_get_inode_ref(struct inode *inode)
-{
-	ihold(inode);
-	atomic_long_inc(&inode->i_sb->s_fsnotify_connectors);
-}
-
 static void fsnotify_put_inode_ref(struct inode *inode)
 {
 	struct super_block *sb = inode->i_sb;
@@ -213,6 +254,10 @@ static void *fsnotify_detach_connector_from_object(
 	if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) {
 		inode = fsnotify_conn_inode(conn);
 		inode->i_fsnotify_mask = 0;
+
+		/* Unpin inode when detaching from connector */
+		if (!(conn->flags & FSNOTIFY_CONN_FLAG_HAS_IREF))
+			inode = NULL;
 	} else if (conn->type == FSNOTIFY_OBJ_TYPE_VFSMOUNT) {
 		fsnotify_conn_mount(conn)->mnt_fsnotify_mask = 0;
 	} else if (conn->type == FSNOTIFY_OBJ_TYPE_SB) {
@@ -223,6 +268,7 @@ static void *fsnotify_detach_connector_from_object(
 	rcu_assign_pointer(*(conn->obj), NULL);
 	conn->obj = NULL;
 	conn->type = FSNOTIFY_OBJ_TYPE_DETACHED;
+	conn->flags = 0;
 
 	return inode;
 }
@@ -274,7 +320,8 @@ void fsnotify_put_mark(struct fsnotify_mark *mark)
 		objp = fsnotify_detach_connector_from_object(conn, &type);
 		free_conn = true;
 	} else {
-		__fsnotify_recalc_mask(conn);
+		objp = __fsnotify_recalc_mask(conn);
+		type = conn->type;
 	}
 	WRITE_ONCE(mark->connector, NULL);
 	spin_unlock(&conn->lock);
@@ -497,7 +544,6 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp,
 					       unsigned int obj_type,
 					       __kernel_fsid_t *fsid)
 {
-	struct inode *inode = NULL;
 	struct fsnotify_mark_connector *conn;
 
 	conn = kmem_cache_alloc(fsnotify_mark_connector_cachep, GFP_KERNEL);
@@ -505,6 +551,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp,
 		return -ENOMEM;
 	spin_lock_init(&conn->lock);
 	INIT_HLIST_HEAD(&conn->list);
+	conn->flags = 0;
 	conn->type = obj_type;
 	conn->obj = connp;
 	/* Cache fsid of filesystem containing the object */
@@ -515,10 +562,6 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp,
 		conn->fsid.val[0] = conn->fsid.val[1] = 0;
 		conn->flags = 0;
 	}
-	if (conn->type == FSNOTIFY_OBJ_TYPE_INODE) {
-		inode = fsnotify_conn_inode(conn);
-		fsnotify_get_inode_ref(inode);
-	}
 	fsnotify_get_sb_connectors(conn);
 
 	/*
@@ -527,8 +570,6 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp,
 	 */
 	if (cmpxchg(connp, NULL, conn)) {
 		/* Someone else created list structure for us */
-		if (inode)
-			fsnotify_put_inode_ref(inode);
 		fsnotify_put_sb_connectors(conn);
 		kmem_cache_free(fsnotify_mark_connector_cachep, conn);
 	}
@@ -690,8 +731,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
 	if (ret)
 		goto err;
 
-	if (mark->mask || mark->ignored_mask)
-		fsnotify_recalc_mask(mark->connector);
+	fsnotify_recalc_mask(mark->connector);
 
 	return ret;
 err:
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index fc6fdbaac797..478a985d13c2 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -457,6 +457,7 @@ struct fsnotify_mark_connector {
 	spinlock_t lock;
 	unsigned short type;	/* Type of object [lock] */
 #define FSNOTIFY_CONN_FLAG_HAS_FSID	0x01
+#define FSNOTIFY_CONN_FLAG_HAS_IREF	0x02
 	unsigned short flags;	/* flags [lock] */
 	__kernel_fsid_t fsid;	/* fsid of filesystem containing object */
 	union {
@@ -512,6 +513,7 @@ struct fsnotify_mark {
 #define FSNOTIFY_MARK_FLAG_IN_ONESHOT		0x0020
 	/* fanotify mark flags */
 #define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY	0x0100
+#define FSNOTIFY_MARK_FLAG_NO_IREF		0x0200
 	unsigned int flags;		/* flags [mark->lock] */
 };
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 12/16] fanotify: create helper fanotify_mark_user_flags()
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
                   ` (10 preceding siblings ...)
  2022-04-13  9:09 ` [PATCH v3 11/16] fsnotify: allow adding an inode mark without pinning inode Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-13  9:09 ` [PATCH v3 13/16] fanotify: factor out helper fanotify_mark_update_flags() Amir Goldstein
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

To translate from fsnotify mark flags to user visible flags.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.h | 10 ++++++++++
 fs/notify/fdinfo.c            |  6 ++----
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index a3d5b751cac5..87142bc0131a 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -490,3 +490,13 @@ static inline unsigned int fanotify_event_hash_bucket(
 {
 	return event->hash & FANOTIFY_HTABLE_MASK;
 }
+
+static inline unsigned int fanotify_mark_user_flags(struct fsnotify_mark *mark)
+{
+	unsigned int mflags = 0;
+
+	if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)
+		mflags |= FAN_MARK_IGNORED_SURV_MODIFY;
+
+	return mflags;
+}
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index 1f34c5c29fdb..59fb40abe33d 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -14,6 +14,7 @@
 #include <linux/exportfs.h>
 
 #include "inotify/inotify.h"
+#include "fanotify/fanotify.h"
 #include "fdinfo.h"
 #include "fsnotify.h"
 
@@ -103,12 +104,9 @@ void inotify_show_fdinfo(struct seq_file *m, struct file *f)
 
 static void fanotify_fdinfo(struct seq_file *m, struct fsnotify_mark *mark)
 {
-	unsigned int mflags = 0;
+	unsigned int mflags = fanotify_mark_user_flags(mark);
 	struct inode *inode;
 
-	if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)
-		mflags |= FAN_MARK_IGNORED_SURV_MODIFY;
-
 	if (mark->connector->type == FSNOTIFY_OBJ_TYPE_INODE) {
 		inode = igrab(fsnotify_conn_inode(mark->connector));
 		if (!inode)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 13/16] fanotify: factor out helper fanotify_mark_update_flags()
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
                   ` (11 preceding siblings ...)
  2022-04-13  9:09 ` [PATCH v3 12/16] fanotify: create helper fanotify_mark_user_flags() Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-21 15:00   ` Jan Kara
  2022-04-13  9:09 ` [PATCH v3 14/16] fanotify: implement "evictable" inode marks Amir Goldstein
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

Handle FAN_MARK_IGNORED_SURV_MODIFY flag change in a helper that
is called after updating the mark mask.

Move recalc of object mask inside fanotify_mark_add_to_mask() which
makes the code a bit simpler to follow.

Add also helper to translate fsnotify mark flags to user visible
fanotify mark flags.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify_user.c | 41 ++++++++++++++++--------------
 1 file changed, 22 insertions(+), 19 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index bf72856da42e..d8d44a5b37e3 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1081,42 +1081,48 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group,
 				    flags, umask);
 }
 
-static void fanotify_mark_add_ignored_mask(struct fsnotify_mark *fsn_mark,
-					   __u32 mask, unsigned int flags,
-					   __u32 *removed)
+static int fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark,
+				      unsigned int flags, bool *recalc)
 {
-	fsn_mark->ignored_mask |= mask;
-
 	/*
 	 * Setting FAN_MARK_IGNORED_SURV_MODIFY for the first time may lead to
 	 * the removal of the FS_MODIFY bit in calculated mask if it was set
 	 * because of an ignored mask that is now going to survive FS_MODIFY.
 	 */
 	if ((flags & FAN_MARK_IGNORED_SURV_MODIFY) &&
+	    (flags & FAN_MARK_IGNORED_MASK) &&
 	    !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) {
 		fsn_mark->flags |= FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY;
 		if (!(fsn_mark->mask & FS_MODIFY))
-			*removed = FS_MODIFY;
+			*recalc = true;
 	}
+
+	return 0;
 }
 
-static __u32 fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
-				       __u32 mask, unsigned int flags,
-				       __u32 *removed)
+static int fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
+				     __u32 mask, unsigned int flags)
 {
-	__u32 oldmask, newmask;
+	bool recalc = false;
+	int ret;
 
 	spin_lock(&fsn_mark->lock);
-	oldmask = fsnotify_calc_mask(fsn_mark);
 	if (!(flags & FAN_MARK_IGNORED_MASK)) {
 		fsn_mark->mask |= mask;
 	} else {
-		fanotify_mark_add_ignored_mask(fsn_mark, mask, flags, removed);
+		fsn_mark->ignored_mask |= mask;
 	}
-	newmask = fsnotify_calc_mask(fsn_mark);
+
+	recalc = fsnotify_calc_mask(fsn_mark) &
+		~fsnotify_conn_mask(fsn_mark->connector);
+
+	ret = fanotify_mark_update_flags(fsn_mark, flags, &recalc);
 	spin_unlock(&fsn_mark->lock);
 
-	return newmask & ~oldmask;
+	if (recalc)
+		fsnotify_recalc_mask(fsn_mark->connector);
+
+	return ret;
 }
 
 static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
@@ -1174,8 +1180,7 @@ static int fanotify_add_mark(struct fsnotify_group *group,
 			     __kernel_fsid_t *fsid)
 {
 	struct fsnotify_mark *fsn_mark;
-	__u32 added, removed = 0;
-	int ret = 0;
+	int ret;
 
 	mutex_lock(&group->mark_mutex);
 	fsn_mark = fsnotify_find_mark(connp, group);
@@ -1197,9 +1202,7 @@ static int fanotify_add_mark(struct fsnotify_group *group,
 			goto out;
 	}
 
-	added = fanotify_mark_add_to_mask(fsn_mark, mask, flags, &removed);
-	if (removed || (added & ~fsnotify_conn_mask(fsn_mark->connector)))
-		fsnotify_recalc_mask(fsn_mark->connector);
+	ret = fanotify_mark_add_to_mask(fsn_mark, mask, flags);
 
 out:
 	mutex_unlock(&group->mark_mutex);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 14/16] fanotify: implement "evictable" inode marks
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
                   ` (12 preceding siblings ...)
  2022-04-13  9:09 ` [PATCH v3 13/16] fanotify: factor out helper fanotify_mark_update_flags() Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-21 15:40   ` Jan Kara
  2022-04-13  9:09 ` [PATCH v3 15/16] fanotify: use fsnotify group lock helpers Amir Goldstein
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

When an inode mark is created with flag FAN_MARK_EVICTABLE, it will not
pin the marked inode to inode cache, so when inode is evicted from cache
due to memory pressure, the mark will be lost.

When an inode mark with flag FAN_MARK_EVICATBLE is updated without using
this flag, the marked inode is pinned to inode cache.

When an inode mark is updated with flag FAN_MARK_EVICTABLE but an
existing mark already has the inode pinned, the mark update fails with
error EEXIST.

Evictable inode marks can be used to setup inode marks with ignored mask
to suppress events from uninteresting files or directories in a lazy
manner, upon receiving the first event, without having to iterate all
the uninteresting files or directories before hand.

The evictbale inode mark feature allows performing this lazy marks setup
without exhausting the system memory with pinned inodes.

This change does not enable the feature yet.

Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiRDpuS=2uA6+ZUM7yG9vVU-u212tkunBmSnP_u=mkv=Q@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify.h      |  2 ++
 fs/notify/fanotify/fanotify_user.c | 37 +++++++++++++++++++++++++++++-
 include/uapi/linux/fanotify.h      |  1 +
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/fs/notify/fanotify/fanotify.h b/fs/notify/fanotify/fanotify.h
index 87142bc0131a..80e0ec95b113 100644
--- a/fs/notify/fanotify/fanotify.h
+++ b/fs/notify/fanotify/fanotify.h
@@ -497,6 +497,8 @@ static inline unsigned int fanotify_mark_user_flags(struct fsnotify_mark *mark)
 
 	if (mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)
 		mflags |= FAN_MARK_IGNORED_SURV_MODIFY;
+	if (mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF)
+		mflags |= FAN_MARK_EVICTABLE;
 
 	return mflags;
 }
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index d8d44a5b37e3..0b4beba95fa8 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1084,6 +1084,8 @@ static int fanotify_remove_inode_mark(struct fsnotify_group *group,
 static int fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark,
 				      unsigned int flags, bool *recalc)
 {
+	bool want_iref = !(flags & FAN_MARK_EVICTABLE);
+
 	/*
 	 * Setting FAN_MARK_IGNORED_SURV_MODIFY for the first time may lead to
 	 * the removal of the FS_MODIFY bit in calculated mask if it was set
@@ -1097,6 +1099,18 @@ static int fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark,
 			*recalc = true;
 	}
 
+	if (fsn_mark->connector->type != FSNOTIFY_OBJ_TYPE_INODE ||
+	    want_iref == !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF))
+		return 0;
+
+	/*
+	 * NO_IREF may be removed from a mark, but not added.
+	 * When removed, fsnotify_recalc_mask() will take the inode ref.
+	 */
+	WARN_ON_ONCE(!want_iref);
+	fsn_mark->flags &= ~FSNOTIFY_MARK_FLAG_NO_IREF;
+	*recalc = true;
+
 	return 0;
 }
 
@@ -1128,6 +1142,7 @@ static int fanotify_mark_add_to_mask(struct fsnotify_mark *fsn_mark,
 static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
 						   fsnotify_connp_t *connp,
 						   unsigned int obj_type,
+						   unsigned int fan_flags,
 						   __kernel_fsid_t *fsid)
 {
 	struct ucounts *ucounts = group->fanotify_data.ucounts;
@@ -1150,6 +1165,9 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
 	}
 
 	fsnotify_init_mark(mark, group);
+	if (fan_flags & FAN_MARK_EVICTABLE)
+		mark->flags |= FSNOTIFY_MARK_FLAG_NO_IREF;
+
 	ret = fsnotify_add_mark_locked(mark, connp, obj_type, fsid);
 	if (ret) {
 		fsnotify_put_mark(mark);
@@ -1185,13 +1203,22 @@ static int fanotify_add_mark(struct fsnotify_group *group,
 	mutex_lock(&group->mark_mutex);
 	fsn_mark = fsnotify_find_mark(connp, group);
 	if (!fsn_mark) {
-		fsn_mark = fanotify_add_new_mark(group, connp, obj_type, fsid);
+		fsn_mark = fanotify_add_new_mark(group, connp, obj_type, flags,
+						 fsid);
 		if (IS_ERR(fsn_mark)) {
 			mutex_unlock(&group->mark_mutex);
 			return PTR_ERR(fsn_mark);
 		}
 	}
 
+	/*
+	 * Non evictable mark cannot be downgraded to evictable mark.
+	 */
+	ret = -EEXIST;
+	if (flags & FAN_MARK_EVICTABLE &&
+	    !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF))
+		goto out;
+
 	/*
 	 * Error events are pre-allocated per group, only if strictly
 	 * needed (i.e. FAN_FS_ERROR was requested).
@@ -1601,6 +1628,14 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
 	    mark_type != FAN_MARK_FILESYSTEM)
 		goto fput_and_out;
 
+	/*
+	 * Evictable is only relevant for inode marks, because only inode object
+	 * can be evicted on memory pressure.
+	 */
+	if (flags & FAN_MARK_EVICTABLE &&
+	     mark_type != FAN_MARK_INODE)
+		goto fput_and_out;
+
 	/*
 	 * Events that do not carry enough information to report
 	 * event->fd require a group that supports reporting fid.  Those
diff --git a/include/uapi/linux/fanotify.h b/include/uapi/linux/fanotify.h
index e8ac38cc2fd6..f1f89132d60e 100644
--- a/include/uapi/linux/fanotify.h
+++ b/include/uapi/linux/fanotify.h
@@ -82,6 +82,7 @@
 #define FAN_MARK_IGNORED_SURV_MODIFY	0x00000040
 #define FAN_MARK_FLUSH		0x00000080
 /* FAN_MARK_FILESYSTEM is	0x00000100 */
+#define FAN_MARK_EVICTABLE	0x00000200
 
 /* These are NOT bitwise flags.  Both bits can be used togther.  */
 #define FAN_MARK_INODE		0x00000000
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 15/16] fanotify: use fsnotify group lock helpers
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
                   ` (13 preceding siblings ...)
  2022-04-13  9:09 ` [PATCH v3 14/16] fanotify: implement "evictable" inode marks Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-13  9:09 ` [PATCH v3 16/16] fanotify: enable "evictable" inode marks Amir Goldstein
  2022-04-21 15:41 ` [PATCH v3 00/16] Evictable fanotify marks Jan Kara
  16 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

Direct reclaim from fanotify mark allocation context may try to evict
inodes with evictable marks of the same group and hit this deadlock:

[<0>] fsnotify_destroy_mark+0x1f/0x3a
[<0>] fsnotify_destroy_marks+0x71/0xd9
[<0>] __destroy_inode+0x24/0x7e
[<0>] destroy_inode+0x2c/0x67
[<0>] dispose_list+0x49/0x68
[<0>] prune_icache_sb+0x5b/0x79
[<0>] super_cache_scan+0x11c/0x16f
[<0>] shrink_slab.constprop.0+0x23e/0x40f
[<0>] shrink_node+0x218/0x3e7
[<0>] do_try_to_free_pages+0x12a/0x2d2
[<0>] try_to_free_pages+0x166/0x242
[<0>] __alloc_pages_slowpath.constprop.0+0x30c/0x903
[<0>] __alloc_pages+0xeb/0x1c7
[<0>] cache_grow_begin+0x6f/0x31e
[<0>] fallback_alloc+0xe0/0x12d
[<0>] ____cache_alloc_node+0x15a/0x17e
[<0>] kmem_cache_alloc_trace+0xa1/0x143
[<0>] fanotify_add_mark+0xd5/0x2b2
[<0>] do_fanotify_mark+0x566/0x5eb
[<0>] __x64_sys_fanotify_mark+0x21/0x24
[<0>] do_syscall_64+0x6d/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Set the FSNOTIFY_GROUP_NOFS flag to prevent going into direct reclaim
from allocations under fanotify group lock and use the safe group lock
helpers.

Suggested-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify_user.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 0b4beba95fa8..fe2534043c17 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1035,10 +1035,10 @@ static int fanotify_remove_mark(struct fsnotify_group *group,
 	__u32 removed;
 	int destroy_mark;
 
-	mutex_lock(&group->mark_mutex);
+	fsnotify_group_lock(group);
 	fsn_mark = fsnotify_find_mark(connp, group);
 	if (!fsn_mark) {
-		mutex_unlock(&group->mark_mutex);
+		fsnotify_group_unlock(group);
 		return -ENOENT;
 	}
 
@@ -1048,7 +1048,7 @@ static int fanotify_remove_mark(struct fsnotify_group *group,
 		fsnotify_recalc_mask(fsn_mark->connector);
 	if (destroy_mark)
 		fsnotify_detach_mark(fsn_mark);
-	mutex_unlock(&group->mark_mutex);
+	fsnotify_group_unlock(group);
 	if (destroy_mark)
 		fsnotify_free_mark(fsn_mark);
 
@@ -1200,13 +1200,13 @@ static int fanotify_add_mark(struct fsnotify_group *group,
 	struct fsnotify_mark *fsn_mark;
 	int ret;
 
-	mutex_lock(&group->mark_mutex);
+	fsnotify_group_lock(group);
 	fsn_mark = fsnotify_find_mark(connp, group);
 	if (!fsn_mark) {
 		fsn_mark = fanotify_add_new_mark(group, connp, obj_type, flags,
 						 fsid);
 		if (IS_ERR(fsn_mark)) {
-			mutex_unlock(&group->mark_mutex);
+			fsnotify_group_unlock(group);
 			return PTR_ERR(fsn_mark);
 		}
 	}
@@ -1232,7 +1232,7 @@ static int fanotify_add_mark(struct fsnotify_group *group,
 	ret = fanotify_mark_add_to_mask(fsn_mark, mask, flags);
 
 out:
-	mutex_unlock(&group->mark_mutex);
+	fsnotify_group_unlock(group);
 
 	fsnotify_put_mark(fsn_mark);
 	return ret;
@@ -1386,7 +1386,7 @@ SYSCALL_DEFINE2(fanotify_init, unsigned int, flags, unsigned int, event_f_flags)
 
 	/* fsnotify_alloc_group takes a ref.  Dropped in fanotify_release */
 	group = fsnotify_alloc_group(&fanotify_fsnotify_ops,
-				     FSNOTIFY_GROUP_USER);
+				     FSNOTIFY_GROUP_USER | FSNOTIFY_GROUP_NOFS);
 	if (IS_ERR(group)) {
 		return PTR_ERR(group);
 	}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH v3 16/16] fanotify: enable "evictable" inode marks
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
                   ` (14 preceding siblings ...)
  2022-04-13  9:09 ` [PATCH v3 15/16] fanotify: use fsnotify group lock helpers Amir Goldstein
@ 2022-04-13  9:09 ` Amir Goldstein
  2022-04-21 15:41 ` [PATCH v3 00/16] Evictable fanotify marks Jan Kara
  16 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2022-04-13  9:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

Now that the direct reclaim path is handled we can enable evictable
inode marks.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fanotify/fanotify_user.c | 2 +-
 include/linux/fanotify.h           | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index fe2534043c17..87756a015be9 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1788,7 +1788,7 @@ static int __init fanotify_user_setup(void)
 
 	BUILD_BUG_ON(FANOTIFY_INIT_FLAGS & FANOTIFY_INTERNAL_GROUP_FLAGS);
 	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_INIT_FLAGS) != 12);
-	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 9);
+	BUILD_BUG_ON(HWEIGHT32(FANOTIFY_MARK_FLAGS) != 10);
 
 	fanotify_mark_cache = KMEM_CACHE(fsnotify_mark,
 					 SLAB_PANIC|SLAB_ACCOUNT);
diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h
index 419cadcd7ff5..edc28555814c 100644
--- a/include/linux/fanotify.h
+++ b/include/linux/fanotify.h
@@ -66,6 +66,7 @@
 				 FAN_MARK_ONLYDIR | \
 				 FAN_MARK_IGNORED_MASK | \
 				 FAN_MARK_IGNORED_SURV_MODIFY | \
+				 FAN_MARK_EVICTABLE | \
 				 FAN_MARK_FLUSH)
 
 /*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 04/16] fsnotify: pass flags argument to fsnotify_add_mark() via mark
  2022-04-13  9:09 ` [PATCH v3 04/16] fsnotify: pass flags argument to fsnotify_add_mark() via mark Amir Goldstein
@ 2022-04-21 14:18   ` Jan Kara
  2022-04-22 10:02     ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2022-04-21 14:18 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel

On Wed 13-04-22 12:09:23, Amir Goldstein wrote:
> Instead of passing the allow_dups argument to fsnotify_add_mark()
> as an argument, define the mark flag FSNOTIFY_MARK_FLAG_ALLOW_DUPS
> to express the old allow_dups meaning and pass this information on the
> mark itself.
> 
> We use mark->flags to pass inotify control flags and will pass more
> control flags in the future so let's make this the standard.
> 
> Although the FSNOTIFY_MARK_FLAG_ALLOW_DUPS flag is not used after the
> call to fsnotify_add_mark(), it does not hurt to leave this information
> on the mark for future use.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

I wanted to comment on this already last time but forgot:
FSNOTIFY_MARK_FLAG_ALLOW_DUPS is IMO more a property of fsnotify_group
than a particular mark (or a particular call to fsnotify_add_mark()). As
such it would make more sense to me to have is as "feature" similarly to
fs-reclaim restrictions you introduce later in the series.

As a bonus, no need for 'flags' argument to
fsnotify_add_inode_mark_locked() or fsnotify_add_inode_mark().

								Honza

> ---
>  fs/notify/fanotify/fanotify_user.c |  2 +-
>  fs/notify/inotify/inotify_user.c   |  4 ++--
>  fs/notify/mark.c                   | 13 ++++++-------
>  include/linux/fsnotify_backend.h   | 19 ++++++++++---------
>  kernel/audit_fsnotify.c            |  3 ++-
>  5 files changed, 21 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
> index 9b32b76a9c30..0f0db1efa379 100644
> --- a/fs/notify/fanotify/fanotify_user.c
> +++ b/fs/notify/fanotify/fanotify_user.c
> @@ -1144,7 +1144,7 @@ static struct fsnotify_mark *fanotify_add_new_mark(struct fsnotify_group *group,
>  	}
>  
>  	fsnotify_init_mark(mark, group);
> -	ret = fsnotify_add_mark_locked(mark, connp, obj_type, 0, fsid);
> +	ret = fsnotify_add_mark_locked(mark, connp, obj_type, fsid);
>  	if (ret) {
>  		fsnotify_put_mark(mark);
>  		goto out_dec_ucounts;
> diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
> index d8907d32a05b..6fc0f598a7aa 100644
> --- a/fs/notify/inotify/inotify_user.c
> +++ b/fs/notify/inotify/inotify_user.c
> @@ -603,7 +603,6 @@ static int inotify_new_watch(struct fsnotify_group *group,
>  
>  	fsnotify_init_mark(&tmp_i_mark->fsn_mark, group);
>  	tmp_i_mark->fsn_mark.mask = inotify_arg_to_mask(inode, arg);
> -	tmp_i_mark->fsn_mark.flags = inotify_arg_to_flags(arg);
>  	tmp_i_mark->wd = -1;
>  
>  	ret = inotify_add_to_idr(idr, idr_lock, tmp_i_mark);
> @@ -618,7 +617,8 @@ static int inotify_new_watch(struct fsnotify_group *group,
>  	}
>  
>  	/* we are on the idr, now get on the inode */
> -	ret = fsnotify_add_inode_mark_locked(&tmp_i_mark->fsn_mark, inode, 0);
> +	ret = fsnotify_add_inode_mark_locked(&tmp_i_mark->fsn_mark, inode,
> +					     inotify_arg_to_flags(arg));
>  	if (ret) {
>  		/* we failed to get on the inode, get off the idr */
>  		inotify_remove_from_idr(group, tmp_i_mark);
> diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> index c86982be2d50..ea8f557881b1 100644
> --- a/fs/notify/mark.c
> +++ b/fs/notify/mark.c
> @@ -574,7 +574,7 @@ static struct fsnotify_mark_connector *fsnotify_grab_connector(
>  static int fsnotify_add_mark_list(struct fsnotify_mark *mark,
>  				  fsnotify_connp_t *connp,
>  				  unsigned int obj_type,
> -				  int allow_dups, __kernel_fsid_t *fsid)
> +				  __kernel_fsid_t *fsid)
>  {
>  	struct fsnotify_mark *lmark, *last = NULL;
>  	struct fsnotify_mark_connector *conn;
> @@ -633,7 +633,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark,
>  
>  		if ((lmark->group == mark->group) &&
>  		    (lmark->flags & FSNOTIFY_MARK_FLAG_ATTACHED) &&
> -		    !allow_dups) {
> +		    !(mark->flags & FSNOTIFY_MARK_FLAG_ALLOW_DUPS)) {
>  			err = -EEXIST;
>  			goto out_err;
>  		}
> @@ -668,7 +668,7 @@ static int fsnotify_add_mark_list(struct fsnotify_mark *mark,
>   */
>  int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
>  			     fsnotify_connp_t *connp, unsigned int obj_type,
> -			     int allow_dups, __kernel_fsid_t *fsid)
> +			     __kernel_fsid_t *fsid)
>  {
>  	struct fsnotify_group *group = mark->group;
>  	int ret = 0;
> @@ -688,7 +688,7 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
>  	fsnotify_get_mark(mark); /* for g_list */
>  	spin_unlock(&mark->lock);
>  
> -	ret = fsnotify_add_mark_list(mark, connp, obj_type, allow_dups, fsid);
> +	ret = fsnotify_add_mark_list(mark, connp, obj_type, fsid);
>  	if (ret)
>  		goto err;
>  
> @@ -708,14 +708,13 @@ int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
>  }
>  
>  int fsnotify_add_mark(struct fsnotify_mark *mark, fsnotify_connp_t *connp,
> -		      unsigned int obj_type, int allow_dups,
> -		      __kernel_fsid_t *fsid)
> +		      unsigned int obj_type, __kernel_fsid_t *fsid)
>  {
>  	int ret;
>  	struct fsnotify_group *group = mark->group;
>  
>  	mutex_lock(&group->mark_mutex);
> -	ret = fsnotify_add_mark_locked(mark, connp, obj_type, allow_dups, fsid);
> +	ret = fsnotify_add_mark_locked(mark, connp, obj_type, fsid);
>  	mutex_unlock(&group->mark_mutex);
>  	return ret;
>  }
> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index b1c72edd9784..2ff686882303 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -473,6 +473,7 @@ struct fsnotify_mark {
>  	/* General fsnotify mark flags */
>  #define FSNOTIFY_MARK_FLAG_ALIVE		0x0001
>  #define FSNOTIFY_MARK_FLAG_ATTACHED		0x0002
> +#define FSNOTIFY_MARK_FLAG_ALLOW_DUPS		0x0004
>  	/* inotify mark flags */
>  #define FSNOTIFY_MARK_FLAG_EXCL_UNLINK		0x0010
>  #define FSNOTIFY_MARK_FLAG_IN_ONESHOT		0x0020
> @@ -634,30 +635,30 @@ extern struct fsnotify_mark *fsnotify_find_mark(fsnotify_connp_t *connp,
>  /* Get cached fsid of filesystem containing object */
>  extern int fsnotify_get_conn_fsid(const struct fsnotify_mark_connector *conn,
>  				  __kernel_fsid_t *fsid);
> +
>  /* attach the mark to the object */
>  extern int fsnotify_add_mark(struct fsnotify_mark *mark,
>  			     fsnotify_connp_t *connp, unsigned int obj_type,
> -			     int allow_dups, __kernel_fsid_t *fsid);
> +			     __kernel_fsid_t *fsid);
>  extern int fsnotify_add_mark_locked(struct fsnotify_mark *mark,
>  				    fsnotify_connp_t *connp,
> -				    unsigned int obj_type, int allow_dups,
> +				    unsigned int obj_type,
>  				    __kernel_fsid_t *fsid);
>  
>  /* attach the mark to the inode */
>  static inline int fsnotify_add_inode_mark(struct fsnotify_mark *mark,
> -					  struct inode *inode,
> -					  int allow_dups)
> +					  struct inode *inode, int flags)
>  {
> +	mark->flags = flags;
>  	return fsnotify_add_mark(mark, &inode->i_fsnotify_marks,
> -				 FSNOTIFY_OBJ_TYPE_INODE, allow_dups, NULL);
> +				 FSNOTIFY_OBJ_TYPE_INODE, NULL);
>  }
>  static inline int fsnotify_add_inode_mark_locked(struct fsnotify_mark *mark,
> -						 struct inode *inode,
> -						 int allow_dups)
> +						 struct inode *inode, int flags)
>  {
> +	mark->flags = flags;
>  	return fsnotify_add_mark_locked(mark, &inode->i_fsnotify_marks,
> -					FSNOTIFY_OBJ_TYPE_INODE, allow_dups,
> -					NULL);
> +					FSNOTIFY_OBJ_TYPE_INODE, NULL);
>  }
>  
>  /* given a group and a mark, flag mark to be freed when all references are dropped */
> diff --git a/kernel/audit_fsnotify.c b/kernel/audit_fsnotify.c
> index 02348b48447c..3c35649bc7f5 100644
> --- a/kernel/audit_fsnotify.c
> +++ b/kernel/audit_fsnotify.c
> @@ -100,7 +100,8 @@ struct audit_fsnotify_mark *audit_alloc_mark(struct audit_krule *krule, char *pa
>  	audit_update_mark(audit_mark, dentry->d_inode);
>  	audit_mark->rule = krule;
>  
> -	ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, true);
> +	ret = fsnotify_add_inode_mark(&audit_mark->mark, inode,
> +				      FSNOTIFY_MARK_FLAG_ALLOW_DUPS);
>  	if (ret < 0) {
>  		fsnotify_put_mark(&audit_mark->mark);
>  		audit_mark = ERR_PTR(ret);
> -- 
> 2.35.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 05/16] fsnotify: pass flags argument to fsnotify_alloc_group()
  2022-04-13  9:09 ` [PATCH v3 05/16] fsnotify: pass flags argument to fsnotify_alloc_group() Amir Goldstein
@ 2022-04-21 14:34   ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2022-04-21 14:34 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel

On Wed 13-04-22 12:09:24, Amir Goldstein wrote:
> Add flags argument to fsnotify_alloc_group(), define and use the flag
> FSNOTIFY_GROUP_USER in inotify and fanotify instead of the helper
> fsnotify_alloc_user_group() to indicate user allocation.
> 
> Although the flag FSNOTIFY_GROUP_USER is currently not used after group
> allocation, we store the flags argument in the group struct for future
> use of other group flags.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Looks good, just one nit:

> diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
> index 2ff686882303..2057ae4bf8e9 100644
> --- a/include/linux/fsnotify_backend.h
> +++ b/include/linux/fsnotify_backend.h
> @@ -210,6 +210,11 @@ struct fsnotify_group {
>  	unsigned int priority;
>  	bool shutdown;		/* group is being shut down, don't queue more events */
>  
> +#define FSNOTIFY_GROUP_USER	0x01 /* user allocated group */
> +#define FSNOTIFY_GROUP_FLAG(group, flag) \

Is the FSNOTIFY_GROUP_FLAG() macro that useful? It isn't shorter than
"group->flags & FSNOTIFY_GROUP_xxx" and it only obfuscates things.

								Honza

-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 11/16] fsnotify: allow adding an inode mark without pinning inode
  2022-04-13  9:09 ` [PATCH v3 11/16] fsnotify: allow adding an inode mark without pinning inode Amir Goldstein
@ 2022-04-21 14:54   ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2022-04-21 14:54 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel

On Wed 13-04-22 12:09:30, Amir Goldstein wrote:
> fsnotify_add_mark() and variants implicitly take a reference on inode
> when attaching a mark to an inode.
> 
> Make that behavior opt-out with the mark flag FSNOTIFY_MARK_FLAG_NO_IREF.
> 
> Instead of taking the inode reference when attaching connector to inode
> and dropping the inode reference when detaching connector from inode,
> take the inode reference on attach of the first mark that wants to hold
> an inode reference and drop the inode reference on detach of the last
> mark that wants to hold an inode reference.
> 
> Backends can "upgrade" an existing mark to take an inode reference, but
> cannot "downgrade" a mark with inode reference to release the refernce.
> 
> This leaves the choice to the backend whether or not to pin the inode
> when adding an inode mark.
> 
> This is intended to be used when adding a mark with ignored mask that is
> used for optimization in cases where group can afford getting unneeded
> events and reinstate the mark with ignored mask when inode is accessed
> again after being evicted.
> 
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Looks good. Just two nits below.

> diff --git a/fs/notify/mark.c b/fs/notify/mark.c
> index 7120918d8251..e38cb241536f 100644
> --- a/fs/notify/mark.c
> +++ b/fs/notify/mark.c
> @@ -116,20 +116,67 @@ __u32 fsnotify_conn_mask(struct fsnotify_mark_connector *conn)
>  	return *fsnotify_conn_mask_p(conn);
>  }
>  
> -static void __fsnotify_recalc_mask(struct fsnotify_mark_connector *conn)
> +/*
> + * Update the proxy refcount on inode maintainted by connector.
> + *
> + * When it's time to drop the proxy refcount, clear the HAS_IREF flag
> + * and return the inode object.  fsnotify_drop_object() will be resonsible
> + * for doing iput() outside of spinlocks when last mark that wanted iref
> + * is detached.
> + *
> + * Note that the proxy refcount is NOT dropped if backend only sets the
> + * NO_IREF mark flag and does detach the mark!
> + */

This comment seems outdated - still speaking about proxy refcount which
does not exist anymore...

> +static void fsnotify_get_inode_ref(struct inode *inode)
> +{
> +	ihold(inode);
> +	atomic_long_inc(&inode->i_sb->s_fsnotify_connectors);
> +}
> +
> @@ -505,6 +551,7 @@ static int fsnotify_attach_connector_to_object(fsnotify_connp_t *connp,
>  		return -ENOMEM;
>  	spin_lock_init(&conn->lock);
>  	INIT_HLIST_HEAD(&conn->list);
> +	conn->flags = 0;

Why this? We init conn->flags just a bit later...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 13/16] fanotify: factor out helper fanotify_mark_update_flags()
  2022-04-13  9:09 ` [PATCH v3 13/16] fanotify: factor out helper fanotify_mark_update_flags() Amir Goldstein
@ 2022-04-21 15:00   ` Jan Kara
  0 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2022-04-21 15:00 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel

On Wed 13-04-22 12:09:32, Amir Goldstein wrote:
> Handle FAN_MARK_IGNORED_SURV_MODIFY flag change in a helper that
> is called after updating the mark mask.
> 
> Move recalc of object mask inside fanotify_mark_add_to_mask() which
> makes the code a bit simpler to follow.
> 
> Add also helper to translate fsnotify mark flags to user visible
> fanotify mark flags.

This bit got moved to another commit. Otherwise changes look good.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 14/16] fanotify: implement "evictable" inode marks
  2022-04-13  9:09 ` [PATCH v3 14/16] fanotify: implement "evictable" inode marks Amir Goldstein
@ 2022-04-21 15:40   ` Jan Kara
  2022-04-22 10:47     ` Amir Goldstein
  0 siblings, 1 reply; 25+ messages in thread
From: Jan Kara @ 2022-04-21 15:40 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel

On Wed 13-04-22 12:09:33, Amir Goldstein wrote:
> When an inode mark is created with flag FAN_MARK_EVICTABLE, it will not
> pin the marked inode to inode cache, so when inode is evicted from cache
> due to memory pressure, the mark will be lost.
> 
> When an inode mark with flag FAN_MARK_EVICATBLE is updated without using
> this flag, the marked inode is pinned to inode cache.
> 
> When an inode mark is updated with flag FAN_MARK_EVICTABLE but an
> existing mark already has the inode pinned, the mark update fails with
> error EEXIST.
> 
> Evictable inode marks can be used to setup inode marks with ignored mask
> to suppress events from uninteresting files or directories in a lazy
> manner, upon receiving the first event, without having to iterate all
> the uninteresting files or directories before hand.
> 
> The evictbale inode mark feature allows performing this lazy marks setup
> without exhausting the system memory with pinned inodes.
> 
> This change does not enable the feature yet.
> 
> Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiRDpuS=2uA6+ZUM7yG9vVU-u212tkunBmSnP_u=mkv=Q@mail.gmail.com/
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>

Just one nit below...

> @@ -1097,6 +1099,18 @@ static int fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark,
>  			*recalc = true;
>  	}
>  
> +	if (fsn_mark->connector->type != FSNOTIFY_OBJ_TYPE_INODE ||
> +	    want_iref == !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF))
> +		return 0;
> +
> +	/*
> +	 * NO_IREF may be removed from a mark, but not added.
> +	 * When removed, fsnotify_recalc_mask() will take the inode ref.
> +	 */
> +	WARN_ON_ONCE(!want_iref);
> +	fsn_mark->flags &= ~FSNOTIFY_MARK_FLAG_NO_IREF;
> +	*recalc = true;
> +
>  	return 0;
>  }

Since we always return 0 from this function, we may as well just drop the
'recalc' argument and return whether mask recalc is needed?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 00/16] Evictable fanotify marks
  2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
                   ` (15 preceding siblings ...)
  2022-04-13  9:09 ` [PATCH v3 16/16] fanotify: enable "evictable" inode marks Amir Goldstein
@ 2022-04-21 15:41 ` Jan Kara
  16 siblings, 0 replies; 25+ messages in thread
From: Jan Kara @ 2022-04-21 15:41 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Matthew Bobrowski, linux-fsdevel

Hi Amir!

On Wed 13-04-22 12:09:19, Amir Goldstein wrote:
> Following v3 patch set addresses your review comments on v2 [2].
> 
> Please see LTP test [3] and man page draft [4] for evictable marks.

Thanks for the patches! I've found just a few smaller issues so once those
are fixed, I'll queue the patches to my tree for the next merge window.

								Honza

> 
> Thanks,
> Amir.
> 
> Changes since v2 [2]:
> - Simplify group lock helpers (Jan)
> - Move FSNOTIFY_GROUP_NOFS flag to group object (Jan)
> - Split patch of fanotify_mark_user_flags() (Jan)
> - Fix bug in case of EEXIST
> - Drop ioctl for debugging
> - Rebased and tested on v5.18-rc1
> 
> Changes since v1 [1]:
> - Fixes for direct reclaim deadlock
> - Add ioctl for direct reclaim test
> - Rebrand as FAN_MARK_EVICTABLE
> - Remove FAN_MARK_CREATE and allow clearing FAN_MARK_EVICTABLE
> - Replace connector proxy_iref with HAS_IREF flag
> - Take iref in fsnotify_reclac_mark() rather than on add mark to list
> - Remove fsnotify_add_mark() allow_dups/flags argument
> - Remove pr_debug() prints
> 
> [1] https://lore.kernel.org/r/20220307155741.1352405-1-amir73il@gmail.com/
> [2] https://lore.kernel.org/r/20220329074904.2980320-1-amir73il@gmail.com/
> [3] https://github.com/amir73il/ltp/commits/fan_evictable
> [4] https://github.com/amir73il/man-pages/commits/fan_evictable
> 
> Amir Goldstein (16):
>   inotify: show inotify mask flags in proc fdinfo
>   inotify: move control flags from mask to mark flags
>   fsnotify: fix wrong lockdep annotations
>   fsnotify: pass flags argument to fsnotify_add_mark() via mark
>   fsnotify: pass flags argument to fsnotify_alloc_group()
>   fsnotify: create helpers for group mark_mutex lock
>   inotify: use fsnotify group lock helpers
>   audit: use fsnotify group lock helpers
>   nfsd: use fsnotify group lock helpers
>   dnotify: use fsnotify group lock helpers
>   fsnotify: allow adding an inode mark without pinning inode
>   fanotify: create helper fanotify_mark_user_flags()
>   fanotify: factor out helper fanotify_mark_update_flags()
>   fanotify: implement "evictable" inode marks
>   fanotify: use fsnotify group lock helpers
>   fanotify: enable "evictable" inode marks
> 
>  fs/nfsd/filecache.c                  |  14 ++--
>  fs/notify/dnotify/dnotify.c          |  13 +--
>  fs/notify/fanotify/fanotify.h        |  12 +++
>  fs/notify/fanotify/fanotify_user.c   |  95 +++++++++++++++-------
>  fs/notify/fdinfo.c                   |  21 ++---
>  fs/notify/fsnotify.c                 |   4 +-
>  fs/notify/group.c                    |  32 +++++---
>  fs/notify/inotify/inotify.h          |  19 +++++
>  fs/notify/inotify/inotify_fsnotify.c |   2 +-
>  fs/notify/inotify/inotify_user.c     |  49 ++++++-----
>  fs/notify/mark.c                     | 117 ++++++++++++++++++---------
>  include/linux/fanotify.h             |   1 +
>  include/linux/fsnotify_backend.h     |  75 ++++++++++++-----
>  include/uapi/linux/fanotify.h        |   1 +
>  kernel/audit_fsnotify.c              |   6 +-
>  kernel/audit_tree.c                  |  34 ++++----
>  kernel/audit_watch.c                 |   2 +-
>  17 files changed, 330 insertions(+), 167 deletions(-)
> 
> -- 
> 2.35.1
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 04/16] fsnotify: pass flags argument to fsnotify_add_mark() via mark
  2022-04-21 14:18   ` Jan Kara
@ 2022-04-22 10:02     ` Amir Goldstein
  0 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2022-04-22 10:02 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

On Thu, Apr 21, 2022 at 5:35 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 13-04-22 12:09:23, Amir Goldstein wrote:
> > Instead of passing the allow_dups argument to fsnotify_add_mark()
> > as an argument, define the mark flag FSNOTIFY_MARK_FLAG_ALLOW_DUPS
> > to express the old allow_dups meaning and pass this information on the
> > mark itself.
> >
> > We use mark->flags to pass inotify control flags and will pass more
> > control flags in the future so let's make this the standard.
> >
> > Although the FSNOTIFY_MARK_FLAG_ALLOW_DUPS flag is not used after the
> > call to fsnotify_add_mark(), it does not hurt to leave this information
> > on the mark for future use.
> >
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> I wanted to comment on this already last time but forgot:
> FSNOTIFY_MARK_FLAG_ALLOW_DUPS is IMO more a property of fsnotify_group
> than a particular mark (or a particular call to fsnotify_add_mark()). As
> such it would make more sense to me to have is as "feature" similarly to
> fs-reclaim restrictions you introduce later in the series.

That's a good idea. I'll do that.

>
> As a bonus, no need for 'flags' argument to
> fsnotify_add_inode_mark_locked() or fsnotify_add_inode_mark().

I prefer to avoid collecting this bonus and leave a flags argument
for future use.

The reason is that I intend to try and convince you to take the patch
for FSNOTIFY_ADD_MARK_UPDATE_MASKS in a future patch set,
so for the chance that I am able to convince you, let's avoid the churn
for now. We can always cleanup the unneeded flags argument later.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3 14/16] fanotify: implement "evictable" inode marks
  2022-04-21 15:40   ` Jan Kara
@ 2022-04-22 10:47     ` Amir Goldstein
  0 siblings, 0 replies; 25+ messages in thread
From: Amir Goldstein @ 2022-04-22 10:47 UTC (permalink / raw)
  To: Jan Kara; +Cc: Matthew Bobrowski, linux-fsdevel

On Thu, Apr 21, 2022 at 6:40 PM Jan Kara <jack@suse.cz> wrote:
>
> On Wed 13-04-22 12:09:33, Amir Goldstein wrote:
> > When an inode mark is created with flag FAN_MARK_EVICTABLE, it will not
> > pin the marked inode to inode cache, so when inode is evicted from cache
> > due to memory pressure, the mark will be lost.
> >
> > When an inode mark with flag FAN_MARK_EVICATBLE is updated without using
> > this flag, the marked inode is pinned to inode cache.
> >
> > When an inode mark is updated with flag FAN_MARK_EVICTABLE but an
> > existing mark already has the inode pinned, the mark update fails with
> > error EEXIST.
> >
> > Evictable inode marks can be used to setup inode marks with ignored mask
> > to suppress events from uninteresting files or directories in a lazy
> > manner, upon receiving the first event, without having to iterate all
> > the uninteresting files or directories before hand.
> >
> > The evictbale inode mark feature allows performing this lazy marks setup
> > without exhausting the system memory with pinned inodes.
> >
> > This change does not enable the feature yet.
> >
> > Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiRDpuS=2uA6+ZUM7yG9vVU-u212tkunBmSnP_u=mkv=Q@mail.gmail.com/
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
>
> Just one nit below...
>
> > @@ -1097,6 +1099,18 @@ static int fanotify_mark_update_flags(struct fsnotify_mark *fsn_mark,
> >                       *recalc = true;
> >       }
> >
> > +     if (fsn_mark->connector->type != FSNOTIFY_OBJ_TYPE_INODE ||
> > +         want_iref == !(fsn_mark->flags & FSNOTIFY_MARK_FLAG_NO_IREF))
> > +             return 0;
> > +
> > +     /*
> > +      * NO_IREF may be removed from a mark, but not added.
> > +      * When removed, fsnotify_recalc_mask() will take the inode ref.
> > +      */
> > +     WARN_ON_ONCE(!want_iref);
> > +     fsn_mark->flags &= ~FSNOTIFY_MARK_FLAG_NO_IREF;
> > +     *recalc = true;
> > +
> >       return 0;
> >  }
>
> Since we always return 0 from this function, we may as well just drop the
> 'recalc' argument and return whether mask recalc is needed?
>

I agree, but in this case, I rather also return recalc from
fanotify_mark_add_to_mask() and keep fsnotify_recalc_mask()
in fanotify_add_mark() and it is now.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2022-04-22 10:47 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-13  9:09 [PATCH v3 00/16] Evictable fanotify marks Amir Goldstein
2022-04-13  9:09 ` [PATCH v3 01/16] inotify: show inotify mask flags in proc fdinfo Amir Goldstein
2022-04-13  9:09 ` [PATCH v3 02/16] inotify: move control flags from mask to mark flags Amir Goldstein
2022-04-13  9:09 ` [PATCH v3 03/16] fsnotify: fix wrong lockdep annotations Amir Goldstein
2022-04-13  9:09 ` [PATCH v3 04/16] fsnotify: pass flags argument to fsnotify_add_mark() via mark Amir Goldstein
2022-04-21 14:18   ` Jan Kara
2022-04-22 10:02     ` Amir Goldstein
2022-04-13  9:09 ` [PATCH v3 05/16] fsnotify: pass flags argument to fsnotify_alloc_group() Amir Goldstein
2022-04-21 14:34   ` Jan Kara
2022-04-13  9:09 ` [PATCH v3 06/16] fsnotify: create helpers for group mark_mutex lock Amir Goldstein
2022-04-13  9:09 ` [PATCH v3 07/16] inotify: use fsnotify group lock helpers Amir Goldstein
2022-04-13  9:09 ` [PATCH v3 08/16] audit: " Amir Goldstein
2022-04-13  9:09 ` [PATCH v3 09/16] nfsd: " Amir Goldstein
2022-04-13  9:09 ` [PATCH v3 10/16] dnotify: " Amir Goldstein
2022-04-13  9:09 ` [PATCH v3 11/16] fsnotify: allow adding an inode mark without pinning inode Amir Goldstein
2022-04-21 14:54   ` Jan Kara
2022-04-13  9:09 ` [PATCH v3 12/16] fanotify: create helper fanotify_mark_user_flags() Amir Goldstein
2022-04-13  9:09 ` [PATCH v3 13/16] fanotify: factor out helper fanotify_mark_update_flags() Amir Goldstein
2022-04-21 15:00   ` Jan Kara
2022-04-13  9:09 ` [PATCH v3 14/16] fanotify: implement "evictable" inode marks Amir Goldstein
2022-04-21 15:40   ` Jan Kara
2022-04-22 10:47     ` Amir Goldstein
2022-04-13  9:09 ` [PATCH v3 15/16] fanotify: use fsnotify group lock helpers Amir Goldstein
2022-04-13  9:09 ` [PATCH v3 16/16] fanotify: enable "evictable" inode marks Amir Goldstein
2022-04-21 15:41 ` [PATCH v3 00/16] Evictable fanotify marks Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.