All of lore.kernel.org
 help / color / mirror / Atom feed
* Potential regression after fsnotify_nameremove() rework in 5.3
@ 2022-01-15  3:11 Ivan Delalande
  2022-01-15 19:50 ` Amir Goldstein
  2022-01-16 10:16 ` Thorsten Leemhuis
  0 siblings, 2 replies; 12+ messages in thread
From: Ivan Delalande @ 2022-01-15  3:11 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-fsdevel, Jan Kara

Hi,

Sorry to bring this up so late but we might have found a regression
introduced by your "Sort out fsnotify_nameremove() mess" patch series
merged in 5.3 (116b9731ad76..7377f5bec133), and that can still be
reproduced on v5.16.

Some of our processes use inotify to watch for IN_DELETE events (for
files on tmpfs mostly), and relied on the fact that once such events are
received, the files they refer to have actually been unlinked and can't
be open/read. So if and once open() succeeds then it is a new version of
the file that has been recreated with new content.

This was true and working reliably before 5.3, but changed after
49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
d_delete()") specifically. There is now a time window where a process
receiving one of those IN_DELETE events may still be able to open the
file and read its old content before it's really unlinked from the FS.

I'm not very familiar with the VFS and fsnotify internals, would you
consider this a regression, or was there never any intentional guarantee
for that behavior and it's best we work around this change in userspace?

Thanks a lot,

-- 
Ivan Delalande
Arista Networks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potential regression after fsnotify_nameremove() rework in 5.3
  2022-01-15  3:11 Potential regression after fsnotify_nameremove() rework in 5.3 Ivan Delalande
@ 2022-01-15 19:50 ` Amir Goldstein
  2022-01-16  1:20   ` Ivan Delalande
  2022-01-16 10:16 ` Thorsten Leemhuis
  1 sibling, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2022-01-15 19:50 UTC (permalink / raw)
  To: Ivan Delalande; +Cc: linux-fsdevel, Jan Kara

On Sat, Jan 15, 2022 at 5:11 AM Ivan Delalande <colona@arista.com> wrote:
>
> Hi,
>
> Sorry to bring this up so late but we might have found a regression
> introduced by your "Sort out fsnotify_nameremove() mess" patch series
> merged in 5.3 (116b9731ad76..7377f5bec133), and that can still be
> reproduced on v5.16.
>
> Some of our processes use inotify to watch for IN_DELETE events (for
> files on tmpfs mostly), and relied on the fact that once such events are
> received, the files they refer to have actually been unlinked and can't
> be open/read. So if and once open() succeeds then it is a new version of
> the file that has been recreated with new content.
>
> This was true and working reliably before 5.3, but changed after
> 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
> d_delete()") specifically. There is now a time window where a process
> receiving one of those IN_DELETE events may still be able to open the
> file and read its old content before it's really unlinked from the FS.

This is a bit surprising to me.
Do you have a reproducer?

>
> I'm not very familiar with the VFS and fsnotify internals, would you
> consider this a regression, or was there never any intentional guarantee
> for that behavior and it's best we work around this change in userspace?

It may be a regression if applications depend on behavior that changed,
but if are open for changes in your application please describe in more details
what it tries to accomplish using IN_DELETE and the ekernel your application
is running on and then I may be able to recommend a more reliable method.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potential regression after fsnotify_nameremove() rework in 5.3
  2022-01-15 19:50 ` Amir Goldstein
@ 2022-01-16  1:20   ` Ivan Delalande
  2022-01-16 10:14     ` Amir Goldstein
  0 siblings, 1 reply; 12+ messages in thread
From: Ivan Delalande @ 2022-01-16  1:20 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-fsdevel, Jan Kara

On Sat, Jan 15, 2022 at 09:50:20PM +0200, Amir Goldstein wrote:
> On Sat, Jan 15, 2022 at 5:11 AM Ivan Delalande <colona@arista.com> wrote:
>> Sorry to bring this up so late but we might have found a regression
>> introduced by your "Sort out fsnotify_nameremove() mess" patch series
>> merged in 5.3 (116b9731ad76..7377f5bec133), and that can still be
>> reproduced on v5.16.
>>
>> Some of our processes use inotify to watch for IN_DELETE events (for
>> files on tmpfs mostly), and relied on the fact that once such events are
>> received, the files they refer to have actually been unlinked and can't
>> be open/read. So if and once open() succeeds then it is a new version of
>> the file that has been recreated with new content.
>>
>> This was true and working reliably before 5.3, but changed after
>> 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
>> d_delete()") specifically. There is now a time window where a process
>> receiving one of those IN_DELETE events may still be able to open the
>> file and read its old content before it's really unlinked from the FS.
> 
> This is a bit surprising to me.
> Do you have a reproducer?

Yeah, I was using the following one to bisect this. It will print a
message every time it succeeds to read the file after receiving a
IN_DELETE event when run with something like `mkdir /tmp/foo;
./indelchurn /tmp/foo`. It seems to hit pretty frequently and reliably
on various systems after 5.3, even for different #define-parameters.

>> I'm not very familiar with the VFS and fsnotify internals, would you
>> consider this a regression, or was there never any intentional guarantee
>> for that behavior and it's best we work around this change in userspace?
> 
> It may be a regression if applications depend on behavior that changed,
> but if are open for changes in your application please describe in more details
> what it tries to accomplish using IN_DELETE and the ekernel your application
> is running on and then I may be able to recommend a more reliable method.

I can discuss it with our team and get more details on this but it may
be pretty complicated and costly to change. My understanding is that
these watched files are used as ID/version references for in-memory
objects shared between multiple processes to synchronize state, and
resynchronize when there are crashes or restarts. So they can be
recreated in place with the same or a different content, and so their
inode number or mtime etc. may not be useable as additonnal check.

I think we can also have a very large number of these objects and files
on some configurations, so waiting to see if we have a following
IN_CREATE, or adding more checks/synchronization logic will probably
have a significant impact at scale.

And we're targeting 5.10 and building it with our own config.

Thanks for your help and insight on this,

-- 
Ivan Delalande
Arista Networks

# ------------------------ >8 ------------------------

#include <fcntl.h>
#include <poll.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/inotify.h>
#include <sys/wait.h>
#include <unistd.h>

#define CHURN_FILES 9999
#define CHURN_SLEEP 2
#define BATCH_UNLINK 0
#define EVENTS_BUF 32

void churn(char* base)
{
	static size_t iter = 0;
	char path[128];
	int ret;

	for (int i = 0; i <= CHURN_FILES; ++i) {
		snprintf(path, sizeof(path), "%s/%d", base, i);
		FILE* f = fopen(path, "w");
		if (!f)
			exit(1);
		ret = fprintf(f, "content, iter %lu\n", iter);
		if (ret < 16)
			exit(1);
		fclose(f);
#if BATCH_UNLINK
	}
	for (int i = 0; i <= CHURN_FILES; ++i) {
		snprintf(path, sizeof(path), "%s/%d", base, i);
#endif
		ret = unlink(path);
		if (ret < 0)
			exit(1);
	}
	++iter;
}

int main(int argc, char* argv[])
{
	char events_buf[EVENTS_BUF * sizeof(struct inotify_event)];
	struct pollfd pfd = { .events = POLLIN };
	const struct inotify_event *event;
	char path[128];
	char buf[128];
	int ret;

	if (argc != 2 || access(argv[1], R_OK|W_OK)) {
		printf("%s TESTDIR\n", argv[0]);
		exit(1);
	}

	int pid = fork();
	if (pid == 0) {
		while (1) {
			churn(argv[1]);
			sleep(CHURN_SLEEP);
		}
		return 0;
	}
	else if (pid < 0)
		exit(1);

	pfd.fd = inotify_init();
	if (pfd.fd < 0)
		goto out;
	ret = inotify_add_watch(pfd.fd, argv[1], IN_DELETE);
	if (ret < 0)
		goto out;

	while (1) {
		ret = poll(&pfd, 1, -1);
		if (ret < 0)
			goto out;
		if (!(pfd.revents & POLLIN))
			continue;

		ssize_t len = read(pfd.fd, events_buf, sizeof(events_buf));
		if (len < 0)
			goto out;
		for (char *ptr = events_buf; ptr < events_buf + len;
		     ptr += sizeof(struct inotify_event) + event->len) {
			event = (const struct inotify_event *)ptr;
			if (!(event->mask & IN_DELETE))
				continue;
			snprintf(path, sizeof(path), "%s/%s", argv[1],
				 event->name);
			int f = open(path, O_RDONLY);
			if (f < 0)
				continue;
			ret = read(f, buf, sizeof(buf));
			if (ret >= 0)
				printf("Read %d from %.*s: %.*s", ret,
				       sizeof(path), path, ret, buf);
			close(f);
		}
	}

out:
	kill(pid, SIGTERM);
	wait(NULL);
	return 0;
}

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potential regression after fsnotify_nameremove() rework in 5.3
  2022-01-16  1:20   ` Ivan Delalande
@ 2022-01-16 10:14     ` Amir Goldstein
  2022-01-17  2:34       ` Ivan Delalande
  0 siblings, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2022-01-16 10:14 UTC (permalink / raw)
  To: Ivan Delalande; +Cc: linux-fsdevel, Jan Kara

[-- Attachment #1: Type: text/plain, Size: 4511 bytes --]

On Sun, Jan 16, 2022 at 3:20 AM Ivan Delalande <colona@arista.com> wrote:
>
> On Sat, Jan 15, 2022 at 09:50:20PM +0200, Amir Goldstein wrote:
> > On Sat, Jan 15, 2022 at 5:11 AM Ivan Delalande <colona@arista.com> wrote:
> >> Sorry to bring this up so late but we might have found a regression
> >> introduced by your "Sort out fsnotify_nameremove() mess" patch series
> >> merged in 5.3 (116b9731ad76..7377f5bec133), and that can still be
> >> reproduced on v5.16.
> >>
> >> Some of our processes use inotify to watch for IN_DELETE events (for
> >> files on tmpfs mostly), and relied on the fact that once such events are
> >> received, the files they refer to have actually been unlinked and can't
> >> be open/read. So if and once open() succeeds then it is a new version of
> >> the file that has been recreated with new content.
> >>
> >> This was true and working reliably before 5.3, but changed after
> >> 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
> >> d_delete()") specifically. There is now a time window where a process
> >> receiving one of those IN_DELETE events may still be able to open the
> >> file and read its old content before it's really unlinked from the FS.
> >
> > This is a bit surprising to me.
> > Do you have a reproducer?
>
> Yeah, I was using the following one to bisect this. It will print a
> message every time it succeeds to read the file after receiving a
> IN_DELETE event when run with something like `mkdir /tmp/foo;
> ./indelchurn /tmp/foo`. It seems to hit pretty frequently and reliably
> on various systems after 5.3, even for different #define-parameters.
>

I see yes, it's a race between fsnotify_unlink() and d_delete()
fsnotify_unlink() in explicitly required to be called before d_delete(), so
it has the d_inode information and that leaves a windows for opening
the file from cached dentry before d_delete().

I would rather that we try to address this not as a regression until
there is proof of more users that expect the behavior you mentioned.
I would like to provide you an API to opt-in for this behavior, because
fixing it for everyone may cause other workloads to break.

Please test the attached patch on top of v5.16 and use
IN_DELETE|IN_EXCL_UNLINK as the watch mask for testing.

I am assuming that it would be possible for you to modify the application
and add the IN_EXCL_UNLINK flag and that your application does not
care about getting IN_OPEN events on unlinked files?

My patch overloads the existing flag IN_EXCL_UNLINK with a new
meaning. It's a bit of a hack and we can use some other flag if we need to
but it actually makes some sense that an application that does not care for
events on d_unlinked() files will be guaranteed to not get those events
after getting an IN_DELETE event. It is another form of the race that you
described.

Will that solution work out for you?

> >> I'm not very familiar with the VFS and fsnotify internals, would you
> >> consider this a regression, or was there never any intentional guarantee
> >> for that behavior and it's best we work around this change in userspace?
> >
> > It may be a regression if applications depend on behavior that changed,
> > but if are open for changes in your application please describe in more details
> > what it tries to accomplish using IN_DELETE and the ekernel your application
> > is running on and then I may be able to recommend a more reliable method.
>
> I can discuss it with our team and get more details on this but it may
> be pretty complicated and costly to change. My understanding is that
> these watched files are used as ID/version references for in-memory
> objects shared between multiple processes to synchronize state, and
> resynchronize when there are crashes or restarts. So they can be
> recreated in place with the same or a different content, and so their
> inode number or mtime etc. may not be useable as additonnal check.
>
> I think we can also have a very large number of these objects and files
> on some configurations, so waiting to see if we have a following
> IN_CREATE, or adding more checks/synchronization logic will probably
> have a significant impact at scale.
>
> And we're targeting 5.10 and building it with our own config.
>

I am not convinced that this is needed as a regression fix for stable
kernels, but since you are building your own kernel I can help you do the
backport - attached *untested* patch for v5.10 that also backport of
"fsnotify: pass dentry instead of inode data".

Thanks,
Amir.

[-- Attachment #2: inotify-optionally-invalidate-dcache-before-IN_DELETE-5.10.patch --]
[-- Type: text/x-patch, Size: 3376 bytes --]

From 73c137d6a4cab445dec76c731b2c763d45af068e Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il@gmail.com>
Date: Thu, 4 Feb 2021 09:11:11 +0200
Subject: [PATCH] inotify: optionally invalidate dcache before IN_DELETE event

Define a new data type to pass for event - FSNOTIFY_EVENT_DENTRY.
Use it to pass the dentry instead of d_inode, to allow d_drop()
before IN_DELETE event.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fsnotify.c             |  2 ++
 include/linux/fsnotify.h         |  4 ++--
 include/linux/fsnotify_backend.h | 16 ++++++++++++++++
 3 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 8d3ad5ef2925..55fb96c44449 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -260,6 +260,8 @@ static int fsnotify_handle_event(struct fsnotify_group *group, __u32 mask,
 		child_mark = NULL;
 		dir = NULL;
 		name = NULL;
+	} else if (inode_mark->mask & FS_EXCL_UNLINK && mask & FS_DELETE) {
+		d_drop(fsnotify_data_dentry(data, data_type));
 	}
 
 	ret = ops->handle_inode_event(inode_mark, mask, inode, dir, name);
diff --git a/include/linux/fsnotify.h b/include/linux/fsnotify.h
index f8acddcf54fb..aa35f261cc3d 100644
--- a/include/linux/fsnotify.h
+++ b/include/linux/fsnotify.h
@@ -36,7 +36,7 @@ static inline void fsnotify_name(struct inode *dir, __u32 mask,
 static inline void fsnotify_dirent(struct inode *dir, struct dentry *dentry,
 				   __u32 mask)
 {
-	fsnotify_name(dir, mask, d_inode(dentry), &dentry->d_name, 0);
+	fsnotify(mask, dentry, FSNOTIFY_EVENT_DENTRY, dir, &dentry->d_name, NULL, 0);
 }
 
 static inline void fsnotify_inode(struct inode *inode, __u32 mask)
@@ -77,7 +77,7 @@ static inline int fsnotify_parent(struct dentry *dentry, __u32 mask,
  */
 static inline void fsnotify_dentry(struct dentry *dentry, __u32 mask)
 {
-	fsnotify_parent(dentry, mask, d_inode(dentry), FSNOTIFY_EVENT_INODE);
+	fsnotify_parent(dentry, mask, dentry, FSNOTIFY_EVENT_DENTRY);
 }
 
 static inline int fsnotify_file(struct file *file, __u32 mask)
diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h
index f8529a3a2923..c67d0d74254e 100644
--- a/include/linux/fsnotify_backend.h
+++ b/include/linux/fsnotify_backend.h
@@ -250,6 +250,7 @@ enum fsnotify_data_type {
 	FSNOTIFY_EVENT_NONE,
 	FSNOTIFY_EVENT_PATH,
 	FSNOTIFY_EVENT_INODE,
+	FSNOTIFY_EVENT_DENTRY,
 };
 
 static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
@@ -257,6 +258,8 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
 	switch (data_type) {
 	case FSNOTIFY_EVENT_INODE:
 		return (struct inode *)data;
+	case FSNOTIFY_EVENT_DENTRY:
+		return d_inode(data);
 	case FSNOTIFY_EVENT_PATH:
 		return d_inode(((const struct path *)data)->dentry);
 	default:
@@ -264,6 +267,19 @@ static inline struct inode *fsnotify_data_inode(const void *data, int data_type)
 	}
 }
 
+static inline struct dentry *fsnotify_data_dentry(const void *data, int data_type)
+{
+	switch (data_type) {
+	case FSNOTIFY_EVENT_DENTRY:
+		/* Non const is needed for dget() */
+		return (struct dentry *)data;
+	case FSNOTIFY_EVENT_PATH:
+		return ((const struct path *)data)->dentry;
+	default:
+		return NULL;
+	}
+}
+
 static inline const struct path *fsnotify_data_path(const void *data,
 						    int data_type)
 {
-- 
2.16.5


[-- Attachment #3: inotify-optionally-invalidate-dcache-before-IN_DELETE-5.16.patch --]
[-- Type: text/x-patch, Size: 2498 bytes --]

From d4d91d5b46c13def48b36bb7734b5fca8e1c2127 Mon Sep 17 00:00:00 2001
From: Amir Goldstein <amir73il@gmail.com>
Date: Sun, 16 Jan 2022 10:11:45 +0200
Subject: [PATCH] inotify: optionally invalidate dcache before IN_DELETE event

Apparently, there are some applications that use IN_DELETE event as an
invalidation mechanism and expect that if they try to open a file with
the name reported with the delete event, that it should not contain the
content of the deleted file.

Commit 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
d_delete()") moved the fsnotify delete hook before d_delete()
intentionally, so fsnotify will have access to a positive dentry.

This allowed a race where opening the deleted file via cached dentry
is now possible after receiving the IN_DELETE event.

In order to provide the "invalidate" functionality without changing
behavior for other workload, overload the original meaning of the
IN_EXCL_UNLINK flag with the additional meaning that the file should
not be accessed via that name after receiving an IN_DELETE event and
drop dentry from cache before IN_DELETE is reported.

This will allow applications to opt-in for the "invalidate" behavior.

Link: https://lore.kernel.org/linux-fsdevel/YeNyzoDM5hP5LtGW@visor/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---
 fs/notify/fsnotify.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 4034ca566f95..86337220f25c 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -255,9 +255,21 @@ static int fsnotify_handle_inode_event(struct fsnotify_group *group,
 	if (WARN_ON_ONCE(!inode && !dir))
 		return 0;
 
-	if ((inode_mark->mask & FS_EXCL_UNLINK) &&
-	    path && d_unlinked(path->dentry))
-		return 0;
+	if (inode_mark->mask & FS_EXCL_UNLINK) {
+		if (path && d_unlinked(path->dentry))
+			return 0;
+
+		/*
+		 * File may be unlinked in filesystem, but still accesible via
+		 * dcache, so user may be able to observe and open the file
+		 * after receiving an IN_DELETE event.
+		 * Overload the original meaning of the IN_EXCL_UNLINK flag
+		 * with the additional meaning that the file cannot be accessed
+		 * via that name after receiving an IN_DELETE event.
+		 */
+		if (mask & FS_DELETE)
+			d_drop(fsnotify_data_dentry(data, data_type));
+	}
 
 	/* Check interest of this mark in case event was sent with two marks */
 	if (!(mask & inode_mark->mask & ALL_FSNOTIFY_EVENTS))
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Potential regression after fsnotify_nameremove() rework in 5.3
  2022-01-15  3:11 Potential regression after fsnotify_nameremove() rework in 5.3 Ivan Delalande
  2022-01-15 19:50 ` Amir Goldstein
@ 2022-01-16 10:16 ` Thorsten Leemhuis
  2022-01-28 14:09   ` Potential regression after fsnotify_nameremove() rework in 5.3 #forregzbot Thorsten Leemhuis
  1 sibling, 1 reply; 12+ messages in thread
From: Thorsten Leemhuis @ 2022-01-16 10:16 UTC (permalink / raw)
  To: Ivan Delalande, Amir Goldstein; +Cc: linux-fsdevel, Jan Kara, regressions

[TLDR: I'm adding this regression to regzbot, the Linux kernel
regression tracking bot; most text you find below is compiled from a few
templates paragraphs some of you might have seen already.]

Hi, this is your Linux kernel regression tracker speaking.

Adding the regression mailing list to the list of recipients, as it
should be in the loop for all regressions, as explained here:
https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html

On 15.01.22 04:11, Ivan Delalande wrote:
> Hi,
> 
> Sorry to bring this up so late but we might have found a regression
> introduced by your "Sort out fsnotify_nameremove() mess" patch series
> merged in 5.3 (116b9731ad76..7377f5bec133), and that can still be
> reproduced on v5.16.
> 
> Some of our processes use inotify to watch for IN_DELETE events (for
> files on tmpfs mostly), and relied on the fact that once such events are
> received, the files they refer to have actually been unlinked and can't
> be open/read. So if and once open() succeeds then it is a new version of
> the file that has been recreated with new content.
> 
> This was true and working reliably before 5.3, but changed after
> 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
> d_delete()") specifically. There is now a time window where a process
> receiving one of those IN_DELETE events may still be able to open the
> file and read its old content before it's really unlinked from the FS.
> 
> I'm not very familiar with the VFS and fsnotify internals, would you
> consider this a regression, or was there never any intentional guarantee
> for that behavior and it's best we work around this change in userspace?

Thanks for the report.

To be sure this issue doesn't fall through the cracks unnoticed, I'm
adding it to regzbot, my Linux kernel regression tracking bot:

#regzbot ^introduced 116b9731ad76..7377f5bec133
#regzbot title fsnotify: regression due to the fsnotify_nameremove()
rework in 5.3
#regzbot ignore-activity

Reminder: when fixing the issue, please add a 'Link:' tag with the URL
to the report (the parent of this mail) using the kernel.org redirector,
as explained in 'Documentation/process/submitting-patches.rst'. Regzbot
then will automatically mark the regression as resolved once the fix
lands in the appropriate tree. For more details about regzbot see footer.

Sending this to everyone that got the initial report, to make all aware
of the tracking. I also hope that messages like this motivate people to
directly get at least the regression mailing list and ideally even
regzbot involved when dealing with regressions, as messages like this
wouldn't be needed then.

Don't worry, I'll send further messages wrt to this regression just to
the lists (with a tag in the subject so people can filter them away), as
long as they are intended just for regzbot. With a bit of luck no such
messages will be needed anyway.

Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat)

P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
on my table. I can only look briefly into most of them. Unfortunately
therefore I sometimes will get things wrong or miss something important.
I hope that's not the case here; if you think it is, don't hesitate to
tell me about it in a public reply, that's in everyone's interest.

BTW, I have no personal interest in this issue, which is tracked using
regzbot, my Linux kernel regression tracking bot
(https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
this mail to get things rolling again and hence don't need to be CC on
all further activities wrt to this regression.

---
Additional information about regzbot:

If you want to know more about regzbot, check out its web-interface, the
getting start guide, and/or the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The last two documents will explain how you can interact with regzbot
yourself if your want to.

Hint for reporters: when reporting a regression it's in your interest to
tell #regzbot about it in the report, as that will ensure the regression
gets on the radar of regzbot and the regression tracker. That's in your
interest, as they will make sure the report won't fall through the
cracks unnoticed.

Hint for developers: you normally don't need to care about regzbot once
it's involved. Fix the issue as you normally would, just remember to
include a 'Link:' tag to the report in the commit message, as explained
in Documentation/process/submitting-patches.rst
That aspect was recently was made more explicit in commit 1f57bd42b77c:
https://git.kernel.org/linus/1f57bd42b77c


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potential regression after fsnotify_nameremove() rework in 5.3
  2022-01-16 10:14     ` Amir Goldstein
@ 2022-01-17  2:34       ` Ivan Delalande
  2022-01-17 13:14         ` Amir Goldstein
  0 siblings, 1 reply; 12+ messages in thread
From: Ivan Delalande @ 2022-01-17  2:34 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: linux-fsdevel, Jan Kara

On Sun, Jan 16, 2022 at 12:14:01PM +0200, Amir Goldstein wrote:
> On Sun, Jan 16, 2022 at 3:20 AM Ivan Delalande <colona@arista.com> wrote:
>> On Sat, Jan 15, 2022 at 09:50:20PM +0200, Amir Goldstein wrote:
>>> On Sat, Jan 15, 2022 at 5:11 AM Ivan Delalande <colona@arista.com> wrote:
>>>> Sorry to bring this up so late but we might have found a regression
>>>> introduced by your "Sort out fsnotify_nameremove() mess" patch series
>>>> merged in 5.3 (116b9731ad76..7377f5bec133), and that can still be
>>>> reproduced on v5.16.
>>>>
>>>> Some of our processes use inotify to watch for IN_DELETE events (for
>>>> files on tmpfs mostly), and relied on the fact that once such events are
>>>> received, the files they refer to have actually been unlinked and can't
>>>> be open/read. So if and once open() succeeds then it is a new version of
>>>> the file that has been recreated with new content.
>>>>
>>>> This was true and working reliably before 5.3, but changed after
>>>> 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
>>>> d_delete()") specifically. There is now a time window where a process
>>>> receiving one of those IN_DELETE events may still be able to open the
>>>> file and read its old content before it's really unlinked from the FS.
>>>
>>> This is a bit surprising to me.
>>> Do you have a reproducer?
>>
>> Yeah, I was using the following one to bisect this. It will print a
>> message every time it succeeds to read the file after receiving a
>> IN_DELETE event when run with something like `mkdir /tmp/foo;
>> ./indelchurn /tmp/foo`. It seems to hit pretty frequently and reliably
>> on various systems after 5.3, even for different #define-parameters.
>>
> 
> I see yes, it's a race between fsnotify_unlink() and d_delete()
> fsnotify_unlink() in explicitly required to be called before d_delete(), so
> it has the d_inode information and that leaves a windows for opening
> the file from cached dentry before d_delete().
> 
> I would rather that we try to address this not as a regression until
> there is proof of more users that expect the behavior you mentioned.
> I would like to provide you an API to opt-in for this behavior, because
> fixing it for everyone may cause other workloads to break.
> 
> Please test the attached patch on top of v5.16 and use
> IN_DELETE|IN_EXCL_UNLINK as the watch mask for testing.
> 
> I am assuming that it would be possible for you to modify the application
> and add the IN_EXCL_UNLINK flag and that your application does not
> care about getting IN_OPEN events on unlinked files?
> 
> My patch overloads the existing flag IN_EXCL_UNLINK with a new
> meaning. It's a bit of a hack and we can use some other flag if we need to
> but it actually makes some sense that an application that does not care for
> events on d_unlinked() files will be guaranteed to not get those events
> after getting an IN_DELETE event. It is another form of the race that you
> described.
> 
> Will that solution work out for you?

Yeah, sounds perfect for us, and adding IN_EXCL_UNLINK to our
applications is fine indeed. I've tested the 5.16 patch on my laptop
with the reproducer and can't reproduce the issue. I've also tried the
5.10 patch on our products and also stop seeing the issue both with
the reproducer but also with our internal applications and test cases
that made us look into this initially. So this looks like a good fix on
our side at least.

>>>> I'm not very familiar with the VFS and fsnotify internals, would you
>>>> consider this a regression, or was there never any intentional guarantee
>>>> for that behavior and it's best we work around this change in userspace?
>>>
>>> It may be a regression if applications depend on behavior that changed,
>>> but if are open for changes in your application please describe in more details
>>> what it tries to accomplish using IN_DELETE and the ekernel your application
>>> is running on and then I may be able to recommend a more reliable method.
>>
>> I can discuss it with our team and get more details on this but it may
>> be pretty complicated and costly to change. My understanding is that
>> these watched files are used as ID/version references for in-memory
>> objects shared between multiple processes to synchronize state, and
>> resynchronize when there are crashes or restarts. So they can be
>> recreated in place with the same or a different content, and so their
>> inode number or mtime etc. may not be useable as additonnal check.
>>
>> I think we can also have a very large number of these objects and files
>> on some configurations, so waiting to see if we have a following
>> IN_CREATE, or adding more checks/synchronization logic will probably
>> have a significant impact at scale.
>>
>> And we're targeting 5.10 and building it with our own config.
>>
> 
> I am not convinced that this is needed as a regression fix for stable
> kernels, but since you are building your own kernel I can help you do the
> backport - attached *untested* patch for v5.10 that also backport of
> "fsnotify: pass dentry instead of inode data".

Thanks a lot for making the backport too, actually later 5.10.y have
your 950cc0d2bef0 and fecc4559780d changes, so I think we should just
use the same fsnotify.c patch hunk as your 5.16 patch (but yeah still
need the fsnotify_data_dentry bit). That's what I tested on our product,
and as mentionned above seems to be working great.

And we do carry a few custom patches and mainline backports so it's
perfectly fine if you prefer not to submit it to stable.

Thanks again,

-- 
Ivan Delalande
Arista Networks

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potential regression after fsnotify_nameremove() rework in 5.3
  2022-01-17  2:34       ` Ivan Delalande
@ 2022-01-17 13:14         ` Amir Goldstein
  2022-01-17 14:21           ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2022-01-17 13:14 UTC (permalink / raw)
  To: Ivan Delalande; +Cc: linux-fsdevel, Jan Kara

On Mon, Jan 17, 2022 at 4:34 AM Ivan Delalande <colona@arista.com> wrote:
>
> On Sun, Jan 16, 2022 at 12:14:01PM +0200, Amir Goldstein wrote:
> > On Sun, Jan 16, 2022 at 3:20 AM Ivan Delalande <colona@arista.com> wrote:
> >> On Sat, Jan 15, 2022 at 09:50:20PM +0200, Amir Goldstein wrote:
> >>> On Sat, Jan 15, 2022 at 5:11 AM Ivan Delalande <colona@arista.com> wrote:
> >>>> Sorry to bring this up so late but we might have found a regression
> >>>> introduced by your "Sort out fsnotify_nameremove() mess" patch series
> >>>> merged in 5.3 (116b9731ad76..7377f5bec133), and that can still be
> >>>> reproduced on v5.16.
> >>>>
> >>>> Some of our processes use inotify to watch for IN_DELETE events (for
> >>>> files on tmpfs mostly), and relied on the fact that once such events are
> >>>> received, the files they refer to have actually been unlinked and can't
> >>>> be open/read. So if and once open() succeeds then it is a new version of
> >>>> the file that has been recreated with new content.
> >>>>
> >>>> This was true and working reliably before 5.3, but changed after
> >>>> 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
> >>>> d_delete()") specifically. There is now a time window where a process
> >>>> receiving one of those IN_DELETE events may still be able to open the
> >>>> file and read its old content before it's really unlinked from the FS.
> >>>
> >>> This is a bit surprising to me.
> >>> Do you have a reproducer?
> >>
> >> Yeah, I was using the following one to bisect this. It will print a
> >> message every time it succeeds to read the file after receiving a
> >> IN_DELETE event when run with something like `mkdir /tmp/foo;
> >> ./indelchurn /tmp/foo`. It seems to hit pretty frequently and reliably
> >> on various systems after 5.3, even for different #define-parameters.
> >>
> >
> > I see yes, it's a race between fsnotify_unlink() and d_delete()
> > fsnotify_unlink() in explicitly required to be called before d_delete(), so
> > it has the d_inode information and that leaves a windows for opening
> > the file from cached dentry before d_delete().
> >
> > I would rather that we try to address this not as a regression until
> > there is proof of more users that expect the behavior you mentioned.
> > I would like to provide you an API to opt-in for this behavior, because
> > fixing it for everyone may cause other workloads to break.
> >
> > Please test the attached patch on top of v5.16 and use
> > IN_DELETE|IN_EXCL_UNLINK as the watch mask for testing.
> >
> > I am assuming that it would be possible for you to modify the application
> > and add the IN_EXCL_UNLINK flag and that your application does not
> > care about getting IN_OPEN events on unlinked files?
> >
> > My patch overloads the existing flag IN_EXCL_UNLINK with a new
> > meaning. It's a bit of a hack and we can use some other flag if we need to
> > but it actually makes some sense that an application that does not care for
> > events on d_unlinked() files will be guaranteed to not get those events
> > after getting an IN_DELETE event. It is another form of the race that you
> > described.
> >
> > Will that solution work out for you?
>
> Yeah, sounds perfect for us, and adding IN_EXCL_UNLINK to our
> applications is fine indeed. I've tested the 5.16 patch on my laptop
> with the reproducer and can't reproduce the issue. I've also tried the
> 5.10 patch on our products and also stop seeing the issue both with
> the reproducer but also with our internal applications and test cases
> that made us look into this initially. So this looks like a good fix on
> our side at least.
>

I am glad the patch addresses your issue.
However, I am not sure if I should even post it upstream,
unless more people ask for it.

My point of view is that IN_DELETE does not have enough
information for an "invalidate file" message.
FAN_DELETE, otoh, with recently merged FAN_REPORT_TARGET_FID
includes an information record with the unique and non-reusable file id of the
unlinked inode.

That should allow your application to correctly invalidate the state files
that it accesses on kernel >= v5.17.

Jan, do you have a different opinion?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potential regression after fsnotify_nameremove() rework in 5.3
  2022-01-17 13:14         ` Amir Goldstein
@ 2022-01-17 14:21           ` Jan Kara
  2022-01-17 19:09             ` Amir Goldstein
  0 siblings, 1 reply; 12+ messages in thread
From: Jan Kara @ 2022-01-17 14:21 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Ivan Delalande, linux-fsdevel, Jan Kara

On Mon 17-01-22 15:14:53, Amir Goldstein wrote:
> On Mon, Jan 17, 2022 at 4:34 AM Ivan Delalande <colona@arista.com> wrote:
> >
> > On Sun, Jan 16, 2022 at 12:14:01PM +0200, Amir Goldstein wrote:
> > > On Sun, Jan 16, 2022 at 3:20 AM Ivan Delalande <colona@arista.com> wrote:
> > >> On Sat, Jan 15, 2022 at 09:50:20PM +0200, Amir Goldstein wrote:
> > >>> On Sat, Jan 15, 2022 at 5:11 AM Ivan Delalande <colona@arista.com> wrote:
> > >>>> Sorry to bring this up so late but we might have found a regression
> > >>>> introduced by your "Sort out fsnotify_nameremove() mess" patch series
> > >>>> merged in 5.3 (116b9731ad76..7377f5bec133), and that can still be
> > >>>> reproduced on v5.16.
> > >>>>
> > >>>> Some of our processes use inotify to watch for IN_DELETE events (for
> > >>>> files on tmpfs mostly), and relied on the fact that once such events are
> > >>>> received, the files they refer to have actually been unlinked and can't
> > >>>> be open/read. So if and once open() succeeds then it is a new version of
> > >>>> the file that has been recreated with new content.
> > >>>>
> > >>>> This was true and working reliably before 5.3, but changed after
> > >>>> 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
> > >>>> d_delete()") specifically. There is now a time window where a process
> > >>>> receiving one of those IN_DELETE events may still be able to open the
> > >>>> file and read its old content before it's really unlinked from the FS.
> > >>>
> > >>> This is a bit surprising to me.
> > >>> Do you have a reproducer?
> > >>
> > >> Yeah, I was using the following one to bisect this. It will print a
> > >> message every time it succeeds to read the file after receiving a
> > >> IN_DELETE event when run with something like `mkdir /tmp/foo;
> > >> ./indelchurn /tmp/foo`. It seems to hit pretty frequently and reliably
> > >> on various systems after 5.3, even for different #define-parameters.
> > >>
> > >
> > > I see yes, it's a race between fsnotify_unlink() and d_delete()
> > > fsnotify_unlink() in explicitly required to be called before d_delete(), so
> > > it has the d_inode information and that leaves a windows for opening
> > > the file from cached dentry before d_delete().
> > >
> > > I would rather that we try to address this not as a regression until
> > > there is proof of more users that expect the behavior you mentioned.
> > > I would like to provide you an API to opt-in for this behavior, because
> > > fixing it for everyone may cause other workloads to break.
> > >
> > > Please test the attached patch on top of v5.16 and use
> > > IN_DELETE|IN_EXCL_UNLINK as the watch mask for testing.
> > >
> > > I am assuming that it would be possible for you to modify the application
> > > and add the IN_EXCL_UNLINK flag and that your application does not
> > > care about getting IN_OPEN events on unlinked files?
> > >
> > > My patch overloads the existing flag IN_EXCL_UNLINK with a new
> > > meaning. It's a bit of a hack and we can use some other flag if we need to
> > > but it actually makes some sense that an application that does not care for
> > > events on d_unlinked() files will be guaranteed to not get those events
> > > after getting an IN_DELETE event. It is another form of the race that you
> > > described.
> > >
> > > Will that solution work out for you?
> >
> > Yeah, sounds perfect for us, and adding IN_EXCL_UNLINK to our
> > applications is fine indeed. I've tested the 5.16 patch on my laptop
> > with the reproducer and can't reproduce the issue. I've also tried the
> > 5.10 patch on our products and also stop seeing the issue both with
> > the reproducer but also with our internal applications and test cases
> > that made us look into this initially. So this looks like a good fix on
> > our side at least.
> >
> 
> I am glad the patch addresses your issue.
> However, I am not sure if I should even post it upstream,
> unless more people ask for it.
> 
> My point of view is that IN_DELETE does not have enough
> information for an "invalidate file" message.
> FAN_DELETE, otoh, with recently merged FAN_REPORT_TARGET_FID
> includes an information record with the unique and non-reusable file id of the
> unlinked inode.
> 
> That should allow your application to correctly invalidate the state files
> that it accesses on kernel >= v5.17.
> 
> Jan, do you have a different opinion?

Yeah, I was thinking about this. I don't quite like your hack with inotify
flag. Firstly, it requires cooperation from userspace (setting the flag),
secondly, d_drop() in fsnotify code is unexpected and ugly on the kernel
side, and overall adding yet another special case to fsnotify code is not
very compelling either.

I agree transitioning to fanotify may be a nice solution for the
application but I'm not sure how viable that is short term (requiring very
new kernel, maybe non-trivial cost of porting the application to fanotify).
Since this fully lies within the "we do not regress userspace" boundaries -
I'm not surprised the application does not expect to see a file for which
it got IN_DELETE event - I guess we should solve this transparently within
the kernel if we can. So far we've got only one report but I'd say there
are other applications like this out there, just they didn't transition to
new enough kernel yet or were lucky enough to not hit the problem yet.

One possibility I can see is: Add fsnotify primitive to create the event,
just not queue it in the notification queue yet (essentially we would
cut-short the event handling before calling fsnotify_insert_event() /
fsnotify_add_event()), only return it. Then another primitive would be for
queueing already prepared event. Then the sequence for unlink would be:

	LIST_HEAD(event_list);

	fsnotify_events_prepare(&event_list, ...);
	d_delete(dentry);
	fsnotify_events_report(&event_list);

And we can optionally wrap this inside d_delete_notify() to make it easier
on the callers. What do you think?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potential regression after fsnotify_nameremove() rework in 5.3
  2022-01-17 14:21           ` Jan Kara
@ 2022-01-17 19:09             ` Amir Goldstein
  2022-01-17 22:02               ` Amir Goldstein
  0 siblings, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2022-01-17 19:09 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ivan Delalande, linux-fsdevel

On Mon, Jan 17, 2022 at 4:21 PM Jan Kara <jack@suse.cz> wrote:
>
> On Mon 17-01-22 15:14:53, Amir Goldstein wrote:
> > On Mon, Jan 17, 2022 at 4:34 AM Ivan Delalande <colona@arista.com> wrote:
> > >
> > > On Sun, Jan 16, 2022 at 12:14:01PM +0200, Amir Goldstein wrote:
> > > > On Sun, Jan 16, 2022 at 3:20 AM Ivan Delalande <colona@arista.com> wrote:
> > > >> On Sat, Jan 15, 2022 at 09:50:20PM +0200, Amir Goldstein wrote:
> > > >>> On Sat, Jan 15, 2022 at 5:11 AM Ivan Delalande <colona@arista.com> wrote:
> > > >>>> Sorry to bring this up so late but we might have found a regression
> > > >>>> introduced by your "Sort out fsnotify_nameremove() mess" patch series
> > > >>>> merged in 5.3 (116b9731ad76..7377f5bec133), and that can still be
> > > >>>> reproduced on v5.16.
> > > >>>>
> > > >>>> Some of our processes use inotify to watch for IN_DELETE events (for
> > > >>>> files on tmpfs mostly), and relied on the fact that once such events are
> > > >>>> received, the files they refer to have actually been unlinked and can't
> > > >>>> be open/read. So if and once open() succeeds then it is a new version of
> > > >>>> the file that has been recreated with new content.
> > > >>>>
> > > >>>> This was true and working reliably before 5.3, but changed after
> > > >>>> 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
> > > >>>> d_delete()") specifically. There is now a time window where a process
> > > >>>> receiving one of those IN_DELETE events may still be able to open the
> > > >>>> file and read its old content before it's really unlinked from the FS.
> > > >>>
> > > >>> This is a bit surprising to me.
> > > >>> Do you have a reproducer?
> > > >>
> > > >> Yeah, I was using the following one to bisect this. It will print a
> > > >> message every time it succeeds to read the file after receiving a
> > > >> IN_DELETE event when run with something like `mkdir /tmp/foo;
> > > >> ./indelchurn /tmp/foo`. It seems to hit pretty frequently and reliably
> > > >> on various systems after 5.3, even for different #define-parameters.
> > > >>
> > > >
> > > > I see yes, it's a race between fsnotify_unlink() and d_delete()
> > > > fsnotify_unlink() in explicitly required to be called before d_delete(), so
> > > > it has the d_inode information and that leaves a windows for opening
> > > > the file from cached dentry before d_delete().
> > > >
> > > > I would rather that we try to address this not as a regression until
> > > > there is proof of more users that expect the behavior you mentioned.
> > > > I would like to provide you an API to opt-in for this behavior, because
> > > > fixing it for everyone may cause other workloads to break.
> > > >
> > > > Please test the attached patch on top of v5.16 and use
> > > > IN_DELETE|IN_EXCL_UNLINK as the watch mask for testing.
> > > >
> > > > I am assuming that it would be possible for you to modify the application
> > > > and add the IN_EXCL_UNLINK flag and that your application does not
> > > > care about getting IN_OPEN events on unlinked files?
> > > >
> > > > My patch overloads the existing flag IN_EXCL_UNLINK with a new
> > > > meaning. It's a bit of a hack and we can use some other flag if we need to
> > > > but it actually makes some sense that an application that does not care for
> > > > events on d_unlinked() files will be guaranteed to not get those events
> > > > after getting an IN_DELETE event. It is another form of the race that you
> > > > described.
> > > >
> > > > Will that solution work out for you?
> > >
> > > Yeah, sounds perfect for us, and adding IN_EXCL_UNLINK to our
> > > applications is fine indeed. I've tested the 5.16 patch on my laptop
> > > with the reproducer and can't reproduce the issue. I've also tried the
> > > 5.10 patch on our products and also stop seeing the issue both with
> > > the reproducer but also with our internal applications and test cases
> > > that made us look into this initially. So this looks like a good fix on
> > > our side at least.
> > >
> >
> > I am glad the patch addresses your issue.
> > However, I am not sure if I should even post it upstream,
> > unless more people ask for it.
> >
> > My point of view is that IN_DELETE does not have enough
> > information for an "invalidate file" message.
> > FAN_DELETE, otoh, with recently merged FAN_REPORT_TARGET_FID
> > includes an information record with the unique and non-reusable file id of the
> > unlinked inode.
> >
> > That should allow your application to correctly invalidate the state files
> > that it accesses on kernel >= v5.17.
> >
> > Jan, do you have a different opinion?
>
> Yeah, I was thinking about this. I don't quite like your hack with inotify
> flag. Firstly, it requires cooperation from userspace (setting the flag),
> secondly, d_drop() in fsnotify code is unexpected and ugly on the kernel
> side, and overall adding yet another special case to fsnotify code is not
> very compelling either.
>
> I agree transitioning to fanotify may be a nice solution for the
> application but I'm not sure how viable that is short term (requiring very
> new kernel, maybe non-trivial cost of porting the application to fanotify).
> Since this fully lies within the "we do not regress userspace" boundaries -
> I'm not surprised the application does not expect to see a file for which
> it got IN_DELETE event - I guess we should solve this transparently within
> the kernel if we can. So far we've got only one report but I'd say there
> are other applications like this out there, just they didn't transition to
> new enough kernel yet or were lucky enough to not hit the problem yet.
>
> One possibility I can see is: Add fsnotify primitive to create the event,
> just not queue it in the notification queue yet (essentially we would
> cut-short the event handling before calling fsnotify_insert_event() /
> fsnotify_add_event()), only return it. Then another primitive would be for
> queueing already prepared event. Then the sequence for unlink would be:
>
>         LIST_HEAD(event_list);
>
>         fsnotify_events_prepare(&event_list, ...);
>         d_delete(dentry);
>         fsnotify_events_report(&event_list);
>
> And we can optionally wrap this inside d_delete_notify() to make it easier
> on the callers. What do you think?
>

I think it sounds like the "correct" design, but also sounds like a
big change that
is not so practical for backporting.

Given that this is a regression that goes way back, backportability
plays a role.
Also, a big change like this needs developer time, which I myself don't have
at the moment.

For a simpler backportable solution, instead of preparing the event
perhaps it is enough that we ihold() the inode until after fsnotify_unlink()
and pass it as an argument very similar to fsnotify_link().

The question is how to ihold() the inode only if we are going to queue
an IN_DELETE event? Maybe send an internal FS_PRE_DELETE
event?

I am currently out of better ideas.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potential regression after fsnotify_nameremove() rework in 5.3
  2022-01-17 19:09             ` Amir Goldstein
@ 2022-01-17 22:02               ` Amir Goldstein
  2022-01-18 10:06                 ` Jan Kara
  0 siblings, 1 reply; 12+ messages in thread
From: Amir Goldstein @ 2022-01-17 22:02 UTC (permalink / raw)
  To: Jan Kara; +Cc: Ivan Delalande, linux-fsdevel

> > One possibility I can see is: Add fsnotify primitive to create the event,
> > just not queue it in the notification queue yet (essentially we would
> > cut-short the event handling before calling fsnotify_insert_event() /
> > fsnotify_add_event()), only return it. Then another primitive would be for
> > queueing already prepared event. Then the sequence for unlink would be:
> >
> >         LIST_HEAD(event_list);
> >
> >         fsnotify_events_prepare(&event_list, ...);
> >         d_delete(dentry);
> >         fsnotify_events_report(&event_list);
> >
> > And we can optionally wrap this inside d_delete_notify() to make it easier
> > on the callers. What do you think?
> >
>
> I think it sounds like the "correct" design, but also sounds like a
> big change that
> is not so practical for backporting.
>
> Given that this is a regression that goes way back, backportability
> plays a role.
> Also, a big change like this needs developer time, which I myself don't have
> at the moment.
>
> For a simpler backportable solution, instead of preparing the event
> perhaps it is enough that we ihold() the inode until after fsnotify_unlink()
> and pass it as an argument very similar to fsnotify_link().
>
> The question is how to ihold() the inode only if we are going to queue
> an IN_DELETE event? Maybe send an internal FS_PRE_DELETE
> event?
>

Actually, seeing that do_unlinkat() already iholds the inode outside
vfs_unlink()
anyway, is it so bad that vfs_unlink() will elevate refcount as well, so we can
call fsnotify_unlink() with the inode arg after d_delete()?

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potential regression after fsnotify_nameremove() rework in 5.3
  2022-01-17 22:02               ` Amir Goldstein
@ 2022-01-18 10:06                 ` Jan Kara
  0 siblings, 0 replies; 12+ messages in thread
From: Jan Kara @ 2022-01-18 10:06 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Jan Kara, Ivan Delalande, linux-fsdevel

On Tue 18-01-22 00:02:38, Amir Goldstein wrote:
> > > One possibility I can see is: Add fsnotify primitive to create the event,
> > > just not queue it in the notification queue yet (essentially we would
> > > cut-short the event handling before calling fsnotify_insert_event() /
> > > fsnotify_add_event()), only return it. Then another primitive would be for
> > > queueing already prepared event. Then the sequence for unlink would be:
> > >
> > >         LIST_HEAD(event_list);
> > >
> > >         fsnotify_events_prepare(&event_list, ...);
> > >         d_delete(dentry);
> > >         fsnotify_events_report(&event_list);
> > >
> > > And we can optionally wrap this inside d_delete_notify() to make it easier
> > > on the callers. What do you think?
> > >
> >
> > I think it sounds like the "correct" design, but also sounds like a
> > big change that
> > is not so practical for backporting.
> >
> > Given that this is a regression that goes way back, backportability
> > plays a role.
> > Also, a big change like this needs developer time, which I myself don't have
> > at the moment.

Agreed. I'm for simplicity as well.

> > For a simpler backportable solution, instead of preparing the event
> > perhaps it is enough that we ihold() the inode until after fsnotify_unlink()
> > and pass it as an argument very similar to fsnotify_link().
> >
> > The question is how to ihold() the inode only if we are going to queue
> > an IN_DELETE event? Maybe send an internal FS_PRE_DELETE
> > event?
> >
> 
> Actually, seeing that do_unlinkat() already iholds the inode outside
> vfs_unlink()
> anyway, is it so bad that vfs_unlink() will elevate refcount as well, so we can
> call fsnotify_unlink() with the inode arg after d_delete()?

No, that's what I wanted to suggest :) ihold() should be pretty cheap as,
as you have noticed, the cacheline should be dirty and hot anyway. I think
this is by far cheaper than trying to find out whether the event will be
sent or not.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Potential regression after fsnotify_nameremove() rework in 5.3 #forregzbot
  2022-01-16 10:16 ` Thorsten Leemhuis
@ 2022-01-28 14:09   ` Thorsten Leemhuis
  0 siblings, 0 replies; 12+ messages in thread
From: Thorsten Leemhuis @ 2022-01-28 14:09 UTC (permalink / raw)
  To: regressions

#regzbot monitor:
https://lore.kernel.org/all/20220120215305.282577-1-amir73il@gmail.com/

TWIMC: this mail is primarily send for documentation purposes and for
regzbot, my Linux kernel regression tracking bot. These mails usually
contain '#forregzbot' in the subject, to make them easy to spot and filter.

On 16.01.22 11:16, Thorsten Leemhuis wrote:
> [TLDR: I'm adding this regression to regzbot, the Linux kernel
> regression tracking bot; most text you find below is compiled from a few
> templates paragraphs some of you might have seen already.]
> 
> Hi, this is your Linux kernel regression tracker speaking.
> 
> Adding the regression mailing list to the list of recipients, as it
> should be in the loop for all regressions, as explained here:
> https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
> 
> On 15.01.22 04:11, Ivan Delalande wrote:
>> Hi,
>>
>> Sorry to bring this up so late but we might have found a regression
>> introduced by your "Sort out fsnotify_nameremove() mess" patch series
>> merged in 5.3 (116b9731ad76..7377f5bec133), and that can still be
>> reproduced on v5.16.
>>
>> Some of our processes use inotify to watch for IN_DELETE events (for
>> files on tmpfs mostly), and relied on the fact that once such events are
>> received, the files they refer to have actually been unlinked and can't
>> be open/read. So if and once open() succeeds then it is a new version of
>> the file that has been recreated with new content.
>>
>> This was true and working reliably before 5.3, but changed after
>> 49246466a989 ("fsnotify: move fsnotify_nameremove() hook out of
>> d_delete()") specifically. There is now a time window where a process
>> receiving one of those IN_DELETE events may still be able to open the
>> file and read its old content before it's really unlinked from the FS.
>>
>> I'm not very familiar with the VFS and fsnotify internals, would you
>> consider this a regression, or was there never any intentional guarantee
>> for that behavior and it's best we work around this change in userspace?
> 
> Thanks for the report.
> 
> To be sure this issue doesn't fall through the cracks unnoticed, I'm
> adding it to regzbot, my Linux kernel regression tracking bot:
> 
> #regzbot ^introduced 116b9731ad76..7377f5bec133
> #regzbot title fsnotify: regression due to the fsnotify_nameremove()
> rework in 5.3
> #regzbot ignore-activity
> 
> Reminder: when fixing the issue, please add a 'Link:' tag with the URL
> to the report (the parent of this mail) using the kernel.org redirector,
> as explained in 'Documentation/process/submitting-patches.rst'. Regzbot
> then will automatically mark the regression as resolved once the fix
> lands in the appropriate tree. For more details about regzbot see footer.
> 
> Sending this to everyone that got the initial report, to make all aware
> of the tracking. I also hope that messages like this motivate people to
> directly get at least the regression mailing list and ideally even
> regzbot involved when dealing with regressions, as messages like this
> wouldn't be needed then.
> 
> Don't worry, I'll send further messages wrt to this regression just to
> the lists (with a tag in the subject so people can filter them away), as
> long as they are intended just for regzbot. With a bit of luck no such
> messages will be needed anyway.
> 
> Ciao, Thorsten (wearing his 'Linux kernel regression tracker' hat)
> 
> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
> on my table. I can only look briefly into most of them. Unfortunately
> therefore I sometimes will get things wrong or miss something important.
> I hope that's not the case here; if you think it is, don't hesitate to
> tell me about it in a public reply, that's in everyone's interest.
> 
> BTW, I have no personal interest in this issue, which is tracked using
> regzbot, my Linux kernel regression tracking bot
> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
> this mail to get things rolling again and hence don't need to be CC on
> all further activities wrt to this regression.
> 
> ---
> Additional information about regzbot:
> 
> If you want to know more about regzbot, check out its web-interface, the
> getting start guide, and/or the references documentation:
> 
> https://linux-regtracking.leemhuis.info/regzbot/
> https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
> https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md
> 
> The last two documents will explain how you can interact with regzbot
> yourself if your want to.
> 
> Hint for reporters: when reporting a regression it's in your interest to
> tell #regzbot about it in the report, as that will ensure the regression
> gets on the radar of regzbot and the regression tracker. That's in your
> interest, as they will make sure the report won't fall through the
> cracks unnoticed.
> 
> Hint for developers: you normally don't need to care about regzbot once
> it's involved. Fix the issue as you normally would, just remember to
> include a 'Link:' tag to the report in the commit message, as explained
> in Documentation/process/submitting-patches.rst
> That aspect was recently was made more explicit in commit 1f57bd42b77c:
> https://git.kernel.org/linus/1f57bd42b77c
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-01-28 14:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-01-15  3:11 Potential regression after fsnotify_nameremove() rework in 5.3 Ivan Delalande
2022-01-15 19:50 ` Amir Goldstein
2022-01-16  1:20   ` Ivan Delalande
2022-01-16 10:14     ` Amir Goldstein
2022-01-17  2:34       ` Ivan Delalande
2022-01-17 13:14         ` Amir Goldstein
2022-01-17 14:21           ` Jan Kara
2022-01-17 19:09             ` Amir Goldstein
2022-01-17 22:02               ` Amir Goldstein
2022-01-18 10:06                 ` Jan Kara
2022-01-16 10:16 ` Thorsten Leemhuis
2022-01-28 14:09   ` Potential regression after fsnotify_nameremove() rework in 5.3 #forregzbot Thorsten Leemhuis

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.