linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fuse: update size attr before doing IO
@ 2024-03-07 15:08 Sweet Tea Dorminy
  2024-03-07 16:09 ` Josef Bacik
  2024-03-11 10:01 ` Miklos Szeredi
  0 siblings, 2 replies; 8+ messages in thread
From: Sweet Tea Dorminy @ 2024-03-07 15:08 UTC (permalink / raw)
  To: miklos, linux-fsdevel, kernel-team, josef, amir73il; +Cc: Sweet Tea Dorminy

All calls into generic vfs functions need to make sure that the inode
attributes used by those functions are up to date, by calling
fuse_update_attributes() as appropriate.

generic_write_checks() accesses inode size in order to get the
appropriate file offset for files opened with O_APPEND. Currently, in
some cases, fuse_update_attributes() is not called before
generic_write_checks(), potentially resulting in corruption/overwrite of
previously appended data if i_size is out of date in the cached inode.

Therefore,  make sure fuse_update_attributes() is always
called before generic_write_checks(), and add a note about why it's not
necessary for some llseek calls.

Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
---
 fs/fuse/file.c | 32 ++++++++++++++++++++++++--------
 1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 1b0b9f2a4fbf..d69b2dd0168c 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -1408,13 +1408,16 @@ static ssize_t fuse_cache_write_iter(struct kiocb *iocb, struct iov_iter *from)
 	ssize_t err, count;
 	struct fuse_conn *fc = get_fuse_conn(inode);
 
-	if (fc->writeback_cache) {
-		/* Update size (EOF optimization) and mode (SUID clearing) */
-		err = fuse_update_attributes(mapping->host, file,
-					     STATX_SIZE | STATX_MODE);
-		if (err)
-			return err;
+	/*
+	 * Update size (EOF optimization, and O_APPEND correctness) and
+	 * mode (SUID clearing)
+	 */
+	err = fuse_update_attributes(mapping->host, file,
+				     STATX_SIZE | STATX_MODE);
+	if (err)
+		return err;
 
+	if (fc->writeback_cache) {
 		if (fc->handle_killpriv_v2 &&
 		    setattr_should_drop_suidgid(&nop_mnt_idmap,
 						file_inode(file))) {
@@ -1666,10 +1669,19 @@ static ssize_t fuse_direct_read_iter(struct kiocb *iocb, struct iov_iter *to)
 
 static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from)
 {
-	struct inode *inode = file_inode(iocb->ki_filp);
+	struct file *file = iocb->ki_filp;
+	struct inode *inode = file_inode(file);
 	struct fuse_io_priv io = FUSE_IO_PRIV_SYNC(iocb);
 	ssize_t res;
 	bool exclusive;
+	/*
+	 * For O_APPEND files, we need to make sure the size is right before
+	 * generic_write_checks() grabs it.
+	 */
+	res = fuse_update_attributes(file->f_mapping->host, file, STATX_SIZE);
+	if (res)
+		return res;
+
 
 	fuse_dio_lock(iocb, from, &exclusive);
 	res = generic_write_checks(iocb, from);
@@ -2815,7 +2827,11 @@ static loff_t fuse_file_llseek(struct file *file, loff_t offset, int whence)
 	switch (whence) {
 	case SEEK_SET:
 	case SEEK_CUR:
-		 /* No i_mutex protection necessary for SEEK_CUR and SEEK_SET */
+		 /*
+		  * No i_mutex protection necessary for SEEK_CUR and SEEK_SET.
+		  * Even if we seek to a point outside the currently known size,
+		  * read and write will update the attributes before doing IO
+		  */
 		retval = generic_file_llseek(file, offset, whence);
 		break;
 	case SEEK_END:

base-commit: cdf6ac2a03d253f05d3e798f60f23dea1b176b92
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] fuse: update size attr before doing IO
  2024-03-07 15:08 [PATCH] fuse: update size attr before doing IO Sweet Tea Dorminy
@ 2024-03-07 16:09 ` Josef Bacik
  2024-03-07 19:39   ` Bernd Schubert
  2024-03-11 10:01 ` Miklos Szeredi
  1 sibling, 1 reply; 8+ messages in thread
From: Josef Bacik @ 2024-03-07 16:09 UTC (permalink / raw)
  To: Sweet Tea Dorminy; +Cc: miklos, linux-fsdevel, kernel-team, amir73il

On Thu, Mar 07, 2024 at 10:08:13AM -0500, Sweet Tea Dorminy wrote:
> All calls into generic vfs functions need to make sure that the inode
> attributes used by those functions are up to date, by calling
> fuse_update_attributes() as appropriate.
> 
> generic_write_checks() accesses inode size in order to get the
> appropriate file offset for files opened with O_APPEND. Currently, in
> some cases, fuse_update_attributes() is not called before
> generic_write_checks(), potentially resulting in corruption/overwrite of
> previously appended data if i_size is out of date in the cached inode.
> 
> Therefore,  make sure fuse_update_attributes() is always
> called before generic_write_checks(), and add a note about why it's not
> necessary for some llseek calls.
> 
> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>

I had to ask questions and go look at the code, mostly because I'm not a FUSE
developer.  fuse_update_attributes() doesn't actually do anything if the stats
aren't invalidated, I was concerned we were suddenly adding a lot of overhead
for every write call.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

I have a question for the normal FUSE developers, how would one test this?
There doesn't appear to be a mechanism for writing stub FUSE fs's to exercise a
case like this in fstests.  Is there some other way you guys would test this or
is this something we need to build out ourselves?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fuse: update size attr before doing IO
  2024-03-07 16:09 ` Josef Bacik
@ 2024-03-07 19:39   ` Bernd Schubert
  0 siblings, 0 replies; 8+ messages in thread
From: Bernd Schubert @ 2024-03-07 19:39 UTC (permalink / raw)
  To: Josef Bacik, Sweet Tea Dorminy
  Cc: miklos, linux-fsdevel, kernel-team, amir73il



On 3/7/24 17:09, Josef Bacik wrote:
> On Thu, Mar 07, 2024 at 10:08:13AM -0500, Sweet Tea Dorminy wrote:
>> All calls into generic vfs functions need to make sure that the inode
>> attributes used by those functions are up to date, by calling
>> fuse_update_attributes() as appropriate.
>>
>> generic_write_checks() accesses inode size in order to get the
>> appropriate file offset for files opened with O_APPEND. Currently, in
>> some cases, fuse_update_attributes() is not called before
>> generic_write_checks(), potentially resulting in corruption/overwrite of
>> previously appended data if i_size is out of date in the cached inode.
>>
>> Therefore,  make sure fuse_update_attributes() is always
>> called before generic_write_checks(), and add a note about why it's not
>> necessary for some llseek calls.
>>
>> Signed-off-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
> 
> I had to ask questions and go look at the code, mostly because I'm not a FUSE
> developer.  fuse_update_attributes() doesn't actually do anything if the stats
> aren't invalidated, I was concerned we were suddenly adding a lot of overhead
> for every write call.

Unless the timeout is set to 0?

> 
> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
> 
> I have a question for the normal FUSE developers, how would one test this?
> There doesn't appear to be a mechanism for writing stub FUSE fs's to exercise a
> case like this in fstests.  Is there some other way you guys would test this or
> is this something we need to build out ourselves?  Thanks,

You mean for xfstests? I'm testing fuse all the time with it.



Thanks,
Bernd

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fuse: update size attr before doing IO
  2024-03-07 15:08 [PATCH] fuse: update size attr before doing IO Sweet Tea Dorminy
  2024-03-07 16:09 ` Josef Bacik
@ 2024-03-11 10:01 ` Miklos Szeredi
  2024-03-12 18:18   ` Sweet Tea Dorminy
  1 sibling, 1 reply; 8+ messages in thread
From: Miklos Szeredi @ 2024-03-11 10:01 UTC (permalink / raw)
  To: Sweet Tea Dorminy; +Cc: linux-fsdevel, kernel-team, josef, amir73il

On Thu, 7 Mar 2024 at 16:10, Sweet Tea Dorminy
<sweettea-kernel@dorminy.me> wrote:
>
> All calls into generic vfs functions need to make sure that the inode
> attributes used by those functions are up to date, by calling
> fuse_update_attributes() as appropriate.
>
> generic_write_checks() accesses inode size in order to get the
> appropriate file offset for files opened with O_APPEND. Currently, in
> some cases, fuse_update_attributes() is not called before
> generic_write_checks(), potentially resulting in corruption/overwrite of
> previously appended data if i_size is out of date in the cached inode.

While this all sounds good, I don't think it makes sense.

Why?  Because doing cached O_APPEND writes without any sort of
exclusion with remote writes is just not going to work.

Either the server ignores the current size and writes at the offset
that the kernel supplied (which will be the cached size of the file)
and executes the write at that position, or it appends the write to
the current EOF.  In the former case the cache will be consistent, but
append semantics are not observed, while in the latter case the append
semantics are observed, but the cache will be inconsistent.

Solution: either exclude remote writes or don't use the cache.

Updating the file size before the write does not prevent the race,
only makes the window smaller.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fuse: update size attr before doing IO
  2024-03-11 10:01 ` Miklos Szeredi
@ 2024-03-12 18:18   ` Sweet Tea Dorminy
  2024-03-12 21:55     ` Bernd Schubert
  2024-03-14 10:26     ` Miklos Szeredi
  0 siblings, 2 replies; 8+ messages in thread
From: Sweet Tea Dorminy @ 2024-03-12 18:18 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel, kernel-team, josef, amir73il



On 3/11/24 06:01, Miklos Szeredi wrote:
> On Thu, 7 Mar 2024 at 16:10, Sweet Tea Dorminy
> <sweettea-kernel@dorminy.me> wrote:
>>
>> All calls into generic vfs functions need to make sure that the inode
>> attributes used by those functions are up to date, by calling
>> fuse_update_attributes() as appropriate.
>>
>> generic_write_checks() accesses inode size in order to get the
>> appropriate file offset for files opened with O_APPEND. Currently, in
>> some cases, fuse_update_attributes() is not called before
>> generic_write_checks(), potentially resulting in corruption/overwrite of
>> previously appended data if i_size is out of date in the cached inode.
> 
> While this all sounds good, I don't think it makes sense.
> 
> Why?  Because doing cached O_APPEND writes without any sort of
> exclusion with remote writes is just not going to work.
> 
> Either the server ignores the current size and writes at the offset
> that the kernel supplied (which will be the cached size of the file)
> and executes the write at that position, or it appends the write to
> the current EOF.  In the former case the cache will be consistent, but
> append semantics are not observed, while in the latter case the append
> semantics are observed, but the cache will be inconsistent.
> 
> Solution: either exclude remote writes or don't use the cache.
> 
> Updating the file size before the write does not prevent the race,
> only makes the window smaller.

Definitely agree with you.

The usecase at hand is a sort of NFS-like network filesystem, where 
there's exclusion of remote writes while the file is open, but no 
problem with remote writes while the file is closed.

The alternative we considered was to add a fuse_update_attributes() call 
to open.

We thought about doing so during d_revalidate/lookup_fast(). But as far 
as I understand, lookup_fast() is not just called during open, and will 
use the cached inode if the dentry timeout hasn't expired. We tried 
setting dentry timeout to 0, but that lost too much performance. So that 
didn't seem to work.

But updating attributes after giving the filesystem a chance to 
invalidate them during open() would work, I think?

That would also conveniently fix the issue where copy_file_range() 
currently checks the size before calling into fuse at all, which I'd 
been building a more elaborate changeset for.

How does that sound?

Thanks!

Sweet Tea

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fuse: update size attr before doing IO
  2024-03-12 18:18   ` Sweet Tea Dorminy
@ 2024-03-12 21:55     ` Bernd Schubert
  2024-03-12 23:08       ` Sweet Tea Dorminy
  2024-03-14 10:26     ` Miklos Szeredi
  1 sibling, 1 reply; 8+ messages in thread
From: Bernd Schubert @ 2024-03-12 21:55 UTC (permalink / raw)
  To: Sweet Tea Dorminy, Miklos Szeredi
  Cc: linux-fsdevel, kernel-team, josef, amir73il



On 3/12/24 19:18, Sweet Tea Dorminy wrote:
> 
> 
> On 3/11/24 06:01, Miklos Szeredi wrote:
>> On Thu, 7 Mar 2024 at 16:10, Sweet Tea Dorminy
>> <sweettea-kernel@dorminy.me> wrote:
>>>
>>> All calls into generic vfs functions need to make sure that the inode
>>> attributes used by those functions are up to date, by calling
>>> fuse_update_attributes() as appropriate.
>>>
>>> generic_write_checks() accesses inode size in order to get the
>>> appropriate file offset for files opened with O_APPEND. Currently, in
>>> some cases, fuse_update_attributes() is not called before
>>> generic_write_checks(), potentially resulting in corruption/overwrite of
>>> previously appended data if i_size is out of date in the cached inode.
>>
>> While this all sounds good, I don't think it makes sense.
>>
>> Why?  Because doing cached O_APPEND writes without any sort of
>> exclusion with remote writes is just not going to work.
>>
>> Either the server ignores the current size and writes at the offset
>> that the kernel supplied (which will be the cached size of the file)
>> and executes the write at that position, or it appends the write to
>> the current EOF.  In the former case the cache will be consistent, but
>> append semantics are not observed, while in the latter case the append
>> semantics are observed, but the cache will be inconsistent.
>>
>> Solution: either exclude remote writes or don't use the cache.
>>
>> Updating the file size before the write does not prevent the race,
>> only makes the window smaller.
> 
> Definitely agree with you.
> 
> The usecase at hand is a sort of NFS-like network filesystem, where
> there's exclusion of remote writes while the file is open, but no
> problem with remote writes while the file is closed.
> 
> The alternative we considered was to add a fuse_update_attributes() call
> to open.
> 
> We thought about doing so during d_revalidate/lookup_fast(). But as far
> as I understand, lookup_fast() is not just called during open, and will
> use the cached inode if the dentry timeout hasn't expired. We tried
> setting dentry timeout to 0, but that lost too much performance. So that
> didn't seem to work.
> 
> But updating attributes after giving the filesystem a chance to
> invalidate them during open() would work, I think?

You mean something like this?

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index d19cbf34c634..2723270323d9 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -204,7 +204,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
        if (inode && fuse_is_bad(inode))
                goto invalid;
        else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
-                (flags & (LOOKUP_EXCL | LOOKUP_REVAL | LOOKUP_RENAME_TARGET))) {
+                (flags & (LOOKUP_EXCL | LOOKUP_REVAL | LOOKUP_RENAME_TARGET | LOOKUP_OPEN))) {
                struct fuse_entry_out outarg;
                FUSE_ARGS(args);
                struct fuse_forget_link *forget;


I think this would make sense and could be caught by the atomic-open/revalidate
(once I get back to it).



> 
> That would also conveniently fix the issue where copy_file_range()
> currently checks the size before calling into fuse at all, which I'd
> been building a more elaborate changeset for.
> 
> How does that sound?
> 
> Thanks!
> 
> Sweet Tea
> 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] fuse: update size attr before doing IO
  2024-03-12 21:55     ` Bernd Schubert
@ 2024-03-12 23:08       ` Sweet Tea Dorminy
  0 siblings, 0 replies; 8+ messages in thread
From: Sweet Tea Dorminy @ 2024-03-12 23:08 UTC (permalink / raw)
  To: Bernd Schubert, Miklos Szeredi
  Cc: linux-fsdevel, kernel-team, josef, amir73il



On 3/12/24 17:55, Bernd Schubert wrote:
> 
> 
> On 3/12/24 19:18, Sweet Tea Dorminy wrote:
>>
>>
>> On 3/11/24 06:01, Miklos Szeredi wrote:
>>> On Thu, 7 Mar 2024 at 16:10, Sweet Tea Dorminy
>>> <sweettea-kernel@dorminy.me> wrote:
>>>>
>>>> All calls into generic vfs functions need to make sure that the inode
>>>> attributes used by those functions are up to date, by calling
>>>> fuse_update_attributes() as appropriate.
>>>>
>>>> generic_write_checks() accesses inode size in order to get the
>>>> appropriate file offset for files opened with O_APPEND. Currently, in
>>>> some cases, fuse_update_attributes() is not called before
>>>> generic_write_checks(), potentially resulting in corruption/overwrite of
>>>> previously appended data if i_size is out of date in the cached inode.
>>>
>>> While this all sounds good, I don't think it makes sense.
>>>
>>> Why?  Because doing cached O_APPEND writes without any sort of
>>> exclusion with remote writes is just not going to work.
>>>
>>> Either the server ignores the current size and writes at the offset
>>> that the kernel supplied (which will be the cached size of the file)
>>> and executes the write at that position, or it appends the write to
>>> the current EOF.  In the former case the cache will be consistent, but
>>> append semantics are not observed, while in the latter case the append
>>> semantics are observed, but the cache will be inconsistent.
>>>
>>> Solution: either exclude remote writes or don't use the cache.
>>>
>>> Updating the file size before the write does not prevent the race,
>>> only makes the window smaller.
>>
>> Definitely agree with you.
>>
>> The usecase at hand is a sort of NFS-like network filesystem, where
>> there's exclusion of remote writes while the file is open, but no
>> problem with remote writes while the file is closed.
>>
>> The alternative we considered was to add a fuse_update_attributes() call
>> to open.
>>
>> We thought about doing so during d_revalidate/lookup_fast(). But as far
>> as I understand, lookup_fast() is not just called during open, and will
>> use the cached inode if the dentry timeout hasn't expired. We tried
>> setting dentry timeout to 0, but that lost too much performance. So that
>> didn't seem to work.
>>
>> But updating attributes after giving the filesystem a chance to
>> invalidate them during open() would work, I think?
> 
> You mean something like this?
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index d19cbf34c634..2723270323d9 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -204,7 +204,7 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
>          if (inode && fuse_is_bad(inode))
>                  goto invalid;
>          else if (time_before64(fuse_dentry_time(entry), get_jiffies_64()) ||
> -                (flags & (LOOKUP_EXCL | LOOKUP_REVAL | LOOKUP_RENAME_TARGET))) {
> +                (flags & (LOOKUP_EXCL | LOOKUP_REVAL | LOOKUP_RENAME_TARGET | LOOKUP_OPEN))) {
>                  struct fuse_entry_out outarg;
>                  FUSE_ARGS(args);
>                  struct fuse_forget_link *forget;
> 
> 
> I think this would make sense and could be caught by the atomic-open/revalidate
> (once I get back to it).
> 
> 

That's an idea I like a lot, although I think that leaves open a window 
for a race and I can't see how to fix it easily.

If we passed the lookup_open flag down to the filesystem daemon, for it 
to exclusively lock the file, then we'd need a call to 
un-exclusively-lock it on an error later in the open process.

But if we don't do that and exclusively locked during fuse_send_open(), 
the window between lookup and fuse_send_open() would allow stat updates 
still before the file was exclusively locked.

Thanks,

Sweet Tea

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] fuse: update size attr before doing IO
  2024-03-12 18:18   ` Sweet Tea Dorminy
  2024-03-12 21:55     ` Bernd Schubert
@ 2024-03-14 10:26     ` Miklos Szeredi
  1 sibling, 0 replies; 8+ messages in thread
From: Miklos Szeredi @ 2024-03-14 10:26 UTC (permalink / raw)
  To: Sweet Tea Dorminy
  Cc: linux-fsdevel, kernel-team, josef, amir73il, Bernd Schubert

On Tue, 12 Mar 2024 at 19:18, Sweet Tea Dorminy
<sweettea-kernel@dorminy.me> wrote:

> The alternative we considered was to add a fuse_update_attributes() call
> to open.

I think the atomic open patchset from Bernd should allow updating
attributes together with open.  Which should solve this case.

I also have a patch introducing the concept of leases to fuse, which
would make the concept of who "owns" the data more formal.  I'll work
on refreshing it.  The interesting part will be interaction with
passthrough.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-03-14 10:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-07 15:08 [PATCH] fuse: update size attr before doing IO Sweet Tea Dorminy
2024-03-07 16:09 ` Josef Bacik
2024-03-07 19:39   ` Bernd Schubert
2024-03-11 10:01 ` Miklos Szeredi
2024-03-12 18:18   ` Sweet Tea Dorminy
2024-03-12 21:55     ` Bernd Schubert
2024-03-12 23:08       ` Sweet Tea Dorminy
2024-03-14 10:26     ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).