Re: [fuse-devel] 512 byte aligned write + O_DIRECT for xfstests

* Re: [fuse-devel] 512 byte aligned write + O_DIRECT for xfstests
       [not found]         ` <87sgensmsk.fsf@vostro.rath.org>
@ 2020-06-22  6:37           ` Amir Goldstein
  2020-06-22  7:35             ` Nikolaus Rath
  0 siblings, 1 reply; 5+ messages in thread
From: Amir Goldstein @ 2020-06-22  6:37 UTC (permalink / raw)
  To: fuse-devel
  Cc: linux-fsdevel, Miklos Szeredi, Nikolaus Rath, Matthew Wilcox,
	Dave Chinner

[+CC fsdevel folks]

On Mon, Jun 22, 2020 at 8:33 AM Nikolaus Rath <Nikolaus@rath.org> wrote:
>
> On Jun 21 2020, Miklos Szeredi <miklos@szeredi.hu> wrote:
> >> I am not sure that is correct. At step 6, the write() request from
> >> userspace is still being processed. I don't think that it is reasonable
> >> to expect that the write() request is atomic, i.e. you can't expect to
> >> see none or all of the data that is *currently being written*.
> >
> > Apparently the standard is quite clear on this:
> >
> >   "All of the following functions shall be atomic with respect to each
> > other in the effects specified in POSIX.1-2017 when they operate on
> > regular files or symbolic links:
> >
> > [...]
> > pread()
> > read()
> > readv()
> > pwrite()
> > write()
> > writev()
> > [...]
> >
> > If two threads each call one of these functions, each call shall
> > either see all of the specified effects of the other call, or none of
> > them."[1]
> >
> > Thanks,
> > Miklos
> >
> > [1]
> > https://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_09_07
>
> Thanks for digging this up, I did not know about this.
>
> That leaves FUSE in a rather uncomfortable place though, doesn't it?
> What does the kernel do when userspace issues a write request that's
> bigger than FUSE userspace pipe? It sounds like either the request must
> be splitted (so it becomes non-atomic), or you'd have to return a short
> write (which IIRC is not supposed to happen for local filesystems).
>

What makes you say that short writes are not supposed to happen?
and what is the definition of "local filesystem" in that claim?

FYI, a similar discussion is also happening about XFS "atomic rw" behavior [1].

Seems like the options for FUSE are:
- Take shared i_rwsem lock on read like XFS and regress performance of
  mixed rw workload
- Do the above only for non-direct and writeback_cache to minimize the
  damage potential
- Return short read/write for direct IO if request is bigger that FUSE
buffer size
- Add a FUSE mode that implements direct IO internally as something like
  RWF_UNCACHED [2] - this is a relaxed version of "no caching" in client or
  a stricter version of "cache write-through"  in the sense that
during an ongoing
  large write operation, read of those fresh written bytes only is served
  from the client cache copy and not from the server.

Thanks,
Amir.

[1] https://lore.kernel.org/linux-fsdevel/20200622010234.GD2040@dread.disaster.area/
[2] https://lore.kernel.org/linux-fsdevel/20191217143948.26380-1-axboe@kernel.dk/

^ permalink raw reply	[flat|nested] 5+ messages in thread