linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [fuse] interaction between O_APPEND and writeback cache
@ 2017-08-04 19:10 Nikolaus Rath
  2017-08-04 19:59 ` [fuse-devel] " Miklos Szeredi
  0 siblings, 1 reply; 5+ messages in thread
From: Nikolaus Rath @ 2017-08-04 19:10 UTC (permalink / raw)
  To: fuse-devel, Miklos Szeredi, Maxim Patlasov, linux-fsdevel

Hello,

I am confused about how O_APPEND is supposed to interact with the
writeback cache.

As far as I can tell, the O_APPEND flag is currently passed to the
filesystem process, so my expectation is that the filesystem process is
responsible for ignoring any offset in write requests and instead write
at the current end of the file[1].

However, with writeback cache enabled the filesystem process cannot tell
which data is "new" and came from userspace, should be appended, and
which data is old and just made a round-trip to the kernel. So it seems
to me that the filesystem process should probably leave the handling of
O_APPEND to the kernel. But then, shouldn't the kernel filter out this
flag when sending the open request?

On the other hand, when the kernel handles O_APPEND, then it is no
longer atomic (think of a network fuse filesystem).


It seems to me something is not right here. Either the kernel should
enforce O_APPEND when writeback caching is enabled and filter out the
flag (sacrificing atomicity), or the kernel should pass O_APPEND to the
filesystem but either disable writeback caching or somehow keep track of
what data is actually new.

Am I missing something?


(At first I thought that maybe the filesystem's write is supposed to
ignore data that comes before the current end of the file if writeback
caching is active. But that wouldn't work for a network filesystem
either).


Best,
-Nikolaus

[1]: Actually, if the offset is bigger than the size of the file, should
it be ignored or should the file be extended?

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [fuse-devel] [fuse] interaction between O_APPEND and writeback cache
  2017-08-04 19:10 [fuse] interaction between O_APPEND and writeback cache Nikolaus Rath
@ 2017-08-04 19:59 ` Miklos Szeredi
  2017-08-05 20:36   ` Nikolaus Rath
  2017-08-05 20:45   ` Nikolaus Rath
  0 siblings, 2 replies; 5+ messages in thread
From: Miklos Szeredi @ 2017-08-04 19:59 UTC (permalink / raw)
  To: fuse-devel, Miklos Szeredi, Maxim Patlasov, linux-fsdevel

On Fri, Aug 4, 2017 at 9:10 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
> Hello,
>
> I am confused about how O_APPEND is supposed to interact with the
> writeback cache.
>
> As far as I can tell, the O_APPEND flag is currently passed to the
> filesystem process, so my expectation is that the filesystem process is
> responsible for ignoring any offset in write requests and instead write
> at the current end of the file[1].
>
> However, with writeback cache enabled the filesystem process cannot tell
> which data is "new" and came from userspace, should be appended, and
> which data is old and just made a round-trip to the kernel. So it seems
> to me that the filesystem process should probably leave the handling of
> O_APPEND to the kernel. But then, shouldn't the kernel filter out this
> flag when sending the open request?

Indeed, when writing back the cache the kernel should definitely not
set O_APPEND.

>
> On the other hand, when the kernel handles O_APPEND, then it is no
> longer atomic (think of a network fuse filesystem).

Yes, network filesystem generally needs to handle consistency of
caches across nodes and O_APPEND in no exception (i.e. you cannot have
two nodes writing O_APPEND to cache at the same time, because that
will not work).

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [fuse-devel] [fuse] interaction between O_APPEND and writeback cache
  2017-08-04 19:59 ` [fuse-devel] " Miklos Szeredi
@ 2017-08-05 20:36   ` Nikolaus Rath
  2017-08-05 20:45   ` Nikolaus Rath
  1 sibling, 0 replies; 5+ messages in thread
From: Nikolaus Rath @ 2017-08-05 20:36 UTC (permalink / raw)
  To: fuse-devel, linux-fsdevel

On Aug 04 2017, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Aug 4, 2017 at 9:10 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
>> Hello,
>>
>> I am confused about how O_APPEND is supposed to interact with the
>> writeback cache.
>>
>> As far as I can tell, the O_APPEND flag is currently passed to the
>> filesystem process, so my expectation is that the filesystem process is
>> responsible for ignoring any offset in write requests and instead write
>> at the current end of the file[1].
>>
>> However, with writeback cache enabled the filesystem process cannot tell
>> which data is "new" and came from userspace, should be appended, and
>> which data is old and just made a round-trip to the kernel. So it seems
>> to me that the filesystem process should probably leave the handling of
>> O_APPEND to the kernel. But then, shouldn't the kernel filter out this
>> flag when sending the open request?
>
> Indeed, when writing back the cache the kernel should definitely not
> set O_APPEND.

Well, 4.9 certainly does it though. Should I try to make a patch, or are
you or Maxim going to do that shortly anyway?

Do you think it makes sense to filter out O_APPEND in libfuse as well
(to work around the issue for present day kernels)?

>> On the other hand, when the kernel handles O_APPEND, then it is no
>> longer atomic (think of a network fuse filesystem).
>
> Yes, network filesystem generally needs to handle consistency of
> caches across nodes and O_APPEND in no exception (i.e. you cannot have
> two nodes writing O_APPEND to cache at the same time, because that
> will not work).

This poses a bit of a problem though. So a network filesystem either
cannot use writeback caching or O_APPEND will (silently) not work.

With the current behavior (O_APPEND being passed to open() when
writeback is enabled) the filesystem would at least have a chance to
return an error, i.e. instead of a silent failure there would be a noisy
error. With that in mind, maybe the current behavior isn't so bad? We'd
just have to document that if writeback cache is enabled and O_APPEND
is received, the filesystem has to decide if it is fine with the kernel
handling O_APPEND (and in that case ignore the flag for subsequent
writes) or return an error.


Best,
-Nikolaus


-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [fuse-devel] [fuse] interaction between O_APPEND and writeback cache
  2017-08-04 19:59 ` [fuse-devel] " Miklos Szeredi
  2017-08-05 20:36   ` Nikolaus Rath
@ 2017-08-05 20:45   ` Nikolaus Rath
  2017-08-07 23:59     ` Maxim Patlasov
  1 sibling, 1 reply; 5+ messages in thread
From: Nikolaus Rath @ 2017-08-05 20:45 UTC (permalink / raw)
  To: fuse-devel, linux-fsdevel

On Aug 04 2017, Miklos Szeredi <miklos@szeredi.hu> wrote:
> On Fri, Aug 4, 2017 at 9:10 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
>> Hello,
>>
>> I am confused about how O_APPEND is supposed to interact with the
>> writeback cache.
>>
>> As far as I can tell, the O_APPEND flag is currently passed to the
>> filesystem process, so my expectation is that the filesystem process is
>> responsible for ignoring any offset in write requests and instead write
>> at the current end of the file[1].
>>
>> However, with writeback cache enabled the filesystem process cannot tell
>> which data is "new" and came from userspace, should be appended, and
>> which data is old and just made a round-trip to the kernel. So it seems
>> to me that the filesystem process should probably leave the handling of
>> O_APPEND to the kernel. But then, shouldn't the kernel filter out this
>> flag when sending the open request?
>
> Indeed, when writing back the cache the kernel should definitely not
> set O_APPEND.

Well, 4.9 certainly does it though. Should I try to make a patch, or are
you or Maxim going to do that shortly anyway?

Do you think it makes sense to filter out O_APPEND in libfuse as well
(to work around the issue for present day kernels)?

>> On the other hand, when the kernel handles O_APPEND, then it is no
>> longer atomic (think of a network fuse filesystem).
>
> Yes, network filesystem generally needs to handle consistency of
> caches across nodes and O_APPEND in no exception (i.e. you cannot have
> two nodes writing O_APPEND to cache at the same time, because that
> will not work).

This poses a bit of a problem though. So a network filesystem either
cannot use writeback caching or O_APPEND will (silently) not work.

With the current behavior (O_APPEND being passed to open() when
writeback is enabled) the filesystem would at least have a chance to
return an error, i.e. instead of a silent failure there would be a noisy
error. With that in mind, maybe the current behavior isn't so bad? We'd
just have to document that if writeback cache is enabled and O_APPEND
is received, the filesystem has to decide if it is fine with the kernel
handling O_APPEND (and in that case ignore the flag for subsequent
writes) or return an error.


Best,
-Nikolaus


-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [fuse-devel] [fuse] interaction between O_APPEND and writeback cache
  2017-08-05 20:45   ` Nikolaus Rath
@ 2017-08-07 23:59     ` Maxim Patlasov
  0 siblings, 0 replies; 5+ messages in thread
From: Maxim Patlasov @ 2017-08-07 23:59 UTC (permalink / raw)
  To: Nikolaus Rath; +Cc: linux-fsdevel, fuse-devel

Hi Nikolaus,


On 08/05/2017 01:45 PM, Nikolaus Rath wrote:

> On Aug 04 2017, Miklos Szeredi <miklos@szeredi.hu> wrote:
>> On Fri, Aug 4, 2017 at 9:10 PM, Nikolaus Rath <Nikolaus@rath.org> wrote:
>>> Hello,
>>>
>>> I am confused about how O_APPEND is supposed to interact with the
>>> writeback cache.
>>>
>>> As far as I can tell, the O_APPEND flag is currently passed to the
>>> filesystem process, so my expectation is that the filesystem process is
>>> responsible for ignoring any offset in write requests and instead write
>>> at the current end of the file[1].
>>>
>>> However, with writeback cache enabled the filesystem process cannot tell
>>> which data is "new" and came from userspace, should be appended, and
>>> which data is old and just made a round-trip to the kernel. So it seems
>>> to me that the filesystem process should probably leave the handling of
>>> O_APPEND to the kernel. But then, shouldn't the kernel filter out this
>>> flag when sending the open request?
>> Indeed, when writing back the cache the kernel should definitely not
>> set O_APPEND.
> Well, 4.9 certainly does it though. Should I try to make a patch, or are
> you or Maxim going to do that shortly anyway?
>
> Do you think it makes sense to filter out O_APPEND in libfuse as well
> (to work around the issue for present day kernels)?

I think it's up to filesystem how to handle O_APPEND. The kernel 
shouldn't filter it out.

>
>>> On the other hand, when the kernel handles O_APPEND, then it is no
>>> longer atomic (think of a network fuse filesystem).
>> Yes, network filesystem generally needs to handle consistency of
>> caches across nodes and O_APPEND in no exception (i.e. you cannot have
>> two nodes writing O_APPEND to cache at the same time, because that
>> will not work).
> This poses a bit of a problem though. So a network filesystem either
> cannot use writeback caching or O_APPEND will (silently) not work.
>
> With the current behavior (O_APPEND being passed to open() when
> writeback is enabled) the filesystem would at least have a chance to
> return an error, i.e. instead of a silent failure there would be a noisy
> error. With that in mind, maybe the current behavior isn't so bad? We'd
> just have to document that if writeback cache is enabled and O_APPEND
> is received, the filesystem has to decide if it is fine with the kernel
> handling O_APPEND (and in that case ignore the flag for subsequent
> writes) or return an error.

Yes, I agree. For some filesystems O_APPEND is problematic, for others 
not. Let them decide.

Thanks,
Maxim

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-08-07 23:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-04 19:10 [fuse] interaction between O_APPEND and writeback cache Nikolaus Rath
2017-08-04 19:59 ` [fuse-devel] " Miklos Szeredi
2017-08-05 20:36   ` Nikolaus Rath
2017-08-05 20:45   ` Nikolaus Rath
2017-08-07 23:59     ` Maxim Patlasov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).