All of lore.kernel.org
 help / color / mirror / Atom feed
From: Amir Goldstein <amir73il@gmail.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Greg Kurz <groug@kaod.org>, Miklos Szeredi <miklos@szeredi.hu>,
	virtualization@lists.linux-foundation.org,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	virtio-fs-list <virtio-fs@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Max Reitz <mreitz@redhat.com>, Robert Krawitz <rlk@redhat.com>
Subject: Re: [PATCH v4 5/5] virtiofs: propagate sync() to file server
Date: Mon, 16 Aug 2021 22:46:10 +0300	[thread overview]
Message-ID: <CAOQ4uxiezabg0U3tGmpH8J4u7OUbB2KMXexZ7yQzujj0vAWirw@mail.gmail.com> (raw)
In-Reply-To: <YRq4W1zcsAZh3FLX@redhat.com>

On Mon, Aug 16, 2021 at 10:11 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Mon, Aug 16, 2021 at 09:57:08PM +0300, Amir Goldstein wrote:
> > On Mon, Aug 16, 2021 at 6:29 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote:
> > > > Hi Greg,
> > > >
> > > > Sorry for the late reply, I have some questions about this change...
> > > >
> > > > On Fri, May 21, 2021 at 9:12 AM Greg Kurz <groug@kaod.org> wrote:
> > > > >
> > > > > Even if POSIX doesn't mandate it, linux users legitimately expect
> > > > > sync() to flush all data and metadata to physical storage when it
> > > > > is located on the same system. This isn't happening with virtiofs
> > > > > though : sync() inside the guest returns right away even though
> > > > > data still needs to be flushed from the host page cache.
> > > > >
> > > > > This is easily demonstrated by doing the following in the guest:
> > > > >
> > > > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync
> > > > > 5120+0 records in
> > > > > 5120+0 records out
> > > > > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s
> > > > > sync()                                  = 0 <0.024068>
> > > > > +++ exited with 0 +++
> > > > >
> > > > > and start the following in the host when the 'dd' command completes
> > > > > in the guest:
> > > > >
> > > > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo
> > > > > fsync(3)                                = 0 <10.371640>
> > > > > +++ exited with 0 +++
> > > > >
> > > > > There are no good reasons not to honor the expected behavior of
> > > > > sync() actually : it gives an unrealistic impression that virtiofs
> > > > > is super fast and that data has safely landed on HW, which isn't
> > > > > the case obviously.
> > > > >
> > > > > Implement a ->sync_fs() superblock operation that sends a new
> > > > > FUSE_SYNCFS request type for this purpose. Provision a 64-bit
> > > > > placeholder for possible future extensions. Since the file
> > > > > server cannot handle the wait == 0 case, we skip it to avoid a
> > > > > gratuitous roundtrip. Note that this is per-superblock : a
> > > > > FUSE_SYNCFS is send for the root mount and for each submount.
> > > > >
> > > > > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for
> > > > > FUSE_SYNCFS in the file server is treated as permanent success.
> > > > > This ensures compatibility with older file servers : the client
> > > > > will get the current behavior of sync() not being propagated to
> > > > > the file server.
> > > >
> > > > I wonder - even if the server does not support SYNCFS or if the kernel
> > > > does not trust the server with SYNCFS, fuse_sync_fs() can wait
> > > > until all pending requests up to this call have been completed, either
> > > > before or after submitting the SYNCFS request. No?
> > >
> > > >
> > > > Does virtiofsd track all requests prior to SYNCFS request to make
> > > > sure that they were executed on the host filesystem before calling
> > > > syncfs() on the host filesystem?
> > >
> > > Hi Amir,
> > >
> > > I don't think virtiofsd has any such notion. I would think, that
> > > client should make sure all pending writes have completed and
> > > then send SYNCFS request.
> > >
> > > Looking at the sync_filesystem(), I am assuming vfs will take care
> > > of flushing out all dirty pages and then call ->sync_fs.
> > >
> > > Having said that, I think fuse queues the writeback request internally
> > > and signals completion of writeback to mm(end_page_writeback()). And
> > > that's why fuse_fsync() has notion of waiting for all pending
> > > writes to finish on an inode (fuse_sync_writes()).
> > >
> > > So I think you have raised a good point. That is if there are pending
> > > writes at the time of syncfs(), we don't seem to have a notion of
> > > first waiting for all these writes to finish before we send
> > > FUSE_SYNCFS request to server.
> >
> > Maybe, but I was not referring to inode writeback requests.
> > I had assumed that those were handled correctly.
> > I was referring to pending metadata requests.
> >
> > ->sync_fs() in local fs also takes care of flushing metadata
> > (e.g. journal). I assumed that virtiofsd implements FUSE_SYNCFS
> > request by calling syncfs() on host fs,
>
> Yes virtiofsd calls syncfs() on host fs.
>
> > but it is does that than
> > there is no guarantee that all metadata requests have reached the
> > host fs from virtiofs unless client or server take care of waiting
> > for all pending metadata requests before issuing FUSE_SYNCFS.
>
> We don't have any journal in virtiofs. In fact we don't seem to
> cache any metadta. Except probably the case when "-o writeback"
> where we can trust local time stamps.
>
> If "-o writeback" is not enabled, i am not sure what metadata
> we will be caching that we will need to worry about. Do you have
> something specific in mind. (Atleast from virtiofs point of view,
> I can't seem to think what metadata we are caching which we need
> to worry about).

No, I don't see a problem.
I guess I was confused by the semantics.
Thanks for clarifying.

Amir.

WARNING: multiple messages have this Message-ID (diff)
From: Amir Goldstein <amir73il@gmail.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	virtualization@lists.linux-foundation.org,
	virtio-fs-list <virtio-fs@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Max Reitz <mreitz@redhat.com>, Robert Krawitz <rlk@redhat.com>
Subject: Re: [Virtio-fs] [PATCH v4 5/5] virtiofs: propagate sync() to file server
Date: Mon, 16 Aug 2021 22:46:10 +0300	[thread overview]
Message-ID: <CAOQ4uxiezabg0U3tGmpH8J4u7OUbB2KMXexZ7yQzujj0vAWirw@mail.gmail.com> (raw)
In-Reply-To: <YRq4W1zcsAZh3FLX@redhat.com>

On Mon, Aug 16, 2021 at 10:11 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Mon, Aug 16, 2021 at 09:57:08PM +0300, Amir Goldstein wrote:
> > On Mon, Aug 16, 2021 at 6:29 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Sun, Aug 15, 2021 at 05:14:06PM +0300, Amir Goldstein wrote:
> > > > Hi Greg,
> > > >
> > > > Sorry for the late reply, I have some questions about this change...
> > > >
> > > > On Fri, May 21, 2021 at 9:12 AM Greg Kurz <groug@kaod.org> wrote:
> > > > >
> > > > > Even if POSIX doesn't mandate it, linux users legitimately expect
> > > > > sync() to flush all data and metadata to physical storage when it
> > > > > is located on the same system. This isn't happening with virtiofs
> > > > > though : sync() inside the guest returns right away even though
> > > > > data still needs to be flushed from the host page cache.
> > > > >
> > > > > This is easily demonstrated by doing the following in the guest:
> > > > >
> > > > > $ dd if=/dev/zero of=/mnt/foo bs=1M count=5K ; strace -T -e sync sync
> > > > > 5120+0 records in
> > > > > 5120+0 records out
> > > > > 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 5.22224 s, 1.0 GB/s
> > > > > sync()                                  = 0 <0.024068>
> > > > > +++ exited with 0 +++
> > > > >
> > > > > and start the following in the host when the 'dd' command completes
> > > > > in the guest:
> > > > >
> > > > > $ strace -T -e fsync /usr/bin/sync virtiofs/foo
> > > > > fsync(3)                                = 0 <10.371640>
> > > > > +++ exited with 0 +++
> > > > >
> > > > > There are no good reasons not to honor the expected behavior of
> > > > > sync() actually : it gives an unrealistic impression that virtiofs
> > > > > is super fast and that data has safely landed on HW, which isn't
> > > > > the case obviously.
> > > > >
> > > > > Implement a ->sync_fs() superblock operation that sends a new
> > > > > FUSE_SYNCFS request type for this purpose. Provision a 64-bit
> > > > > placeholder for possible future extensions. Since the file
> > > > > server cannot handle the wait == 0 case, we skip it to avoid a
> > > > > gratuitous roundtrip. Note that this is per-superblock : a
> > > > > FUSE_SYNCFS is send for the root mount and for each submount.
> > > > >
> > > > > Like with FUSE_FSYNC and FUSE_FSYNCDIR, lack of support for
> > > > > FUSE_SYNCFS in the file server is treated as permanent success.
> > > > > This ensures compatibility with older file servers : the client
> > > > > will get the current behavior of sync() not being propagated to
> > > > > the file server.
> > > >
> > > > I wonder - even if the server does not support SYNCFS or if the kernel
> > > > does not trust the server with SYNCFS, fuse_sync_fs() can wait
> > > > until all pending requests up to this call have been completed, either
> > > > before or after submitting the SYNCFS request. No?
> > >
> > > >
> > > > Does virtiofsd track all requests prior to SYNCFS request to make
> > > > sure that they were executed on the host filesystem before calling
> > > > syncfs() on the host filesystem?
> > >
> > > Hi Amir,
> > >
> > > I don't think virtiofsd has any such notion. I would think, that
> > > client should make sure all pending writes have completed and
> > > then send SYNCFS request.
> > >
> > > Looking at the sync_filesystem(), I am assuming vfs will take care
> > > of flushing out all dirty pages and then call ->sync_fs.
> > >
> > > Having said that, I think fuse queues the writeback request internally
> > > and signals completion of writeback to mm(end_page_writeback()). And
> > > that's why fuse_fsync() has notion of waiting for all pending
> > > writes to finish on an inode (fuse_sync_writes()).
> > >
> > > So I think you have raised a good point. That is if there are pending
> > > writes at the time of syncfs(), we don't seem to have a notion of
> > > first waiting for all these writes to finish before we send
> > > FUSE_SYNCFS request to server.
> >
> > Maybe, but I was not referring to inode writeback requests.
> > I had assumed that those were handled correctly.
> > I was referring to pending metadata requests.
> >
> > ->sync_fs() in local fs also takes care of flushing metadata
> > (e.g. journal). I assumed that virtiofsd implements FUSE_SYNCFS
> > request by calling syncfs() on host fs,
>
> Yes virtiofsd calls syncfs() on host fs.
>
> > but it is does that than
> > there is no guarantee that all metadata requests have reached the
> > host fs from virtiofs unless client or server take care of waiting
> > for all pending metadata requests before issuing FUSE_SYNCFS.
>
> We don't have any journal in virtiofs. In fact we don't seem to
> cache any metadta. Except probably the case when "-o writeback"
> where we can trust local time stamps.
>
> If "-o writeback" is not enabled, i am not sure what metadata
> we will be caching that we will need to worry about. Do you have
> something specific in mind. (Atleast from virtiofs point of view,
> I can't seem to think what metadata we are caching which we need
> to worry about).

No, I don't see a problem.
I guess I was confused by the semantics.
Thanks for clarifying.

Amir.


  reply	other threads:[~2021-08-16 19:46 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-20 15:46 [PATCH v4 0/5] virtiofs: propagate sync() to file server Greg Kurz
2021-05-20 15:46 ` [Virtio-fs] " Greg Kurz
2021-05-20 15:46 ` Greg Kurz
2021-05-20 15:46 ` [PATCH v4 1/5] fuse: Fix leak in fuse_dentry_automount() error path Greg Kurz
2021-05-20 15:46   ` [Virtio-fs] " Greg Kurz
2021-05-20 15:46   ` Greg Kurz
2021-05-20 19:45   ` Al Viro
2021-05-20 19:45     ` [Virtio-fs] " Al Viro
2021-05-20 19:45     ` Al Viro
2021-05-21  7:54     ` Miklos Szeredi
2021-05-21  7:54       ` [Virtio-fs] " Miklos Szeredi
2021-05-21  8:15       ` Greg Kurz
2021-05-21  8:15         ` [Virtio-fs] " Greg Kurz
2021-05-21  8:15         ` Greg Kurz
2021-05-21  8:23         ` Miklos Szeredi
2021-05-21  8:23           ` [Virtio-fs] " Miklos Szeredi
2021-05-21  8:08     ` Greg Kurz
2021-05-21  8:08       ` [Virtio-fs] " Greg Kurz
2021-05-21  8:08       ` Greg Kurz
2021-05-20 15:46 ` [PATCH v4 2/5] fuse: Call vfs_get_tree() for submounts Greg Kurz
2021-05-20 15:46   ` [Virtio-fs] " Greg Kurz
2021-05-20 15:46   ` Greg Kurz
2021-05-21  8:19   ` Miklos Szeredi
2021-05-21  8:19     ` [Virtio-fs] " Miklos Szeredi
2021-05-21  8:28     ` Greg Kurz
2021-05-21  8:28       ` [Virtio-fs] " Greg Kurz
2021-05-21  8:28       ` Greg Kurz
2021-05-22 17:50   ` kernel test robot
2021-05-22 17:50     ` [Virtio-fs] " kernel test robot
2021-05-22 17:50     ` kernel test robot
2021-05-22 17:50     ` kernel test robot
2021-05-22 20:12   ` kernel test robot
2021-05-22 20:12     ` [Virtio-fs] " kernel test robot
2021-05-22 20:12     ` kernel test robot
2021-05-22 20:12     ` kernel test robot
2021-05-20 15:46 ` [PATCH v4 3/5] fuse: Make fuse_fill_super_submount() static Greg Kurz
2021-05-20 15:46   ` [Virtio-fs] " Greg Kurz
2021-05-20 15:46   ` Greg Kurz
2021-05-20 15:46 ` [PATCH v4 4/5] virtiofs: Skip submounts in sget_fc() Greg Kurz
2021-05-20 15:46   ` [Virtio-fs] " Greg Kurz
2021-05-20 15:46   ` Greg Kurz
2021-05-21  8:26   ` Miklos Szeredi
2021-05-21  8:26     ` [Virtio-fs] " Miklos Szeredi
2021-05-21  8:39     ` Greg Kurz
2021-05-21  8:39       ` [Virtio-fs] " Greg Kurz
2021-05-21  8:39       ` Greg Kurz
2021-05-21  8:50       ` Miklos Szeredi
2021-05-21  8:50         ` [Virtio-fs] " Miklos Szeredi
2021-05-21 10:06         ` Greg Kurz
2021-05-21 10:06           ` [Virtio-fs] " Greg Kurz
2021-05-21 10:06           ` Greg Kurz
2021-05-21 12:37           ` Miklos Szeredi
2021-05-21 12:37             ` [Virtio-fs] " Miklos Szeredi
2021-05-21 13:36             ` Greg Kurz
2021-05-21 13:36               ` [Virtio-fs] " Greg Kurz
2021-05-21 13:36               ` Greg Kurz
2021-05-20 15:46 ` [PATCH v4 5/5] virtiofs: propagate sync() to file server Greg Kurz
2021-05-20 15:46   ` [Virtio-fs] " Greg Kurz
2021-05-20 15:46   ` Greg Kurz
2021-05-21 10:08   ` Greg Kurz
2021-05-21 10:08     ` [Virtio-fs] " Greg Kurz
2021-05-21 10:08     ` Greg Kurz
2021-05-21 12:51     ` Miklos Szeredi
2021-05-21 12:51       ` [Virtio-fs] " Miklos Szeredi
2021-08-15 14:14   ` Amir Goldstein
2021-08-15 14:14     ` [Virtio-fs] " Amir Goldstein
2021-08-16 15:29     ` Vivek Goyal
2021-08-16 15:29       ` [Virtio-fs] " Vivek Goyal
2021-08-16 15:29       ` Vivek Goyal
2021-08-16 18:57       ` Amir Goldstein
2021-08-16 18:57         ` [Virtio-fs] " Amir Goldstein
2021-08-16 19:11         ` Vivek Goyal
2021-08-16 19:11           ` [Virtio-fs] " Vivek Goyal
2021-08-16 19:11           ` Vivek Goyal
2021-08-16 19:46           ` Amir Goldstein [this message]
2021-08-16 19:46             ` [Virtio-fs] " Amir Goldstein
2021-08-28 15:21       ` Miklos Szeredi
2021-08-28 15:21         ` [Virtio-fs] " Miklos Szeredi
2021-08-30 17:01         ` Vivek Goyal
2021-08-30 17:01           ` [Virtio-fs] " Vivek Goyal
2021-08-30 17:01           ` Vivek Goyal
2021-08-30 17:36           ` Miklos Szeredi
2021-08-30 17:36             ` [Virtio-fs] " Miklos Szeredi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOQ4uxiezabg0U3tGmpH8J4u7OUbB2KMXexZ7yQzujj0vAWirw@mail.gmail.com \
    --to=amir73il@gmail.com \
    --cc=groug@kaod.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=mreitz@redhat.com \
    --cc=rlk@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=virtio-fs@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.