All of lore.kernel.org
 help / color / mirror / Atom feed
* [Virtio-fs] [PATCH v2 1/2] virtiofsd: print more verbose information when bailing out
@ 2020-03-20 18:34 Liu Bo
  2020-03-20 18:34 ` [Virtio-fs] [PATCH v2 2/2] virtiofsd: fix mmap write under nondax mode Liu Bo
  2020-03-20 18:59 ` [Virtio-fs] [PATCH v2 1/2] virtiofsd: print more verbose information when bailing out Dr. David Alan Gilbert
  0 siblings, 2 replies; 6+ messages in thread
From: Liu Bo @ 2020-03-20 18:34 UTC (permalink / raw)
  To: virtio-fs

It'd be helpful to know what is the exact value of arg's offset, size
and flags.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index a34a611..ca2056f 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1224,8 +1224,8 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
     }
 
     if (fuse_buf_size(pbufv) != arg->size) {
-        fuse_log(FUSE_LOG_ERR,
-                 "fuse: do_write_buf: buffer size doesn't match arg->size\n");
+        fuse_log(FUSE_LOG_ERR, "fuse: do_write_buf: buffer size %lu doesn't match arg->size %u offset %lu flags %u\n",
+                 fuse_buf_size(pbufv), arg->size, arg->offset, arg->write_flags);
         fuse_reply_err(req, EIO);
         return;
     }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Virtio-fs] [PATCH v2 2/2] virtiofsd: fix mmap write under nondax mode
  2020-03-20 18:34 [Virtio-fs] [PATCH v2 1/2] virtiofsd: print more verbose information when bailing out Liu Bo
@ 2020-03-20 18:34 ` Liu Bo
  2020-03-20 20:16   ` Dr. David Alan Gilbert
  2020-03-20 18:59 ` [Virtio-fs] [PATCH v2 1/2] virtiofsd: print more verbose information when bailing out Dr. David Alan Gilbert
  1 sibling, 1 reply; 6+ messages in thread
From: Liu Bo @ 2020-03-20 18:34 UTC (permalink / raw)
  To: virtio-fs

When a file size is not aligned to PAGE_SIZE, a mmap write on it may
encounter -EIO (can be observed from virtiofsd's log) due to the difference
between the buf size and the size recorded in struct fuse_write_in.  The
difference comes from the fact that for mmap, writeback IO is used and
guest kernel sets fuse_write_in's size to inode size if EOF, while the buf
len still remains PAGE_SIZE aligned.

This handles the above special mmap case by truncating the last buf'size.

Fixes: Commit 469f9d2f ("virtiofsd: Plumb fuse_bufvec through do_write_buf")
Reported-by: Yiqun Leng <yqleng@linux.alibaba.com>
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
---
 tools/virtiofsd/fuse_lowlevel.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
index ca2056f..4f8bfb6 100644
--- a/tools/virtiofsd/fuse_lowlevel.c
+++ b/tools/virtiofsd/fuse_lowlevel.c
@@ -1221,6 +1221,23 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
          * and the data in the rest, we need to skip that first element
          */
         ibufv->buf[0].size = 0;
+
+        /*
+         * In case of mmap, fuse_buf_size(pbufv) may need to truncate if
+         * arg->size has been cropped by inode size inside guest.  The
+         * diff can only be (0, PAGE_SIZE) because inode size must be
+         * overlapped with the last buf.
+         */
+        if (arg->write_flags & FUSE_WRITE_CACHE) {
+                size_t total = fuse_buf_size(pbufv);
+                int last = ibufv->count - 1;
+
+                if (total > arg->size) {
+                        size_t diff = total - arg->size;
+                        if (diff < ibufv->buf[last].size)
+                                ibufv->buf[last].size -= diff;
+                }
+        }
     }
 
     if (fuse_buf_size(pbufv) != arg->size) {
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Virtio-fs] [PATCH v2 1/2] virtiofsd: print more verbose information when bailing out
  2020-03-20 18:34 [Virtio-fs] [PATCH v2 1/2] virtiofsd: print more verbose information when bailing out Liu Bo
  2020-03-20 18:34 ` [Virtio-fs] [PATCH v2 2/2] virtiofsd: fix mmap write under nondax mode Liu Bo
@ 2020-03-20 18:59 ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 6+ messages in thread
From: Dr. David Alan Gilbert @ 2020-03-20 18:59 UTC (permalink / raw)
  To: Liu Bo; +Cc: virtio-fs

* Liu Bo (bo.liu@linux.alibaba.com) wrote:
> It'd be helpful to know what is the exact value of arg's offset, size
> and flags.
> 
> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>

Thanks,

> ---
>  tools/virtiofsd/fuse_lowlevel.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index a34a611..ca2056f 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -1224,8 +1224,8 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
>      }
>  
>      if (fuse_buf_size(pbufv) != arg->size) {
> -        fuse_log(FUSE_LOG_ERR,
> -                 "fuse: do_write_buf: buffer size doesn't match arg->size\n");
> +        fuse_log(FUSE_LOG_ERR, "fuse: do_write_buf: buffer size %lu doesn't match arg->size %u offset %lu flags %u\n",
> +                 fuse_buf_size(pbufv), arg->size, arg->offset, arg->write_flags);

Please use %zu for size_t's (i.e. fuse_buf_size) and %llu with a cast to
(unsigned long long) for uint64_t's; also it's gone over the maximum
line limit of 80 characters for qemu.

Dave

>          fuse_reply_err(req, EIO);
>          return;
>      }
> -- 
> 1.8.3.1
> 
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://www.redhat.com/mailman/listinfo/virtio-fs
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Virtio-fs] [PATCH v2 2/2] virtiofsd: fix mmap write under nondax mode
  2020-03-20 18:34 ` [Virtio-fs] [PATCH v2 2/2] virtiofsd: fix mmap write under nondax mode Liu Bo
@ 2020-03-20 20:16   ` Dr. David Alan Gilbert
  2020-03-20 22:33     ` Liu Bo
  0 siblings, 1 reply; 6+ messages in thread
From: Dr. David Alan Gilbert @ 2020-03-20 20:16 UTC (permalink / raw)
  To: Liu Bo; +Cc: virtio-fs

* Liu Bo (bo.liu@linux.alibaba.com) wrote:
> When a file size is not aligned to PAGE_SIZE, a mmap write on it may
> encounter -EIO (can be observed from virtiofsd's log) due to the difference
> between the buf size and the size recorded in struct fuse_write_in.  The
> difference comes from the fact that for mmap, writeback IO is used and
> guest kernel sets fuse_write_in's size to inode size if EOF, while the buf
> len still remains PAGE_SIZE aligned.
> 
> This handles the above special mmap case by truncating the last buf'size.

Thanks,

> Fixes: Commit 469f9d2f ("virtiofsd: Plumb fuse_bufvec through do_write_buf")
> Reported-by: Yiqun Leng <yqleng@linux.alibaba.com>
> Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> ---
>  tools/virtiofsd/fuse_lowlevel.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> index ca2056f..4f8bfb6 100644
> --- a/tools/virtiofsd/fuse_lowlevel.c
> +++ b/tools/virtiofsd/fuse_lowlevel.c
> @@ -1221,6 +1221,23 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
>           * and the data in the rest, we need to skip that first element
>           */
>          ibufv->buf[0].size = 0;
> +
> +        /*
> +         * In case of mmap, fuse_buf_size(pbufv) may need to truncate if
> +         * arg->size has been cropped by inode size inside guest.  The
> +         * diff can only be (0, PAGE_SIZE) because inode size must be
> +         * overlapped with the last buf.
> +         */
> +        if (arg->write_flags & FUSE_WRITE_CACHE) {

Does this need to only do it in the WRITE_CACHE case - or should we just
always truncate the write to arg->size?
Or is this just simpler?

> +                size_t total = fuse_buf_size(pbufv);
> +                int last = ibufv->count - 1;
> +
> +                if (total > arg->size) {
> +                        size_t diff = total - arg->size;
> +                        if (diff < ibufv->buf[last].size)
> +                                ibufv->buf[last].size -= diff;

I think that needs to modify pbufv->buf[last].size not ibufv
because the two are only the same in some cases (although it's possible
in this case the guest we try at the moment always falls in this side).

We should also do something in the else case - probably fail?

> +                }
> +        }
>      }
>  
>      if (fuse_buf_size(pbufv) != arg->size) {

If we now know that pbufv is now always shrung to size,
then we only now need to check for the case where pbufv is too small.

Dave

> -- 
> 1.8.3.1
> 
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://www.redhat.com/mailman/listinfo/virtio-fs
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Virtio-fs] [PATCH v2 2/2] virtiofsd: fix mmap write under nondax mode
  2020-03-20 20:16   ` Dr. David Alan Gilbert
@ 2020-03-20 22:33     ` Liu Bo
  2020-03-24 20:09       ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 6+ messages in thread
From: Liu Bo @ 2020-03-20 22:33 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: virtio-fs

On Fri, Mar 20, 2020 at 08:16:15PM +0000, Dr. David Alan Gilbert wrote:
> * Liu Bo (bo.liu@linux.alibaba.com) wrote:
> > When a file size is not aligned to PAGE_SIZE, a mmap write on it may
> > encounter -EIO (can be observed from virtiofsd's log) due to the difference
> > between the buf size and the size recorded in struct fuse_write_in.  The
> > difference comes from the fact that for mmap, writeback IO is used and
> > guest kernel sets fuse_write_in's size to inode size if EOF, while the buf
> > len still remains PAGE_SIZE aligned.
> > 
> > This handles the above special mmap case by truncating the last buf'size.
> 
> Thanks,
> 
> > Fixes: Commit 469f9d2f ("virtiofsd: Plumb fuse_bufvec through do_write_buf")
> > Reported-by: Yiqun Leng <yqleng@linux.alibaba.com>
> > Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> > ---
> >  tools/virtiofsd/fuse_lowlevel.c | 17 +++++++++++++++++
> >  1 file changed, 17 insertions(+)
> > 
> > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > index ca2056f..4f8bfb6 100644
> > --- a/tools/virtiofsd/fuse_lowlevel.c
> > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > @@ -1221,6 +1221,23 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
> >           * and the data in the rest, we need to skip that first element
> >           */
> >          ibufv->buf[0].size = 0;
> > +
> > +        /*
> > +         * In case of mmap, fuse_buf_size(pbufv) may need to truncate if
> > +         * arg->size has been cropped by inode size inside guest.  The
> > +         * diff can only be (0, PAGE_SIZE) because inode size must be
> > +         * overlapped with the last buf.
> > +         */
> > +        if (arg->write_flags & FUSE_WRITE_CACHE) {
> 
> Does this need to only do it in the WRITE_CACHE case - or should we just
> always truncate the write to arg->size?
> Or is this just simpler?

For non-mmap IO, AFAICS, it's all synchronous IO (not using
writepages) where the data part's length should be equal to arg->size
here.  So I think it's no harm to do it for both.

> 
> > +                size_t total = fuse_buf_size(pbufv);
> > +                int last = ibufv->count - 1;
> > +
> > +                if (total > arg->size) {
> > +                        size_t diff = total - arg->size;
> > +                        if (diff < ibufv->buf[last].size)
> > +                                ibufv->buf[last].size -= diff;
> 
> I think that needs to modify pbufv->buf[last].size not ibufv
> because the two are only the same in some cases (although it's possible
> in this case the guest we try at the moment always falls in this side).
>

OK.

> We should also do something in the else case - probably fail?
>

If it fails, it then gets to the following check and report -EIO.

> > +                }
> > +        }
> >      }
> >  
> >      if (fuse_buf_size(pbufv) != arg->size) {
> 
> If we now know that pbufv is now always shrung to size,
> then we only now need to check for the case where pbufv is too small.
>

>From my understanding about both mmap IO and nonmmap IO, I think it's
arg->size that is always <= pbufv size.

thanks,
-liubo



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Virtio-fs] [PATCH v2 2/2] virtiofsd: fix mmap write under nondax mode
  2020-03-20 22:33     ` Liu Bo
@ 2020-03-24 20:09       ` Dr. David Alan Gilbert
  0 siblings, 0 replies; 6+ messages in thread
From: Dr. David Alan Gilbert @ 2020-03-24 20:09 UTC (permalink / raw)
  To: Liu Bo; +Cc: virtio-fs

* Liu Bo (bo.liu@linux.alibaba.com) wrote:
> On Fri, Mar 20, 2020 at 08:16:15PM +0000, Dr. David Alan Gilbert wrote:
> > * Liu Bo (bo.liu@linux.alibaba.com) wrote:
> > > When a file size is not aligned to PAGE_SIZE, a mmap write on it may
> > > encounter -EIO (can be observed from virtiofsd's log) due to the difference
> > > between the buf size and the size recorded in struct fuse_write_in.  The
> > > difference comes from the fact that for mmap, writeback IO is used and
> > > guest kernel sets fuse_write_in's size to inode size if EOF, while the buf
> > > len still remains PAGE_SIZE aligned.
> > > 
> > > This handles the above special mmap case by truncating the last buf'size.
> > 
> > Thanks,
> > 
> > > Fixes: Commit 469f9d2f ("virtiofsd: Plumb fuse_bufvec through do_write_buf")
> > > Reported-by: Yiqun Leng <yqleng@linux.alibaba.com>
> > > Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> > > ---
> > >  tools/virtiofsd/fuse_lowlevel.c | 17 +++++++++++++++++
> > >  1 file changed, 17 insertions(+)
> > > 
> > > diff --git a/tools/virtiofsd/fuse_lowlevel.c b/tools/virtiofsd/fuse_lowlevel.c
> > > index ca2056f..4f8bfb6 100644
> > > --- a/tools/virtiofsd/fuse_lowlevel.c
> > > +++ b/tools/virtiofsd/fuse_lowlevel.c
> > > @@ -1221,6 +1221,23 @@ static void do_write_buf(fuse_req_t req, fuse_ino_t nodeid,
> > >           * and the data in the rest, we need to skip that first element
> > >           */
> > >          ibufv->buf[0].size = 0;
> > > +
> > > +        /*
> > > +         * In case of mmap, fuse_buf_size(pbufv) may need to truncate if
> > > +         * arg->size has been cropped by inode size inside guest.  The
> > > +         * diff can only be (0, PAGE_SIZE) because inode size must be
> > > +         * overlapped with the last buf.
> > > +         */
> > > +        if (arg->write_flags & FUSE_WRITE_CACHE) {
> > 
> > Does this need to only do it in the WRITE_CACHE case - or should we just
> > always truncate the write to arg->size?
> > Or is this just simpler?
> 
> For non-mmap IO, AFAICS, it's all synchronous IO (not using
> writepages) where the data part's length should be equal to arg->size
> here.  So I think it's no harm to do it for both.

OK, good that would simplify it.

> > 
> > > +                size_t total = fuse_buf_size(pbufv);
> > > +                int last = ibufv->count - 1;
> > > +
> > > +                if (total > arg->size) {
> > > +                        size_t diff = total - arg->size;
> > > +                        if (diff < ibufv->buf[last].size)
> > > +                                ibufv->buf[last].size -= diff;
> > 
> > I think that needs to modify pbufv->buf[last].size not ibufv
> > because the two are only the same in some cases (although it's possible
> > in this case the guest we try at the moment always falls in this side).
> >
> 
> OK.
> 
> > We should also do something in the else case - probably fail?
> >
> 
> If it fails, it then gets to the following check and report -EIO.

Yes, but that error doesn't tell us why; so we'll see that error and not
realise that it was because we failed to do this truncation.

> > > +                }
> > > +        }
> > >      }
> > >  
> > >      if (fuse_buf_size(pbufv) != arg->size) {
> > 
> > If we now know that pbufv is now always shrung to size,
> > then we only now need to check for the case where pbufv is too small.
> >
> 
> From my understanding about both mmap IO and nonmmap IO, I think it's
> arg->size that is always <= pbufv size.

It should do, but this is a sanity check - remember we don't trust the
guest.

Dave

> thanks,
> -liubo
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-03-24 20:09 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-20 18:34 [Virtio-fs] [PATCH v2 1/2] virtiofsd: print more verbose information when bailing out Liu Bo
2020-03-20 18:34 ` [Virtio-fs] [PATCH v2 2/2] virtiofsd: fix mmap write under nondax mode Liu Bo
2020-03-20 20:16   ` Dr. David Alan Gilbert
2020-03-20 22:33     ` Liu Bo
2020-03-24 20:09       ` Dr. David Alan Gilbert
2020-03-20 18:59 ` [Virtio-fs] [PATCH v2 1/2] virtiofsd: print more verbose information when bailing out Dr. David Alan Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.