All of lore.kernel.org
 help / color / mirror / Atom feed
* [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev
@ 2019-08-09  0:50 piaojun
  2019-08-09  0:51 ` [Virtio-fs] [PATCH v4 1/2] virtiofsd: add definition of fuse_buf_writev() piaojun
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: piaojun @ 2019-08-09  0:50 UTC (permalink / raw)
  To: virtio-fs

>From my test, write bandwidth will be improved greatly by replacing
pwrite with pwritev, and the test result as below:

---
pwrite:
# fio -direct=1 -time_based -iodepth=64 -rw=randwrite -ioengine=libaio -bs=1M -size=1G -numjob=16 -runtime=30 -group_reporting -name=file -filename=/mnt/virtiofs/file
file: (g=0): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=64
...
fio-2.13
Starting 16 processes
Jobs: 16 (f=16): [w(16)] [100.0% done] [0KB/886.0MB/0KB /s] [0/886/0 iops] [eta 00m:00s]
file: (groupid=0, jobs=16): err= 0: pid=5799: Tue Aug 6 18:48:26 2019
write: io=26881MB, bw=916988KB/s, iops=895, runt= 30018msec

pwritev:
# fio -direct=1 -time_based -iodepth=64 -rw=randwrite -ioengine=libaio -bs=1M -size=1G -numjob=16 -runtime=30 -group_reporting -name=file -filename=/mnt/virtiofs/file
file: (g=0): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=64
...
fio-2.13
Starting 16 processes
Jobs: 16 (f=16): [w(16)] [100.0% done] [0KB/1793MB/0KB /s] [0/1793/0 iops] [eta 00m:00s]
file: (groupid=0, jobs=16): err= 0: pid=6328: Tue Aug 6 18:22:17 2019
write: io=52775MB, bw=1758.7MB/s, iops=1758, runt= 30009msec
---

This patch introduces writev and pwritev for lo_write_buf(). I tried my
best not doing harm to the origin code construct, and there will be
some *useless* branches in fuse_buf_copy_one() which are hard to judge
if they will be useful in the future. So I just leave them alone
safely. If the cleanup work is necessary, please let me know.

v2
  - Split into two patches
  - Add the lost flags support, such as FUSE_BUF_PHYS_ADDR

v3
  - use git send-email to make the patch set in one thread
  - move fuse_buf_writev() into fuse_buf_copy()
  - use writev for the src buffers when they're alread already mapped by the daemon process
  - use calloc to replace malloc
  - set res 0 if writev() returns 0

v4
  - iterate from in_buf->buf[0] rather than buf[1]
  - optimize the code to make it more elegant

Jun Piao (2):
  add definition of fuse_buf_writev().
  use fuse_buf_writev to replace fuse_buf_write for better performance

Signed-off-by: Jun Piao <piaojun@huawei.com>
Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 buffer.c      |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
 fuse_common.h |   14 ++++++++++++++
 seccomp.c     |    2 ++
 3 files changed, 64 insertions(+)
--


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Virtio-fs] [PATCH v4 1/2] virtiofsd: add definition of fuse_buf_writev()
  2019-08-09  0:50 [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev piaojun
@ 2019-08-09  0:51 ` piaojun
  2019-08-14 17:07   ` Dr. David Alan Gilbert
  2019-08-09  0:53 ` [Virtio-fs] [PATCH v4 2/2] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance piaojun
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: piaojun @ 2019-08-09  0:51 UTC (permalink / raw)
  To: virtio-fs

Define fuse_buf_writev() which use pwritev and writev to improve io
bandwidth.

Signed-off-by: Jun Piao <piaojun@huawei.com>
Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/buffer.c  | 31 +++++++++++++++++++++++++++++++
 contrib/virtiofsd/seccomp.c |  2 ++
 2 files changed, 33 insertions(+)

diff --git a/contrib/virtiofsd/buffer.c b/contrib/virtiofsd/buffer.c
index 655be137..cec762f 100644
--- a/contrib/virtiofsd/buffer.c
+++ b/contrib/virtiofsd/buffer.c
@@ -15,6 +15,7 @@
 #include <unistd.h>
 #include <errno.h>
 #include <assert.h>
+#include <stdlib.h>

 size_t fuse_buf_size(const struct fuse_bufvec *bufv)
 {
@@ -71,6 +72,36 @@ static ssize_t fuse_buf_write(const struct fuse_buf *dst, size_t dst_off,
 	return copied;
 }

+static ssize_t fuse_buf_writev(fuse_req_t req,
+			     struct fuse_buf *out_buf,
+			     struct fuse_bufvec *in_buf)
+{
+	ssize_t res, i;
+	size_t iovcnt = in_buf->count;
+	struct iovec * iov;
+	int fd = out_buf->fd;
+
+	iov = calloc(iovcnt, sizeof(struct iovec));
+	if (!iov)
+		return -ENOMEM;
+
+	for (i = 0; i < iovcnt; i++) {
+		iov[i].iov_base = in_buf->buf[i].mem;
+		iov[i].iov_len = in_buf->buf[i].size;
+	}
+
+	if (out_buf->flags & FUSE_BUF_FD_SEEK)
+		res = pwritev(fd, iov, iovcnt, out_buf->pos);
+	else
+		res = writev(fd, iov, iovcnt);
+
+	if (res == -1)
+		res = -errno;
+
+	free(iov);
+	return res;
+}
+
 static ssize_t fuse_buf_read(const struct fuse_buf *dst, size_t dst_off,
 			     const struct fuse_buf *src, size_t src_off,
 			     size_t len)
diff --git a/contrib/virtiofsd/seccomp.c b/contrib/virtiofsd/seccomp.c
index 5f1c873..4bba30b 100644
--- a/contrib/virtiofsd/seccomp.c
+++ b/contrib/virtiofsd/seccomp.c
@@ -60,6 +60,7 @@ static const int syscall_whitelist[] = {
 	SCMP_SYS(ppoll),
 	SCMP_SYS(prctl), /* TODO restrict to just PR_SET_NAME? */
 	SCMP_SYS(preadv),
+	SCMP_SYS(pwritev),
 	SCMP_SYS(pwrite64),
 	SCMP_SYS(read),
 	SCMP_SYS(readlinkat),
@@ -78,6 +79,7 @@ static const int syscall_whitelist[] = {
 	SCMP_SYS(unlinkat),
 	SCMP_SYS(utimensat),
 	SCMP_SYS(write),
+	SCMP_SYS(writev),
 };

 /* Syscalls used when --syslog is enabled */
-- 


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Virtio-fs] [PATCH v4 2/2] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance
  2019-08-09  0:50 [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev piaojun
  2019-08-09  0:51 ` [Virtio-fs] [PATCH v4 1/2] virtiofsd: add definition of fuse_buf_writev() piaojun
@ 2019-08-09  0:53 ` piaojun
  2019-08-14 12:52   ` Stefan Hajnoczi
  2019-08-11  2:06 ` [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev Eric Ren
  2019-08-14  6:53 ` piaojun
  3 siblings, 1 reply; 17+ messages in thread
From: piaojun @ 2019-08-09  0:53 UTC (permalink / raw)
  To: virtio-fs

fuse_buf_writev() only handles the normal write in which src is buffer
and dest is fd. Specially if src buffer represents guest physical
address that can't be mapped by the daemon process, IO must be bounced
back to the VMM to do it by fuse_buf_copy().

Signed-off-by: Jun Piao <piaojun@huawei.com>
Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 contrib/virtiofsd/buffer.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/contrib/virtiofsd/buffer.c b/contrib/virtiofsd/buffer.c
index cec762f..d127c45 100644
--- a/contrib/virtiofsd/buffer.c
+++ b/contrib/virtiofsd/buffer.c
@@ -318,11 +318,22 @@ static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
 ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv, struct fuse_bufvec *srcv,
 		      enum fuse_buf_copy_flags flags)
 {
+	size_t i;
 	size_t copied = 0;

 	if (dstv == srcv)
 		return fuse_buf_size(dstv);

+	/* use writev to improve bandwidth when all the
+	 * src buffers already mapped by the daemon
+	 * process */
+	for (i = 0; i < srcv->count; i++) {
+		if (srcv->buf[i].flags & FUSE_BUF_PHYS_ADDR)
+			break;
+	}
+	if (i == srcv->count)
+		return fuse_buf_writev(req, &dstv->buf[0], srcv);
+
 	for (;;) {
 		const struct fuse_buf *src = fuse_bufvec_current(srcv);
 		const struct fuse_buf *dst = fuse_bufvec_current(dstv);
-- 


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev
  2019-08-09  0:50 [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev piaojun
  2019-08-09  0:51 ` [Virtio-fs] [PATCH v4 1/2] virtiofsd: add definition of fuse_buf_writev() piaojun
  2019-08-09  0:53 ` [Virtio-fs] [PATCH v4 2/2] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance piaojun
@ 2019-08-11  2:06 ` Eric Ren
  2019-08-11  2:46   ` piaojun
  2019-08-14  6:53 ` piaojun
  3 siblings, 1 reply; 17+ messages in thread
From: Eric Ren @ 2019-08-11  2:06 UTC (permalink / raw)
  To: piaojun; +Cc: virtio-fs

Hi jun,

On Fri, Aug 09, 2019 at 08:50:12AM +0800, piaojun wrote:
> >From my test, write bandwidth will be improved greatly by replacing
> pwrite with pwritev, and the test result as below:

Could you share more information about this testing?

- args for qemu: cache size?
- args for virtiofsd: which cache mode?
- DAX is used, right?

- which kind of disk are you using, what's then IOPS/BW limit?

I tried this patch with HDD disk - IOPS:5000, BW: 140MB/s.

- VM: 4 vcpus, 8G mem
- cache=always, cache-size=8G, DAX
- fio job

```
[global]
fsync=0
name=virtiofs-test
filename=fio-test
directory=$mntdir   # share dir for test sitting on the disk
rw=randwrite
bs=4K
direct=1
numjobs=1
time_based=0

[file1]
size=2G
io_size=40M
ioengine=libaio
iodepth=128
```
- without this patch

```
file1: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
fio-3.1
Starting 1 process

file1: (groupid=0, jobs=1): err= 0: pid=11: Sat Aug 10 11:33:29 2019
  write: IOPS=985, BW=3942KiB/s (4037kB/s)(40.0MiB/10390msec)
```

- with this patch applied

```
file1: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
fio-3.1
Starting 1 process

file1: (groupid=0, jobs=1): err= 0: pid=10: Sat Aug 10 15:31:57 2019
  write: IOPS=1056, BW=4224KiB/s (4326kB/s)(40.0MiB/9696msec)

```

the number is even worse than 9pfs.

Thanks,
Eric

> 
> ---
> pwrite:
> # fio -direct=1 -time_based -iodepth=64 -rw=randwrite -ioengine=libaio -bs=1M -size=1G -numjob=16 -runtime=30 -group_reporting -name=file -filename=/mnt/virtiofs/file
> file: (g=0): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=64
> ...
> fio-2.13
> Starting 16 processes
> Jobs: 16 (f=16): [w(16)] [100.0% done] [0KB/886.0MB/0KB /s] [0/886/0 iops] [eta 00m:00s]
> file: (groupid=0, jobs=16): err= 0: pid=5799: Tue Aug 6 18:48:26 2019
> write: io=26881MB, bw=916988KB/s, iops=895, runt= 30018msec
> 
> pwritev:
> # fio -direct=1 -time_based -iodepth=64 -rw=randwrite -ioengine=libaio -bs=1M -size=1G -numjob=16 -runtime=30 -group_reporting -name=file -filename=/mnt/virtiofs/file
> file: (g=0): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=64
> ...
> fio-2.13
> Starting 16 processes
> Jobs: 16 (f=16): [w(16)] [100.0% done] [0KB/1793MB/0KB /s] [0/1793/0 iops] [eta 00m:00s]
> file: (groupid=0, jobs=16): err= 0: pid=6328: Tue Aug 6 18:22:17 2019
> write: io=52775MB, bw=1758.7MB/s, iops=1758, runt= 30009msec
> ---
> 
> This patch introduces writev and pwritev for lo_write_buf(). I tried my
> best not doing harm to the origin code construct, and there will be
> some *useless* branches in fuse_buf_copy_one() which are hard to judge
> if they will be useful in the future. So I just leave them alone
> safely. If the cleanup work is necessary, please let me know.
> 
> v2
>   - Split into two patches
>   - Add the lost flags support, such as FUSE_BUF_PHYS_ADDR
> 
> v3
>   - use git send-email to make the patch set in one thread
>   - move fuse_buf_writev() into fuse_buf_copy()
>   - use writev for the src buffers when they're alread already mapped by the daemon process
>   - use calloc to replace malloc
>   - set res 0 if writev() returns 0
> 
> v4
>   - iterate from in_buf->buf[0] rather than buf[1]
>   - optimize the code to make it more elegant
> 
> Jun Piao (2):
>   add definition of fuse_buf_writev().
>   use fuse_buf_writev to replace fuse_buf_write for better performance
> 
> Signed-off-by: Jun Piao <piaojun@huawei.com>
> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  buffer.c      |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fuse_common.h |   14 ++++++++++++++
>  seccomp.c     |    2 ++
>  3 files changed, 64 insertions(+)
> --
> 
> _______________________________________________
> Virtio-fs mailing list
> Virtio-fs@redhat.com
> https://www.redhat.com/mailman/listinfo/virtio-fs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev
  2019-08-11  2:06 ` [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev Eric Ren
@ 2019-08-11  2:46   ` piaojun
  2019-08-11 11:15     ` Eric Ren
  0 siblings, 1 reply; 17+ messages in thread
From: piaojun @ 2019-08-11  2:46 UTC (permalink / raw)
  To: Eric Ren; +Cc: virtio-fs



On 2019/8/11 10:06, Eric Ren wrote:
> Hi jun,
> 
> On Fri, Aug 09, 2019 at 08:50:12AM +0800, piaojun wrote:
>> >From my test, write bandwidth will be improved greatly by replacing
>> pwrite with pwritev, and the test result as below:
> 
> Could you share more information about this testing?
> 
> - args for qemu: cache size?
> - args for virtiofsd: which cache mode?
> - DAX is used, right?

DAX is disabled, as this optimization only works on Host side.

> 
> - which kind of disk are you using, what's then IOPS/BW limit?
> 
> I tried this patch with HDD disk - IOPS:5000, BW: 140MB/s.

I used EXT4 and ramdisk as backend device.

> 
> - VM: 4 vcpus, 8G mem
> - cache=always, cache-size=8G, DAX
> - fio job
> 
> ```
> [global]
> fsync=0
> name=virtiofs-test
> filename=fio-test
> directory=$mntdir   # share dir for test sitting on the disk
> rw=randwrite
> bs=4K

*bs* should be 1M for bandwidth test which makes writev works well.

> direct=1
> numjobs=1
> time_based=0
> 
> [file1]
> size=2G
> io_size=40M
> ioengine=libaio
> iodepth=128
> ```
> - without this patch
> 
> ```
> file1: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
> fio-3.1
> Starting 1 process
> 
> file1: (groupid=0, jobs=1): err= 0: pid=11: Sat Aug 10 11:33:29 2019
>   write: IOPS=985, BW=3942KiB/s (4037kB/s)(40.0MiB/10390msec)
> ```
> 
> - with this patch applied
> 
> ```
> file1: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
> fio-3.1
> Starting 1 process
> 
> file1: (groupid=0, jobs=1): err= 0: pid=10: Sat Aug 10 15:31:57 2019
>   write: IOPS=1056, BW=4224KiB/s (4326kB/s)(40.0MiB/9696msec)
> 
> ```
> 
> the number is even worse than 9pfs.

Please modify your testcase as mentioned above, and share your result
again. I'm glad to find out where the problem is.

Jun

> 
> Thanks,
> Eric
> 
>>
>> ---
>> pwrite:
>> # fio -direct=1 -time_based -iodepth=64 -rw=randwrite -ioengine=libaio -bs=1M -size=1G -numjob=16 -runtime=30 -group_reporting -name=file -filename=/mnt/virtiofs/file
>> file: (g=0): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=64
>> ...
>> fio-2.13
>> Starting 16 processes
>> Jobs: 16 (f=16): [w(16)] [100.0% done] [0KB/886.0MB/0KB /s] [0/886/0 iops] [eta 00m:00s]
>> file: (groupid=0, jobs=16): err= 0: pid=5799: Tue Aug 6 18:48:26 2019
>> write: io=26881MB, bw=916988KB/s, iops=895, runt= 30018msec
>>
>> pwritev:
>> # fio -direct=1 -time_based -iodepth=64 -rw=randwrite -ioengine=libaio -bs=1M -size=1G -numjob=16 -runtime=30 -group_reporting -name=file -filename=/mnt/virtiofs/file
>> file: (g=0): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=64
>> ...
>> fio-2.13
>> Starting 16 processes
>> Jobs: 16 (f=16): [w(16)] [100.0% done] [0KB/1793MB/0KB /s] [0/1793/0 iops] [eta 00m:00s]
>> file: (groupid=0, jobs=16): err= 0: pid=6328: Tue Aug 6 18:22:17 2019
>> write: io=52775MB, bw=1758.7MB/s, iops=1758, runt= 30009msec
>> ---
>>
>> This patch introduces writev and pwritev for lo_write_buf(). I tried my
>> best not doing harm to the origin code construct, and there will be
>> some *useless* branches in fuse_buf_copy_one() which are hard to judge
>> if they will be useful in the future. So I just leave them alone
>> safely. If the cleanup work is necessary, please let me know.
>>
>> v2
>>   - Split into two patches
>>   - Add the lost flags support, such as FUSE_BUF_PHYS_ADDR
>>
>> v3
>>   - use git send-email to make the patch set in one thread
>>   - move fuse_buf_writev() into fuse_buf_copy()
>>   - use writev for the src buffers when they're alread already mapped by the daemon process
>>   - use calloc to replace malloc
>>   - set res 0 if writev() returns 0
>>
>> v4
>>   - iterate from in_buf->buf[0] rather than buf[1]
>>   - optimize the code to make it more elegant
>>
>> Jun Piao (2):
>>   add definition of fuse_buf_writev().
>>   use fuse_buf_writev to replace fuse_buf_write for better performance
>>
>> Signed-off-by: Jun Piao <piaojun@huawei.com>
>> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>>  buffer.c      |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  fuse_common.h |   14 ++++++++++++++
>>  seccomp.c     |    2 ++
>>  3 files changed, 64 insertions(+)
>> --
>>
>> _______________________________________________
>> Virtio-fs mailing list
>> Virtio-fs@redhat.com
>> https://www.redhat.com/mailman/listinfo/virtio-fs
> .
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev
  2019-08-11  2:46   ` piaojun
@ 2019-08-11 11:15     ` Eric Ren
  2019-08-11 13:55       ` piaojun
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Ren @ 2019-08-11 11:15 UTC (permalink / raw)
  To: piaojun; +Cc: virtio-fs

Hi,

On Sun, Aug 11, 2019 at 10:46:02AM +0800, piaojun wrote:
> 
> 
> On 2019/8/11 10:06, Eric Ren wrote:
> > Hi jun,
> > 
> > On Fri, Aug 09, 2019 at 08:50:12AM +0800, piaojun wrote:
> >> >From my test, write bandwidth will be improved greatly by replacing
> >> pwrite with pwritev, and the test result as below:
> > 
> > Could you share more information about this testing?
> > 
> > - args for qemu: cache size?
> > - args for virtiofsd: which cache mode?
> > - DAX is used, right?
> 
> DAX is disabled, as this optimization only works on Host side.


OK, thanks for the info.

The test runs in KATA container, which use DAX as its default mount
option.

> 
> > 
> > - which kind of disk are you using, what's then IOPS/BW limit?
> > 
> > I tried this patch with HDD disk - IOPS:5000, BW: 140MB/s.
> 
> I used EXT4 and ramdisk as backend device.
> 
> > 
> > - VM: 4 vcpus, 8G mem
> > - cache=always, cache-size=8G, DAX
> > - fio job
> > 
> > ```
> > [global]
> > fsync=0
> > name=virtiofs-test
> > filename=fio-test
> > directory=$mntdir   # share dir for test sitting on the disk
> > rw=randwrite
> > bs=4K
> 
> *bs* should be 1M for bandwidth test which makes writev works well.

Yes, but seq write "rw=write" is also good fr BW test, right?

> 
> > direct=1
> > numjobs=1
> > time_based=0
> > 
> > [file1]
> > size=2G
> > io_size=40M
> > ioengine=libaio
> > iodepth=128
> > ```
> > - without this patch
> > 
> > ```
> > file1: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
> > fio-3.1
> > Starting 1 process
> > 
> > file1: (groupid=0, jobs=1): err= 0: pid=11: Sat Aug 10 11:33:29 2019
> >   write: IOPS=985, BW=3942KiB/s (4037kB/s)(40.0MiB/10390msec)
> > ```
> > 
> > - with this patch applied
> > 
> > ```
> > file1: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
> > fio-3.1
> > Starting 1 process
> > 
> > file1: (groupid=0, jobs=1): err= 0: pid=10: Sat Aug 10 15:31:57 2019
> >   write: IOPS=1056, BW=4224KiB/s (4326kB/s)(40.0MiB/9696msec)
> > 
> > ```
> > 
> > the number is even worse than 9pfs.
> 
> Please modify your testcase as mentioned above, and share your result
> again. I'm glad to find out where the problem is.

Will come back if I get the number.

Thanks,
Eric

> 
> Jun
> 
> > 
> > Thanks,
> > Eric
> > 
> >>
> >> ---
> >> pwrite:
> >> # fio -direct=1 -time_based -iodepth=64 -rw=randwrite -ioengine=libaio -bs=1M -size=1G -numjob=16 -runtime=30 -group_reporting -name=file -filename=/mnt/virtiofs/file
> >> file: (g=0): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=64
> >> ...
> >> fio-2.13
> >> Starting 16 processes
> >> Jobs: 16 (f=16): [w(16)] [100.0% done] [0KB/886.0MB/0KB /s] [0/886/0 iops] [eta 00m:00s]
> >> file: (groupid=0, jobs=16): err= 0: pid=5799: Tue Aug 6 18:48:26 2019
> >> write: io=26881MB, bw=916988KB/s, iops=895, runt= 30018msec
> >>
> >> pwritev:
> >> # fio -direct=1 -time_based -iodepth=64 -rw=randwrite -ioengine=libaio -bs=1M -size=1G -numjob=16 -runtime=30 -group_reporting -name=file -filename=/mnt/virtiofs/file
> >> file: (g=0): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=64
> >> ...
> >> fio-2.13
> >> Starting 16 processes
> >> Jobs: 16 (f=16): [w(16)] [100.0% done] [0KB/1793MB/0KB /s] [0/1793/0 iops] [eta 00m:00s]
> >> file: (groupid=0, jobs=16): err= 0: pid=6328: Tue Aug 6 18:22:17 2019
> >> write: io=52775MB, bw=1758.7MB/s, iops=1758, runt= 30009msec
> >> ---
> >>
> >> This patch introduces writev and pwritev for lo_write_buf(). I tried my
> >> best not doing harm to the origin code construct, and there will be
> >> some *useless* branches in fuse_buf_copy_one() which are hard to judge
> >> if they will be useful in the future. So I just leave them alone
> >> safely. If the cleanup work is necessary, please let me know.
> >>
> >> v2
> >>   - Split into two patches
> >>   - Add the lost flags support, such as FUSE_BUF_PHYS_ADDR
> >>
> >> v3
> >>   - use git send-email to make the patch set in one thread
> >>   - move fuse_buf_writev() into fuse_buf_copy()
> >>   - use writev for the src buffers when they're alread already mapped by the daemon process
> >>   - use calloc to replace malloc
> >>   - set res 0 if writev() returns 0
> >>
> >> v4
> >>   - iterate from in_buf->buf[0] rather than buf[1]
> >>   - optimize the code to make it more elegant
> >>
> >> Jun Piao (2):
> >>   add definition of fuse_buf_writev().
> >>   use fuse_buf_writev to replace fuse_buf_write for better performance
> >>
> >> Signed-off-by: Jun Piao <piaojun@huawei.com>
> >> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> >> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> >> ---
> >>  buffer.c      |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
> >>  fuse_common.h |   14 ++++++++++++++
> >>  seccomp.c     |    2 ++
> >>  3 files changed, 64 insertions(+)
> >> --
> >>
> >> _______________________________________________
> >> Virtio-fs mailing list
> >> Virtio-fs@redhat.com
> >> https://www.redhat.com/mailman/listinfo/virtio-fs
> > .
> > 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev
  2019-08-11 11:15     ` Eric Ren
@ 2019-08-11 13:55       ` piaojun
  2019-08-14 12:41         ` Stefan Hajnoczi
  0 siblings, 1 reply; 17+ messages in thread
From: piaojun @ 2019-08-11 13:55 UTC (permalink / raw)
  To: Eric Ren; +Cc: virtio-fs



On 2019/8/11 19:15, Eric Ren wrote:
> Hi,
> 
> On Sun, Aug 11, 2019 at 10:46:02AM +0800, piaojun wrote:
>>
>>
>> On 2019/8/11 10:06, Eric Ren wrote:
>>> Hi jun,
>>>
>>> On Fri, Aug 09, 2019 at 08:50:12AM +0800, piaojun wrote:
>>>> >From my test, write bandwidth will be improved greatly by replacing
>>>> pwrite with pwritev, and the test result as below:
>>>
>>> Could you share more information about this testing?
>>>
>>> - args for qemu: cache size?
>>> - args for virtiofsd: which cache mode?
>>> - DAX is used, right?
>>
>> DAX is disabled, as this optimization only works on Host side.
> 
> 
> OK, thanks for the info.
> 
> The test runs in KATA container, which use DAX as its default mount
> option.
> 
>>
>>>
>>> - which kind of disk are you using, what's then IOPS/BW limit?
>>>
>>> I tried this patch with HDD disk - IOPS:5000, BW: 140MB/s.
>>
>> I used EXT4 and ramdisk as backend device.
>>
>>>
>>> - VM: 4 vcpus, 8G mem
>>> - cache=always, cache-size=8G, DAX
>>> - fio job
>>>
>>> ```
>>> [global]
>>> fsync=0
>>> name=virtiofs-test
>>> filename=fio-test
>>> directory=$mntdir   # share dir for test sitting on the disk
>>> rw=randwrite
>>> bs=4K
>>
>> *bs* should be 1M for bandwidth test which makes writev works well.
> 
> Yes, but seq write "rw=write" is also good fr BW test, right?

Right.

Jun

> 
>>
>>> direct=1
>>> numjobs=1
>>> time_based=0
>>>
>>> [file1]
>>> size=2G
>>> io_size=40M
>>> ioengine=libaio
>>> iodepth=128
>>> ```
>>> - without this patch
>>>
>>> ```
>>> file1: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
>>> fio-3.1
>>> Starting 1 process
>>>
>>> file1: (groupid=0, jobs=1): err= 0: pid=11: Sat Aug 10 11:33:29 2019
>>>   write: IOPS=985, BW=3942KiB/s (4037kB/s)(40.0MiB/10390msec)
>>> ```
>>>
>>> - with this patch applied
>>>
>>> ```
>>> file1: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=128
>>> fio-3.1
>>> Starting 1 process
>>>
>>> file1: (groupid=0, jobs=1): err= 0: pid=10: Sat Aug 10 15:31:57 2019
>>>   write: IOPS=1056, BW=4224KiB/s (4326kB/s)(40.0MiB/9696msec)
>>>
>>> ```
>>>
>>> the number is even worse than 9pfs.
>>
>> Please modify your testcase as mentioned above, and share your result
>> again. I'm glad to find out where the problem is.
> 
> Will come back if I get the number.
> 
> Thanks,
> Eric
> 
>>
>> Jun
>>
>>>
>>> Thanks,
>>> Eric
>>>
>>>>
>>>> ---
>>>> pwrite:
>>>> # fio -direct=1 -time_based -iodepth=64 -rw=randwrite -ioengine=libaio -bs=1M -size=1G -numjob=16 -runtime=30 -group_reporting -name=file -filename=/mnt/virtiofs/file
>>>> file: (g=0): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=64
>>>> ...
>>>> fio-2.13
>>>> Starting 16 processes
>>>> Jobs: 16 (f=16): [w(16)] [100.0% done] [0KB/886.0MB/0KB /s] [0/886/0 iops] [eta 00m:00s]
>>>> file: (groupid=0, jobs=16): err= 0: pid=5799: Tue Aug 6 18:48:26 2019
>>>> write: io=26881MB, bw=916988KB/s, iops=895, runt= 30018msec
>>>>
>>>> pwritev:
>>>> # fio -direct=1 -time_based -iodepth=64 -rw=randwrite -ioengine=libaio -bs=1M -size=1G -numjob=16 -runtime=30 -group_reporting -name=file -filename=/mnt/virtiofs/file
>>>> file: (g=0): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=64
>>>> ...
>>>> fio-2.13
>>>> Starting 16 processes
>>>> Jobs: 16 (f=16): [w(16)] [100.0% done] [0KB/1793MB/0KB /s] [0/1793/0 iops] [eta 00m:00s]
>>>> file: (groupid=0, jobs=16): err= 0: pid=6328: Tue Aug 6 18:22:17 2019
>>>> write: io=52775MB, bw=1758.7MB/s, iops=1758, runt= 30009msec
>>>> ---
>>>>
>>>> This patch introduces writev and pwritev for lo_write_buf(). I tried my
>>>> best not doing harm to the origin code construct, and there will be
>>>> some *useless* branches in fuse_buf_copy_one() which are hard to judge
>>>> if they will be useful in the future. So I just leave them alone
>>>> safely. If the cleanup work is necessary, please let me know.
>>>>
>>>> v2
>>>>   - Split into two patches
>>>>   - Add the lost flags support, such as FUSE_BUF_PHYS_ADDR
>>>>
>>>> v3
>>>>   - use git send-email to make the patch set in one thread
>>>>   - move fuse_buf_writev() into fuse_buf_copy()
>>>>   - use writev for the src buffers when they're alread already mapped by the daemon process
>>>>   - use calloc to replace malloc
>>>>   - set res 0 if writev() returns 0
>>>>
>>>> v4
>>>>   - iterate from in_buf->buf[0] rather than buf[1]
>>>>   - optimize the code to make it more elegant
>>>>
>>>> Jun Piao (2):
>>>>   add definition of fuse_buf_writev().
>>>>   use fuse_buf_writev to replace fuse_buf_write for better performance
>>>>
>>>> Signed-off-by: Jun Piao <piaojun@huawei.com>
>>>> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>>>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
>>>> ---
>>>>  buffer.c      |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  fuse_common.h |   14 ++++++++++++++
>>>>  seccomp.c     |    2 ++
>>>>  3 files changed, 64 insertions(+)
>>>> --
>>>>
>>>> _______________________________________________
>>>> Virtio-fs mailing list
>>>> Virtio-fs@redhat.com
>>>> https://www.redhat.com/mailman/listinfo/virtio-fs
>>> .
>>>
> .
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev
  2019-08-09  0:50 [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev piaojun
                   ` (2 preceding siblings ...)
  2019-08-11  2:06 ` [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev Eric Ren
@ 2019-08-14  6:53 ` piaojun
  3 siblings, 0 replies; 17+ messages in thread
From: piaojun @ 2019-08-14  6:53 UTC (permalink / raw)
  To: virtio-fs

Ping, any comments on this patch?

On 2019/8/9 8:50, piaojun wrote:
>>From my test, write bandwidth will be improved greatly by replacing
> pwrite with pwritev, and the test result as below:
> 
> ---
> pwrite:
> # fio -direct=1 -time_based -iodepth=64 -rw=randwrite -ioengine=libaio -bs=1M -size=1G -numjob=16 -runtime=30 -group_reporting -name=file -filename=/mnt/virtiofs/file
> file: (g=0): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=64
> ...
> fio-2.13
> Starting 16 processes
> Jobs: 16 (f=16): [w(16)] [100.0% done] [0KB/886.0MB/0KB /s] [0/886/0 iops] [eta 00m:00s]
> file: (groupid=0, jobs=16): err= 0: pid=5799: Tue Aug 6 18:48:26 2019
> write: io=26881MB, bw=916988KB/s, iops=895, runt= 30018msec
> 
> pwritev:
> # fio -direct=1 -time_based -iodepth=64 -rw=randwrite -ioengine=libaio -bs=1M -size=1G -numjob=16 -runtime=30 -group_reporting -name=file -filename=/mnt/virtiofs/file
> file: (g=0): rw=randwrite, bs=1M-1M/1M-1M/1M-1M, ioengine=libaio, iodepth=64
> ...
> fio-2.13
> Starting 16 processes
> Jobs: 16 (f=16): [w(16)] [100.0% done] [0KB/1793MB/0KB /s] [0/1793/0 iops] [eta 00m:00s]
> file: (groupid=0, jobs=16): err= 0: pid=6328: Tue Aug 6 18:22:17 2019
> write: io=52775MB, bw=1758.7MB/s, iops=1758, runt= 30009msec
> ---
> 
> This patch introduces writev and pwritev for lo_write_buf(). I tried my
> best not doing harm to the origin code construct, and there will be
> some *useless* branches in fuse_buf_copy_one() which are hard to judge
> if they will be useful in the future. So I just leave them alone
> safely. If the cleanup work is necessary, please let me know.
> 
> v2
>   - Split into two patches
>   - Add the lost flags support, such as FUSE_BUF_PHYS_ADDR
> 
> v3
>   - use git send-email to make the patch set in one thread
>   - move fuse_buf_writev() into fuse_buf_copy()
>   - use writev for the src buffers when they're alread already mapped by the daemon process
>   - use calloc to replace malloc
>   - set res 0 if writev() returns 0
> 
> v4
>   - iterate from in_buf->buf[0] rather than buf[1]
>   - optimize the code to make it more elegant
> 
> Jun Piao (2):
>   add definition of fuse_buf_writev().
>   use fuse_buf_writev to replace fuse_buf_write for better performance
> 
> Signed-off-by: Jun Piao <piaojun@huawei.com>
> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  buffer.c      |   48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  fuse_common.h |   14 ++++++++++++++
>  seccomp.c     |    2 ++
>  3 files changed, 64 insertions(+)
> --
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev
  2019-08-11 13:55       ` piaojun
@ 2019-08-14 12:41         ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2019-08-14 12:41 UTC (permalink / raw)
  To: Eric Ren; +Cc: virtio-fs

[-- Attachment #1: Type: text/plain, Size: 1814 bytes --]

On Sun, Aug 11, 2019 at 09:55:17PM +0800, piaojun wrote:
> 
> 
> On 2019/8/11 19:15, Eric Ren wrote:
> > Hi,
> > 
> > On Sun, Aug 11, 2019 at 10:46:02AM +0800, piaojun wrote:
> >>
> >>
> >> On 2019/8/11 10:06, Eric Ren wrote:
> >>> Hi jun,
> >>>
> >>> On Fri, Aug 09, 2019 at 08:50:12AM +0800, piaojun wrote:
> >>>> >From my test, write bandwidth will be improved greatly by replacing
> >>>> pwrite with pwritev, and the test result as below:
> >>>
> >>> Could you share more information about this testing?
> >>>
> >>> - args for qemu: cache size?
> >>> - args for virtiofsd: which cache mode?
> >>> - DAX is used, right?
> >>
> >> DAX is disabled, as this optimization only works on Host side.
> > 
> > 
> > OK, thanks for the info.
> > 
> > The test runs in KATA container, which use DAX as its default mount
> > option.
> > 
> >>
> >>>
> >>> - which kind of disk are you using, what's then IOPS/BW limit?
> >>>
> >>> I tried this patch with HDD disk - IOPS:5000, BW: 140MB/s.
> >>
> >> I used EXT4 and ramdisk as backend device.
> >>
> >>>
> >>> - VM: 4 vcpus, 8G mem
> >>> - cache=always, cache-size=8G, DAX
> >>> - fio job
> >>>
> >>> ```
> >>> [global]
> >>> fsync=0
> >>> name=virtiofs-test
> >>> filename=fio-test
> >>> directory=$mntdir   # share dir for test sitting on the disk
> >>> rw=randwrite
> >>> bs=4K
> >>
> >> *bs* should be 1M for bandwidth test which makes writev works well.
> > 
> > Yes, but seq write "rw=write" is also good fr BW test, right?
> 
> Right.

The purpose of piaojun's patch is to accelerate cases where the guest
submits multiple pages.  If bs is 4KB then there is only one page and
writev will not speed things up.

You need a larger blocksize if you want to see the effects of this
optimization.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] [PATCH v4 2/2] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance
  2019-08-09  0:53 ` [Virtio-fs] [PATCH v4 2/2] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance piaojun
@ 2019-08-14 12:52   ` Stefan Hajnoczi
  2019-08-14 13:28     ` piaojun
  2019-08-15  1:11     ` piaojun
  0 siblings, 2 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2019-08-14 12:52 UTC (permalink / raw)
  To: piaojun; +Cc: virtio-fs

[-- Attachment #1: Type: text/plain, Size: 1192 bytes --]

On Fri, Aug 09, 2019 at 08:53:20AM +0800, piaojun wrote:
> diff --git a/contrib/virtiofsd/buffer.c b/contrib/virtiofsd/buffer.c
> index cec762f..d127c45 100644
> --- a/contrib/virtiofsd/buffer.c
> +++ b/contrib/virtiofsd/buffer.c
> @@ -318,11 +318,22 @@ static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
>  ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv, struct fuse_bufvec *srcv,
>  		      enum fuse_buf_copy_flags flags)
>  {
> +	size_t i;
>  	size_t copied = 0;
> 
>  	if (dstv == srcv)
>  		return fuse_buf_size(dstv);
> 
> +	/* use writev to improve bandwidth when all the
> +	 * src buffers already mapped by the daemon
> +	 * process */

Some more checks to make this generic:

> +	for (i = 0; i < srcv->count; i++) {
> +		if (srcv->buf[i].flags & FUSE_BUF_PHYS_ADDR)
> +			break;

                if (srcv->buf[i].flags & FUSE_BUF_IS_FD)
		        break;

> +	}
> +	if (i == srcv->count)

Please verify the preconditions for the destination buffer too:

  && dstv->count == 1 && dstv->idx == 0 && dstv->off == 0 &&
  (dstv->buf[0].flags & FUSE_BUF_IS_FD)

> +		return fuse_buf_writev(req, &dstv->buf[0], srcv);

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] [PATCH v4 2/2] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance
  2019-08-14 12:52   ` Stefan Hajnoczi
@ 2019-08-14 13:28     ` piaojun
  2019-08-15  1:11     ` piaojun
  1 sibling, 0 replies; 17+ messages in thread
From: piaojun @ 2019-08-14 13:28 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs



On 2019/8/14 20:52, Stefan Hajnoczi wrote:
> On Fri, Aug 09, 2019 at 08:53:20AM +0800, piaojun wrote:
>> diff --git a/contrib/virtiofsd/buffer.c b/contrib/virtiofsd/buffer.c
>> index cec762f..d127c45 100644
>> --- a/contrib/virtiofsd/buffer.c
>> +++ b/contrib/virtiofsd/buffer.c
>> @@ -318,11 +318,22 @@ static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
>>  ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv, struct fuse_bufvec *srcv,
>>  		      enum fuse_buf_copy_flags flags)
>>  {
>> +	size_t i;
>>  	size_t copied = 0;
>>
>>  	if (dstv == srcv)
>>  		return fuse_buf_size(dstv);
>>
>> +	/* use writev to improve bandwidth when all the
>> +	 * src buffers already mapped by the daemon
>> +	 * process */
> 
> Some more checks to make this generic:
> 
>> +	for (i = 0; i < srcv->count; i++) {
>> +		if (srcv->buf[i].flags & FUSE_BUF_PHYS_ADDR)
>> +			break;
> 
>                 if (srcv->buf[i].flags & FUSE_BUF_IS_FD)
> 		        break;
> 
>> +	}
>> +	if (i == srcv->count)
> 
> Please verify the preconditions for the destination buffer too:
> 
>   && dstv->count == 1 && dstv->idx == 0 && dstv->off == 0 &&
>   (dstv->buf[0].flags & FUSE_BUF_IS_FD)
> My previous thought was letting lo_write_buf() check all the
preconditions which would make code simpler. But as you said, generic
code comes first, so I will fix them in PATCH v5 later.

Thanks,
Jun


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] [PATCH v4 1/2] virtiofsd: add definition of fuse_buf_writev()
  2019-08-09  0:51 ` [Virtio-fs] [PATCH v4 1/2] virtiofsd: add definition of fuse_buf_writev() piaojun
@ 2019-08-14 17:07   ` Dr. David Alan Gilbert
  2019-09-12  6:09     ` piaojun
  0 siblings, 1 reply; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2019-08-14 17:07 UTC (permalink / raw)
  To: piaojun; +Cc: virtio-fs

* piaojun (piaojun@huawei.com) wrote:
> Define fuse_buf_writev() which use pwritev and writev to improve io
> bandwidth.
> 
> Signed-off-by: Jun Piao <piaojun@huawei.com>
> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>

Yep, this part looks OK,


Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  contrib/virtiofsd/buffer.c  | 31 +++++++++++++++++++++++++++++++
>  contrib/virtiofsd/seccomp.c |  2 ++
>  2 files changed, 33 insertions(+)
> 
> diff --git a/contrib/virtiofsd/buffer.c b/contrib/virtiofsd/buffer.c
> index 655be137..cec762f 100644
> --- a/contrib/virtiofsd/buffer.c
> +++ b/contrib/virtiofsd/buffer.c
> @@ -15,6 +15,7 @@
>  #include <unistd.h>
>  #include <errno.h>
>  #include <assert.h>
> +#include <stdlib.h>
> 
>  size_t fuse_buf_size(const struct fuse_bufvec *bufv)
>  {
> @@ -71,6 +72,36 @@ static ssize_t fuse_buf_write(const struct fuse_buf *dst, size_t dst_off,
>  	return copied;
>  }
> 
> +static ssize_t fuse_buf_writev(fuse_req_t req,
> +			     struct fuse_buf *out_buf,
> +			     struct fuse_bufvec *in_buf)
> +{
> +	ssize_t res, i;
> +	size_t iovcnt = in_buf->count;
> +	struct iovec * iov;
> +	int fd = out_buf->fd;
> +
> +	iov = calloc(iovcnt, sizeof(struct iovec));
> +	if (!iov)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < iovcnt; i++) {
> +		iov[i].iov_base = in_buf->buf[i].mem;
> +		iov[i].iov_len = in_buf->buf[i].size;
> +	}
> +
> +	if (out_buf->flags & FUSE_BUF_FD_SEEK)
> +		res = pwritev(fd, iov, iovcnt, out_buf->pos);
> +	else
> +		res = writev(fd, iov, iovcnt);
> +
> +	if (res == -1)
> +		res = -errno;
> +
> +	free(iov);
> +	return res;
> +}
> +
>  static ssize_t fuse_buf_read(const struct fuse_buf *dst, size_t dst_off,
>  			     const struct fuse_buf *src, size_t src_off,
>  			     size_t len)
> diff --git a/contrib/virtiofsd/seccomp.c b/contrib/virtiofsd/seccomp.c
> index 5f1c873..4bba30b 100644
> --- a/contrib/virtiofsd/seccomp.c
> +++ b/contrib/virtiofsd/seccomp.c
> @@ -60,6 +60,7 @@ static const int syscall_whitelist[] = {
>  	SCMP_SYS(ppoll),
>  	SCMP_SYS(prctl), /* TODO restrict to just PR_SET_NAME? */
>  	SCMP_SYS(preadv),
> +	SCMP_SYS(pwritev),
>  	SCMP_SYS(pwrite64),
>  	SCMP_SYS(read),
>  	SCMP_SYS(readlinkat),
> @@ -78,6 +79,7 @@ static const int syscall_whitelist[] = {
>  	SCMP_SYS(unlinkat),
>  	SCMP_SYS(utimensat),
>  	SCMP_SYS(write),
> +	SCMP_SYS(writev),
>  };
> 
>  /* Syscalls used when --syslog is enabled */
> -- 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] [PATCH v4 2/2] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance
  2019-08-14 12:52   ` Stefan Hajnoczi
  2019-08-14 13:28     ` piaojun
@ 2019-08-15  1:11     ` piaojun
  2019-08-15 14:50       ` Stefan Hajnoczi
  1 sibling, 1 reply; 17+ messages in thread
From: piaojun @ 2019-08-15  1:11 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs

Hi Stefan,

On 2019/8/14 20:52, Stefan Hajnoczi wrote:
> On Fri, Aug 09, 2019 at 08:53:20AM +0800, piaojun wrote:
>> diff --git a/contrib/virtiofsd/buffer.c b/contrib/virtiofsd/buffer.c
>> index cec762f..d127c45 100644
>> --- a/contrib/virtiofsd/buffer.c
>> +++ b/contrib/virtiofsd/buffer.c
>> @@ -318,11 +318,22 @@ static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
>>  ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv, struct fuse_bufvec *srcv,
>>  		      enum fuse_buf_copy_flags flags)
>>  {
>> +	size_t i;
>>  	size_t copied = 0;
>>
>>  	if (dstv == srcv)
>>  		return fuse_buf_size(dstv);
>>
>> +	/* use writev to improve bandwidth when all the
>> +	 * src buffers already mapped by the daemon
>> +	 * process */
> 
> Some more checks to make this generic:
> 
>> +	for (i = 0; i < srcv->count; i++) {
>> +		if (srcv->buf[i].flags & FUSE_BUF_PHYS_ADDR)
>> +			break;
> 
>                 if (srcv->buf[i].flags & FUSE_BUF_IS_FD)
> 		        break;
> 
>> +	}
>> +	if (i == srcv->count)
> 
> Please verify the preconditions for the destination buffer too:
> 
>   && dstv->count == 1 && dstv->idx == 0 && dstv->off == 0 &&
>   (dstv->buf[0].flags & FUSE_BUF_IS_FD)

Thanks for your comments, and I prefer adding checker just for
FUSE_BUF_IS_FD as fuse_buf_writev() does not care much about the other
parts. And I personally think so many preconditions will make
fuse_buf_writev() so specific.

Thanks,
Jun

> 
>> +		return fuse_buf_writev(req, &dstv->buf[0], srcv);


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] [PATCH v4 2/2] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance
  2019-08-15  1:11     ` piaojun
@ 2019-08-15 14:50       ` Stefan Hajnoczi
  2019-08-16  0:48         ` piaojun
  0 siblings, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2019-08-15 14:50 UTC (permalink / raw)
  To: piaojun; +Cc: virtio-fs

[-- Attachment #1: Type: text/plain, Size: 1886 bytes --]

On Thu, Aug 15, 2019 at 09:11:49AM +0800, piaojun wrote:
> Hi Stefan,
> 
> On 2019/8/14 20:52, Stefan Hajnoczi wrote:
> > On Fri, Aug 09, 2019 at 08:53:20AM +0800, piaojun wrote:
> >> diff --git a/contrib/virtiofsd/buffer.c b/contrib/virtiofsd/buffer.c
> >> index cec762f..d127c45 100644
> >> --- a/contrib/virtiofsd/buffer.c
> >> +++ b/contrib/virtiofsd/buffer.c
> >> @@ -318,11 +318,22 @@ static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
> >>  ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv, struct fuse_bufvec *srcv,
> >>  		      enum fuse_buf_copy_flags flags)
> >>  {
> >> +	size_t i;
> >>  	size_t copied = 0;
> >>
> >>  	if (dstv == srcv)
> >>  		return fuse_buf_size(dstv);
> >>
> >> +	/* use writev to improve bandwidth when all the
> >> +	 * src buffers already mapped by the daemon
> >> +	 * process */
> > 
> > Some more checks to make this generic:
> > 
> >> +	for (i = 0; i < srcv->count; i++) {
> >> +		if (srcv->buf[i].flags & FUSE_BUF_PHYS_ADDR)
> >> +			break;
> > 
> >                 if (srcv->buf[i].flags & FUSE_BUF_IS_FD)
> > 		        break;
> > 
> >> +	}
> >> +	if (i == srcv->count)
> > 
> > Please verify the preconditions for the destination buffer too:
> > 
> >   && dstv->count == 1 && dstv->idx == 0 && dstv->off == 0 &&
> >   (dstv->buf[0].flags & FUSE_BUF_IS_FD)
> 
> Thanks for your comments, and I prefer adding checker just for
> FUSE_BUF_IS_FD as fuse_buf_writev() does not care much about the other
> parts. And I personally think so many preconditions will make
> fuse_buf_writev() so specific.

I think these preconditions are necessary.  This function won't do the
right thing if dstv has multiple elements or has been previously
modified (idx/off).

Can you explain why these cases will never happen or why the code is
correct under these cases?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] [PATCH v4 2/2] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance
  2019-08-15 14:50       ` Stefan Hajnoczi
@ 2019-08-16  0:48         ` piaojun
  2019-08-19 10:08           ` Stefan Hajnoczi
  0 siblings, 1 reply; 17+ messages in thread
From: piaojun @ 2019-08-16  0:48 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: virtio-fs



On 2019/8/15 22:50, Stefan Hajnoczi wrote:
> On Thu, Aug 15, 2019 at 09:11:49AM +0800, piaojun wrote:
>> Hi Stefan,
>>
>> On 2019/8/14 20:52, Stefan Hajnoczi wrote:
>>> On Fri, Aug 09, 2019 at 08:53:20AM +0800, piaojun wrote:
>>>> diff --git a/contrib/virtiofsd/buffer.c b/contrib/virtiofsd/buffer.c
>>>> index cec762f..d127c45 100644
>>>> --- a/contrib/virtiofsd/buffer.c
>>>> +++ b/contrib/virtiofsd/buffer.c
>>>> @@ -318,11 +318,22 @@ static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
>>>>  ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv, struct fuse_bufvec *srcv,
>>>>  		      enum fuse_buf_copy_flags flags)
>>>>  {
>>>> +	size_t i;
>>>>  	size_t copied = 0;
>>>>
>>>>  	if (dstv == srcv)
>>>>  		return fuse_buf_size(dstv);
>>>>
>>>> +	/* use writev to improve bandwidth when all the
>>>> +	 * src buffers already mapped by the daemon
>>>> +	 * process */
>>>
>>> Some more checks to make this generic:
>>>
>>>> +	for (i = 0; i < srcv->count; i++) {
>>>> +		if (srcv->buf[i].flags & FUSE_BUF_PHYS_ADDR)
>>>> +			break;
>>>
>>>                 if (srcv->buf[i].flags & FUSE_BUF_IS_FD)
>>> 		        break;
>>>
>>>> +	}
>>>> +	if (i == srcv->count)
>>>
>>> Please verify the preconditions for the destination buffer too:
>>>
>>>   && dstv->count == 1 && dstv->idx == 0 && dstv->off == 0 &&
>>>   (dstv->buf[0].flags & FUSE_BUF_IS_FD)
>>
>> Thanks for your comments, and I prefer adding checker just for
>> FUSE_BUF_IS_FD as fuse_buf_writev() does not care much about the other
>> parts. And I personally think so many preconditions will make
>> fuse_buf_writev() so specific.
> 
> I think these preconditions are necessary.  This function won't do the
> right thing if dstv has multiple elements or has been previously
> modified (idx/off).
> 
> Can you explain why these cases will never happen or why the code is
> correct under these cases?
> 
> Stefan
> 

We have already reached an agreement with the src buf and need discuss
with the dst buf. So I spent some time looking into libfuse code, and
found your suggestion really make sense. My previous thought was that
idx/off only works when dst is buf. So I will make these preconditions
more generic like this:

In fuse_buf_copy():
if (i == srcv->count) {
dstv->buf[0].pos += dstv->off; // add dst offset even if it's always 0
fuse_buf_writev(req, &dstv->buf[0], srcv);
}

In fuse_buf_writev():

// modify the preconditions
if (i == srcv->count && dstv->count == 1 &&
dstv->idx == 0 && (dstv->buf[0].flags & FUSE_BUF_IS_FD))

Jun


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] [PATCH v4 2/2] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance
  2019-08-16  0:48         ` piaojun
@ 2019-08-19 10:08           ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2019-08-19 10:08 UTC (permalink / raw)
  To: piaojun; +Cc: virtio-fs

[-- Attachment #1: Type: text/plain, Size: 2847 bytes --]

On Fri, Aug 16, 2019 at 08:48:59AM +0800, piaojun wrote:
> On 2019/8/15 22:50, Stefan Hajnoczi wrote:
> > On Thu, Aug 15, 2019 at 09:11:49AM +0800, piaojun wrote:
> >> Hi Stefan,
> >>
> >> On 2019/8/14 20:52, Stefan Hajnoczi wrote:
> >>> On Fri, Aug 09, 2019 at 08:53:20AM +0800, piaojun wrote:
> >>>> diff --git a/contrib/virtiofsd/buffer.c b/contrib/virtiofsd/buffer.c
> >>>> index cec762f..d127c45 100644
> >>>> --- a/contrib/virtiofsd/buffer.c
> >>>> +++ b/contrib/virtiofsd/buffer.c
> >>>> @@ -318,11 +318,22 @@ static int fuse_bufvec_advance(struct fuse_bufvec *bufv, size_t len)
> >>>>  ssize_t fuse_buf_copy(fuse_req_t req, struct fuse_bufvec *dstv, struct fuse_bufvec *srcv,
> >>>>  		      enum fuse_buf_copy_flags flags)
> >>>>  {
> >>>> +	size_t i;
> >>>>  	size_t copied = 0;
> >>>>
> >>>>  	if (dstv == srcv)
> >>>>  		return fuse_buf_size(dstv);
> >>>>
> >>>> +	/* use writev to improve bandwidth when all the
> >>>> +	 * src buffers already mapped by the daemon
> >>>> +	 * process */
> >>>
> >>> Some more checks to make this generic:
> >>>
> >>>> +	for (i = 0; i < srcv->count; i++) {
> >>>> +		if (srcv->buf[i].flags & FUSE_BUF_PHYS_ADDR)
> >>>> +			break;
> >>>
> >>>                 if (srcv->buf[i].flags & FUSE_BUF_IS_FD)
> >>> 		        break;
> >>>
> >>>> +	}
> >>>> +	if (i == srcv->count)
> >>>
> >>> Please verify the preconditions for the destination buffer too:
> >>>
> >>>   && dstv->count == 1 && dstv->idx == 0 && dstv->off == 0 &&
> >>>   (dstv->buf[0].flags & FUSE_BUF_IS_FD)
> >>
> >> Thanks for your comments, and I prefer adding checker just for
> >> FUSE_BUF_IS_FD as fuse_buf_writev() does not care much about the other
> >> parts. And I personally think so many preconditions will make
> >> fuse_buf_writev() so specific.
> > 
> > I think these preconditions are necessary.  This function won't do the
> > right thing if dstv has multiple elements or has been previously
> > modified (idx/off).
> > 
> > Can you explain why these cases will never happen or why the code is
> > correct under these cases?
> > 
> > Stefan
> > 
> 
> We have already reached an agreement with the src buf and need discuss
> with the dst buf. So I spent some time looking into libfuse code, and
> found your suggestion really make sense. My previous thought was that
> idx/off only works when dst is buf. So I will make these preconditions
> more generic like this:
> 
> In fuse_buf_copy():
> if (i == srcv->count) {
> dstv->buf[0].pos += dstv->off; // add dst offset even if it's always 0
> fuse_buf_writev(req, &dstv->buf[0], srcv);
> }

The caller doesn't expect dstv->buf[0].pos to change, so it is safer to
use a local variable for the file position inside fuse_buf_writev()
instead of doing this in fuse_buf_copy().

Hope this makes sense,
Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Virtio-fs] [PATCH v4 1/2] virtiofsd: add definition of fuse_buf_writev()
  2019-08-14 17:07   ` Dr. David Alan Gilbert
@ 2019-09-12  6:09     ` piaojun
  0 siblings, 0 replies; 17+ messages in thread
From: piaojun @ 2019-09-12  6:09 UTC (permalink / raw)
  To: Dr. David Alan Gilbert; +Cc: virtio-fs

Hi Dave,

I just found a little mistake of this patch as below. Could you help
squash into this patch?

Fixes: fea3e6e5ec76 ("add definition of fuse_buf_writev()")

Thanks,
Jun

On 2019/8/15 1:07, Dr. David Alan Gilbert wrote:
> * piaojun (piaojun@huawei.com) wrote:
>> Define fuse_buf_writev() which use pwritev and writev to improve io
>> bandwidth.
>>
>> Signed-off-by: Jun Piao <piaojun@huawei.com>
>> Suggested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> 
> Yep, this part looks OK,
> 
> 
> Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> 
>> ---
>>  contrib/virtiofsd/buffer.c  | 31 +++++++++++++++++++++++++++++++
>>  contrib/virtiofsd/seccomp.c |  2 ++
>>  2 files changed, 33 insertions(+)
>>
>> diff --git a/contrib/virtiofsd/buffer.c b/contrib/virtiofsd/buffer.c
>> index 655be137..cec762f 100644
>> --- a/contrib/virtiofsd/buffer.c
>> +++ b/contrib/virtiofsd/buffer.c
>> @@ -15,6 +15,7 @@
>>  #include <unistd.h>
>>  #include <errno.h>
>>  #include <assert.h>
>> +#include <stdlib.h>
>>
>>  size_t fuse_buf_size(const struct fuse_bufvec *bufv)
>>  {
>> @@ -71,6 +72,36 @@ static ssize_t fuse_buf_write(const struct fuse_buf *dst, size_t dst_off,
>>  	return copied;
>>  }
>>
>> +static ssize_t fuse_buf_writev(fuse_req_t req,
>> +			     struct fuse_buf *out_buf,
>> +			     struct fuse_bufvec *in_buf)

*req* could be cleaned up as it is unused.

>> +{
>> +	ssize_t res, i;
>> +	size_t iovcnt = in_buf->count;
>> +	struct iovec * iov;
>> +	int fd = out_buf->fd;
>> +
>> +	iov = calloc(iovcnt, sizeof(struct iovec));
>> +	if (!iov)
>> +		return -ENOMEM;
>> +
>> +	for (i = 0; i < iovcnt; i++) {
>> +		iov[i].iov_base = in_buf->buf[i].mem;
>> +		iov[i].iov_len = in_buf->buf[i].size;
>> +	}
>> +
>> +	if (out_buf->flags & FUSE_BUF_FD_SEEK)
>> +		res = pwritev(fd, iov, iovcnt, out_buf->pos);
>> +	else
>> +		res = writev(fd, iov, iovcnt);
>> +
>> +	if (res == -1)
>> +		res = -errno;
>> +
>> +	free(iov);
>> +	return res;
>> +}
>> +
>>  static ssize_t fuse_buf_read(const struct fuse_buf *dst, size_t dst_off,
>>  			     const struct fuse_buf *src, size_t src_off,
>>  			     size_t len)
>> diff --git a/contrib/virtiofsd/seccomp.c b/contrib/virtiofsd/seccomp.c
>> index 5f1c873..4bba30b 100644
>> --- a/contrib/virtiofsd/seccomp.c
>> +++ b/contrib/virtiofsd/seccomp.c
>> @@ -60,6 +60,7 @@ static const int syscall_whitelist[] = {
>>  	SCMP_SYS(ppoll),
>>  	SCMP_SYS(prctl), /* TODO restrict to just PR_SET_NAME? */
>>  	SCMP_SYS(preadv),
>> +	SCMP_SYS(pwritev),
>>  	SCMP_SYS(pwrite64),
>>  	SCMP_SYS(read),
>>  	SCMP_SYS(readlinkat),
>> @@ -78,6 +79,7 @@ static const int syscall_whitelist[] = {
>>  	SCMP_SYS(unlinkat),
>>  	SCMP_SYS(utimensat),
>>  	SCMP_SYS(write),
>> +	SCMP_SYS(writev),
>>  };
>>
>>  /* Syscalls used when --syslog is enabled */
>> -- 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> .
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-09-12  6:09 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-09  0:50 [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev piaojun
2019-08-09  0:51 ` [Virtio-fs] [PATCH v4 1/2] virtiofsd: add definition of fuse_buf_writev() piaojun
2019-08-14 17:07   ` Dr. David Alan Gilbert
2019-09-12  6:09     ` piaojun
2019-08-09  0:53 ` [Virtio-fs] [PATCH v4 2/2] virtiofsd: use fuse_buf_writev to replace fuse_buf_write for better performance piaojun
2019-08-14 12:52   ` Stefan Hajnoczi
2019-08-14 13:28     ` piaojun
2019-08-15  1:11     ` piaojun
2019-08-15 14:50       ` Stefan Hajnoczi
2019-08-16  0:48         ` piaojun
2019-08-19 10:08           ` Stefan Hajnoczi
2019-08-11  2:06 ` [Virtio-fs] [PATCH v4 0/2] virtiofsd: Improve io bandwidth by replacing pwrite with pwritev Eric Ren
2019-08-11  2:46   ` piaojun
2019-08-11 11:15     ` Eric Ren
2019-08-11 13:55       ` piaojun
2019-08-14 12:41         ` Stefan Hajnoczi
2019-08-14  6:53 ` piaojun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.