From: Vivek Goyal <vgoyal@redhat.com>
To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nvdimm@lists.01.org, virtio-fs@redhat.com,
miklos@szeredi.hu
Cc: stefanha@redhat.com, dgilbert@redhat.com, mst@redhat.com
Subject: [PATCH 00/20] virtiofs: Add DAX support
Date: Wed, 4 Mar 2020 11:58:25 -0500 [thread overview]
Message-ID: <20200304165845.3081-1-vgoyal@redhat.com> (raw)
Hi,
This patch series adds DAX support to virtiofs filesystem. This allows
bypassing guest page cache and allows mapping host page cache directly
in guest address space.
When a page of file is needed, guest sends a request to map that page
(in host page cache) in qemu address space. Inside guest this is
a physical memory range controlled by virtiofs device. And guest
directly maps this physical address range using DAX and hence gets
access to file data on host.
This can speed up things considerably in many situations. Also this
can result in substantial memory savings as file data does not have
to be copied in guest and it is directly accessed from host page
cache.
Most of the changes are limited to fuse/virtiofs. There are couple
of changes needed in generic dax infrastructure and couple of changes
in virtio to be able to access shared memory region.
These patches apply on top of 5.6-rc4 and are also available here.
https://github.com/rhvgoyal/linux/commits/vivek-04-march-2020
Any review or feedback is welcome.
Performance
===========
I have basically run bunch of fio jobs to get a sense of speed of
various operations. I wrote a simple wrapper script to run fio jobs
3 times and take their average and report it. These scripts and fio
jobs are available here.
https://github.com/rhvgoyal/virtiofs-tests
I set up a directory on ramfs on host and exported that directory inside
guest using virtio-fs and ran tests inside guests. Ran tests with
cache=none both with dax enabled and disabled. cache=none option
enforces no caching happens in guest both for data and metadata.
Test Setup
-----------
- A fedora 29 host with 376Gi RAM, 2 sockets (20 cores per socket, 2
threads per core)
- Using ramfs on host as backing store. 4 fio files of 8G each.
- Created a VM with 64 VCPUS and 64GB memory. An 64GB cache window (for dax
mmap).
Test Results
------------
- Results in two configurations have been reported.
virtio-fs (cache=none) and virtio-fs (cache=none + dax).
There are other caching modes as well but to me cache=none seemed most
interesting for now because it does not cache anything in guest
and provides strong coherence. Other modes which provide less strong
coherence and hence are faster are yet to be benchmarked.
- Three fio ioengines psync, libaio and mmap have been used.
- I/O Workload of randread, radwrite, seqread and seqwrite have been run.
- Each file size is 8G. Block size 4K. iodepth=16
- "multi" means same operation was done with 4 jobs and each job is
operating on a file of size 8G.
- Some results are "0 (KiB/s)". That means that particular operation is
not supported in that configuration.
NAME I/O Operation BW(Read/Write)
virtiofs-cache-none seqread-psync 35(MiB/s)
virtiofs-cache-none-dax seqread-psync 643(MiB/s)
virtiofs-cache-none seqread-psync-multi 219(MiB/s)
virtiofs-cache-none-dax seqread-psync-multi 2132(MiB/s)
virtiofs-cache-none seqread-mmap 0(KiB/s)
virtiofs-cache-none-dax seqread-mmap 741(MiB/s)
virtiofs-cache-none seqread-mmap-multi 0(KiB/s)
virtiofs-cache-none-dax seqread-mmap-multi 2530(MiB/s)
virtiofs-cache-none seqread-libaio 293(MiB/s)
virtiofs-cache-none-dax seqread-libaio 425(MiB/s)
virtiofs-cache-none seqread-libaio-multi 207(MiB/s)
virtiofs-cache-none-dax seqread-libaio-multi 1543(MiB/s)
virtiofs-cache-none randread-psync 36(MiB/s)
virtiofs-cache-none-dax randread-psync 572(MiB/s)
virtiofs-cache-none randread-psync-multi 211(MiB/s)
virtiofs-cache-none-dax randread-psync-multi 1764(MiB/s)
virtiofs-cache-none randread-mmap 0(KiB/s)
virtiofs-cache-none-dax randread-mmap 719(MiB/s)
virtiofs-cache-none randread-mmap-multi 0(KiB/s)
virtiofs-cache-none-dax randread-mmap-multi 2005(MiB/s)
virtiofs-cache-none randread-libaio 300(MiB/s)
virtiofs-cache-none-dax randread-libaio 413(MiB/s)
virtiofs-cache-none randread-libaio-multi 327(MiB/s)
virtiofs-cache-none-dax randread-libaio-multi 1326(MiB/s)
virtiofs-cache-none seqwrite-psync 34(MiB/s)
virtiofs-cache-none-dax seqwrite-psync 494(MiB/s)
virtiofs-cache-none seqwrite-psync-multi 223(MiB/s)
virtiofs-cache-none-dax seqwrite-psync-multi 1680(MiB/s)
virtiofs-cache-none seqwrite-mmap 0(KiB/s)
virtiofs-cache-none-dax seqwrite-mmap 1217(MiB/s)
virtiofs-cache-none seqwrite-mmap-multi 0(KiB/s)
virtiofs-cache-none-dax seqwrite-mmap-multi 2359(MiB/s)
virtiofs-cache-none seqwrite-libaio 282(MiB/s)
virtiofs-cache-none-dax seqwrite-libaio 348(MiB/s)
virtiofs-cache-none seqwrite-libaio-multi 320(MiB/s)
virtiofs-cache-none-dax seqwrite-libaio-multi 1255(MiB/s)
virtiofs-cache-none randwrite-psync 32(MiB/s)
virtiofs-cache-none-dax randwrite-psync 458(MiB/s)
virtiofs-cache-none randwrite-psync-multi 213(MiB/s)
virtiofs-cache-none-dax randwrite-psync-multi 1343(MiB/s)
virtiofs-cache-none randwrite-mmap 0(KiB/s)
virtiofs-cache-none-dax randwrite-mmap 663(MiB/s)
virtiofs-cache-none randwrite-mmap-multi 0(KiB/s)
virtiofs-cache-none-dax randwrite-mmap-multi 1820(MiB/s)
virtiofs-cache-none randwrite-libaio 292(MiB/s)
virtiofs-cache-none-dax randwrite-libaio 341(MiB/s)
virtiofs-cache-none randwrite-libaio-multi 322(MiB/s)
virtiofs-cache-none-dax randwrite-libaio-multi 1094(MiB/s)
Conclusion
===========
- virtio-fs with dax enabled is significantly faster and memory
effiecient as comapred to non-dax operation.
Note:
Right now dax window is 64G and max fio file size is 32G as well (4
files of 8G each). That means everything fits into dax window and no
reclaim is needed. Dax window reclaim logic is slower and if file
size is bigger than dax window size, performance slows down.
Thanks
Vivek
Sebastien Boeuf (3):
virtio: Add get_shm_region method
virtio: Implement get_shm_region for PCI transport
virtio: Implement get_shm_region for MMIO transport
Stefan Hajnoczi (2):
virtio_fs, dax: Set up virtio_fs dax_device
fuse,dax: add DAX mmap support
Vivek Goyal (15):
dax: Modify bdev_dax_pgoff() to handle NULL bdev
dax: Create a range version of dax_layout_busy_page()
virtiofs: Provide a helper function for virtqueue initialization
fuse: Get rid of no_mount_options
fuse,virtiofs: Add a mount option to enable dax
fuse,virtiofs: Keep a list of free dax memory ranges
fuse: implement FUSE_INIT map_alignment field
fuse: Introduce setupmapping/removemapping commands
fuse, dax: Implement dax read/write operations
fuse, dax: Take ->i_mmap_sem lock during dax page fault
fuse,virtiofs: Define dax address space operations
fuse,virtiofs: Maintain a list of busy elements
fuse: Release file in process context
fuse: Take inode lock for dax inode truncation
fuse,virtiofs: Add logic to free up a memory range
drivers/dax/super.c | 3 +-
drivers/virtio/virtio_mmio.c | 32 +
drivers/virtio/virtio_pci_modern.c | 107 +++
fs/dax.c | 66 +-
fs/fuse/dir.c | 2 +
fs/fuse/file.c | 1162 +++++++++++++++++++++++++++-
fs/fuse/fuse_i.h | 109 ++-
fs/fuse/inode.c | 148 +++-
fs/fuse/virtio_fs.c | 250 +++++-
include/linux/dax.h | 6 +
include/linux/virtio_config.h | 17 +
include/uapi/linux/fuse.h | 42 +-
include/uapi/linux/virtio_fs.h | 3 +
include/uapi/linux/virtio_mmio.h | 11 +
include/uapi/linux/virtio_pci.h | 11 +-
15 files changed, 1888 insertions(+), 81 deletions(-)
--
2.20.1
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org
next reply other threads:[~2020-03-04 16:59 UTC|newest]
Thread overview: 67+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-03-04 16:58 Vivek Goyal [this message]
2020-03-04 16:58 ` [PATCH 01/20] dax: Modify bdev_dax_pgoff() to handle NULL bdev Vivek Goyal
2020-03-04 16:58 ` [PATCH 02/20] dax: Create a range version of dax_layout_busy_page() Vivek Goyal
2020-03-10 15:19 ` Ira Weiny
2020-03-10 20:29 ` Vivek Goyal
2020-03-04 16:58 ` [PATCH 03/20] virtio: Add get_shm_region method Vivek Goyal
2020-03-10 10:53 ` Stefan Hajnoczi
2020-03-04 16:58 ` [PATCH 04/20] virtio: Implement get_shm_region for PCI transport Vivek Goyal
2020-03-10 11:04 ` Stefan Hajnoczi
2020-03-10 18:19 ` Vivek Goyal
2020-03-11 17:34 ` Stefan Hajnoczi
2020-03-11 19:29 ` Vivek Goyal
2020-03-10 11:12 ` Michael S. Tsirkin
2020-03-10 18:47 ` Vivek Goyal
2020-03-10 21:27 ` Michael S. Tsirkin
2020-03-04 16:58 ` [PATCH 05/20] virtio: Implement get_shm_region for MMIO transport Vivek Goyal
2020-03-10 11:06 ` Stefan Hajnoczi
2020-03-04 16:58 ` [PATCH 06/20] virtiofs: Provide a helper function for virtqueue initialization Vivek Goyal
2020-03-10 14:10 ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 07/20] fuse: Get rid of no_mount_options Vivek Goyal
2020-03-10 14:12 ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 08/20] fuse,virtiofs: Add a mount option to enable dax Vivek Goyal
2020-03-10 14:16 ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 09/20] virtio_fs, dax: Set up virtio_fs dax_device Vivek Goyal
2020-03-04 16:58 ` [PATCH 10/20] fuse,virtiofs: Keep a list of free dax memory ranges Vivek Goyal
2020-03-10 19:29 ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 11/20] fuse: implement FUSE_INIT map_alignment field Vivek Goyal
2020-03-10 19:31 ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 12/20] fuse: Introduce setupmapping/removemapping commands Vivek Goyal
2020-03-10 19:49 ` Miklos Szeredi
2020-03-10 20:33 ` Vivek Goyal
2020-03-11 7:03 ` Amir Goldstein
2020-03-11 14:19 ` Miklos Szeredi
2020-03-11 14:41 ` Vivek Goyal
2020-03-11 15:12 ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 13/20] fuse, dax: Implement dax read/write operations Vivek Goyal
2020-03-12 9:43 ` Miklos Szeredi
2020-03-12 16:02 ` Vivek Goyal
2020-03-13 10:18 ` Miklos Szeredi
2020-03-13 13:41 ` Vivek Goyal
2020-04-04 0:25 ` Liu Bo
2020-04-14 12:54 ` Vivek Goyal
2020-03-04 16:58 ` [PATCH 14/20] fuse,dax: add DAX mmap support Vivek Goyal
2020-03-04 16:58 ` [PATCH 15/20] fuse, dax: Take ->i_mmap_sem lock during dax page fault Vivek Goyal
2020-03-04 16:58 ` [PATCH 16/20] fuse,virtiofs: Define dax address space operations Vivek Goyal
2020-03-04 16:58 ` [PATCH 17/20] fuse,virtiofs: Maintain a list of busy elements Vivek Goyal
2020-03-04 16:58 ` [PATCH 18/20] fuse: Release file in process context Vivek Goyal
2020-03-04 16:58 ` [PATCH 19/20] fuse: Take inode lock for dax inode truncation Vivek Goyal
2020-03-04 16:58 ` [PATCH 20/20] fuse,virtiofs: Add logic to free up a memory range Vivek Goyal
2020-03-11 5:16 ` Liu Bo
2020-03-11 12:59 ` Vivek Goyal
2020-03-11 17:24 ` Liu Bo
2020-03-26 0:09 ` Liu Bo
2020-03-27 14:01 ` Vivek Goyal
2020-03-27 22:06 ` Liu Bo
2020-04-14 19:30 ` Vivek Goyal
2020-04-15 17:22 ` Liu Bo
2020-04-16 19:05 ` Vivek Goyal
2020-04-17 18:05 ` Liu Bo
2020-03-11 5:22 ` [PATCH 00/20] virtiofs: Add DAX support Amir Goldstein
2020-03-11 13:09 ` Vivek Goyal
2020-03-11 18:48 ` Vivek Goyal
2020-03-11 19:32 ` Amir Goldstein
2020-03-11 19:39 ` Vivek Goyal
2020-03-11 13:38 ` Patrick Ohly
2020-03-16 13:02 ` Vivek Goyal
2020-03-17 8:28 ` Patrick Ohly
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200304165845.3081-1-vgoyal@redhat.com \
--to=vgoyal@redhat.com \
--cc=dgilbert@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@lists.01.org \
--cc=miklos@szeredi.hu \
--cc=mst@redhat.com \
--cc=stefanha@redhat.com \
--cc=virtio-fs@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).