From: Pankaj Gupta <pagupta@redhat.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
qemu-devel@nongnu.org, linux-nvdimm@ml01.01.org,
linux-fsdevel@vger.kernel.org,
virtualization@lists.linux-foundation.org,
linux-acpi@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-xfs@vger.kernel.org
Cc: jack@suse.cz, stefanha@redhat.com, dan.j.williams@intel.com,
riel@surriel.com, nilal@redhat.com, kwolf@redhat.com,
pbonzini@redhat.com, zwisler@kernel.org,
vishal.l.verma@intel.com, dave.jiang@intel.com, david@redhat.com,
jmoyer@redhat.com, xiaoguangrong.eric@gmail.com,
hch@infradead.org, mst@redhat.com, jasowang@redhat.com,
lcapitulino@redhat.com, imammedo@redhat.com, eblake@redhat.com,
willy@infradead.org, tytso@mit.edu, adilger.kernel@dilger.ca,
darrick.wong@oracle.com, rjw@rjwysocki.net, pagupta@redhat.com
Subject: [PATCH v3 0/5] kvm "virtio pmem" device
Date: Wed, 9 Jan 2019 19:20:19 +0530 [thread overview]
Message-ID: <20190109135024.14093-1-pagupta@redhat.com> (raw)
This patch series has implementation for "virtio pmem".
"virtio pmem" is fake persistent memory(nvdimm) in guest
which allows to bypass the guest page cache. This also
implements a VIRTIO based asynchronous flush mechanism.
Sharing guest kernel driver in this patchset with the
changes suggested in v2. Tested with Qemu side device
emulation for virtio-pmem [6].
Details of project idea for 'virtio pmem' flushing interface
is shared [3] & [4].
Implementation is divided into two parts:
New virtio pmem guest driver and qemu code changes for new
virtio pmem paravirtualized device.
1. Guest virtio-pmem kernel driver
---------------------------------
- Reads persistent memory range from paravirt device and
registers with 'nvdimm_bus'.
- 'nvdimm/pmem' driver uses this information to allocate
persistent memory region and setup filesystem operations
to the allocated memory.
- virtio pmem driver implements asynchronous flushing
interface to flush from guest to host.
2. Qemu virtio-pmem device
---------------------------------
- Creates virtio pmem device and exposes a memory range to
KVM guest.
- At host side this is file backed memory which acts as
persistent memory.
- Qemu side flush uses aio thread pool API's and virtio
for asynchronous guest multi request handling.
David Hildenbrand CCed also posted a modified version[6] of
qemu virtio-pmem code based on updated Qemu memory device API.
Virtio-pmem errors handling:
----------------------------------------
Checked behaviour of virtio-pmem for below types of errors
Need suggestions on expected behaviour for handling these errors?
- Hardware Errors: Uncorrectable recoverable Errors:
a] virtio-pmem:
- As per current logic if error page belongs to Qemu process,
host MCE handler isolates(hwpoison) that page and send SIGBUS.
Qemu SIGBUS handler injects exception to KVM guest.
- KVM guest then isolates the page and send SIGBUS to guest
userspace process which has mapped the page.
b] Existing implementation for ACPI pmem driver:
- Handles such errors with MCE notifier and creates a list
of bad blocks. Read/direct access DAX operation return EIO
if accessed memory page fall in bad block list.
- It also starts backgound scrubbing.
- Similar functionality can be reused in virtio-pmem with MCE
notifier but without scrubbing(no ACPI/ARS)? Need inputs to
confirm if this behaviour is ok or needs any change?
Changes from PATCH v2: [1]
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan]
- Use name 'virtio pmem' in place of 'fake dax'
Changes from PATCH v1: [2]
- 0-day build test for build dependency on libnvdimm
Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text
- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow
Changes from RFC v3
- Rebase to latest upstream - Luiz
- Call ndregion->flush in place of nvdimm_flush- Luiz
- kmalloc return check - Luiz
- virtqueue full handling - Stefan
- Don't map entire virtio_pmem_req to device - Stefan
- request leak, correct sizeof req- Stefan
- Move declaration to virtio_pmem.c
Changes from RFC v2:
- Add flush function in the nd_region in place of switching
on a flag - Dan & Stefan
- Add flush completion function with proper locking and wait
for host side flush completion - Stefan & Dan
- Keep userspace API in uapi header file - Stefan, MST
- Use LE fields & New device id - MST
- Indentation & spacing suggestions - MST & Eric
- Remove extra header files & add licensing - Stefan
Changes from RFC v1:
- Reuse existing 'pmem' code for registering persistent
memory and other operations instead of creating an entirely
new block driver.
- Use VIRTIO driver to register memory information with
nvdimm_bus and create region_type accordingly.
- Call VIRTIO flush from existing pmem driver.
Pankaj Gupta (5):
libnvdimm: nd_region flush callback support
virtio-pmem: Add virtio-pmem guest driver
libnvdimm: add nd_region buffered dax_dev flag
ext4: disable map_sync for virtio pmem
xfs: disable map_sync for virtio pmem
[2] https://lkml.org/lkml/2018/8/31/407
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=qemu-devel&m=153555721901824&w=2
drivers/acpi/nfit/core.c | 4 -
drivers/dax/super.c | 17 +++++
drivers/nvdimm/claim.c | 6 +
drivers/nvdimm/nd.h | 1
drivers/nvdimm/pmem.c | 15 +++-
drivers/nvdimm/region_devs.c | 45 +++++++++++++-
drivers/nvdimm/virtio_pmem.c | 84 ++++++++++++++++++++++++++
drivers/virtio/Kconfig | 10 +++
drivers/virtio/Makefile | 1
drivers/virtio/pmem.c | 125 +++++++++++++++++++++++++++++++++++++++
fs/ext4/file.c | 11 +++
fs/xfs/xfs_file.c | 8 ++
include/linux/dax.h | 9 ++
include/linux/libnvdimm.h | 11 +++
include/linux/virtio_pmem.h | 60 ++++++++++++++++++
include/uapi/linux/virtio_ids.h | 1
include/uapi/linux/virtio_pmem.h | 10 +++
17 files changed, 406 insertions(+), 12 deletions(-)
next reply other threads:[~2019-01-09 13:51 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-09 13:50 Pankaj Gupta [this message]
2019-01-09 13:50 ` [PATCH v3 1/5] libnvdimm: nd_region flush callback support Pankaj Gupta
2019-01-09 13:50 ` [PATCH v3 2/5] virtio-pmem: Add virtio pmem driver Pankaj Gupta
2019-01-09 13:50 ` [PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag Pankaj Gupta
2019-01-09 17:02 ` Dan Williams
2019-01-09 18:21 ` Pankaj Gupta
2019-01-09 14:46 ` [Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device Pankaj Gupta
2019-01-09 14:47 Pankaj Gupta
[not found] ` <20190109144736.17452-1-pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-10 1:26 ` Dave Chinner
2019-01-10 1:26 ` Dave Chinner
2019-01-10 2:40 ` Rik van Riel
2019-01-10 2:40 ` Rik van Riel
2019-01-10 10:17 ` Jan Kara
2019-01-10 10:17 ` Jan Kara
[not found] ` <20190110101757.GC15790-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2019-01-13 1:38 ` Pankaj Gupta
2019-01-13 1:38 ` Pankaj Gupta
[not found] ` <1354249849.63357171.1547343519970.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-13 1:43 ` Dan Williams
2019-01-13 1:43 ` Dan Williams
2019-01-11 7:45 ` Pankaj Gupta
2019-01-11 7:45 ` Pankaj Gupta
[not found] ` <1326478078.61913951.1547192704870.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-13 23:29 ` Dave Chinner
2019-01-13 23:29 ` Dave Chinner
2019-01-13 23:38 ` Matthew Wilcox
2019-01-13 23:38 ` Matthew Wilcox
[not found] ` <20190113233820.GX6310-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
2019-01-14 2:50 ` Dave Chinner
2019-01-14 2:50 ` Dave Chinner
2019-01-14 7:15 ` Pankaj Gupta
2019-01-14 7:15 ` Pankaj Gupta
[not found] ` <942065073.64011540.1547450140670.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-14 21:25 ` Dave Chinner
2019-01-14 21:25 ` Dave Chinner
2019-01-14 21:35 ` Dan Williams
2019-01-14 21:35 ` Dan Williams
[not found] ` <CAPcyv4jtPcLV-s0sKNHwwk0ug7GLBV6699dpm1h3r2xSo879dg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-01-14 22:21 ` Dave Chinner
2019-01-14 22:21 ` Dave Chinner
2019-01-15 2:19 ` Michael S. Tsirkin
2019-01-15 2:19 ` Michael S. Tsirkin
2019-01-15 5:35 ` Pankaj Gupta
2019-01-15 5:35 ` Pankaj Gupta
[not found] ` <1684638419.64320214.1547530506805.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-15 20:42 ` Dave Chinner
2019-01-15 20:42 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190109135024.14093-1-pagupta@redhat.com \
--to=pagupta@redhat.com \
--cc=adilger.kernel@dilger.ca \
--cc=dan.j.williams@intel.com \
--cc=darrick.wong@oracle.com \
--cc=dave.jiang@intel.com \
--cc=david@redhat.com \
--cc=eblake@redhat.com \
--cc=hch@infradead.org \
--cc=imammedo@redhat.com \
--cc=jack@suse.cz \
--cc=jasowang@redhat.com \
--cc=jmoyer@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=kwolf@redhat.com \
--cc=lcapitulino@redhat.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvdimm@ml01.01.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mst@redhat.com \
--cc=nilal@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=riel@surriel.com \
--cc=rjw@rjwysocki.net \
--cc=stefanha@redhat.com \
--cc=tytso@mit.edu \
--cc=virtualization@lists.linux-foundation.org \
--cc=vishal.l.verma@intel.com \
--cc=willy@infradead.org \
--cc=xiaoguangrong.eric@gmail.com \
--cc=zwisler@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).