Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v3 0/5] kvm "virtio pmem" device
@ 2019-01-09 14:47 Pankaj Gupta
  2019-01-09 14:47 ` [PATCH v3 1/5] libnvdimm: nd_region flush callback support Pankaj Gupta
                   ` (6 more replies)
  0 siblings, 7 replies; 61+ messages in thread
From: Pankaj Gupta @ 2019-01-09 14:47 UTC (permalink / raw)
  To: linux-kernel, kvm, qemu-devel, linux-nvdimm, linux-fsdevel,
	virtualization, linux-acpi, linux-ext4, linux-xfs
  Cc: jack, stefanha, dan.j.williams, riel, nilal, kwolf, pbonzini,
	zwisler, vishal.l.verma, dave.jiang, david, jmoyer,
	xiaoguangrong.eric, hch, mst, jasowang, lcapitulino, imammedo,
	eblake, willy, tytso, adilger.kernel, darrick.wong, rjw, pagupta

 This patch series has implementation for "virtio pmem". 
 "virtio pmem" is fake persistent memory(nvdimm) in guest 
 which allows to bypass the guest page cache. This also
 implements a VIRTIO based asynchronous flush mechanism.  
 
 Sharing guest kernel driver in this patchset with the 
 changes suggested in v2. Tested with Qemu side device 
 emulation for virtio-pmem [6]. 
 
 Details of project idea for 'virtio pmem' flushing interface 
 is shared [3] & [4].

 Implementation is divided into two parts:
 New virtio pmem guest driver and qemu code changes for new 
 virtio pmem paravirtualized device.

1. Guest virtio-pmem kernel driver
---------------------------------
   - Reads persistent memory range from paravirt device and 
     registers with 'nvdimm_bus'.  
   - 'nvdimm/pmem' driver uses this information to allocate 
     persistent memory region and setup filesystem operations 
     to the allocated memory. 
   - virtio pmem driver implements asynchronous flushing 
     interface to flush from guest to host.

2. Qemu virtio-pmem device
---------------------------------
   - Creates virtio pmem device and exposes a memory range to 
     KVM guest. 
   - At host side this is file backed memory which acts as 
     persistent memory. 
   - Qemu side flush uses aio thread pool API's and virtio 
     for asynchronous guest multi request handling. 

   David Hildenbrand CCed also posted a modified version[6] of 
   qemu virtio-pmem code based on updated Qemu memory device API. 

 Virtio-pmem errors handling:
 ----------------------------------------
  Checked behaviour of virtio-pmem for below types of errors
  Need suggestions on expected behaviour for handling these errors?

  - Hardware Errors: Uncorrectable recoverable Errors: 
  a] virtio-pmem: 
    - As per current logic if error page belongs to Qemu process, 
      host MCE handler isolates(hwpoison) that page and send SIGBUS. 
      Qemu SIGBUS handler injects exception to KVM guest. 
    - KVM guest then isolates the page and send SIGBUS to guest 
      userspace process which has mapped the page. 
  
  b] Existing implementation for ACPI pmem driver: 
    - Handles such errors with MCE notifier and creates a list 
      of bad blocks. Read/direct access DAX operation return EIO 
      if accessed memory page fall in bad block list.
    - It also starts backgound scrubbing.  
    - Similar functionality can be reused in virtio-pmem with MCE 
      notifier but without scrubbing(no ACPI/ARS)? Need inputs to 
      confirm if this behaviour is ok or needs any change?

Changes from PATCH v2: [1]
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: [2]
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Changes from RFC v3
- Rebase to latest upstream - Luiz
- Call ndregion->flush in place of nvdimm_flush- Luiz
- kmalloc return check - Luiz
- virtqueue full handling - Stefan
- Don't map entire virtio_pmem_req to device - Stefan
- request leak, correct sizeof req- Stefan
- Move declaration to virtio_pmem.c

Changes from RFC v2:
- Add flush function in the nd_region in place of switching
  on a flag - Dan & Stefan
- Add flush completion function with proper locking and wait
  for host side flush completion - Stefan & Dan
- Keep userspace API in uapi header file - Stefan, MST
- Use LE fields & New device id - MST
- Indentation & spacing suggestions - MST & Eric
- Remove extra header files & add licensing - Stefan

Changes from RFC v1:
- Reuse existing 'pmem' code for registering persistent 
  memory and other operations instead of creating an entirely 
  new block driver.
- Use VIRTIO driver to register memory information with 
  nvdimm_bus and create region_type accordingly. 
- Call VIRTIO flush from existing pmem driver.

Pankaj Gupta (5):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[2] https://lkml.org/lkml/2018/8/31/407
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=qemu-devel&m=153555721901824&w=2

 drivers/acpi/nfit/core.c         |    4 -
 drivers/dax/super.c              |   17 +++++
 drivers/nvdimm/claim.c           |    6 +
 drivers/nvdimm/nd.h              |    1 
 drivers/nvdimm/pmem.c            |   15 +++-
 drivers/nvdimm/region_devs.c     |   45 +++++++++++++-
 drivers/nvdimm/virtio_pmem.c     |   84 ++++++++++++++++++++++++++
 drivers/virtio/Kconfig           |   10 +++
 drivers/virtio/Makefile          |    1 
 drivers/virtio/pmem.c            |  125 +++++++++++++++++++++++++++++++++++++++
 fs/ext4/file.c                   |   11 +++
 fs/xfs/xfs_file.c                |    8 ++
 include/linux/dax.h              |    9 ++
 include/linux/libnvdimm.h        |   11 +++
 include/linux/virtio_pmem.h      |   60 ++++++++++++++++++
 include/uapi/linux/virtio_ids.h  |    1 
 include/uapi/linux/virtio_pmem.h |   10 +++
 17 files changed, 406 insertions(+), 12 deletions(-)

^ permalink raw reply	[flat|nested] 61+ messages in thread

end of thread, back to index

Thread overview: 61+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-09 14:47 [PATCH v3 0/5] kvm "virtio pmem" device Pankaj Gupta
2019-01-09 14:47 ` [PATCH v3 1/5] libnvdimm: nd_region flush callback support Pankaj Gupta
2019-01-09 14:47 ` [PATCH v3 2/5] virtio-pmem: Add virtio pmem driver Pankaj Gupta
2019-01-14 15:54   ` Michael S. Tsirkin
2019-01-14 15:54     ` Michael S. Tsirkin
     [not found]     ` <20190114105314-mutt-send-email-mst-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2019-01-15  6:33       ` Pankaj Gupta
2019-01-15  6:33         ` Pankaj Gupta
2019-01-09 14:47 ` [PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag Pankaj Gupta
2019-01-09 14:47 ` [PATCH v3 4/5] ext4: disable map_sync for virtio pmem Pankaj Gupta
2019-01-09 14:47 ` [PATCH v3 5/5] xfs: " Pankaj Gupta
2019-01-09 14:47   ` Pankaj Gupta
2019-01-09 16:26   ` Darrick J. Wong
2019-01-09 18:08     ` Pankaj Gupta
     [not found] ` <20190109144736.17452-1-pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-10  1:26   ` [PATCH v3 0/5] kvm "virtio pmem" device Dave Chinner
2019-01-10  1:26     ` Dave Chinner
2019-01-10  2:40     ` Rik van Riel
2019-01-10  2:40       ` Rik van Riel
2019-01-10 10:17     ` Jan Kara
2019-01-10 10:17       ` Jan Kara
     [not found]       ` <20190110101757.GC15790-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2019-01-13  1:38         ` Pankaj Gupta
2019-01-13  1:38           ` Pankaj Gupta
     [not found]           ` <1354249849.63357171.1547343519970.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-13  1:43             ` Dan Williams
2019-01-13  1:43               ` Dan Williams
     [not found]               ` <CAPcyv4hwcgTUpgNCefCGu4DvgkYBp5b=f+hJ+FC=s5APYKoycg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-01-13  2:17                 ` [Qemu-devel] " Pankaj Gupta
2019-01-13  2:17                   ` Pankaj Gupta
     [not found]                   ` <540171952.63371441.1547345866585.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-14  9:55                     ` Jan Kara
2019-01-14  9:55                       ` Jan Kara
     [not found]                       ` <20190114095520.GC13316-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2019-01-14 10:16                         ` Pankaj Gupta
2019-01-14 10:16                           ` Pankaj Gupta
2019-01-11  7:45     ` Pankaj Gupta
2019-01-11  7:45       ` Pankaj Gupta
     [not found]       ` <1326478078.61913951.1547192704870.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-13 23:29         ` Dave Chinner
2019-01-13 23:29           ` Dave Chinner
2019-01-13 23:38           ` Matthew Wilcox
2019-01-13 23:38             ` Matthew Wilcox
     [not found]             ` <20190113233820.GX6310-PfSpb0PWhxZc2C7mugBRk2EX/6BAtgUQ@public.gmane.org>
2019-01-14  2:50               ` Dave Chinner
2019-01-14  2:50                 ` Dave Chinner
2019-01-14  7:15               ` Pankaj Gupta
2019-01-14  7:15                 ` Pankaj Gupta
     [not found]                 ` <942065073.64011540.1547450140670.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-14 21:25                   ` Dave Chinner
2019-01-14 21:25                     ` Dave Chinner
2019-01-14 21:35                     ` Dan Williams
2019-01-14 21:35                       ` Dan Williams
     [not found]                       ` <CAPcyv4jtPcLV-s0sKNHwwk0ug7GLBV6699dpm1h3r2xSo879dg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-01-14 22:21                         ` Dave Chinner
2019-01-14 22:21                           ` Dave Chinner
2019-01-15  2:19                           ` Michael S. Tsirkin
2019-01-15  2:19                             ` Michael S. Tsirkin
     [not found]                             ` <20190114205031-mutt-send-email-mst-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2019-01-15  5:37                               ` [Qemu-devel] " Pankaj Gupta
2019-01-15  5:37                                 ` Pankaj Gupta
2019-01-15  5:35                           ` Pankaj Gupta
2019-01-15  5:35                             ` Pankaj Gupta
     [not found]                             ` <1684638419.64320214.1547530506805.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-15 20:42                               ` Dave Chinner
2019-01-15 20:42                                 ` Dave Chinner
2019-02-04 22:56 ` security implications of caching with virtio pmem (was Re: [PATCH v3 0/5] kvm "virtio pmem" device) Michael S. Tsirkin
2019-02-05  7:29   ` [Qemu-devel] " Pankaj Gupta
2019-02-06 14:00   ` David Hildenbrand
2019-02-06 18:01     ` Michael S. Tsirkin
2019-02-11  7:29   ` [Qemu-devel] " Pankaj Gupta
2019-02-11 22:29     ` Dave Chinner
2019-02-11 22:58       ` David Hildenbrand
2019-02-11 23:07         ` Michael S. Tsirkin

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox