Linux-Fsdevel Archive on lore.kernel.org
 help / color / Atom feed
From: Pankaj Gupta <pagupta@redhat.com>
To: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
	qemu-devel@nongnu.org, linux-nvdimm@ml01.01.org,
	linux-fsdevel@vger.kernel.org,
	virtualization@lists.linux-foundation.org,
	linux-acpi@vger.kernel.org, linux-ext4@vger.kernel.org,
	linux-xfs@vger.kernel.org
Cc: jack@suse.cz, david@redhat.com, jasowang@redhat.com,
	lcapitulino@redhat.com, adilger kernel <adilger.kernel@dilger.ca>,
	zwisler@kernel.org, dave jiang <dave.jiang@intel.com>,
	darrick wong <darrick.wong@oracle.com>,
	vishal l verma <vishal.l.verma@intel.com>,
	mst@redhat.com, willy@infradead.org, hch@infradead.org,
	jmoyer@redhat.com, nilal@redhat.com, riel@surriel.com,
	stefanha@redhat.com, imammedo@redhat.com,
	dan j williams <dan.j.williams@intel.com>,
	kwolf@redhat.com, tytso@mit.edu,
	xiaoguangrong eric <xiaoguangrong.eric@gmail.com>,
	rjw@rjwysocki.net, pbonzini@redhat.com
Subject: Re: [Qemu-devel] [PATCH v3 0/5] kvm "virtio pmem" device
Date: Wed, 9 Jan 2019 09:46:08 -0500 (EST)
Message-ID: <1814830087.61221572.1547045168645.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <20190109135024.14093-1-pagupta@redhat.com>



Please ignore this series as my network went down while 
sending this. I will send this series again.

Thanks,
Pankaj  

> 
> This patch series has implementation for "virtio pmem".
>  "virtio pmem" is fake persistent memory(nvdimm) in guest
>  which allows to bypass the guest page cache. This also
>  implements a VIRTIO based asynchronous flush mechanism.
>  
>  Sharing guest kernel driver in this patchset with the
>  changes suggested in v2. Tested with Qemu side device
>  emulation for virtio-pmem [6].
>  
>  Details of project idea for 'virtio pmem' flushing interface
>  is shared [3] & [4].
> 
>  Implementation is divided into two parts:
>  New virtio pmem guest driver and qemu code changes for new
>  virtio pmem paravirtualized device.
> 
> 1. Guest virtio-pmem kernel driver
> ---------------------------------
>    - Reads persistent memory range from paravirt device and
>      registers with 'nvdimm_bus'.
>    - 'nvdimm/pmem' driver uses this information to allocate
>      persistent memory region and setup filesystem operations
>      to the allocated memory.
>    - virtio pmem driver implements asynchronous flushing
>      interface to flush from guest to host.
> 
> 2. Qemu virtio-pmem device
> ---------------------------------
>    - Creates virtio pmem device and exposes a memory range to
>      KVM guest.
>    - At host side this is file backed memory which acts as
>      persistent memory.
>    - Qemu side flush uses aio thread pool API's and virtio
>      for asynchronous guest multi request handling.
> 
>    David Hildenbrand CCed also posted a modified version[6] of
>    qemu virtio-pmem code based on updated Qemu memory device API.
> 
>  Virtio-pmem errors handling:
>  ----------------------------------------
>   Checked behaviour of virtio-pmem for below types of errors
>   Need suggestions on expected behaviour for handling these errors?
> 
>   - Hardware Errors: Uncorrectable recoverable Errors:
>   a] virtio-pmem:
>     - As per current logic if error page belongs to Qemu process,
>       host MCE handler isolates(hwpoison) that page and send SIGBUS.
>       Qemu SIGBUS handler injects exception to KVM guest.
>     - KVM guest then isolates the page and send SIGBUS to guest
>       userspace process which has mapped the page.
>   
>   b] Existing implementation for ACPI pmem driver:
>     - Handles such errors with MCE notifier and creates a list
>       of bad blocks. Read/direct access DAX operation return EIO
>       if accessed memory page fall in bad block list.
>     - It also starts backgound scrubbing.
>     - Similar functionality can be reused in virtio-pmem with MCE
>       notifier but without scrubbing(no ACPI/ARS)? Need inputs to
>       confirm if this behaviour is ok or needs any change?
> 
> Changes from PATCH v2: [1]
> - Disable MAP_SYNC for ext4 & XFS filesystems - [Dan]
> - Use name 'virtio pmem' in place of 'fake dax'
> 
> Changes from PATCH v1: [2]
> - 0-day build test for build dependency on libnvdimm
> 
>  Changes suggested by - [Dan Williams]
> - Split the driver into two parts virtio & pmem
> - Move queuing of async block request to block layer
> - Add "sync" parameter in nvdimm_flush function
> - Use indirect call for nvdimm_flush
> - Don’t move declarations to common global header e.g nd.h
> - nvdimm_flush() return 0 or -EIO if it fails
> - Teach nsio_rw_bytes() that the flush can fail
> - Rename nvdimm_flush() to generic_nvdimm_flush()
> - Use 'nd_region->provider_data' for long dereferencing
> - Remove virtio_pmem_freeze/restore functions
> - Remove BSD license text with SPDX license text
> 
> - Add might_sleep() in virtio_pmem_flush - [Luiz]
> - Make spin_lock_irqsave() narrow
> 
> Changes from RFC v3
> - Rebase to latest upstream - Luiz
> - Call ndregion->flush in place of nvdimm_flush- Luiz
> - kmalloc return check - Luiz
> - virtqueue full handling - Stefan
> - Don't map entire virtio_pmem_req to device - Stefan
> - request leak, correct sizeof req- Stefan
> - Move declaration to virtio_pmem.c
> 
> Changes from RFC v2:
> - Add flush function in the nd_region in place of switching
>   on a flag - Dan & Stefan
> - Add flush completion function with proper locking and wait
>   for host side flush completion - Stefan & Dan
> - Keep userspace API in uapi header file - Stefan, MST
> - Use LE fields & New device id - MST
> - Indentation & spacing suggestions - MST & Eric
> - Remove extra header files & add licensing - Stefan
> 
> Changes from RFC v1:
> - Reuse existing 'pmem' code for registering persistent
>   memory and other operations instead of creating an entirely
>   new block driver.
> - Use VIRTIO driver to register memory information with
>   nvdimm_bus and create region_type accordingly.
> - Call VIRTIO flush from existing pmem driver.
> 
> Pankaj Gupta (5):
>    libnvdimm: nd_region flush callback support
>    virtio-pmem: Add virtio-pmem guest driver
>    libnvdimm: add nd_region buffered dax_dev flag
>    ext4: disable map_sync for virtio pmem
>    xfs: disable map_sync for virtio pmem
> 
> [2] https://lkml.org/lkml/2018/8/31/407
> [3] https://www.spinics.net/lists/kvm/msg149761.html
> [4] https://www.spinics.net/lists/kvm/msg153095.html
> [5] https://lkml.org/lkml/2018/8/31/413
> [6] https://marc.info/?l=qemu-devel&m=153555721901824&w=2
> 
>  drivers/acpi/nfit/core.c         |    4 -
>  drivers/dax/super.c              |   17 +++++
>  drivers/nvdimm/claim.c           |    6 +
>  drivers/nvdimm/nd.h              |    1
>  drivers/nvdimm/pmem.c            |   15 +++-
>  drivers/nvdimm/region_devs.c     |   45 +++++++++++++-
>  drivers/nvdimm/virtio_pmem.c     |   84 ++++++++++++++++++++++++++
>  drivers/virtio/Kconfig           |   10 +++
>  drivers/virtio/Makefile          |    1
>  drivers/virtio/pmem.c            |  125
>  +++++++++++++++++++++++++++++++++++++++
>  fs/ext4/file.c                   |   11 +++
>  fs/xfs/xfs_file.c                |    8 ++
>  include/linux/dax.h              |    9 ++
>  include/linux/libnvdimm.h        |   11 +++
>  include/linux/virtio_pmem.h      |   60 ++++++++++++++++++
>  include/uapi/linux/virtio_ids.h  |    1
>  include/uapi/linux/virtio_pmem.h |   10 +++
>  17 files changed, 406 insertions(+), 12 deletions(-)
> 
> 
> 

  parent reply index

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-09 13:50 Pankaj Gupta
2019-01-09 13:50 ` [PATCH v3 1/5] libnvdimm: nd_region flush callback support Pankaj Gupta
2019-01-09 13:50 ` [PATCH v3 2/5] virtio-pmem: Add virtio pmem driver Pankaj Gupta
2019-01-09 13:50 ` [PATCH v3 3/5] libnvdimm: add nd_region buffered dax_dev flag Pankaj Gupta
2019-01-09 17:02   ` Dan Williams
2019-01-09 18:21     ` Pankaj Gupta
2019-01-09 14:46 ` Pankaj Gupta [this message]
2019-01-09 14:47 [PATCH v3 0/5] kvm "virtio pmem" device Pankaj Gupta
2019-01-10  1:26 ` Dave Chinner
2019-01-10 10:17   ` Jan Kara
2019-01-13  1:38     ` Pankaj Gupta
2019-01-13  1:43       ` Dan Williams
     [not found]         ` <CAPcyv4hwcgTUpgNCefCGu4DvgkYBp5b=f+hJ+FC=s5APYKoycg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-01-13  2:17           ` [Qemu-devel] " Pankaj Gupta
2019-01-13  2:17             ` Pankaj Gupta
     [not found]             ` <540171952.63371441.1547345866585.JavaMail.zimbra-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-01-14  9:55               ` Jan Kara
2019-01-14  9:55                 ` Jan Kara
     [not found]                 ` <20190114095520.GC13316-4I4JzKEfoa/jFM9bn6wA6Q@public.gmane.org>
2019-01-14 10:16                   ` Pankaj Gupta
2019-01-14 10:16                     ` Pankaj Gupta
2019-01-13 23:29 ` Dave Chinner
2019-01-13 23:38   ` Matthew Wilcox
2019-01-14  7:15     ` Pankaj Gupta
2019-01-14 21:25       ` Dave Chinner
2019-01-14 21:35         ` Dan Williams
2019-01-14 22:21           ` Dave Chinner
2019-01-15  2:19             ` Michael S. Tsirkin
     [not found]               ` <20190114205031-mutt-send-email-mst-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
2019-01-15  5:37                 ` [Qemu-devel] " Pankaj Gupta
2019-01-15  5:37                   ` Pankaj Gupta

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1814830087.61221572.1547045168645.JavaMail.zimbra@redhat.com \
    --to=pagupta@redhat.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=dave.jiang@intel.com \
    --cc=david@redhat.com \
    --cc=hch@infradead.org \
    --cc=imammedo@redhat.com \
    --cc=jack@suse.cz \
    --cc=jasowang@redhat.com \
    --cc=jmoyer@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwolf@redhat.com \
    --cc=lcapitulino@redhat.com \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=nilal@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=riel@surriel.com \
    --cc=rjw@rjwysocki.net \
    --cc=stefanha@redhat.com \
    --cc=tytso@mit.edu \
    --cc=virtualization@lists.linux-foundation.org \
    --cc=vishal.l.verma@intel.com \
    --cc=willy@infradead.org \
    --cc=xiaoguangrong.eric@gmail.com \
    --cc=zwisler@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-Fsdevel Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-fsdevel/0 linux-fsdevel/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-fsdevel linux-fsdevel/ https://lore.kernel.org/linux-fsdevel \
		linux-fsdevel@vger.kernel.org linux-fsdevel@archiver.kernel.org
	public-inbox-index linux-fsdevel


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-fsdevel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox