LKML Archive on lore.kernel.org
 help / Atom feed
* [PATCH 0/3] kvm "fake DAX" device
@ 2018-08-31 13:30 Pankaj Gupta
  2018-08-31 13:30 ` [PATCH 1/3] nd: move nd_region to common header Pankaj Gupta
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Pankaj Gupta @ 2018-08-31 13:30 UTC (permalink / raw)
  To: linux-kernel, kvm, qemu-devel, linux-nvdimm
  Cc: jack, stefanha, dan.j.williams, riel, nilal, kwolf, pbonzini,
	ross.zwisler, david, xiaoguangrong.eric, hch, mst,
	niteshnarayanlal, lcapitulino, imammedo, eblake, pagupta

 This patch series has implementation for "fake DAX". 
 "fake DAX" is fake persistent memory(nvdimm) in guest 
 which allows to bypass the guest page cache. This also
 implements a VIRTIO based asynchronous flush mechanism.  
 
 Sharing guest driver and qemu device changes in separate 
 patch sets for easy review and it has been tested together. 
 
 Details of project idea for 'fake DAX' flushing interface 
 is shared [2] & [3].

 Implementation is divided into two parts:
 New virtio pmem guest driver and qemu code changes for new 
 virtio pmem paravirtualized device.

1. Guest virtio-pmem kernel driver
---------------------------------
   - Reads persistent memory range from paravirt device and 
     registers with 'nvdimm_bus'.  
   - 'nvdimm/pmem' driver uses this information to allocate 
     persistent memory region and setup filesystem operations 
     to the allocated memory. 
   - virtio pmem driver implements asynchronous flushing 
     interface to flush from guest to host.

2. Qemu virtio-pmem device
---------------------------------
   - Creates virtio pmem device and exposes a memory range to 
     KVM guest. 
   - At host side this is file backed memory which acts as 
     persistent memory. 
   - Qemu side flush uses aio thread pool API's and virtio 
     for asynchronous guest multi request handling. 

   David Hildenbrand CCed also posted a modified version[4] of 
   qemu virtio-pmem code based on updated Qemu memory device API. 

 Virtio-pmem errors handling:
 ----------------------------------------
  Checked behaviour of virtio-pmem for below types of errors
  Need suggestions on expected behaviour for handling these errors?

  - Hardware Errors: Uncorrectable recoverable Errors: 
  a] virtio-pmem: 
    - As per current logic if error page belongs to Qemu process, 
      host MCE handler isolates(hwpoison) that page and send SIGBUS. 
      Qemu SIGBUS handler injects exception to KVM guest. 
    - KVM guest then isolates the page and send SIGBUS to guest 
      userspace process which has mapped the page. 
  
  b] Existing implementation for ACPI pmem driver: 
    - Handles such errors with MCE notifier and creates a list 
      of bad blocks. Read/direct access DAX operation return EIO 
      if accessed memory page fall in bad block list.
    - It also starts backgound scrubbing.  
    - Similar functionality can be reused in virtio-pmem with MCE 
      notifier but without scrubbing(no ACPI/ARS)? Need inputs to 
      confirm if this behaviour is ok or needs any change?

Changes from RFC v3: [1]
- Rebase to latest upstream - Luiz
- Call ndregion->flush in place of nvdimm_flush- Luiz
- kmalloc return check - Luiz
- virtqueue full handling - Stefan
- Don't map entire virtio_pmem_req to device - Stefan
- request leak,correct sizeof req- Stefan
- Move declaration to virtio_pmem.c

Changes from RFC v2:
- Add flush function in the nd_region in place of switching
  on a flag - Dan & Stefan
- Add flush completion function with proper locking and wait
  for host side flush completion - Stefan & Dan
- Keep userspace API in uapi header file - Stefan, MST
- Use LE fields & New device id - MST
- Indentation & spacing suggestions - MST & Eric
- Remove extra header files & add licensing - Stefan

Changes from RFC v1:
- Reuse existing 'pmem' code for registering persistent 
  memory and other operations instead of creating an entirely 
  new block driver.
- Use VIRTIO driver to register memory information with 
  nvdimm_bus and create region_type accordingly. 
- Call VIRTIO flush from existing pmem driver.

Pankaj Gupta (3):
   nd: move nd_region to common header
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver

[1] https://lkml.org/lkml/2018/7/13/102
[2] https://www.spinics.net/lists/kvm/msg149761.html
[3] https://www.spinics.net/lists/kvm/msg153095.html  
[4] https://marc.info/?l=qemu-devel&m=153555721901824&w=2

 drivers/acpi/nfit/core.c         |    7 -
 drivers/nvdimm/claim.c           |    3 
 drivers/nvdimm/nd.h              |   39 -----
 drivers/nvdimm/pmem.c            |   12 +
 drivers/nvdimm/region_devs.c     |   12 +
 drivers/virtio/Kconfig           |    9 +
 drivers/virtio/Makefile          |    1 
 drivers/virtio/virtio_pmem.c     |  255 +++++++++++++++++++++++++++++++++++++++
 include/linux/libnvdimm.h        |    4 
 include/linux/nd.h               |   40 ++++++
 include/uapi/linux/virtio_ids.h  |    1 
 include/uapi/linux/virtio_pmem.h |   40 ++++++
 12 files changed, 374 insertions(+), 49 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/3] nd: move nd_region to common header 
  2018-08-31 13:30 [PATCH 0/3] kvm "fake DAX" device Pankaj Gupta
@ 2018-08-31 13:30 ` Pankaj Gupta
  2018-09-22  0:47   ` Dan Williams
  2018-08-31 13:30 ` [PATCH 2/3] libnvdimm: nd_region flush callback support Pankaj Gupta
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Pankaj Gupta @ 2018-08-31 13:30 UTC (permalink / raw)
  To: linux-kernel, kvm, qemu-devel, linux-nvdimm
  Cc: jack, stefanha, dan.j.williams, riel, nilal, kwolf, pbonzini,
	ross.zwisler, david, xiaoguangrong.eric, hch, mst,
	niteshnarayanlal, lcapitulino, imammedo, eblake, pagupta

This patch moves nd_region definition to common header
include/linux/nd.h file. This is required for flush callback 
support for both virtio-pmem & pmem driver.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/nvdimm/nd.h | 39 ---------------------------------------
 include/linux/nd.h  | 40 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index 98317e7..d079a2b 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -123,45 +123,6 @@ enum nd_mapping_lock_class {
 	ND_MAPPING_UUID_SCAN,
 };
 
-struct nd_mapping {
-	struct nvdimm *nvdimm;
-	u64 start;
-	u64 size;
-	int position;
-	struct list_head labels;
-	struct mutex lock;
-	/*
-	 * @ndd is for private use at region enable / disable time for
-	 * get_ndd() + put_ndd(), all other nd_mapping to ndd
-	 * conversions use to_ndd() which respects enabled state of the
-	 * nvdimm.
-	 */
-	struct nvdimm_drvdata *ndd;
-};
-
-struct nd_region {
-	struct device dev;
-	struct ida ns_ida;
-	struct ida btt_ida;
-	struct ida pfn_ida;
-	struct ida dax_ida;
-	unsigned long flags;
-	struct device *ns_seed;
-	struct device *btt_seed;
-	struct device *pfn_seed;
-	struct device *dax_seed;
-	u16 ndr_mappings;
-	u64 ndr_size;
-	u64 ndr_start;
-	int id, num_lanes, ro, numa_node;
-	void *provider_data;
-	struct kernfs_node *bb_state;
-	struct badblocks bb;
-	struct nd_interleave_set *nd_set;
-	struct nd_percpu_lane __percpu *lane;
-	struct nd_mapping mapping[0];
-};
-
 struct nd_blk_region {
 	int (*enable)(struct nvdimm_bus *nvdimm_bus, struct device *dev);
 	int (*do_io)(struct nd_blk_region *ndbr, resource_size_t dpa,
diff --git a/include/linux/nd.h b/include/linux/nd.h
index 43c181a..b9da9f7 100644
--- a/include/linux/nd.h
+++ b/include/linux/nd.h
@@ -120,6 +120,46 @@ struct nd_namespace_blk {
 	struct resource **res;
 };
 
+struct nd_mapping {
+	struct nvdimm *nvdimm;
+	u64 start;
+	u64 size;
+	int position;
+	struct list_head labels;
+	struct mutex lock;
+	/*
+	 * @ndd is for private use at region enable / disable time for
+	 * get_ndd() + put_ndd(), all other nd_mapping to ndd
+	 * conversions use to_ndd() which respects enabled state of the
+	 * nvdimm.
+	 */
+	struct nvdimm_drvdata *ndd;
+};
+
+struct nd_region {
+	struct device dev;
+	struct ida ns_ida;
+	struct ida btt_ida;
+	struct ida pfn_ida;
+	struct ida dax_ida;
+	unsigned long flags;
+	struct device *ns_seed;
+	struct device *btt_seed;
+	struct device *pfn_seed;
+	struct device *dax_seed;
+	u16 ndr_mappings;
+	u64 ndr_size;
+	u64 ndr_start;
+	int id, num_lanes, ro, numa_node;
+	void *provider_data;
+	struct kernfs_node *bb_state;
+	struct badblocks bb;
+	struct nd_interleave_set *nd_set;
+	struct nd_percpu_lane __percpu *lane;
+	int (*flush)(struct nd_region *nd_region);
+	struct nd_mapping mapping[0];
+};
+
 static inline struct nd_namespace_io *to_nd_namespace_io(const struct device *dev)
 {
 	return container_of(dev, struct nd_namespace_io, common.dev);
-- 
2.9.3


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 2/3] libnvdimm: nd_region flush callback support
  2018-08-31 13:30 [PATCH 0/3] kvm "fake DAX" device Pankaj Gupta
  2018-08-31 13:30 ` [PATCH 1/3] nd: move nd_region to common header Pankaj Gupta
@ 2018-08-31 13:30 ` Pankaj Gupta
  2018-09-04 15:29   ` kbuild test robot
  2018-09-22  0:43   ` Dan Williams
  2018-08-31 13:30 ` [PATCH 3/3] virtio-pmem: Add virtio pmem driver Pankaj Gupta
  2018-08-31 13:30 ` [PATCH] qemu: Add virtio pmem device Pankaj Gupta
  3 siblings, 2 replies; 22+ messages in thread
From: Pankaj Gupta @ 2018-08-31 13:30 UTC (permalink / raw)
  To: linux-kernel, kvm, qemu-devel, linux-nvdimm
  Cc: jack, stefanha, dan.j.williams, riel, nilal, kwolf, pbonzini,
	ross.zwisler, david, xiaoguangrong.eric, hch, mst,
	niteshnarayanlal, lcapitulino, imammedo, eblake, pagupta

This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special 
flush function. For rest of the region types we are registering 
existing flush function. Report error returned by host fsync 
failure to userspace.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/acpi/nfit/core.c     |  7 +++++--
 drivers/nvdimm/claim.c       |  3 ++-
 drivers/nvdimm/pmem.c        | 12 ++++++++----
 drivers/nvdimm/region_devs.c | 12 ++++++++++--
 include/linux/libnvdimm.h    |  4 +++-
 5 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index b072cfc..cd63b69 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2216,6 +2216,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw,
 {
 	u64 cmd, offset;
 	struct nfit_blk_mmio *mmio = &nfit_blk->mmio[DCR];
+	struct nd_region *nd_region = nfit_blk->nd_region;
 
 	enum {
 		BCW_OFFSET_MASK = (1ULL << 48)-1,
@@ -2234,7 +2235,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw,
 		offset = to_interleave_offset(offset, mmio);
 
 	writeq(cmd, mmio->addr.base + offset);
-	nvdimm_flush(nfit_blk->nd_region);
+	nd_region->flush(nd_region);
 
 	if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
 		readq(mmio->addr.base + offset);
@@ -2245,6 +2246,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
 		unsigned int lane)
 {
 	struct nfit_blk_mmio *mmio = &nfit_blk->mmio[BDW];
+	struct nd_region *nd_region = nfit_blk->nd_region;
 	unsigned int copied = 0;
 	u64 base_offset;
 	int rc;
@@ -2283,7 +2285,8 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
 	}
 
 	if (rw)
-		nvdimm_flush(nfit_blk->nd_region);
+		nd_region->flush(nd_region);
+
 
 	rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
 	return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf..49dce9c 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -262,6 +262,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 {
 	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
 	unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
+	struct nd_region *nd_region = to_nd_region(ndns->dev.parent);
 	sector_t sector = offset >> 9;
 	int rc = 0;
 
@@ -301,7 +302,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 	}
 
 	memcpy_flushcache(nsio->addr + offset, buf, size);
-	nvdimm_flush(to_nd_region(ndns->dev.parent));
+	nd_region->flush(nd_region);
 
 	return rc;
 }
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 6071e29..ba57cfa 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -201,7 +201,8 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 	struct nd_region *nd_region = to_region(pmem);
 
 	if (bio->bi_opf & REQ_PREFLUSH)
-		nvdimm_flush(nd_region);
+		bio->bi_status = nd_region->flush(nd_region);
+
 
 	do_acct = nd_iostat_start(bio, &start);
 	bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 		nd_iostat_end(bio, start);
 
 	if (bio->bi_opf & REQ_FUA)
-		nvdimm_flush(nd_region);
+		bio->bi_status = nd_region->flush(nd_region);
 
 	bio_endio(bio);
 	return BLK_QC_T_NONE;
@@ -517,6 +518,7 @@ static int nd_pmem_probe(struct device *dev)
 static int nd_pmem_remove(struct device *dev)
 {
 	struct pmem_device *pmem = dev_get_drvdata(dev);
+	struct nd_region *nd_region = to_region(pmem);
 
 	if (is_nd_btt(dev))
 		nvdimm_namespace_detach_btt(to_nd_btt(dev));
@@ -528,14 +530,16 @@ static int nd_pmem_remove(struct device *dev)
 		sysfs_put(pmem->bb_state);
 		pmem->bb_state = NULL;
 	}
-	nvdimm_flush(to_nd_region(dev->parent));
+	nd_region->flush(nd_region);
 
 	return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-	nvdimm_flush(to_nd_region(dev->parent));
+	struct nd_region *nd_region = to_nd_region(dev->parent);
+
+	nd_region->flush(nd_region);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index fa37afc..a170a6b 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -290,7 +290,7 @@ static ssize_t deep_flush_store(struct device *dev, struct device_attribute *att
 		return rc;
 	if (!flush)
 		return -EINVAL;
-	nvdimm_flush(nd_region);
+	nd_region->flush(nd_region);
 
 	return len;
 }
@@ -1065,6 +1065,11 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 	dev->of_node = ndr_desc->of_node;
 	nd_region->ndr_size = resource_size(ndr_desc->res);
 	nd_region->ndr_start = ndr_desc->res->start;
+	if (ndr_desc->flush)
+		nd_region->flush = ndr_desc->flush;
+	else
+		nd_region->flush = nvdimm_flush;
+
 	nd_device_register(dev);
 
 	return nd_region;
@@ -1109,7 +1114,7 @@ EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
  * nvdimm_flush - flush any posted write queues between the cpu and pmem media
  * @nd_region: blk or interleaved pmem region
  */
-void nvdimm_flush(struct nd_region *nd_region)
+int nvdimm_flush(struct nd_region *nd_region)
 {
 	struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
 	int i, idx;
@@ -1133,7 +1138,10 @@ void nvdimm_flush(struct nd_region *nd_region)
 		if (ndrd_get_flush_wpq(ndrd, i, 0))
 			writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
 	wmb();
+
+	return 0;
 }
+
 EXPORT_SYMBOL_GPL(nvdimm_flush);
 
 /**
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index 097072c..3af7177 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -115,6 +115,7 @@ struct nd_mapping_desc {
 	int position;
 };
 
+struct nd_region;
 struct nd_region_desc {
 	struct resource *res;
 	struct nd_mapping_desc *mapping;
@@ -126,6 +127,7 @@ struct nd_region_desc {
 	int numa_node;
 	unsigned long flags;
 	struct device_node *of_node;
+	int (*flush)(struct nd_region *nd_region);
 };
 
 struct device;
@@ -201,7 +203,7 @@ unsigned long nd_blk_memremap_flags(struct nd_blk_region *ndbr);
 unsigned int nd_region_acquire_lane(struct nd_region *nd_region);
 void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane);
 u64 nd_fletcher64(void *addr, size_t len, bool le);
-void nvdimm_flush(struct nd_region *nd_region);
+int nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 
-- 
2.9.3


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 3/3]  virtio-pmem: Add virtio pmem driver
  2018-08-31 13:30 [PATCH 0/3] kvm "fake DAX" device Pankaj Gupta
  2018-08-31 13:30 ` [PATCH 1/3] nd: move nd_region to common header Pankaj Gupta
  2018-08-31 13:30 ` [PATCH 2/3] libnvdimm: nd_region flush callback support Pankaj Gupta
@ 2018-08-31 13:30 ` Pankaj Gupta
  2018-09-04 15:17   ` kbuild test robot
                     ` (3 more replies)
  2018-08-31 13:30 ` [PATCH] qemu: Add virtio pmem device Pankaj Gupta
  3 siblings, 4 replies; 22+ messages in thread
From: Pankaj Gupta @ 2018-08-31 13:30 UTC (permalink / raw)
  To: linux-kernel, kvm, qemu-devel, linux-nvdimm
  Cc: jack, stefanha, dan.j.williams, riel, nilal, kwolf, pbonzini,
	ross.zwisler, david, xiaoguangrong.eric, hch, mst,
	niteshnarayanlal, lcapitulino, imammedo, eblake, pagupta

This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/virtio/Kconfig           |   9 ++
 drivers/virtio/Makefile          |   1 +
 drivers/virtio/virtio_pmem.c     | 255 +++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  40 ++++++
 5 files changed, 306 insertions(+)
 create mode 100644 drivers/virtio/virtio_pmem.c
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 3589764..a331e23 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -42,6 +42,15 @@ config VIRTIO_PCI_LEGACY
 
 	  If unsure, say Y.
 
+config VIRTIO_PMEM
+	tristate "Support for virtio pmem driver"
+	depends on VIRTIO
+	help
+	This driver provides support for virtio based flushing interface
+	for persistent memory range.
+
+	If unsure, say M.
+
 config VIRTIO_BALLOON
 	tristate "Virtio balloon driver"
 	depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 3a2b5c5..cbe91c6 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
+obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o
diff --git a/drivers/virtio/virtio_pmem.c b/drivers/virtio/virtio_pmem.c
new file mode 100644
index 0000000..c22cc87
--- /dev/null
+++ b/drivers/virtio/virtio_pmem.c
@@ -0,0 +1,255 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include <linux/virtio.h>
+#include <linux/module.h>
+#include <linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <uapi/linux/virtio_pmem.h>
+#include <linux/spinlock.h>
+#include <linux/libnvdimm.h>
+#include <linux/nd.h>
+
+struct virtio_pmem_request {
+	/* Host return status corresponding to flush request */
+	int ret;
+
+	/* command name*/
+	char name[16];
+
+	/* Wait queue to process deferred work after ack from host */
+	wait_queue_head_t host_acked;
+	bool done;
+
+	/* Wait queue to process deferred work after virt queue buffer avail */
+	wait_queue_head_t wq_buf;
+	bool wq_buf_avail;
+	struct list_head list;
+};
+
+struct virtio_pmem {
+	struct virtio_device *vdev;
+
+	/* Virtio pmem request queue */
+	struct virtqueue *req_vq;
+
+	/* nvdimm bus registers virtio pmem device */
+	struct nvdimm_bus *nvdimm_bus;
+	struct nvdimm_bus_descriptor nd_desc;
+
+	/* List to store deferred work if virtqueue is full */
+	struct list_head req_list;
+
+	/* Synchronize virtqueue data */
+	spinlock_t pmem_lock;
+
+	/* Memory region information */
+	uint64_t start;
+	uint64_t size;
+};
+
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+ /* The interrupt handler */
+static void host_ack(struct virtqueue *vq)
+{
+	unsigned int len;
+	unsigned long flags;
+	struct virtio_pmem_request *req, *req_buf;
+	struct virtio_pmem *vpmem = vq->vdev->priv;
+
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
+		req->done = true;
+		wake_up(&req->host_acked);
+
+		if (!list_empty(&vpmem->req_list)) {
+			req_buf = list_first_entry(&vpmem->req_list,
+					struct virtio_pmem_request, list);
+			list_del(&vpmem->req_list);
+			req_buf->wq_buf_avail = true;
+			wake_up(&req_buf->wq_buf);
+		}
+	}
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+}
+ /* Initialize virt queue */
+static int init_vq(struct virtio_pmem *vpmem)
+{
+	struct virtqueue *vq;
+
+	/* single vq */
+	vpmem->req_vq = vq = virtio_find_single_vq(vpmem->vdev,
+				host_ack, "flush_queue");
+	if (IS_ERR(vq))
+		return PTR_ERR(vq);
+
+	spin_lock_init(&vpmem->pmem_lock);
+	INIT_LIST_HEAD(&vpmem->req_list);
+
+	return 0;
+};
+
+ /* The request submission function */
+static int virtio_pmem_flush(struct nd_region *nd_region)
+{
+	int err;
+	unsigned long flags;
+	struct scatterlist *sgs[2], sg, ret;
+	struct virtio_device *vdev =
+		dev_to_virtio(nd_region->dev.parent->parent);
+	struct virtio_pmem *vpmem = vdev->priv;
+	struct virtio_pmem_request *req = kmalloc(sizeof(*req), GFP_KERNEL);
+
+	if (!req)
+		return -ENOMEM;
+
+	req->done = req->wq_buf_avail = false;
+	strcpy(req->name, "FLUSH");
+	init_waitqueue_head(&req->host_acked);
+	init_waitqueue_head(&req->wq_buf);
+
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	sg_init_one(&sg, req->name, strlen(req->name));
+	sgs[0] = &sg;
+	sg_init_one(&ret, &req->ret, sizeof(req->ret));
+	sgs[1] = &ret;
+	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
+	if (err) {
+		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
+
+		list_add_tail(&vpmem->req_list, &req->list);
+		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+		/* When host has read buffer, this completes via host_ack */
+		wait_event(req->wq_buf, req->wq_buf_avail);
+		spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	}
+	virtqueue_kick(vpmem->req_vq);
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+	/* When host has read buffer, this completes via host_ack */
+	wait_event(req->host_acked, req->done);
+	err = req->ret;
+	kfree(req);
+
+	return err;
+};
+EXPORT_SYMBOL_GPL(virtio_pmem_flush);
+
+static int virtio_pmem_probe(struct virtio_device *vdev)
+{
+	int err = 0;
+	struct resource res;
+	struct virtio_pmem *vpmem;
+	struct nvdimm_bus *nvdimm_bus;
+	struct nd_region_desc ndr_desc;
+	int nid = dev_to_node(&vdev->dev);
+	struct nd_region *nd_region;
+
+	if (!vdev->config->get) {
+		dev_err(&vdev->dev, "%s failure: config disabled\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	vdev->priv = vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem),
+			GFP_KERNEL);
+	if (!vpmem) {
+		err = -ENOMEM;
+		goto out_err;
+	}
+
+	vpmem->vdev = vdev;
+	err = init_vq(vpmem);
+	if (err)
+		goto out_err;
+
+	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
+			start, &vpmem->start);
+	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
+			size, &vpmem->size);
+
+	res.start = vpmem->start;
+	res.end   = vpmem->start + vpmem->size-1;
+	vpmem->nd_desc.provider_name = "virtio-pmem";
+	vpmem->nd_desc.module = THIS_MODULE;
+
+	vpmem->nvdimm_bus = nvdimm_bus = nvdimm_bus_register(&vdev->dev,
+						&vpmem->nd_desc);
+	if (!nvdimm_bus)
+		goto out_vq;
+
+	dev_set_drvdata(&vdev->dev, nvdimm_bus);
+	memset(&ndr_desc, 0, sizeof(ndr_desc));
+
+	ndr_desc.res = &res;
+	ndr_desc.numa_node = nid;
+	ndr_desc.flush = virtio_pmem_flush;
+	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
+	nd_region = nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc);
+
+	if (!nd_region)
+		goto out_nd;
+
+	//virtio_device_ready(vdev);
+	return 0;
+out_nd:
+	err = -ENXIO;
+	nvdimm_bus_unregister(nvdimm_bus);
+out_vq:
+	vdev->config->del_vqs(vdev);
+out_err:
+	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
+	return err;
+}
+
+static void virtio_pmem_remove(struct virtio_device *vdev)
+{
+	struct virtio_pmem *vpmem = vdev->priv;
+	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
+
+	nvdimm_bus_unregister(nvdimm_bus);
+	vdev->config->del_vqs(vdev);
+	kfree(vpmem);
+}
+
+#ifdef CONFIG_PM_SLEEP
+static int virtio_pmem_freeze(struct virtio_device *vdev)
+{
+	/* todo: handle freeze function */
+	return -EPERM;
+}
+
+static int virtio_pmem_restore(struct virtio_device *vdev)
+{
+	/* todo: handle restore function */
+	return -EPERM;
+}
+#endif
+
+
+static struct virtio_driver virtio_pmem_driver = {
+	.driver.name		= KBUILD_MODNAME,
+	.driver.owner		= THIS_MODULE,
+	.id_table		= id_table,
+	.probe			= virtio_pmem_probe,
+	.remove			= virtio_pmem_remove,
+#ifdef CONFIG_PM_SLEEP
+	.freeze                 = virtio_pmem_freeze,
+	.restore                = virtio_pmem_restore,
+#endif
+};
+
+module_virtio_driver(virtio_pmem_driver);
+MODULE_DEVICE_TABLE(virtio, id_table);
+MODULE_DESCRIPTION("Virtio pmem driver");
+MODULE_LICENSE("GPL");
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index 6d5c3b2..3463895 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -43,5 +43,6 @@
 #define VIRTIO_ID_INPUT        18 /* virtio input */
 #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
 #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
+#define VIRTIO_ID_PMEM         25 /* virtio pmem */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
new file mode 100644
index 0000000..c7c22a5
--- /dev/null
+++ b/include/uapi/linux/virtio_pmem.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
+ * anyone can use the definitions to implement compatible drivers/servers:
+ *
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the name of IBM nor the names of its contributors
+ *    may be used to endorse or promote products derived from this software
+ *    without specific prior written permission.
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS''
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
+ * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+ * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
+ * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
+ * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+ * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
+ * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
+ * SUCH DAMAGE.
+ *
+ * Copyright (C) Red Hat, Inc., 2018-2019
+ * Copyright (C) Pankaj Gupta <pagupta@redhat.com>, 2018
+ */
+#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
+#define _UAPI_LINUX_VIRTIO_PMEM_H
+
+struct virtio_pmem_config {
+	__le64 start;
+	__le64 size;
+};
+#endif
-- 
2.9.3


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH] qemu: Add virtio pmem device
  2018-08-31 13:30 [PATCH 0/3] kvm "fake DAX" device Pankaj Gupta
                   ` (2 preceding siblings ...)
  2018-08-31 13:30 ` [PATCH 3/3] virtio-pmem: Add virtio pmem driver Pankaj Gupta
@ 2018-08-31 13:30 ` Pankaj Gupta
  2018-09-12 16:57   ` Luiz Capitulino
  2018-09-20 11:21   ` David Hildenbrand
  3 siblings, 2 replies; 22+ messages in thread
From: Pankaj Gupta @ 2018-08-31 13:30 UTC (permalink / raw)
  To: linux-kernel, kvm, qemu-devel, linux-nvdimm
  Cc: jack, stefanha, dan.j.williams, riel, nilal, kwolf, pbonzini,
	ross.zwisler, david, xiaoguangrong.eric, hch, mst,
	niteshnarayanlal, lcapitulino, imammedo, eblake, pagupta

 This patch adds virtio-pmem Qemu device.

 This device presents memory address range information to guest
 which is backed by file backend type. It acts like persistent
 memory device for KVM guest. Guest can perform read and 
 persistent write operations on this memory range with the help 
 of DAX capable filesystem.

 Persistent guest writes are assured with the help of virtio 
 based flushing interface. When guest userspace space performs 
 fsync on file fd on pmem device, a flush command is send to 
 Qemu over VIRTIO and host side flush/sync is done on backing 
 image file.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
Changes from RFC v3:
- Return EIO for host fsync failure instead of errno - Luiz, Stefan
- Change version for inclusion to Qemu 3.1 - Eric

Changes from RFC v2:
- Use aio_worker() to avoid Qemu from hanging with blocking fsync
  call - Stefan
- Use virtio_st*_p() for endianess - Stefan
- Correct indentation in qapi/misc.json - Eric

 hw/virtio/Makefile.objs                     |   3 +
 hw/virtio/virtio-pci.c                      |  44 +++++
 hw/virtio/virtio-pci.h                      |  14 ++
 hw/virtio/virtio-pmem.c                     | 241 ++++++++++++++++++++++++++++
 include/hw/pci/pci.h                        |   1 +
 include/hw/virtio/virtio-pmem.h             |  42 +++++
 include/standard-headers/linux/virtio_ids.h |   1 +
 qapi/misc.json                              |  26 ++-
 8 files changed, 371 insertions(+), 1 deletion(-)
 create mode 100644 hw/virtio/virtio-pmem.c
 create mode 100644 include/hw/virtio/virtio-pmem.h

diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index 1b2799cfd8..7f914d45d0 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -10,6 +10,9 @@ obj-$(CONFIG_VIRTIO_CRYPTO) += virtio-crypto.o
 obj-$(call land,$(CONFIG_VIRTIO_CRYPTO),$(CONFIG_VIRTIO_PCI)) += virtio-crypto-pci.o
 
 obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
+ifeq ($(CONFIG_MEM_HOTPLUG),y)
+obj-$(CONFIG_LINUX) += virtio-pmem.o
+endif
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
 endif
 
diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 3a01fe90f0..93d3fc05c7 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -2521,6 +2521,49 @@ static const TypeInfo virtio_rng_pci_info = {
     .class_init    = virtio_rng_pci_class_init,
 };
 
+/* virtio-pmem-pci */
+
+static void virtio_pmem_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
+{
+    VirtIOPMEMPCI *vpmem = VIRTIO_PMEM_PCI(vpci_dev);
+    DeviceState *vdev = DEVICE(&vpmem->vdev);
+
+    qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
+    object_property_set_bool(OBJECT(vdev), true, "realized", errp);
+}
+
+static void virtio_pmem_pci_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
+    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
+    k->realize = virtio_pmem_pci_realize;
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
+    pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_PMEM;
+    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
+    pcidev_k->class_id = PCI_CLASS_OTHERS;
+}
+
+static void virtio_pmem_pci_instance_init(Object *obj)
+{
+    VirtIOPMEMPCI *dev = VIRTIO_PMEM_PCI(obj);
+
+    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+                                TYPE_VIRTIO_PMEM);
+    object_property_add_alias(obj, "memdev", OBJECT(&dev->vdev), "memdev",
+                              &error_abort);
+}
+
+static const TypeInfo virtio_pmem_pci_info = {
+    .name          = TYPE_VIRTIO_PMEM_PCI,
+    .parent        = TYPE_VIRTIO_PCI,
+    .instance_size = sizeof(VirtIOPMEMPCI),
+    .instance_init = virtio_pmem_pci_instance_init,
+    .class_init    = virtio_pmem_pci_class_init,
+};
+
+
 /* virtio-input-pci */
 
 static Property virtio_input_pci_properties[] = {
@@ -2714,6 +2757,7 @@ static void virtio_pci_register_types(void)
     type_register_static(&virtio_balloon_pci_info);
     type_register_static(&virtio_serial_pci_info);
     type_register_static(&virtio_net_pci_info);
+    type_register_static(&virtio_pmem_pci_info);
 #ifdef CONFIG_VHOST_SCSI
     type_register_static(&vhost_scsi_pci_info);
 #endif
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 813082b0d7..fe74fcad3f 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -19,6 +19,7 @@
 #include "hw/virtio/virtio-blk.h"
 #include "hw/virtio/virtio-net.h"
 #include "hw/virtio/virtio-rng.h"
+#include "hw/virtio/virtio-pmem.h"
 #include "hw/virtio/virtio-serial.h"
 #include "hw/virtio/virtio-scsi.h"
 #include "hw/virtio/virtio-balloon.h"
@@ -57,6 +58,7 @@ typedef struct VirtIOInputHostPCI VirtIOInputHostPCI;
 typedef struct VirtIOGPUPCI VirtIOGPUPCI;
 typedef struct VHostVSockPCI VHostVSockPCI;
 typedef struct VirtIOCryptoPCI VirtIOCryptoPCI;
+typedef struct VirtIOPMEMPCI VirtIOPMEMPCI;
 
 /* virtio-pci-bus */
 
@@ -274,6 +276,18 @@ struct VirtIOBlkPCI {
     VirtIOBlock vdev;
 };
 
+/*
+ * virtio-pmem-pci: This extends VirtioPCIProxy.
+ */
+#define TYPE_VIRTIO_PMEM_PCI "virtio-pmem-pci"
+#define VIRTIO_PMEM_PCI(obj) \
+        OBJECT_CHECK(VirtIOPMEMPCI, (obj), TYPE_VIRTIO_PMEM_PCI)
+
+struct VirtIOPMEMPCI {
+    VirtIOPCIProxy parent_obj;
+    VirtIOPMEM vdev;
+};
+
 /*
  * virtio-balloon-pci: This extends VirtioPCIProxy.
  */
diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
new file mode 100644
index 0000000000..69ae4c0a50
--- /dev/null
+++ b/hw/virtio/virtio-pmem.c
@@ -0,0 +1,241 @@
+/*
+ * Virtio pmem device
+ *
+ * Copyright (C) 2018 Red Hat, Inc.
+ * Copyright (C) 2018 Pankaj Gupta <pagupta@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu-common.h"
+#include "qemu/error-report.h"
+#include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-pmem.h"
+#include "hw/mem/memory-device.h"
+#include "block/aio.h"
+#include "block/thread-pool.h"
+
+typedef struct VirtIOPMEMresp {
+    int ret;
+} VirtIOPMEMResp;
+
+typedef struct VirtIODeviceRequest {
+    VirtQueueElement elem;
+    int fd;
+    VirtIOPMEM *pmem;
+    VirtIOPMEMResp resp;
+} VirtIODeviceRequest;
+
+static int worker_cb(void *opaque)
+{
+    VirtIODeviceRequest *req = opaque;
+    int err = 0;
+
+    /* flush raw backing image */
+    err = fsync(req->fd);
+    if (err != 0) {
+        err = EIO;
+    }
+    req->resp.ret = err;
+
+    return 0;
+}
+
+static void done_cb(void *opaque, int ret)
+{
+    VirtIODeviceRequest *req = opaque;
+    int len = iov_from_buf(req->elem.in_sg, req->elem.in_num, 0,
+                              &req->resp, sizeof(VirtIOPMEMResp));
+
+    /* Callbacks are serialized, so no need to use atomic ops.  */
+    virtqueue_push(req->pmem->rq_vq, &req->elem, len);
+    virtio_notify((VirtIODevice *)req->pmem, req->pmem->rq_vq);
+    g_free(req);
+}
+
+static void virtio_pmem_flush(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VirtIODeviceRequest *req;
+    VirtIOPMEM *pmem = VIRTIO_PMEM(vdev);
+    HostMemoryBackend *backend = MEMORY_BACKEND(pmem->memdev);
+    ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context());
+
+    req = virtqueue_pop(vq, sizeof(VirtIODeviceRequest));
+    if (!req) {
+        virtio_error(vdev, "virtio-pmem missing request data");
+        return;
+    }
+
+    if (req->elem.out_num < 1 || req->elem.in_num < 1) {
+        virtio_error(vdev, "virtio-pmem request not proper");
+        g_free(req);
+        return;
+    }
+    req->fd = memory_region_get_fd(&backend->mr);
+    req->pmem = pmem;
+    thread_pool_submit_aio(pool, worker_cb, req, done_cb, req);
+}
+
+static void virtio_pmem_get_config(VirtIODevice *vdev, uint8_t *config)
+{
+    VirtIOPMEM *pmem = VIRTIO_PMEM(vdev);
+    struct virtio_pmem_config *pmemcfg = (struct virtio_pmem_config *) config;
+
+    virtio_stq_p(vdev, &pmemcfg->start, pmem->start);
+    virtio_stq_p(vdev, &pmemcfg->size, pmem->size);
+}
+
+static uint64_t virtio_pmem_get_features(VirtIODevice *vdev, uint64_t features,
+                                        Error **errp)
+{
+    return features;
+}
+
+static void virtio_pmem_realize(DeviceState *dev, Error **errp)
+{
+    VirtIODevice   *vdev   = VIRTIO_DEVICE(dev);
+    VirtIOPMEM     *pmem   = VIRTIO_PMEM(dev);
+    MachineState   *ms     = MACHINE(qdev_get_machine());
+    uint64_t align;
+    Error *local_err = NULL;
+    MemoryRegion *mr;
+
+    if (!pmem->memdev) {
+        error_setg(errp, "virtio-pmem memdev not set");
+        return;
+    }
+
+    mr  = host_memory_backend_get_memory(pmem->memdev);
+    align = memory_region_get_alignment(mr);
+    pmem->size = QEMU_ALIGN_DOWN(memory_region_size(mr), align);
+    pmem->start = memory_device_get_free_addr(ms, NULL, align, pmem->size,
+                                                               &local_err);
+    if (local_err) {
+        error_setg(errp, "Can't get free address in mem device");
+        return;
+    }
+    memory_region_init_alias(&pmem->mr, OBJECT(pmem),
+                             "virtio_pmem-memory", mr, 0, pmem->size);
+    memory_device_plug_region(ms, &pmem->mr, pmem->start);
+
+    host_memory_backend_set_mapped(pmem->memdev, true);
+    virtio_init(vdev, TYPE_VIRTIO_PMEM, VIRTIO_ID_PMEM,
+                                          sizeof(struct virtio_pmem_config));
+    pmem->rq_vq = virtio_add_queue(vdev, 128, virtio_pmem_flush);
+}
+
+static void virtio_mem_check_memdev(Object *obj, const char *name, Object *val,
+                                    Error **errp)
+{
+    if (host_memory_backend_is_mapped(MEMORY_BACKEND(val))) {
+        char *path = object_get_canonical_path_component(val);
+        error_setg(errp, "Can't use already busy memdev: %s", path);
+        g_free(path);
+        return;
+    }
+
+    qdev_prop_allow_set_link_before_realize(obj, name, val, errp);
+}
+
+static const char *virtio_pmem_get_device_id(VirtIOPMEM *vm)
+{
+    Object *obj = OBJECT(vm);
+    DeviceState *parent_dev;
+
+    /* always use the ID of the proxy device */
+    if (obj->parent && object_dynamic_cast(obj->parent, TYPE_DEVICE)) {
+        parent_dev = DEVICE(obj->parent);
+        return parent_dev->id;
+    }
+    return NULL;
+}
+
+static void virtio_pmem_md_fill_device_info(const MemoryDeviceState *md,
+                                           MemoryDeviceInfo *info)
+{
+    VirtioPMemDeviceInfo *vi = g_new0(VirtioPMemDeviceInfo, 1);
+    VirtIOPMEM *vm = VIRTIO_PMEM(md);
+    const char *id = virtio_pmem_get_device_id(vm);
+
+    if (id) {
+        vi->has_id = true;
+        vi->id = g_strdup(id);
+    }
+
+    vi->start = vm->start;
+    vi->size = vm->size;
+    vi->memdev = object_get_canonical_path(OBJECT(vm->memdev));
+
+    info->u.virtio_pmem.data = vi;
+    info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_PMEM;
+}
+
+static uint64_t virtio_pmem_md_get_addr(const MemoryDeviceState *md)
+{
+    VirtIOPMEM *vm = VIRTIO_PMEM(md);
+
+    return vm->start;
+}
+
+static uint64_t virtio_pmem_md_get_plugged_size(const MemoryDeviceState *md)
+{
+    VirtIOPMEM *vm = VIRTIO_PMEM(md);
+
+    return vm->size;
+}
+
+static uint64_t virtio_pmem_md_get_region_size(const MemoryDeviceState *md)
+{
+    VirtIOPMEM *vm = VIRTIO_PMEM(md);
+
+    return vm->size;
+}
+
+static void virtio_pmem_instance_init(Object *obj)
+{
+    VirtIOPMEM *vm = VIRTIO_PMEM(obj);
+    object_property_add_link(obj, "memdev", TYPE_MEMORY_BACKEND,
+                                (Object **)&vm->memdev,
+                                (void *) virtio_mem_check_memdev,
+                                OBJ_PROP_LINK_STRONG,
+                                &error_abort);
+}
+
+
+static void virtio_pmem_class_init(ObjectClass *klass, void *data)
+{
+    VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
+    MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(klass);
+
+    vdc->realize      =  virtio_pmem_realize;
+    vdc->get_config   =  virtio_pmem_get_config;
+    vdc->get_features =  virtio_pmem_get_features;
+
+    mdc->get_addr         = virtio_pmem_md_get_addr;
+    mdc->get_plugged_size = virtio_pmem_md_get_plugged_size;
+    mdc->get_region_size  = virtio_pmem_md_get_region_size;
+    mdc->fill_device_info = virtio_pmem_md_fill_device_info;
+}
+
+static TypeInfo virtio_pmem_info = {
+    .name          = TYPE_VIRTIO_PMEM,
+    .parent        = TYPE_VIRTIO_DEVICE,
+    .class_init    = virtio_pmem_class_init,
+    .instance_size = sizeof(VirtIOPMEM),
+    .instance_init = virtio_pmem_instance_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_MEMORY_DEVICE },
+        { }
+  },
+};
+
+static void virtio_register_types(void)
+{
+    type_register_static(&virtio_pmem_info);
+}
+
+type_init(virtio_register_types)
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 990d6fcbde..28829b6437 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -85,6 +85,7 @@ extern bool pci_available;
 #define PCI_DEVICE_ID_VIRTIO_RNG         0x1005
 #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
 #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
+#define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
 
 #define PCI_VENDOR_ID_REDHAT             0x1b36
 #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
diff --git a/include/hw/virtio/virtio-pmem.h b/include/hw/virtio/virtio-pmem.h
new file mode 100644
index 0000000000..fda3ee691c
--- /dev/null
+++ b/include/hw/virtio/virtio-pmem.h
@@ -0,0 +1,42 @@
+/*
+ * Virtio pmem Device
+ *
+ * Copyright Red Hat, Inc. 2018
+ * Copyright Pankaj Gupta <pagupta@redhat.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or
+ * (at your option) any later version.  See the COPYING file in the
+ * top-level directory.
+ */
+
+#ifndef QEMU_VIRTIO_PMEM_H
+#define QEMU_VIRTIO_PMEM_H
+
+#include "hw/virtio/virtio.h"
+#include "exec/memory.h"
+#include "sysemu/hostmem.h"
+#include "standard-headers/linux/virtio_ids.h"
+#include "hw/boards.h"
+#include "hw/i386/pc.h"
+
+#define TYPE_VIRTIO_PMEM "virtio-pmem"
+
+#define VIRTIO_PMEM(obj) \
+        OBJECT_CHECK(VirtIOPMEM, (obj), TYPE_VIRTIO_PMEM)
+
+/* VirtIOPMEM device structure */
+typedef struct VirtIOPMEM {
+    VirtIODevice parent_obj;
+
+    VirtQueue *rq_vq;
+    uint64_t start;
+    uint64_t size;
+    MemoryRegion mr;
+    HostMemoryBackend *memdev;
+} VirtIOPMEM;
+
+struct virtio_pmem_config {
+    uint64_t start;
+    uint64_t size;
+};
+#endif
diff --git a/include/standard-headers/linux/virtio_ids.h b/include/standard-headers/linux/virtio_ids.h
index 6d5c3b2d4f..346389565a 100644
--- a/include/standard-headers/linux/virtio_ids.h
+++ b/include/standard-headers/linux/virtio_ids.h
@@ -43,5 +43,6 @@
 #define VIRTIO_ID_INPUT        18 /* virtio input */
 #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
 #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
+#define VIRTIO_ID_PMEM         25 /* virtio pmem */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/qapi/misc.json b/qapi/misc.json
index d450cfef21..517376b866 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -2907,6 +2907,29 @@
           }
 }
 
+##
+# @VirtioPMemDeviceInfo:
+#
+# VirtioPMem state information
+#
+# @id: device's ID
+#
+# @start: physical address, where device is mapped
+#
+# @size: size of memory that the device provides
+#
+# @memdev: memory backend linked with device
+#
+# Since: 3.1
+##
+{ 'struct': 'VirtioPMemDeviceInfo',
+  'data': { '*id': 'str',
+            'start': 'size',
+            'size': 'size',
+            'memdev': 'str'
+          }
+}
+
 ##
 # @MemoryDeviceInfo:
 #
@@ -2916,7 +2939,8 @@
 ##
 { 'union': 'MemoryDeviceInfo',
   'data': { 'dimm': 'PCDIMMDeviceInfo',
-            'nvdimm': 'PCDIMMDeviceInfo'
+            'nvdimm': 'PCDIMMDeviceInfo',
+	    'virtio-pmem': 'VirtioPMemDeviceInfo'
           }
 }
 
-- 
2.14.3


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3]  virtio-pmem: Add virtio pmem driver
  2018-08-31 13:30 ` [PATCH 3/3] virtio-pmem: Add virtio pmem driver Pankaj Gupta
@ 2018-09-04 15:17   ` kbuild test robot
  2018-09-05  8:34     ` Pankaj Gupta
  2018-09-05 12:02   ` kbuild test robot
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: kbuild test robot @ 2018-09-04 15:17 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: kbuild-all, linux-kernel, kvm, qemu-devel, linux-nvdimm, jack,
	stefanha, dan.j.williams, riel, nilal, kwolf, pbonzini,
	ross.zwisler, david, xiaoguangrong.eric, hch, mst,
	niteshnarayanlal, lcapitulino, imammedo, eblake, pagupta

[-- Attachment #1: Type: text/plain, Size: 6507 bytes --]

Hi Pankaj,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linux-nvdimm/libnvdimm-for-next]
[also build test ERROR on v4.19-rc2 next-20180903]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pankaj-Gupta/kvm-fake-DAX-device/20180903-160032
base:   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git libnvdimm-for-next
config: i386-randconfig-a3-201835 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 
:::::: branch date: 21 hours ago
:::::: commit date: 21 hours ago

All errors (new ones prefixed by >>):

   drivers/virtio/virtio_pmem.o: In function `virtio_pmem_remove':
>> drivers/virtio/virtio_pmem.c:220: undefined reference to `nvdimm_bus_unregister'
   drivers/virtio/virtio_pmem.o: In function `virtio_pmem_probe':
>> drivers/virtio/virtio_pmem.c:186: undefined reference to `nvdimm_bus_register'
>> drivers/virtio/virtio_pmem.c:198: undefined reference to `nvdimm_pmem_region_create'
   drivers/virtio/virtio_pmem.c:207: undefined reference to `nvdimm_bus_unregister'

# https://github.com/0day-ci/linux/commit/acce2633da18b0ad58d0cc9243a85b03020ca099
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout acce2633da18b0ad58d0cc9243a85b03020ca099
vim +220 drivers/virtio/virtio_pmem.c

acce2633 Pankaj Gupta 2018-08-31  147  
acce2633 Pankaj Gupta 2018-08-31  148  static int virtio_pmem_probe(struct virtio_device *vdev)
acce2633 Pankaj Gupta 2018-08-31  149  {
acce2633 Pankaj Gupta 2018-08-31  150  	int err = 0;
acce2633 Pankaj Gupta 2018-08-31  151  	struct resource res;
acce2633 Pankaj Gupta 2018-08-31  152  	struct virtio_pmem *vpmem;
acce2633 Pankaj Gupta 2018-08-31  153  	struct nvdimm_bus *nvdimm_bus;
acce2633 Pankaj Gupta 2018-08-31  154  	struct nd_region_desc ndr_desc;
acce2633 Pankaj Gupta 2018-08-31  155  	int nid = dev_to_node(&vdev->dev);
acce2633 Pankaj Gupta 2018-08-31  156  	struct nd_region *nd_region;
acce2633 Pankaj Gupta 2018-08-31  157  
acce2633 Pankaj Gupta 2018-08-31  158  	if (!vdev->config->get) {
acce2633 Pankaj Gupta 2018-08-31  159  		dev_err(&vdev->dev, "%s failure: config disabled\n",
acce2633 Pankaj Gupta 2018-08-31  160  			__func__);
acce2633 Pankaj Gupta 2018-08-31  161  		return -EINVAL;
acce2633 Pankaj Gupta 2018-08-31  162  	}
acce2633 Pankaj Gupta 2018-08-31  163  
acce2633 Pankaj Gupta 2018-08-31  164  	vdev->priv = vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem),
acce2633 Pankaj Gupta 2018-08-31  165  			GFP_KERNEL);
acce2633 Pankaj Gupta 2018-08-31  166  	if (!vpmem) {
acce2633 Pankaj Gupta 2018-08-31  167  		err = -ENOMEM;
acce2633 Pankaj Gupta 2018-08-31  168  		goto out_err;
acce2633 Pankaj Gupta 2018-08-31  169  	}
acce2633 Pankaj Gupta 2018-08-31  170  
acce2633 Pankaj Gupta 2018-08-31  171  	vpmem->vdev = vdev;
acce2633 Pankaj Gupta 2018-08-31  172  	err = init_vq(vpmem);
acce2633 Pankaj Gupta 2018-08-31  173  	if (err)
acce2633 Pankaj Gupta 2018-08-31  174  		goto out_err;
acce2633 Pankaj Gupta 2018-08-31  175  
acce2633 Pankaj Gupta 2018-08-31  176  	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
acce2633 Pankaj Gupta 2018-08-31  177  			start, &vpmem->start);
acce2633 Pankaj Gupta 2018-08-31  178  	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
acce2633 Pankaj Gupta 2018-08-31  179  			size, &vpmem->size);
acce2633 Pankaj Gupta 2018-08-31  180  
acce2633 Pankaj Gupta 2018-08-31  181  	res.start = vpmem->start;
acce2633 Pankaj Gupta 2018-08-31  182  	res.end   = vpmem->start + vpmem->size-1;
acce2633 Pankaj Gupta 2018-08-31  183  	vpmem->nd_desc.provider_name = "virtio-pmem";
acce2633 Pankaj Gupta 2018-08-31  184  	vpmem->nd_desc.module = THIS_MODULE;
acce2633 Pankaj Gupta 2018-08-31  185  
acce2633 Pankaj Gupta 2018-08-31 @186  	vpmem->nvdimm_bus = nvdimm_bus = nvdimm_bus_register(&vdev->dev,
acce2633 Pankaj Gupta 2018-08-31  187  						&vpmem->nd_desc);
acce2633 Pankaj Gupta 2018-08-31  188  	if (!nvdimm_bus)
acce2633 Pankaj Gupta 2018-08-31  189  		goto out_vq;
acce2633 Pankaj Gupta 2018-08-31  190  
acce2633 Pankaj Gupta 2018-08-31  191  	dev_set_drvdata(&vdev->dev, nvdimm_bus);
acce2633 Pankaj Gupta 2018-08-31  192  	memset(&ndr_desc, 0, sizeof(ndr_desc));
acce2633 Pankaj Gupta 2018-08-31  193  
acce2633 Pankaj Gupta 2018-08-31  194  	ndr_desc.res = &res;
acce2633 Pankaj Gupta 2018-08-31  195  	ndr_desc.numa_node = nid;
acce2633 Pankaj Gupta 2018-08-31  196  	ndr_desc.flush = virtio_pmem_flush;
acce2633 Pankaj Gupta 2018-08-31  197  	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
acce2633 Pankaj Gupta 2018-08-31 @198  	nd_region = nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc);
acce2633 Pankaj Gupta 2018-08-31  199  
acce2633 Pankaj Gupta 2018-08-31  200  	if (!nd_region)
acce2633 Pankaj Gupta 2018-08-31  201  		goto out_nd;
acce2633 Pankaj Gupta 2018-08-31  202  
acce2633 Pankaj Gupta 2018-08-31  203  	//virtio_device_ready(vdev);
acce2633 Pankaj Gupta 2018-08-31  204  	return 0;
acce2633 Pankaj Gupta 2018-08-31  205  out_nd:
acce2633 Pankaj Gupta 2018-08-31  206  	err = -ENXIO;
acce2633 Pankaj Gupta 2018-08-31  207  	nvdimm_bus_unregister(nvdimm_bus);
acce2633 Pankaj Gupta 2018-08-31  208  out_vq:
acce2633 Pankaj Gupta 2018-08-31  209  	vdev->config->del_vqs(vdev);
acce2633 Pankaj Gupta 2018-08-31  210  out_err:
acce2633 Pankaj Gupta 2018-08-31  211  	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
acce2633 Pankaj Gupta 2018-08-31  212  	return err;
acce2633 Pankaj Gupta 2018-08-31  213  }
acce2633 Pankaj Gupta 2018-08-31  214  
acce2633 Pankaj Gupta 2018-08-31  215  static void virtio_pmem_remove(struct virtio_device *vdev)
acce2633 Pankaj Gupta 2018-08-31  216  {
acce2633 Pankaj Gupta 2018-08-31  217  	struct virtio_pmem *vpmem = vdev->priv;
acce2633 Pankaj Gupta 2018-08-31  218  	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
acce2633 Pankaj Gupta 2018-08-31  219  
acce2633 Pankaj Gupta 2018-08-31 @220  	nvdimm_bus_unregister(nvdimm_bus);
acce2633 Pankaj Gupta 2018-08-31  221  	vdev->config->del_vqs(vdev);
acce2633 Pankaj Gupta 2018-08-31  222  	kfree(vpmem);
acce2633 Pankaj Gupta 2018-08-31  223  }
acce2633 Pankaj Gupta 2018-08-31  224  

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 28663 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] libnvdimm: nd_region flush callback support
  2018-08-31 13:30 ` [PATCH 2/3] libnvdimm: nd_region flush callback support Pankaj Gupta
@ 2018-09-04 15:29   ` kbuild test robot
  2018-09-05  8:40     ` Pankaj Gupta
  2018-09-22  0:43   ` Dan Williams
  1 sibling, 1 reply; 22+ messages in thread
From: kbuild test robot @ 2018-09-04 15:29 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: kbuild-all, linux-kernel, kvm, qemu-devel, linux-nvdimm, jack,
	stefanha, dan.j.williams, riel, nilal, kwolf, pbonzini,
	ross.zwisler, david, xiaoguangrong.eric, hch, mst,
	niteshnarayanlal, lcapitulino, imammedo, eblake, pagupta

Hi Pankaj,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linux-nvdimm/libnvdimm-for-next]
[also build test WARNING on v4.19-rc2 next-20180831]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pankaj-Gupta/kvm-fake-DAX-device/20180903-160032
base:   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git libnvdimm-for-next
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__
:::::: branch date: 7 hours ago
:::::: commit date: 7 hours ago

   drivers/nvdimm/pmem.c:116:25: sparse: expression using sizeof(void)
   drivers/nvdimm/pmem.c:135:25: sparse: expression using sizeof(void)
>> drivers/nvdimm/pmem.c:204:32: sparse: incorrect type in assignment (different base types) @@    expected restricted blk_status_t [usertype] bi_status @@    got e] bi_status @@
   drivers/nvdimm/pmem.c:204:32:    expected restricted blk_status_t [usertype] bi_status
   drivers/nvdimm/pmem.c:204:32:    got int
   drivers/nvdimm/pmem.c:208:9: sparse: expression using sizeof(void)
   drivers/nvdimm/pmem.c:208:9: sparse: expression using sizeof(void)
   include/linux/bvec.h:82:37: sparse: expression using sizeof(void)
   include/linux/bvec.h:82:37: sparse: expression using sizeof(void)
   include/linux/bvec.h:83:32: sparse: expression using sizeof(void)
   include/linux/bvec.h:83:32: sparse: expression using sizeof(void)
   drivers/nvdimm/pmem.c:220:32: sparse: incorrect type in assignment (different base types) @@    expected restricted blk_status_t [usertype] bi_status @@    got e] bi_status @@
   drivers/nvdimm/pmem.c:220:32:    expected restricted blk_status_t [usertype] bi_status
   drivers/nvdimm/pmem.c:220:32:    got int

# https://github.com/0day-ci/linux/commit/69b95edd2a1f4676361988fa36866b59427e2cfa
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 69b95edd2a1f4676361988fa36866b59427e2cfa
vim +204 drivers/nvdimm/pmem.c

59e647398 drivers/nvdimm/pmem.c Dan Williams      2016-03-08  107  
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  108  static void write_pmem(void *pmem_addr, struct page *page,
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  109  		unsigned int off, unsigned int len)
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  110  {
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  111  	unsigned int chunk;
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  112  	void *mem;
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  113  
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  114  	while (len) {
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  115  		mem = kmap_atomic(page);
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06 @116  		chunk = min_t(unsigned int, len, PAGE_SIZE);
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  117  		memcpy_flushcache(pmem_addr, mem + off, chunk);
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  118  		kunmap_atomic(mem);
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  119  		len -= chunk;
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  120  		off = 0;
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  121  		page++;
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  122  		pmem_addr += PAGE_SIZE;
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  123  	}
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  124  }
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  125  
4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  126  static blk_status_t read_pmem(struct page *page, unsigned int off,
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  127  		void *pmem_addr, unsigned int len)
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  128  {
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  129  	unsigned int chunk;
60622d682 drivers/nvdimm/pmem.c Dan Williams      2018-05-03  130  	unsigned long rem;
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  131  	void *mem;
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  132  
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  133  	while (len) {
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  134  		mem = kmap_atomic(page);
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  135  		chunk = min_t(unsigned int, len, PAGE_SIZE);
60622d682 drivers/nvdimm/pmem.c Dan Williams      2018-05-03  136  		rem = memcpy_mcsafe(mem + off, pmem_addr, chunk);
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  137  		kunmap_atomic(mem);
60622d682 drivers/nvdimm/pmem.c Dan Williams      2018-05-03  138  		if (rem)
4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  139  			return BLK_STS_IOERR;
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  140  		len -= chunk;
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  141  		off = 0;
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  142  		page++;
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  143  		pmem_addr += PAGE_SIZE;
98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  144  	}
4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  145  	return BLK_STS_OK;
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  146  }
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  147  
4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  148  static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
3f289dcb4 drivers/nvdimm/pmem.c Tejun Heo         2018-07-18  149  			unsigned int len, unsigned int off, unsigned int op,
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  150  			sector_t sector)
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  151  {
4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  152  	blk_status_t rc = BLK_STS_OK;
59e647398 drivers/nvdimm/pmem.c Dan Williams      2016-03-08  153  	bool bad_pmem = false;
32ab0a3f5 drivers/nvdimm/pmem.c Dan Williams      2015-08-01  154  	phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
7a9eb2066 drivers/nvdimm/pmem.c Dan Williams      2016-06-03  155  	void *pmem_addr = pmem->virt_addr + pmem_off;
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  156  
e10624f8c drivers/nvdimm/pmem.c Dan Williams      2016-01-06  157  	if (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
59e647398 drivers/nvdimm/pmem.c Dan Williams      2016-03-08  158  		bad_pmem = true;
59e647398 drivers/nvdimm/pmem.c Dan Williams      2016-03-08  159  
3f289dcb4 drivers/nvdimm/pmem.c Tejun Heo         2018-07-18  160  	if (!op_is_write(op)) {
59e647398 drivers/nvdimm/pmem.c Dan Williams      2016-03-08  161  		if (unlikely(bad_pmem))
4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  162  			rc = BLK_STS_IOERR;
b5ebc8ec6 drivers/nvdimm/pmem.c Dan Williams      2016-03-06  163  		else {
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  164  			rc = read_pmem(page, off, pmem_addr, len);
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  165  			flush_dcache_page(page);
b5ebc8ec6 drivers/nvdimm/pmem.c Dan Williams      2016-03-06  166  		}
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  167  	} else {
0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  168  		/*
0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  169  		 * Note that we write the data both before and after
0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  170  		 * clearing poison.  The write before clear poison
0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  171  		 * handles situations where the latest written data is
0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  172  		 * preserved and the clear poison operation simply marks
0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  173  		 * the address range as valid without changing the data.
0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  174  		 * In this case application software can assume that an
0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  175  		 * interrupted write will either return the new good
0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  176  		 * data or an error.
0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  177  		 *
0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  178  		 * However, if pmem_clear_poison() leaves the data in an
0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  179  		 * indeterminate state we need to perform the write
0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  180  		 * after clear poison.
0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  181  		 */
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  182  		flush_dcache_page(page);
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  183  		write_pmem(pmem_addr, page, off, len);
59e647398 drivers/nvdimm/pmem.c Dan Williams      2016-03-08  184  		if (unlikely(bad_pmem)) {
3115bb02b drivers/nvdimm/pmem.c Toshi Kani        2016-10-13  185  			rc = pmem_clear_poison(pmem, pmem_off, len);
bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  186  			write_pmem(pmem_addr, page, off, len);
59e647398 drivers/nvdimm/pmem.c Dan Williams      2016-03-08  187  		}
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  188  	}
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  189  
b5ebc8ec6 drivers/nvdimm/pmem.c Dan Williams      2016-03-06  190  	return rc;
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  191  }
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  192  
dece16353 drivers/nvdimm/pmem.c Jens Axboe        2015-11-05  193  static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  194  {
4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  195  	blk_status_t rc = 0;
f0dc089ce drivers/nvdimm/pmem.c Dan Williams      2015-05-16  196  	bool do_acct;
f0dc089ce drivers/nvdimm/pmem.c Dan Williams      2015-05-16  197  	unsigned long start;
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  198  	struct bio_vec bvec;
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  199  	struct bvec_iter iter;
bd842b8ca drivers/nvdimm/pmem.c Dan Williams      2016-03-18  200  	struct pmem_device *pmem = q->queuedata;
7e267a8c7 drivers/nvdimm/pmem.c Dan Williams      2016-06-01  201  	struct nd_region *nd_region = to_region(pmem);
7e267a8c7 drivers/nvdimm/pmem.c Dan Williams      2016-06-01  202  
d2d6364dc drivers/nvdimm/pmem.c Ross Zwisler      2018-06-06  203  	if (bio->bi_opf & REQ_PREFLUSH)
69b95edd2 drivers/nvdimm/pmem.c Pankaj Gupta      2018-08-31 @204  		bio->bi_status = nd_region->flush(nd_region);
69b95edd2 drivers/nvdimm/pmem.c Pankaj Gupta      2018-08-31  205  
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  206  
f0dc089ce drivers/nvdimm/pmem.c Dan Williams      2015-05-16  207  	do_acct = nd_iostat_start(bio, &start);
e10624f8c drivers/nvdimm/pmem.c Dan Williams      2016-01-06  208  	bio_for_each_segment(bvec, bio, iter) {
e10624f8c drivers/nvdimm/pmem.c Dan Williams      2016-01-06  209  		rc = pmem_do_bvec(pmem, bvec.bv_page, bvec.bv_len,
3f289dcb4 drivers/nvdimm/pmem.c Tejun Heo         2018-07-18  210  				bvec.bv_offset, bio_op(bio), iter.bi_sector);
e10624f8c drivers/nvdimm/pmem.c Dan Williams      2016-01-06  211  		if (rc) {
4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  212  			bio->bi_status = rc;
e10624f8c drivers/nvdimm/pmem.c Dan Williams      2016-01-06  213  			break;
e10624f8c drivers/nvdimm/pmem.c Dan Williams      2016-01-06  214  		}
e10624f8c drivers/nvdimm/pmem.c Dan Williams      2016-01-06  215  	}
f0dc089ce drivers/nvdimm/pmem.c Dan Williams      2015-05-16  216  	if (do_acct)
f0dc089ce drivers/nvdimm/pmem.c Dan Williams      2015-05-16  217  		nd_iostat_end(bio, start);
61031952f drivers/nvdimm/pmem.c Ross Zwisler      2015-06-25  218  
1eff9d322 drivers/nvdimm/pmem.c Jens Axboe        2016-08-05  219  	if (bio->bi_opf & REQ_FUA)
69b95edd2 drivers/nvdimm/pmem.c Pankaj Gupta      2018-08-31  220  		bio->bi_status = nd_region->flush(nd_region);
61031952f drivers/nvdimm/pmem.c Ross Zwisler      2015-06-25  221  
4246a0b63 drivers/nvdimm/pmem.c Christoph Hellwig 2015-07-20  222  	bio_endio(bio);
dece16353 drivers/nvdimm/pmem.c Jens Axboe        2015-11-05  223  	return BLK_QC_T_NONE;
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  224  }
9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  225  

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3]  virtio-pmem: Add virtio pmem driver
  2018-09-04 15:17   ` kbuild test robot
@ 2018-09-05  8:34     ` Pankaj Gupta
  0 siblings, 0 replies; 22+ messages in thread
From: Pankaj Gupta @ 2018-09-05  8:34 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, linux-kernel, kvm, qemu-devel, linux-nvdimm, jack,
	stefanha, dan j williams, riel, nilal, kwolf, pbonzini,
	ross zwisler, david, xiaoguangrong eric, hch, mst,
	niteshnarayanlal, lcapitulino, imammedo, eblake


Hello,

Thanks for the report.

> Hi Pankaj,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linux-nvdimm/libnvdimm-for-next]
> [also build test ERROR on v4.19-rc2 next-20180903]
> [if your patch is applied to the wrong git tree, please drop us a note to
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Pankaj-Gupta/kvm-fake-DAX-device/20180903-160032
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git
> libnvdimm-for-next
> config: i386-randconfig-a3-201835 (attached as .config)
> compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
> reproduce:
>         # save the attached .config to linux build tree
>         make ARCH=i386
> :::::: branch date: 21 hours ago
> :::::: commit date: 21 hours ago
> 
> All errors (new ones prefixed by >>):
> 
>    drivers/virtio/virtio_pmem.o: In function `virtio_pmem_remove':
> >> drivers/virtio/virtio_pmem.c:220: undefined reference to
> >> `nvdimm_bus_unregister'
>    drivers/virtio/virtio_pmem.o: In function `virtio_pmem_probe':
> >> drivers/virtio/virtio_pmem.c:186: undefined reference to
> >> `nvdimm_bus_register'
> >> drivers/virtio/virtio_pmem.c:198: undefined reference to
> >> `nvdimm_pmem_region_create'
>    drivers/virtio/virtio_pmem.c:207: undefined reference to
>    `nvdimm_bus_unregister'

It looks like dependent configiguration 'LIBNVDIMM' is not enabled. I will add the 
dependency in Kconfig file for virtio_pmem in v2.

Thanks,
Pankaj

> 
> #
> https://github.com/0day-ci/linux/commit/acce2633da18b0ad58d0cc9243a85b03020ca099
> git remote add linux-review https://github.com/0day-ci/linux
> git remote update linux-review
> git checkout acce2633da18b0ad58d0cc9243a85b03020ca099
> vim +220 drivers/virtio/virtio_pmem.c
> 
> acce2633 Pankaj Gupta 2018-08-31  147
> acce2633 Pankaj Gupta 2018-08-31  148  static int virtio_pmem_probe(struct
> virtio_device *vdev)
> acce2633 Pankaj Gupta 2018-08-31  149  {
> acce2633 Pankaj Gupta 2018-08-31  150  	int err = 0;
> acce2633 Pankaj Gupta 2018-08-31  151  	struct resource res;
> acce2633 Pankaj Gupta 2018-08-31  152  	struct virtio_pmem *vpmem;
> acce2633 Pankaj Gupta 2018-08-31  153  	struct nvdimm_bus *nvdimm_bus;
> acce2633 Pankaj Gupta 2018-08-31  154  	struct nd_region_desc ndr_desc;
> acce2633 Pankaj Gupta 2018-08-31  155  	int nid = dev_to_node(&vdev->dev);
> acce2633 Pankaj Gupta 2018-08-31  156  	struct nd_region *nd_region;
> acce2633 Pankaj Gupta 2018-08-31  157
> acce2633 Pankaj Gupta 2018-08-31  158  	if (!vdev->config->get) {
> acce2633 Pankaj Gupta 2018-08-31  159  		dev_err(&vdev->dev, "%s failure:
> config disabled\n",
> acce2633 Pankaj Gupta 2018-08-31  160  			__func__);
> acce2633 Pankaj Gupta 2018-08-31  161  		return -EINVAL;
> acce2633 Pankaj Gupta 2018-08-31  162  	}
> acce2633 Pankaj Gupta 2018-08-31  163
> acce2633 Pankaj Gupta 2018-08-31  164  	vdev->priv = vpmem =
> devm_kzalloc(&vdev->dev, sizeof(*vpmem),
> acce2633 Pankaj Gupta 2018-08-31  165  			GFP_KERNEL);
> acce2633 Pankaj Gupta 2018-08-31  166  	if (!vpmem) {
> acce2633 Pankaj Gupta 2018-08-31  167  		err = -ENOMEM;
> acce2633 Pankaj Gupta 2018-08-31  168  		goto out_err;
> acce2633 Pankaj Gupta 2018-08-31  169  	}
> acce2633 Pankaj Gupta 2018-08-31  170
> acce2633 Pankaj Gupta 2018-08-31  171  	vpmem->vdev = vdev;
> acce2633 Pankaj Gupta 2018-08-31  172  	err = init_vq(vpmem);
> acce2633 Pankaj Gupta 2018-08-31  173  	if (err)
> acce2633 Pankaj Gupta 2018-08-31  174  		goto out_err;
> acce2633 Pankaj Gupta 2018-08-31  175
> acce2633 Pankaj Gupta 2018-08-31  176  	virtio_cread(vpmem->vdev, struct
> virtio_pmem_config,
> acce2633 Pankaj Gupta 2018-08-31  177  			start, &vpmem->start);
> acce2633 Pankaj Gupta 2018-08-31  178  	virtio_cread(vpmem->vdev, struct
> virtio_pmem_config,
> acce2633 Pankaj Gupta 2018-08-31  179  			size, &vpmem->size);
> acce2633 Pankaj Gupta 2018-08-31  180
> acce2633 Pankaj Gupta 2018-08-31  181  	res.start = vpmem->start;
> acce2633 Pankaj Gupta 2018-08-31  182  	res.end   = vpmem->start +
> vpmem->size-1;
> acce2633 Pankaj Gupta 2018-08-31  183  	vpmem->nd_desc.provider_name =
> "virtio-pmem";
> acce2633 Pankaj Gupta 2018-08-31  184  	vpmem->nd_desc.module = THIS_MODULE;
> acce2633 Pankaj Gupta 2018-08-31  185
> acce2633 Pankaj Gupta 2018-08-31 @186  	vpmem->nvdimm_bus = nvdimm_bus =
> nvdimm_bus_register(&vdev->dev,
> acce2633 Pankaj Gupta 2018-08-31  187  						&vpmem->nd_desc);
> acce2633 Pankaj Gupta 2018-08-31  188  	if (!nvdimm_bus)
> acce2633 Pankaj Gupta 2018-08-31  189  		goto out_vq;
> acce2633 Pankaj Gupta 2018-08-31  190
> acce2633 Pankaj Gupta 2018-08-31  191  	dev_set_drvdata(&vdev->dev,
> nvdimm_bus);
> acce2633 Pankaj Gupta 2018-08-31  192  	memset(&ndr_desc, 0,
> sizeof(ndr_desc));
> acce2633 Pankaj Gupta 2018-08-31  193
> acce2633 Pankaj Gupta 2018-08-31  194  	ndr_desc.res = &res;
> acce2633 Pankaj Gupta 2018-08-31  195  	ndr_desc.numa_node = nid;
> acce2633 Pankaj Gupta 2018-08-31  196  	ndr_desc.flush = virtio_pmem_flush;
> acce2633 Pankaj Gupta 2018-08-31  197  	set_bit(ND_REGION_PAGEMAP,
> &ndr_desc.flags);
> acce2633 Pankaj Gupta 2018-08-31 @198  	nd_region =
> nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc);
> acce2633 Pankaj Gupta 2018-08-31  199
> acce2633 Pankaj Gupta 2018-08-31  200  	if (!nd_region)
> acce2633 Pankaj Gupta 2018-08-31  201  		goto out_nd;
> acce2633 Pankaj Gupta 2018-08-31  202
> acce2633 Pankaj Gupta 2018-08-31  203  	//virtio_device_ready(vdev);
> acce2633 Pankaj Gupta 2018-08-31  204  	return 0;
> acce2633 Pankaj Gupta 2018-08-31  205  out_nd:
> acce2633 Pankaj Gupta 2018-08-31  206  	err = -ENXIO;
> acce2633 Pankaj Gupta 2018-08-31  207  	nvdimm_bus_unregister(nvdimm_bus);
> acce2633 Pankaj Gupta 2018-08-31  208  out_vq:
> acce2633 Pankaj Gupta 2018-08-31  209  	vdev->config->del_vqs(vdev);
> acce2633 Pankaj Gupta 2018-08-31  210  out_err:
> acce2633 Pankaj Gupta 2018-08-31  211  	dev_err(&vdev->dev, "failed to
> register virtio pmem memory\n");
> acce2633 Pankaj Gupta 2018-08-31  212  	return err;
> acce2633 Pankaj Gupta 2018-08-31  213  }
> acce2633 Pankaj Gupta 2018-08-31  214
> acce2633 Pankaj Gupta 2018-08-31  215  static void virtio_pmem_remove(struct
> virtio_device *vdev)
> acce2633 Pankaj Gupta 2018-08-31  216  {
> acce2633 Pankaj Gupta 2018-08-31  217  	struct virtio_pmem *vpmem =
> vdev->priv;
> acce2633 Pankaj Gupta 2018-08-31  218  	struct nvdimm_bus *nvdimm_bus =
> dev_get_drvdata(&vdev->dev);
> acce2633 Pankaj Gupta 2018-08-31  219
> acce2633 Pankaj Gupta 2018-08-31 @220  	nvdimm_bus_unregister(nvdimm_bus);
> acce2633 Pankaj Gupta 2018-08-31  221  	vdev->config->del_vqs(vdev);
> acce2633 Pankaj Gupta 2018-08-31  222  	kfree(vpmem);
> acce2633 Pankaj Gupta 2018-08-31  223  }
> acce2633 Pankaj Gupta 2018-08-31  224
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] libnvdimm: nd_region flush callback support
  2018-09-04 15:29   ` kbuild test robot
@ 2018-09-05  8:40     ` Pankaj Gupta
  0 siblings, 0 replies; 22+ messages in thread
From: Pankaj Gupta @ 2018-09-05  8:40 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, linux-kernel, kvm, qemu-devel, linux-nvdimm, jack,
	stefanha, dan j williams, riel, nilal, kwolf, pbonzini,
	ross zwisler, david, xiaoguangrong eric, hch, mst,
	niteshnarayanlal, lcapitulino, imammedo, eblake


Hello,

Thanks for the report.

> 
> Hi Pankaj,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on linux-nvdimm/libnvdimm-for-next]
> [also build test WARNING on v4.19-rc2 next-20180831]
> [if your patch is applied to the wrong git tree, please drop us a note to
> help improve the system]
> 
> url:
> https://github.com/0day-ci/linux/commits/Pankaj-Gupta/kvm-fake-DAX-device/20180903-160032
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git
> libnvdimm-for-next
> reproduce:
>         # apt-get install sparse
>         make ARCH=x86_64 allmodconfig
>         make C=1 CF=-D__CHECK_ENDIAN__
> :::::: branch date: 7 hours ago
> :::::: commit date: 7 hours ago
> 
>    drivers/nvdimm/pmem.c:116:25: sparse: expression using sizeof(void)
>    drivers/nvdimm/pmem.c:135:25: sparse: expression using sizeof(void)
> >> drivers/nvdimm/pmem.c:204:32: sparse: incorrect type in assignment
> >> (different base types) @@    expected restricted blk_status_t [usertype]
> >> bi_status @@    got e] bi_status @@

I will fix this in V2. Will wait for any review comments and address in v2.

Thanks,
Pankaj

>    drivers/nvdimm/pmem.c:204:32:    expected restricted blk_status_t
>    [usertype] bi_status
>    drivers/nvdimm/pmem.c:204:32:    got int
>    drivers/nvdimm/pmem.c:208:9: sparse: expression using sizeof(void)
>    drivers/nvdimm/pmem.c:208:9: sparse: expression using sizeof(void)
>    include/linux/bvec.h:82:37: sparse: expression using sizeof(void)
>    include/linux/bvec.h:82:37: sparse: expression using sizeof(void)
>    include/linux/bvec.h:83:32: sparse: expression using sizeof(void)
>    include/linux/bvec.h:83:32: sparse: expression using sizeof(void)
>    drivers/nvdimm/pmem.c:220:32: sparse: incorrect type in assignment
>    (different base types) @@    expected restricted blk_status_t [usertype]
>    bi_status @@    got e] bi_status @@
>    drivers/nvdimm/pmem.c:220:32:    expected restricted blk_status_t
>    [usertype] bi_status
>    drivers/nvdimm/pmem.c:220:32:    got int
> 
> #
> https://github.com/0day-ci/linux/commit/69b95edd2a1f4676361988fa36866b59427e2cfa
> git remote add linux-review https://github.com/0day-ci/linux
> git remote update linux-review
> git checkout 69b95edd2a1f4676361988fa36866b59427e2cfa
> vim +204 drivers/nvdimm/pmem.c
> 
> 59e647398 drivers/nvdimm/pmem.c Dan Williams      2016-03-08  107
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  108  static
> void write_pmem(void *pmem_addr, struct page *page,
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  109  		unsigned
> int off, unsigned int len)
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  110  {
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  111  	unsigned
> int chunk;
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  112  	void
> *mem;
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  113
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  114  	while
> (len) {
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  115  		mem =
> kmap_atomic(page);
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06 @116  		chunk =
> min_t(unsigned int, len, PAGE_SIZE);
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  117
> 		memcpy_flushcache(pmem_addr, mem + off, chunk);
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  118
> 		kunmap_atomic(mem);
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  119  		len -=
> chunk;
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  120  		off = 0;
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  121  		page++;
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  122
> 		pmem_addr += PAGE_SIZE;
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  123  	}
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  124  }
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  125
> 4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  126  static
> blk_status_t read_pmem(struct page *page, unsigned int off,
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  127  		void
> *pmem_addr, unsigned int len)
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  128  {
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  129  	unsigned
> int chunk;
> 60622d682 drivers/nvdimm/pmem.c Dan Williams      2018-05-03  130  	unsigned
> long rem;
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  131  	void
> *mem;
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  132
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  133  	while
> (len) {
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  134  		mem =
> kmap_atomic(page);
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  135  		chunk =
> min_t(unsigned int, len, PAGE_SIZE);
> 60622d682 drivers/nvdimm/pmem.c Dan Williams      2018-05-03  136  		rem =
> memcpy_mcsafe(mem + off, pmem_addr, chunk);
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  137
> 		kunmap_atomic(mem);
> 60622d682 drivers/nvdimm/pmem.c Dan Williams      2018-05-03  138  		if (rem)
> 4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  139  			return
> BLK_STS_IOERR;
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  140  		len -=
> chunk;
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  141  		off = 0;
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  142  		page++;
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  143
> 		pmem_addr += PAGE_SIZE;
> 98cc093cb drivers/nvdimm/pmem.c Huang Ying        2017-09-06  144  	}
> 4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  145  	return
> BLK_STS_OK;
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  146  }
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  147
> 4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  148  static
> blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
> 3f289dcb4 drivers/nvdimm/pmem.c Tejun Heo         2018-07-18  149
> 			unsigned int len, unsigned int off, unsigned int op,
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  150
> 			sector_t sector)
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  151  {
> 4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  152
> 	blk_status_t rc = BLK_STS_OK;
> 59e647398 drivers/nvdimm/pmem.c Dan Williams      2016-03-08  153  	bool
> bad_pmem = false;
> 32ab0a3f5 drivers/nvdimm/pmem.c Dan Williams      2015-08-01  154
> 	phys_addr_t pmem_off = sector * 512 + pmem->data_offset;
> 7a9eb2066 drivers/nvdimm/pmem.c Dan Williams      2016-06-03  155  	void
> *pmem_addr = pmem->virt_addr + pmem_off;
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  156
> e10624f8c drivers/nvdimm/pmem.c Dan Williams      2016-01-06  157  	if
> (unlikely(is_bad_pmem(&pmem->bb, sector, len)))
> 59e647398 drivers/nvdimm/pmem.c Dan Williams      2016-03-08  158  		bad_pmem
> = true;
> 59e647398 drivers/nvdimm/pmem.c Dan Williams      2016-03-08  159
> 3f289dcb4 drivers/nvdimm/pmem.c Tejun Heo         2018-07-18  160  	if
> (!op_is_write(op)) {
> 59e647398 drivers/nvdimm/pmem.c Dan Williams      2016-03-08  161  		if
> (unlikely(bad_pmem))
> 4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  162  			rc =
> BLK_STS_IOERR;
> b5ebc8ec6 drivers/nvdimm/pmem.c Dan Williams      2016-03-06  163  		else {
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  164  			rc =
> read_pmem(page, off, pmem_addr, len);
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  165
> 			flush_dcache_page(page);
> b5ebc8ec6 drivers/nvdimm/pmem.c Dan Williams      2016-03-06  166  		}
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  167  	} else {
> 0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  168  		/*
> 0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  169  		 * Note
> that we write the data both before and after
> 0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  170  		 *
> clearing poison.  The write before clear poison
> 0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  171  		 *
> handles situations where the latest written data is
> 0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  172  		 *
> preserved and the clear poison operation simply marks
> 0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  173  		 * the
> address range as valid without changing the data.
> 0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  174  		 * In
> this case application software can assume that an
> 0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  175  		 *
> interrupted write will either return the new good
> 0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  176  		 * data
> or an error.
> 0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  177  		 *
> 0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  178  		 *
> However, if pmem_clear_poison() leaves the data in an
> 0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  179  		 *
> indeterminate state we need to perform the write
> 0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  180  		 * after
> clear poison.
> 0a370d261 drivers/nvdimm/pmem.c Dan Williams      2016-04-14  181  		 */
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  182
> 		flush_dcache_page(page);
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  183
> 		write_pmem(pmem_addr, page, off, len);
> 59e647398 drivers/nvdimm/pmem.c Dan Williams      2016-03-08  184  		if
> (unlikely(bad_pmem)) {
> 3115bb02b drivers/nvdimm/pmem.c Toshi Kani        2016-10-13  185  			rc =
> pmem_clear_poison(pmem, pmem_off, len);
> bd697a80c drivers/nvdimm/pmem.c Vishal Verma      2016-09-30  186
> 			write_pmem(pmem_addr, page, off, len);
> 59e647398 drivers/nvdimm/pmem.c Dan Williams      2016-03-08  187  		}
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  188  	}
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  189
> b5ebc8ec6 drivers/nvdimm/pmem.c Dan Williams      2016-03-06  190  	return
> rc;
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  191  }
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  192
> dece16353 drivers/nvdimm/pmem.c Jens Axboe        2015-11-05  193  static
> blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  194  {
> 4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  195
> 	blk_status_t rc = 0;
> f0dc089ce drivers/nvdimm/pmem.c Dan Williams      2015-05-16  196  	bool
> do_acct;
> f0dc089ce drivers/nvdimm/pmem.c Dan Williams      2015-05-16  197  	unsigned
> long start;
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  198  	struct
> bio_vec bvec;
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  199  	struct
> bvec_iter iter;
> bd842b8ca drivers/nvdimm/pmem.c Dan Williams      2016-03-18  200  	struct
> pmem_device *pmem = q->queuedata;
> 7e267a8c7 drivers/nvdimm/pmem.c Dan Williams      2016-06-01  201  	struct
> nd_region *nd_region = to_region(pmem);
> 7e267a8c7 drivers/nvdimm/pmem.c Dan Williams      2016-06-01  202
> d2d6364dc drivers/nvdimm/pmem.c Ross Zwisler      2018-06-06  203  	if
> (bio->bi_opf & REQ_PREFLUSH)
> 69b95edd2 drivers/nvdimm/pmem.c Pankaj Gupta      2018-08-31 @204
> 		bio->bi_status = nd_region->flush(nd_region);
> 69b95edd2 drivers/nvdimm/pmem.c Pankaj Gupta      2018-08-31  205
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  206
> f0dc089ce drivers/nvdimm/pmem.c Dan Williams      2015-05-16  207  	do_acct =
> nd_iostat_start(bio, &start);
> e10624f8c drivers/nvdimm/pmem.c Dan Williams      2016-01-06  208
> 	bio_for_each_segment(bvec, bio, iter) {
> e10624f8c drivers/nvdimm/pmem.c Dan Williams      2016-01-06  209  		rc =
> pmem_do_bvec(pmem, bvec.bv_page, bvec.bv_len,
> 3f289dcb4 drivers/nvdimm/pmem.c Tejun Heo         2018-07-18  210
> 				bvec.bv_offset, bio_op(bio), iter.bi_sector);
> e10624f8c drivers/nvdimm/pmem.c Dan Williams      2016-01-06  211  		if (rc)
> {
> 4e4cbee93 drivers/nvdimm/pmem.c Christoph Hellwig 2017-06-03  212
> 			bio->bi_status = rc;
> e10624f8c drivers/nvdimm/pmem.c Dan Williams      2016-01-06  213  			break;
> e10624f8c drivers/nvdimm/pmem.c Dan Williams      2016-01-06  214  		}
> e10624f8c drivers/nvdimm/pmem.c Dan Williams      2016-01-06  215  	}
> f0dc089ce drivers/nvdimm/pmem.c Dan Williams      2015-05-16  216  	if
> (do_acct)
> f0dc089ce drivers/nvdimm/pmem.c Dan Williams      2015-05-16  217
> 		nd_iostat_end(bio, start);
> 61031952f drivers/nvdimm/pmem.c Ross Zwisler      2015-06-25  218
> 1eff9d322 drivers/nvdimm/pmem.c Jens Axboe        2016-08-05  219  	if
> (bio->bi_opf & REQ_FUA)
> 69b95edd2 drivers/nvdimm/pmem.c Pankaj Gupta      2018-08-31  220
> 		bio->bi_status = nd_region->flush(nd_region);
> 61031952f drivers/nvdimm/pmem.c Ross Zwisler      2015-06-25  221
> 4246a0b63 drivers/nvdimm/pmem.c Christoph Hellwig 2015-07-20  222
> 	bio_endio(bio);
> dece16353 drivers/nvdimm/pmem.c Jens Axboe        2015-11-05  223  	return
> BLK_QC_T_NONE;
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  224  }
> 9e853f231 drivers/block/pmem.c  Ross Zwisler      2015-04-01  225
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3]  virtio-pmem: Add virtio pmem driver
  2018-08-31 13:30 ` [PATCH 3/3] virtio-pmem: Add virtio pmem driver Pankaj Gupta
  2018-09-04 15:17   ` kbuild test robot
@ 2018-09-05 12:02   ` kbuild test robot
  2018-09-12 16:54   ` Luiz Capitulino
  2018-09-22  1:08   ` Dan Williams
  3 siblings, 0 replies; 22+ messages in thread
From: kbuild test robot @ 2018-09-05 12:02 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: kbuild-all, linux-kernel, kvm, qemu-devel, linux-nvdimm, jack,
	stefanha, dan.j.williams, riel, nilal, kwolf, pbonzini,
	ross.zwisler, david, xiaoguangrong.eric, hch, mst,
	niteshnarayanlal, lcapitulino, imammedo, eblake, pagupta

[-- Attachment #1: Type: text/plain, Size: 1295 bytes --]

Hi Pankaj,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linux-nvdimm/libnvdimm-for-next]
[also build test ERROR on v4.19-rc2 next-20180905]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Pankaj-Gupta/kvm-fake-DAX-device/20180903-160032
base:   https://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm.git libnvdimm-for-next
config: i386-allyesconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/virtio/virtio_pmem.o: In function `virtio_pmem_remove':
>> virtio_pmem.c:(.text+0x299): undefined reference to `nvdimm_bus_unregister'
   drivers/virtio/virtio_pmem.o: In function `virtio_pmem_probe':
>> virtio_pmem.c:(.text+0x5e3): undefined reference to `nvdimm_bus_register'
>> virtio_pmem.c:(.text+0x62a): undefined reference to `nvdimm_pmem_region_create'
   virtio_pmem.c:(.text+0x63b): undefined reference to `nvdimm_bus_unregister'

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 64478 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3]  virtio-pmem: Add virtio pmem driver
  2018-08-31 13:30 ` [PATCH 3/3] virtio-pmem: Add virtio pmem driver Pankaj Gupta
  2018-09-04 15:17   ` kbuild test robot
  2018-09-05 12:02   ` kbuild test robot
@ 2018-09-12 16:54   ` Luiz Capitulino
  2018-09-13  6:58     ` [Qemu-devel] " Pankaj Gupta
  2018-09-22  1:08   ` Dan Williams
  3 siblings, 1 reply; 22+ messages in thread
From: Luiz Capitulino @ 2018-09-12 16:54 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-kernel, kvm, qemu-devel, linux-nvdimm, jack, stefanha,
	dan.j.williams, riel, nilal, kwolf, pbonzini, ross.zwisler,
	david, xiaoguangrong.eric, hch, mst, niteshnarayanlal, imammedo,
	eblake

On Fri, 31 Aug 2018 19:00:18 +0530
Pankaj Gupta <pagupta@redhat.com> wrote:

> This patch adds virtio-pmem driver for KVM guest.
> 
> Guest reads the persistent memory range information from
> Qemu over VIRTIO and registers it on nvdimm_bus. It also
> creates a nd_region object with the persistent memory
> range information so that existing 'nvdimm/pmem' driver
> can reserve this into system memory map. This way
> 'virtio-pmem' driver uses existing functionality of pmem
> driver to register persistent memory compatible for DAX
> capable filesystems.
> 
> This also provides function to perform guest flush over
> VIRTIO from 'pmem' driver when userspace performs flush
> on DAX memory range.
> 
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  drivers/virtio/Kconfig           |   9 ++
>  drivers/virtio/Makefile          |   1 +
>  drivers/virtio/virtio_pmem.c     | 255 +++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/virtio_ids.h  |   1 +
>  include/uapi/linux/virtio_pmem.h |  40 ++++++
>  5 files changed, 306 insertions(+)
>  create mode 100644 drivers/virtio/virtio_pmem.c
>  create mode 100644 include/uapi/linux/virtio_pmem.h
> 
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 3589764..a331e23 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -42,6 +42,15 @@ config VIRTIO_PCI_LEGACY
>  
>  	  If unsure, say Y.
>  
> +config VIRTIO_PMEM
> +	tristate "Support for virtio pmem driver"
> +	depends on VIRTIO
> +	help
> +	This driver provides support for virtio based flushing interface
> +	for persistent memory range.
> +
> +	If unsure, say M.
> +
>  config VIRTIO_BALLOON
>  	tristate "Virtio balloon driver"
>  	depends on VIRTIO
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 3a2b5c5..cbe91c6 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o
> diff --git a/drivers/virtio/virtio_pmem.c b/drivers/virtio/virtio_pmem.c
> new file mode 100644
> index 0000000..c22cc87
> --- /dev/null
> +++ b/drivers/virtio/virtio_pmem.c
> @@ -0,0 +1,255 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + */
> +#include <linux/virtio.h>
> +#include <linux/module.h>
> +#include <linux/virtio_ids.h>
> +#include <linux/virtio_config.h>
> +#include <uapi/linux/virtio_pmem.h>
> +#include <linux/spinlock.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/nd.h>
> +
> +struct virtio_pmem_request {
> +	/* Host return status corresponding to flush request */
> +	int ret;
> +
> +	/* command name*/
> +	char name[16];
> +
> +	/* Wait queue to process deferred work after ack from host */
> +	wait_queue_head_t host_acked;
> +	bool done;
> +
> +	/* Wait queue to process deferred work after virt queue buffer avail */
> +	wait_queue_head_t wq_buf;
> +	bool wq_buf_avail;
> +	struct list_head list;
> +};
> +
> +struct virtio_pmem {
> +	struct virtio_device *vdev;
> +
> +	/* Virtio pmem request queue */
> +	struct virtqueue *req_vq;
> +
> +	/* nvdimm bus registers virtio pmem device */
> +	struct nvdimm_bus *nvdimm_bus;
> +	struct nvdimm_bus_descriptor nd_desc;
> +
> +	/* List to store deferred work if virtqueue is full */
> +	struct list_head req_list;
> +
> +	/* Synchronize virtqueue data */
> +	spinlock_t pmem_lock;
> +
> +	/* Memory region information */
> +	uint64_t start;
> +	uint64_t size;
> +};
> +
> +static struct virtio_device_id id_table[] = {
> +	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> +	{ 0 },
> +};
> +
> + /* The interrupt handler */
> +static void host_ack(struct virtqueue *vq)
> +{
> +	unsigned int len;
> +	unsigned long flags;
> +	struct virtio_pmem_request *req, *req_buf;
> +	struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> +		req->done = true;
> +		wake_up(&req->host_acked);
> +
> +		if (!list_empty(&vpmem->req_list)) {
> +			req_buf = list_first_entry(&vpmem->req_list,
> +					struct virtio_pmem_request, list);
> +			list_del(&vpmem->req_list);
> +			req_buf->wq_buf_avail = true;
> +			wake_up(&req_buf->wq_buf);
> +		}
> +	}
> +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +}
> + /* Initialize virt queue */
> +static int init_vq(struct virtio_pmem *vpmem)
> +{
> +	struct virtqueue *vq;
> +
> +	/* single vq */
> +	vpmem->req_vq = vq = virtio_find_single_vq(vpmem->vdev,
> +				host_ack, "flush_queue");
> +	if (IS_ERR(vq))
> +		return PTR_ERR(vq);
> +
> +	spin_lock_init(&vpmem->pmem_lock);
> +	INIT_LIST_HEAD(&vpmem->req_list);
> +
> +	return 0;
> +};
> +
> + /* The request submission function */
> +static int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> +	int err;
> +	unsigned long flags;
> +	struct scatterlist *sgs[2], sg, ret;
> +	struct virtio_device *vdev =
> +		dev_to_virtio(nd_region->dev.parent->parent);
> +	struct virtio_pmem *vpmem = vdev->priv;

I'm missing a might_sleep() call in this function.

> +	struct virtio_pmem_request *req = kmalloc(sizeof(*req), GFP_KERNEL);
> +
> +	if (!req)
> +		return -ENOMEM;
> +
> +	req->done = req->wq_buf_avail = false;
> +	strcpy(req->name, "FLUSH");
> +	init_waitqueue_head(&req->host_acked);
> +	init_waitqueue_head(&req->wq_buf);
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	sg_init_one(&sg, req->name, strlen(req->name));
> +	sgs[0] = &sg;
> +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> +	sgs[1] = &ret;

It seems that sg_init_one() is only setting fields, in this
case you can move spin_lock_irqsave() here.

> +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> +	if (err) {
> +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +
> +		list_add_tail(&vpmem->req_list, &req->list);
> +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +		/* When host has read buffer, this completes via host_ack */
> +		wait_event(req->wq_buf, req->wq_buf_avail);
> +		spin_lock_irqsave(&vpmem->pmem_lock, flags);

Is this error handling code assuming that at some point
virtqueue_add_sgs() will succeed for a different thread? If yes,
what happens if the assumption is false? That is, what happens if
virtqueue_add_sgs() never succeeds anymore?

Why not just return an error?

> +	}
> +	virtqueue_kick(vpmem->req_vq);
> +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +	/* When host has read buffer, this completes via host_ack */
> +	wait_event(req->host_acked, req->done);
> +	err = req->ret;

If I'm understanding the QEMU code correctly, you're returning EIO
from QEMU if fsync() fails. I think this is wrong, since we don't know
if EIO in QEMU will be the same EIO in the guest. One way to solve this
would be to return 0 for success and 1 for failure from QEMU, and let the
guest implementation pick its error code (for your implementation it
could be EIO).

> +	kfree(req);
> +
> +	return err;
> +};
> +EXPORT_SYMBOL_GPL(virtio_pmem_flush);
> +
> +static int virtio_pmem_probe(struct virtio_device *vdev)
> +{
> +	int err = 0;
> +	struct resource res;
> +	struct virtio_pmem *vpmem;
> +	struct nvdimm_bus *nvdimm_bus;
> +	struct nd_region_desc ndr_desc;
> +	int nid = dev_to_node(&vdev->dev);
> +	struct nd_region *nd_region;
> +
> +	if (!vdev->config->get) {
> +		dev_err(&vdev->dev, "%s failure: config disabled\n",
> +			__func__);
> +		return -EINVAL;
> +	}
> +
> +	vdev->priv = vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem),
> +			GFP_KERNEL);
> +	if (!vpmem) {
> +		err = -ENOMEM;
> +		goto out_err;
> +	}
> +
> +	vpmem->vdev = vdev;
> +	err = init_vq(vpmem);
> +	if (err)
> +		goto out_err;
> +
> +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +			start, &vpmem->start);
> +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +			size, &vpmem->size);
> +
> +	res.start = vpmem->start;
> +	res.end   = vpmem->start + vpmem->size-1;
> +	vpmem->nd_desc.provider_name = "virtio-pmem";
> +	vpmem->nd_desc.module = THIS_MODULE;
> +
> +	vpmem->nvdimm_bus = nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> +						&vpmem->nd_desc);
> +	if (!nvdimm_bus)
> +		goto out_vq;
> +
> +	dev_set_drvdata(&vdev->dev, nvdimm_bus);
> +	memset(&ndr_desc, 0, sizeof(ndr_desc));
> +
> +	ndr_desc.res = &res;
> +	ndr_desc.numa_node = nid;
> +	ndr_desc.flush = virtio_pmem_flush;
> +	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> +	nd_region = nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc);
> +
> +	if (!nd_region)
> +		goto out_nd;
> +
> +	//virtio_device_ready(vdev);
> +	return 0;
> +out_nd:
> +	err = -ENXIO;
> +	nvdimm_bus_unregister(nvdimm_bus);
> +out_vq:
> +	vdev->config->del_vqs(vdev);
> +out_err:
> +	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> +	return err;
> +}
> +
> +static void virtio_pmem_remove(struct virtio_device *vdev)
> +{
> +	struct virtio_pmem *vpmem = vdev->priv;
> +	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> +
> +	nvdimm_bus_unregister(nvdimm_bus);
> +	vdev->config->del_vqs(vdev);
> +	kfree(vpmem);
> +}
> +
> +#ifdef CONFIG_PM_SLEEP
> +static int virtio_pmem_freeze(struct virtio_device *vdev)
> +{
> +	/* todo: handle freeze function */
> +	return -EPERM;
> +}
> +
> +static int virtio_pmem_restore(struct virtio_device *vdev)
> +{
> +	/* todo: handle restore function */
> +	return -EPERM;
> +}
> +#endif
> +
> +
> +static struct virtio_driver virtio_pmem_driver = {
> +	.driver.name		= KBUILD_MODNAME,
> +	.driver.owner		= THIS_MODULE,
> +	.id_table		= id_table,
> +	.probe			= virtio_pmem_probe,
> +	.remove			= virtio_pmem_remove,
> +#ifdef CONFIG_PM_SLEEP
> +	.freeze                 = virtio_pmem_freeze,
> +	.restore                = virtio_pmem_restore,
> +#endif
> +};
> +
> +module_virtio_driver(virtio_pmem_driver);
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +MODULE_DESCRIPTION("Virtio pmem driver");
> +MODULE_LICENSE("GPL");
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 6d5c3b2..3463895 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -43,5 +43,6 @@
>  #define VIRTIO_ID_INPUT        18 /* virtio input */
>  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
>  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> +#define VIRTIO_ID_PMEM         25 /* virtio pmem */
>  
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
> new file mode 100644
> index 0000000..c7c22a5
> --- /dev/null
> +++ b/include/uapi/linux/virtio_pmem.h
> @@ -0,0 +1,40 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
> + * anyone can use the definitions to implement compatible drivers/servers:
> + *
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of IBM nor the names of its contributors
> + *    may be used to endorse or promote products derived from this software
> + *    without specific prior written permission.
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS''
> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + *
> + * Copyright (C) Red Hat, Inc., 2018-2019
> + * Copyright (C) Pankaj Gupta <pagupta@redhat.com>, 2018
> + */
> +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> +#define _UAPI_LINUX_VIRTIO_PMEM_H
> +
> +struct virtio_pmem_config {
> +	__le64 start;
> +	__le64 size;
> +};
> +#endif


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] qemu: Add virtio pmem device
  2018-08-31 13:30 ` [PATCH] qemu: Add virtio pmem device Pankaj Gupta
@ 2018-09-12 16:57   ` Luiz Capitulino
  2018-09-13  7:06     ` Pankaj Gupta
  2018-09-20 11:21   ` David Hildenbrand
  1 sibling, 1 reply; 22+ messages in thread
From: Luiz Capitulino @ 2018-09-12 16:57 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-kernel, kvm, qemu-devel, linux-nvdimm, jack, stefanha,
	dan.j.williams, riel, nilal, kwolf, pbonzini, ross.zwisler,
	david, xiaoguangrong.eric, hch, mst, niteshnarayanlal, imammedo,
	eblake

On Fri, 31 Aug 2018 19:00:19 +0530
Pankaj Gupta <pagupta@redhat.com> wrote:

>  This patch adds virtio-pmem Qemu device.
> 
>  This device presents memory address range information to guest
>  which is backed by file backend type. It acts like persistent
>  memory device for KVM guest. Guest can perform read and 
>  persistent write operations on this memory range with the help 
>  of DAX capable filesystem.
> 
>  Persistent guest writes are assured with the help of virtio 
>  based flushing interface. When guest userspace space performs 
>  fsync on file fd on pmem device, a flush command is send to 
>  Qemu over VIRTIO and host side flush/sync is done on backing 
>  image file.
> 
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
> Changes from RFC v3:
> - Return EIO for host fsync failure instead of errno - Luiz, Stefan
> - Change version for inclusion to Qemu 3.1 - Eric
> 
> Changes from RFC v2:
> - Use aio_worker() to avoid Qemu from hanging with blocking fsync
>   call - Stefan
> - Use virtio_st*_p() for endianess - Stefan
> - Correct indentation in qapi/misc.json - Eric
> 
>  hw/virtio/Makefile.objs                     |   3 +
>  hw/virtio/virtio-pci.c                      |  44 +++++
>  hw/virtio/virtio-pci.h                      |  14 ++
>  hw/virtio/virtio-pmem.c                     | 241 ++++++++++++++++++++++++++++
>  include/hw/pci/pci.h                        |   1 +
>  include/hw/virtio/virtio-pmem.h             |  42 +++++
>  include/standard-headers/linux/virtio_ids.h |   1 +
>  qapi/misc.json                              |  26 ++-
>  8 files changed, 371 insertions(+), 1 deletion(-)
>  create mode 100644 hw/virtio/virtio-pmem.c
>  create mode 100644 include/hw/virtio/virtio-pmem.h
> 
> diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> index 1b2799cfd8..7f914d45d0 100644
> --- a/hw/virtio/Makefile.objs
> +++ b/hw/virtio/Makefile.objs
> @@ -10,6 +10,9 @@ obj-$(CONFIG_VIRTIO_CRYPTO) += virtio-crypto.o
>  obj-$(call land,$(CONFIG_VIRTIO_CRYPTO),$(CONFIG_VIRTIO_PCI)) += virtio-crypto-pci.o
>  
>  obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
> +ifeq ($(CONFIG_MEM_HOTPLUG),y)
> +obj-$(CONFIG_LINUX) += virtio-pmem.o
> +endif
>  obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
>  endif
>  
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 3a01fe90f0..93d3fc05c7 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -2521,6 +2521,49 @@ static const TypeInfo virtio_rng_pci_info = {
>      .class_init    = virtio_rng_pci_class_init,
>  };
>  
> +/* virtio-pmem-pci */
> +
> +static void virtio_pmem_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
> +{
> +    VirtIOPMEMPCI *vpmem = VIRTIO_PMEM_PCI(vpci_dev);
> +    DeviceState *vdev = DEVICE(&vpmem->vdev);
> +
> +    qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
> +    object_property_set_bool(OBJECT(vdev), true, "realized", errp);
> +}
> +
> +static void virtio_pmem_pci_class_init(ObjectClass *klass, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
> +    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
> +    k->realize = virtio_pmem_pci_realize;
> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> +    pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
> +    pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_PMEM;
> +    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
> +    pcidev_k->class_id = PCI_CLASS_OTHERS;
> +}
> +
> +static void virtio_pmem_pci_instance_init(Object *obj)
> +{
> +    VirtIOPMEMPCI *dev = VIRTIO_PMEM_PCI(obj);
> +
> +    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
> +                                TYPE_VIRTIO_PMEM);
> +    object_property_add_alias(obj, "memdev", OBJECT(&dev->vdev), "memdev",
> +                              &error_abort);
> +}
> +
> +static const TypeInfo virtio_pmem_pci_info = {
> +    .name          = TYPE_VIRTIO_PMEM_PCI,
> +    .parent        = TYPE_VIRTIO_PCI,
> +    .instance_size = sizeof(VirtIOPMEMPCI),
> +    .instance_init = virtio_pmem_pci_instance_init,
> +    .class_init    = virtio_pmem_pci_class_init,
> +};
> +
> +
>  /* virtio-input-pci */
>  
>  static Property virtio_input_pci_properties[] = {
> @@ -2714,6 +2757,7 @@ static void virtio_pci_register_types(void)
>      type_register_static(&virtio_balloon_pci_info);
>      type_register_static(&virtio_serial_pci_info);
>      type_register_static(&virtio_net_pci_info);
> +    type_register_static(&virtio_pmem_pci_info);
>  #ifdef CONFIG_VHOST_SCSI
>      type_register_static(&vhost_scsi_pci_info);
>  #endif
> diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> index 813082b0d7..fe74fcad3f 100644
> --- a/hw/virtio/virtio-pci.h
> +++ b/hw/virtio/virtio-pci.h
> @@ -19,6 +19,7 @@
>  #include "hw/virtio/virtio-blk.h"
>  #include "hw/virtio/virtio-net.h"
>  #include "hw/virtio/virtio-rng.h"
> +#include "hw/virtio/virtio-pmem.h"
>  #include "hw/virtio/virtio-serial.h"
>  #include "hw/virtio/virtio-scsi.h"
>  #include "hw/virtio/virtio-balloon.h"
> @@ -57,6 +58,7 @@ typedef struct VirtIOInputHostPCI VirtIOInputHostPCI;
>  typedef struct VirtIOGPUPCI VirtIOGPUPCI;
>  typedef struct VHostVSockPCI VHostVSockPCI;
>  typedef struct VirtIOCryptoPCI VirtIOCryptoPCI;
> +typedef struct VirtIOPMEMPCI VirtIOPMEMPCI;
>  
>  /* virtio-pci-bus */
>  
> @@ -274,6 +276,18 @@ struct VirtIOBlkPCI {
>      VirtIOBlock vdev;
>  };
>  
> +/*
> + * virtio-pmem-pci: This extends VirtioPCIProxy.
> + */
> +#define TYPE_VIRTIO_PMEM_PCI "virtio-pmem-pci"
> +#define VIRTIO_PMEM_PCI(obj) \
> +        OBJECT_CHECK(VirtIOPMEMPCI, (obj), TYPE_VIRTIO_PMEM_PCI)
> +
> +struct VirtIOPMEMPCI {
> +    VirtIOPCIProxy parent_obj;
> +    VirtIOPMEM vdev;
> +};
> +
>  /*
>   * virtio-balloon-pci: This extends VirtioPCIProxy.
>   */
> diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
> new file mode 100644
> index 0000000000..69ae4c0a50
> --- /dev/null
> +++ b/hw/virtio/virtio-pmem.c
> @@ -0,0 +1,241 @@
> +/*
> + * Virtio pmem device
> + *
> + * Copyright (C) 2018 Red Hat, Inc.
> + * Copyright (C) 2018 Pankaj Gupta <pagupta@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu-common.h"
> +#include "qemu/error-report.h"
> +#include "hw/virtio/virtio-access.h"
> +#include "hw/virtio/virtio-pmem.h"
> +#include "hw/mem/memory-device.h"
> +#include "block/aio.h"
> +#include "block/thread-pool.h"
> +
> +typedef struct VirtIOPMEMresp {
> +    int ret;
> +} VirtIOPMEMResp;
> +
> +typedef struct VirtIODeviceRequest {
> +    VirtQueueElement elem;
> +    int fd;
> +    VirtIOPMEM *pmem;
> +    VirtIOPMEMResp resp;
> +} VirtIODeviceRequest;
> +
> +static int worker_cb(void *opaque)
> +{
> +    VirtIODeviceRequest *req = opaque;
> +    int err = 0;
> +
> +    /* flush raw backing image */
> +    err = fsync(req->fd);
> +    if (err != 0) {
> +        err = EIO;
> +    }
> +    req->resp.ret = err;

As I mentioned in the kernel patch, I think you should 1 for
error and let the guest pick the error it wants to return to
the calling thread.

> +
> +    return 0;
> +}
> +
> +static void done_cb(void *opaque, int ret)
> +{
> +    VirtIODeviceRequest *req = opaque;
> +    int len = iov_from_buf(req->elem.in_sg, req->elem.in_num, 0,
> +                              &req->resp, sizeof(VirtIOPMEMResp));
> +
> +    /* Callbacks are serialized, so no need to use atomic ops.  */
> +    virtqueue_push(req->pmem->rq_vq, &req->elem, len);
> +    virtio_notify((VirtIODevice *)req->pmem, req->pmem->rq_vq);
> +    g_free(req);
> +}
> +
> +static void virtio_pmem_flush(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +    VirtIODeviceRequest *req;
> +    VirtIOPMEM *pmem = VIRTIO_PMEM(vdev);
> +    HostMemoryBackend *backend = MEMORY_BACKEND(pmem->memdev);
> +    ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context());
> +
> +    req = virtqueue_pop(vq, sizeof(VirtIODeviceRequest));
> +    if (!req) {
> +        virtio_error(vdev, "virtio-pmem missing request data");
> +        return;
> +    }
> +
> +    if (req->elem.out_num < 1 || req->elem.in_num < 1) {
> +        virtio_error(vdev, "virtio-pmem request not proper");
> +        g_free(req);
> +        return;
> +    }

I think you should abort() in those errors.

> +    req->fd = memory_region_get_fd(&backend->mr);
> +    req->pmem = pmem;
> +    thread_pool_submit_aio(pool, worker_cb, req, done_cb, req);
> +}
> +
> +static void virtio_pmem_get_config(VirtIODevice *vdev, uint8_t *config)
> +{
> +    VirtIOPMEM *pmem = VIRTIO_PMEM(vdev);
> +    struct virtio_pmem_config *pmemcfg = (struct virtio_pmem_config *) config;
> +
> +    virtio_stq_p(vdev, &pmemcfg->start, pmem->start);
> +    virtio_stq_p(vdev, &pmemcfg->size, pmem->size);
> +}
> +
> +static uint64_t virtio_pmem_get_features(VirtIODevice *vdev, uint64_t features,
> +                                        Error **errp)
> +{
> +    return features;
> +}
> +
> +static void virtio_pmem_realize(DeviceState *dev, Error **errp)
> +{
> +    VirtIODevice   *vdev   = VIRTIO_DEVICE(dev);
> +    VirtIOPMEM     *pmem   = VIRTIO_PMEM(dev);
> +    MachineState   *ms     = MACHINE(qdev_get_machine());
> +    uint64_t align;
> +    Error *local_err = NULL;
> +    MemoryRegion *mr;
> +
> +    if (!pmem->memdev) {
> +        error_setg(errp, "virtio-pmem memdev not set");
> +        return;
> +    }
> +
> +    mr  = host_memory_backend_get_memory(pmem->memdev);
> +    align = memory_region_get_alignment(mr);
> +    pmem->size = QEMU_ALIGN_DOWN(memory_region_size(mr), align);
> +    pmem->start = memory_device_get_free_addr(ms, NULL, align, pmem->size,
> +                                                               &local_err);
> +    if (local_err) {
> +        error_setg(errp, "Can't get free address in mem device");
> +        return;
> +    }
> +    memory_region_init_alias(&pmem->mr, OBJECT(pmem),
> +                             "virtio_pmem-memory", mr, 0, pmem->size);
> +    memory_device_plug_region(ms, &pmem->mr, pmem->start);
> +
> +    host_memory_backend_set_mapped(pmem->memdev, true);
> +    virtio_init(vdev, TYPE_VIRTIO_PMEM, VIRTIO_ID_PMEM,
> +                                          sizeof(struct virtio_pmem_config));
> +    pmem->rq_vq = virtio_add_queue(vdev, 128, virtio_pmem_flush);
> +}
> +
> +static void virtio_mem_check_memdev(Object *obj, const char *name, Object *val,
> +                                    Error **errp)
> +{
> +    if (host_memory_backend_is_mapped(MEMORY_BACKEND(val))) {
> +        char *path = object_get_canonical_path_component(val);
> +        error_setg(errp, "Can't use already busy memdev: %s", path);
> +        g_free(path);
> +        return;
> +    }
> +
> +    qdev_prop_allow_set_link_before_realize(obj, name, val, errp);
> +}
> +
> +static const char *virtio_pmem_get_device_id(VirtIOPMEM *vm)
> +{
> +    Object *obj = OBJECT(vm);
> +    DeviceState *parent_dev;
> +
> +    /* always use the ID of the proxy device */
> +    if (obj->parent && object_dynamic_cast(obj->parent, TYPE_DEVICE)) {
> +        parent_dev = DEVICE(obj->parent);
> +        return parent_dev->id;
> +    }
> +    return NULL;
> +}
> +
> +static void virtio_pmem_md_fill_device_info(const MemoryDeviceState *md,
> +                                           MemoryDeviceInfo *info)
> +{
> +    VirtioPMemDeviceInfo *vi = g_new0(VirtioPMemDeviceInfo, 1);
> +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> +    const char *id = virtio_pmem_get_device_id(vm);
> +
> +    if (id) {
> +        vi->has_id = true;
> +        vi->id = g_strdup(id);
> +    }
> +
> +    vi->start = vm->start;
> +    vi->size = vm->size;
> +    vi->memdev = object_get_canonical_path(OBJECT(vm->memdev));
> +
> +    info->u.virtio_pmem.data = vi;
> +    info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_PMEM;
> +}
> +
> +static uint64_t virtio_pmem_md_get_addr(const MemoryDeviceState *md)
> +{
> +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> +
> +    return vm->start;
> +}
> +
> +static uint64_t virtio_pmem_md_get_plugged_size(const MemoryDeviceState *md)
> +{
> +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> +
> +    return vm->size;
> +}
> +
> +static uint64_t virtio_pmem_md_get_region_size(const MemoryDeviceState *md)
> +{
> +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> +
> +    return vm->size;
> +}
> +
> +static void virtio_pmem_instance_init(Object *obj)
> +{
> +    VirtIOPMEM *vm = VIRTIO_PMEM(obj);
> +    object_property_add_link(obj, "memdev", TYPE_MEMORY_BACKEND,
> +                                (Object **)&vm->memdev,
> +                                (void *) virtio_mem_check_memdev,
> +                                OBJ_PROP_LINK_STRONG,
> +                                &error_abort);
> +}
> +
> +
> +static void virtio_pmem_class_init(ObjectClass *klass, void *data)
> +{
> +    VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
> +    MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(klass);
> +
> +    vdc->realize      =  virtio_pmem_realize;
> +    vdc->get_config   =  virtio_pmem_get_config;
> +    vdc->get_features =  virtio_pmem_get_features;
> +
> +    mdc->get_addr         = virtio_pmem_md_get_addr;
> +    mdc->get_plugged_size = virtio_pmem_md_get_plugged_size;
> +    mdc->get_region_size  = virtio_pmem_md_get_region_size;
> +    mdc->fill_device_info = virtio_pmem_md_fill_device_info;
> +}
> +
> +static TypeInfo virtio_pmem_info = {
> +    .name          = TYPE_VIRTIO_PMEM,
> +    .parent        = TYPE_VIRTIO_DEVICE,
> +    .class_init    = virtio_pmem_class_init,
> +    .instance_size = sizeof(VirtIOPMEM),
> +    .instance_init = virtio_pmem_instance_init,
> +    .interfaces = (InterfaceInfo[]) {
> +        { TYPE_MEMORY_DEVICE },
> +        { }
> +  },
> +};
> +
> +static void virtio_register_types(void)
> +{
> +    type_register_static(&virtio_pmem_info);
> +}
> +
> +type_init(virtio_register_types)
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 990d6fcbde..28829b6437 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -85,6 +85,7 @@ extern bool pci_available;
>  #define PCI_DEVICE_ID_VIRTIO_RNG         0x1005
>  #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
>  #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
> +#define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
>  
>  #define PCI_VENDOR_ID_REDHAT             0x1b36
>  #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
> diff --git a/include/hw/virtio/virtio-pmem.h b/include/hw/virtio/virtio-pmem.h
> new file mode 100644
> index 0000000000..fda3ee691c
> --- /dev/null
> +++ b/include/hw/virtio/virtio-pmem.h
> @@ -0,0 +1,42 @@
> +/*
> + * Virtio pmem Device
> + *
> + * Copyright Red Hat, Inc. 2018
> + * Copyright Pankaj Gupta <pagupta@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * (at your option) any later version.  See the COPYING file in the
> + * top-level directory.
> + */
> +
> +#ifndef QEMU_VIRTIO_PMEM_H
> +#define QEMU_VIRTIO_PMEM_H
> +
> +#include "hw/virtio/virtio.h"
> +#include "exec/memory.h"
> +#include "sysemu/hostmem.h"
> +#include "standard-headers/linux/virtio_ids.h"
> +#include "hw/boards.h"
> +#include "hw/i386/pc.h"
> +
> +#define TYPE_VIRTIO_PMEM "virtio-pmem"
> +
> +#define VIRTIO_PMEM(obj) \
> +        OBJECT_CHECK(VirtIOPMEM, (obj), TYPE_VIRTIO_PMEM)
> +
> +/* VirtIOPMEM device structure */
> +typedef struct VirtIOPMEM {
> +    VirtIODevice parent_obj;
> +
> +    VirtQueue *rq_vq;
> +    uint64_t start;
> +    uint64_t size;
> +    MemoryRegion mr;
> +    HostMemoryBackend *memdev;
> +} VirtIOPMEM;
> +
> +struct virtio_pmem_config {
> +    uint64_t start;
> +    uint64_t size;
> +};
> +#endif
> diff --git a/include/standard-headers/linux/virtio_ids.h b/include/standard-headers/linux/virtio_ids.h
> index 6d5c3b2d4f..346389565a 100644
> --- a/include/standard-headers/linux/virtio_ids.h
> +++ b/include/standard-headers/linux/virtio_ids.h
> @@ -43,5 +43,6 @@
>  #define VIRTIO_ID_INPUT        18 /* virtio input */
>  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
>  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> +#define VIRTIO_ID_PMEM         25 /* virtio pmem */
>  
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/qapi/misc.json b/qapi/misc.json
> index d450cfef21..517376b866 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -2907,6 +2907,29 @@
>            }
>  }
>  
> +##
> +# @VirtioPMemDeviceInfo:
> +#
> +# VirtioPMem state information
> +#
> +# @id: device's ID
> +#
> +# @start: physical address, where device is mapped
> +#
> +# @size: size of memory that the device provides
> +#
> +# @memdev: memory backend linked with device
> +#
> +# Since: 3.1
> +##
> +{ 'struct': 'VirtioPMemDeviceInfo',
> +  'data': { '*id': 'str',
> +            'start': 'size',
> +            'size': 'size',
> +            'memdev': 'str'
> +          }
> +}
> +
>  ##
>  # @MemoryDeviceInfo:
>  #
> @@ -2916,7 +2939,8 @@
>  ##
>  { 'union': 'MemoryDeviceInfo',
>    'data': { 'dimm': 'PCDIMMDeviceInfo',
> -            'nvdimm': 'PCDIMMDeviceInfo'
> +            'nvdimm': 'PCDIMMDeviceInfo',
> +	    'virtio-pmem': 'VirtioPMemDeviceInfo'
>            }
>  }
>  


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3]  virtio-pmem: Add virtio pmem driver
  2018-09-12 16:54   ` Luiz Capitulino
@ 2018-09-13  6:58     ` " Pankaj Gupta
  2018-09-13 12:19       ` Luiz Capitulino
  0 siblings, 1 reply; 22+ messages in thread
From: Pankaj Gupta @ 2018-09-13  6:58 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: kwolf, jack, xiaoguangrong eric, kvm, riel, linux-nvdimm, david,
	ross zwisler, linux-kernel, qemu-devel, hch, imammedo, mst,
	stefanha, niteshnarayanlal, pbonzini, dan j williams, nilal


Hi Luiz,

Thanks for the review.

> 
> > This patch adds virtio-pmem driver for KVM guest.
> > 
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> > 
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> > 
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> >  drivers/virtio/Kconfig           |   9 ++
> >  drivers/virtio/Makefile          |   1 +
> >  drivers/virtio/virtio_pmem.c     | 255
> >  +++++++++++++++++++++++++++++++++++++++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  40 ++++++
> >  5 files changed, 306 insertions(+)
> >  create mode 100644 drivers/virtio/virtio_pmem.c
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > 
> > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > index 3589764..a331e23 100644
> > --- a/drivers/virtio/Kconfig
> > +++ b/drivers/virtio/Kconfig
> > @@ -42,6 +42,15 @@ config VIRTIO_PCI_LEGACY
> >  
> >  	  If unsure, say Y.
> >  
> > +config VIRTIO_PMEM
> > +	tristate "Support for virtio pmem driver"
> > +	depends on VIRTIO
> > +	help
> > +	This driver provides support for virtio based flushing interface
> > +	for persistent memory range.
> > +
> > +	If unsure, say M.
> > +
> >  config VIRTIO_BALLOON
> >  	tristate "Virtio balloon driver"
> >  	depends on VIRTIO
> > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > index 3a2b5c5..cbe91c6 100644
> > --- a/drivers/virtio/Makefile
> > +++ b/drivers/virtio/Makefile
> > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > +obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o
> > diff --git a/drivers/virtio/virtio_pmem.c b/drivers/virtio/virtio_pmem.c
> > new file mode 100644
> > index 0000000..c22cc87
> > --- /dev/null
> > +++ b/drivers/virtio/virtio_pmem.c
> > @@ -0,0 +1,255 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + */
> > +#include <linux/virtio.h>
> > +#include <linux/module.h>
> > +#include <linux/virtio_ids.h>
> > +#include <linux/virtio_config.h>
> > +#include <uapi/linux/virtio_pmem.h>
> > +#include <linux/spinlock.h>
> > +#include <linux/libnvdimm.h>
> > +#include <linux/nd.h>
> > +
> > +struct virtio_pmem_request {
> > +	/* Host return status corresponding to flush request */
> > +	int ret;
> > +
> > +	/* command name*/
> > +	char name[16];
> > +
> > +	/* Wait queue to process deferred work after ack from host */
> > +	wait_queue_head_t host_acked;
> > +	bool done;
> > +
> > +	/* Wait queue to process deferred work after virt queue buffer avail */
> > +	wait_queue_head_t wq_buf;
> > +	bool wq_buf_avail;
> > +	struct list_head list;
> > +};
> > +
> > +struct virtio_pmem {
> > +	struct virtio_device *vdev;
> > +
> > +	/* Virtio pmem request queue */
> > +	struct virtqueue *req_vq;
> > +
> > +	/* nvdimm bus registers virtio pmem device */
> > +	struct nvdimm_bus *nvdimm_bus;
> > +	struct nvdimm_bus_descriptor nd_desc;
> > +
> > +	/* List to store deferred work if virtqueue is full */
> > +	struct list_head req_list;
> > +
> > +	/* Synchronize virtqueue data */
> > +	spinlock_t pmem_lock;
> > +
> > +	/* Memory region information */
> > +	uint64_t start;
> > +	uint64_t size;
> > +};
> > +
> > +static struct virtio_device_id id_table[] = {
> > +	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > +	{ 0 },
> > +};
> > +
> > + /* The interrupt handler */
> > +static void host_ack(struct virtqueue *vq)
> > +{
> > +	unsigned int len;
> > +	unsigned long flags;
> > +	struct virtio_pmem_request *req, *req_buf;
> > +	struct virtio_pmem *vpmem = vq->vdev->priv;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > +		req->done = true;
> > +		wake_up(&req->host_acked);
> > +
> > +		if (!list_empty(&vpmem->req_list)) {
> > +			req_buf = list_first_entry(&vpmem->req_list,
> > +					struct virtio_pmem_request, list);
> > +			list_del(&vpmem->req_list);
> > +			req_buf->wq_buf_avail = true;
> > +			wake_up(&req_buf->wq_buf);
> > +		}
> > +	}
> > +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +}
> > + /* Initialize virt queue */
> > +static int init_vq(struct virtio_pmem *vpmem)
> > +{
> > +	struct virtqueue *vq;
> > +
> > +	/* single vq */
> > +	vpmem->req_vq = vq = virtio_find_single_vq(vpmem->vdev,
> > +				host_ack, "flush_queue");
> > +	if (IS_ERR(vq))
> > +		return PTR_ERR(vq);
> > +
> > +	spin_lock_init(&vpmem->pmem_lock);
> > +	INIT_LIST_HEAD(&vpmem->req_list);
> > +
> > +	return 0;
> > +};
> > +
> > + /* The request submission function */
> > +static int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +	int err;
> > +	unsigned long flags;
> > +	struct scatterlist *sgs[2], sg, ret;
> > +	struct virtio_device *vdev =
> > +		dev_to_virtio(nd_region->dev.parent->parent);
> > +	struct virtio_pmem *vpmem = vdev->priv;
> 
> I'm missing a might_sleep() call in this function.

I am not sure if we need might_sleep here? 
We can add it as debugging aid for detecting any problems
in sleeping from acquired atomic context?

> 
> > +	struct virtio_pmem_request *req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +
> > +	if (!req)
> > +		return -ENOMEM;
> > +
> > +	req->done = req->wq_buf_avail = false;
> > +	strcpy(req->name, "FLUSH");
> > +	init_waitqueue_head(&req->host_acked);
> > +	init_waitqueue_head(&req->wq_buf);
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	sg_init_one(&sg, req->name, strlen(req->name));
> > +	sgs[0] = &sg;
> > +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > +	sgs[1] = &ret;
> 
> It seems that sg_init_one() is only setting fields, in this
> case you can move spin_lock_irqsave() here.

yes, will move spin_lock_irqsave here.

> 
> > +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > +	if (err) {
> > +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> > +
> > +		list_add_tail(&vpmem->req_list, &req->list);
> > +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +		/* When host has read buffer, this completes via host_ack */
> > +		wait_event(req->wq_buf, req->wq_buf_avail);
> > +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> 
> Is this error handling code assuming that at some point
> virtqueue_add_sgs() will succeed for a different thread? If yes,
> what happens if the assumption is false? That is, what happens if
> virtqueue_add_sgs() never succeeds anymore?

virtqueue_add_sgs will not succeed and corresponding thread should wait.
All subsequent calling threads should also wait. As soon as there is first
available free entry(from host), first waiting thread is acknowledged.

In worst case if Qemu is not utilizing any of the used buffer will keep
multiple threads waiting. 

> 
> Why not just return an error?

As per suggestion by Stefan in previous discussion: if the virtqueue is full.  
Printing a message and failing the flush isn't appropriate.  This thread needs to 
wait until virtqueue space becomes available.

> 
> > +	}
> > +	virtqueue_kick(vpmem->req_vq);
> > +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +	/* When host has read buffer, this completes via host_ack */
> > +	wait_event(req->host_acked, req->done);
> > +	err = req->ret;
> 
> If I'm understanding the QEMU code correctly, you're returning EIO
> from QEMU if fsync() fails. I think this is wrong, since we don't know
> if EIO in QEMU will be the same EIO in the guest. One way to solve this
> would be to return 0 for success and 1 for failure from QEMU, and let the
> guest implementation pick its error code (for your implementation it
> could be EIO).

Makes sense, will change this. 

Thanks,
Pankaj 
> 
> > +	kfree(req);
> > +
> > +	return err;
> > +};
> > +EXPORT_SYMBOL_GPL(virtio_pmem_flush);
> > +
> > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > +{
> > +	int err = 0;
> > +	struct resource res;
> > +	struct virtio_pmem *vpmem;
> > +	struct nvdimm_bus *nvdimm_bus;
> > +	struct nd_region_desc ndr_desc;
> > +	int nid = dev_to_node(&vdev->dev);
> > +	struct nd_region *nd_region;
> > +
> > +	if (!vdev->config->get) {
> > +		dev_err(&vdev->dev, "%s failure: config disabled\n",
> > +			__func__);
> > +		return -EINVAL;
> > +	}
> > +
> > +	vdev->priv = vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem),
> > +			GFP_KERNEL);
> > +	if (!vpmem) {
> > +		err = -ENOMEM;
> > +		goto out_err;
> > +	}
> > +
> > +	vpmem->vdev = vdev;
> > +	err = init_vq(vpmem);
> > +	if (err)
> > +		goto out_err;
> > +
> > +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +			start, &vpmem->start);
> > +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +			size, &vpmem->size);
> > +
> > +	res.start = vpmem->start;
> > +	res.end   = vpmem->start + vpmem->size-1;
> > +	vpmem->nd_desc.provider_name = "virtio-pmem";
> > +	vpmem->nd_desc.module = THIS_MODULE;
> > +
> > +	vpmem->nvdimm_bus = nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > +						&vpmem->nd_desc);
> > +	if (!nvdimm_bus)
> > +		goto out_vq;
> > +
> > +	dev_set_drvdata(&vdev->dev, nvdimm_bus);
> > +	memset(&ndr_desc, 0, sizeof(ndr_desc));
> > +
> > +	ndr_desc.res = &res;
> > +	ndr_desc.numa_node = nid;
> > +	ndr_desc.flush = virtio_pmem_flush;
> > +	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > +	nd_region = nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc);
> > +
> > +	if (!nd_region)
> > +		goto out_nd;
> > +
> > +	//virtio_device_ready(vdev);
> > +	return 0;
> > +out_nd:
> > +	err = -ENXIO;
> > +	nvdimm_bus_unregister(nvdimm_bus);
> > +out_vq:
> > +	vdev->config->del_vqs(vdev);
> > +out_err:
> > +	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > +	return err;
> > +}
> > +
> > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > +{
> > +	struct virtio_pmem *vpmem = vdev->priv;
> > +	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > +
> > +	nvdimm_bus_unregister(nvdimm_bus);
> > +	vdev->config->del_vqs(vdev);
> > +	kfree(vpmem);
> > +}
> > +
> > +#ifdef CONFIG_PM_SLEEP
> > +static int virtio_pmem_freeze(struct virtio_device *vdev)
> > +{
> > +	/* todo: handle freeze function */
> > +	return -EPERM;
> > +}
> > +
> > +static int virtio_pmem_restore(struct virtio_device *vdev)
> > +{
> > +	/* todo: handle restore function */
> > +	return -EPERM;
> > +}
> > +#endif
> > +
> > +
> > +static struct virtio_driver virtio_pmem_driver = {
> > +	.driver.name		= KBUILD_MODNAME,
> > +	.driver.owner		= THIS_MODULE,
> > +	.id_table		= id_table,
> > +	.probe			= virtio_pmem_probe,
> > +	.remove			= virtio_pmem_remove,
> > +#ifdef CONFIG_PM_SLEEP
> > +	.freeze                 = virtio_pmem_freeze,
> > +	.restore                = virtio_pmem_restore,
> > +#endif
> > +};
> > +
> > +module_virtio_driver(virtio_pmem_driver);
> > +MODULE_DEVICE_TABLE(virtio, id_table);
> > +MODULE_DESCRIPTION("Virtio pmem driver");
> > +MODULE_LICENSE("GPL");
> > diff --git a/include/uapi/linux/virtio_ids.h
> > b/include/uapi/linux/virtio_ids.h
> > index 6d5c3b2..3463895 100644
> > --- a/include/uapi/linux/virtio_ids.h
> > +++ b/include/uapi/linux/virtio_ids.h
> > @@ -43,5 +43,6 @@
> >  #define VIRTIO_ID_INPUT        18 /* virtio input */
> >  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
> >  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> > +#define VIRTIO_ID_PMEM         25 /* virtio pmem */
> >  
> >  #endif /* _LINUX_VIRTIO_IDS_H */
> > diff --git a/include/uapi/linux/virtio_pmem.h
> > b/include/uapi/linux/virtio_pmem.h
> > new file mode 100644
> > index 0000000..c7c22a5
> > --- /dev/null
> > +++ b/include/uapi/linux/virtio_pmem.h
> > @@ -0,0 +1,40 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
> > + * anyone can use the definitions to implement compatible drivers/servers:
> > + *
> > + *
> > + * Redistribution and use in source and binary forms, with or without
> > + * modification, are permitted provided that the following conditions
> > + * are met:
> > + * 1. Redistributions of source code must retain the above copyright
> > + *    notice, this list of conditions and the following disclaimer.
> > + * 2. Redistributions in binary form must reproduce the above copyright
> > + *    notice, this list of conditions and the following disclaimer in the
> > + *    documentation and/or other materials provided with the distribution.
> > + * 3. Neither the name of IBM nor the names of its contributors
> > + *    may be used to endorse or promote products derived from this
> > software
> > + *    without specific prior written permission.
> > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > ``AS IS''
> > + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
> > THE
> > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
> > PURPOSE
> > + * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
> > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> > CONSEQUENTIAL
> > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> > STRICT
> > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
> > WAY
> > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> > + * SUCH DAMAGE.
> > + *
> > + * Copyright (C) Red Hat, Inc., 2018-2019
> > + * Copyright (C) Pankaj Gupta <pagupta@redhat.com>, 2018
> > + */
> > +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> > +#define _UAPI_LINUX_VIRTIO_PMEM_H
> > +
> > +struct virtio_pmem_config {
> > +	__le64 start;
> > +	__le64 size;
> > +};
> > +#endif
> 
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] qemu: Add virtio pmem device
  2018-09-12 16:57   ` Luiz Capitulino
@ 2018-09-13  7:06     ` Pankaj Gupta
  2018-09-13 12:22       ` Luiz Capitulino
  0 siblings, 1 reply; 22+ messages in thread
From: Pankaj Gupta @ 2018-09-13  7:06 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: linux-kernel, kvm, qemu-devel, linux-nvdimm, jack, stefanha,
	dan j williams, riel, nilal, kwolf, pbonzini, ross zwisler,
	david, xiaoguangrong eric, hch, mst, niteshnarayanlal, imammedo,
	eblake


> 
> >  This patch adds virtio-pmem Qemu device.
> > 
> >  This device presents memory address range information to guest
> >  which is backed by file backend type. It acts like persistent
> >  memory device for KVM guest. Guest can perform read and
> >  persistent write operations on this memory range with the help
> >  of DAX capable filesystem.
> > 
> >  Persistent guest writes are assured with the help of virtio
> >  based flushing interface. When guest userspace space performs
> >  fsync on file fd on pmem device, a flush command is send to
> >  Qemu over VIRTIO and host side flush/sync is done on backing
> >  image file.
> > 
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> > Changes from RFC v3:
> > - Return EIO for host fsync failure instead of errno - Luiz, Stefan
> > - Change version for inclusion to Qemu 3.1 - Eric
> > 
> > Changes from RFC v2:
> > - Use aio_worker() to avoid Qemu from hanging with blocking fsync
> >   call - Stefan
> > - Use virtio_st*_p() for endianess - Stefan
> > - Correct indentation in qapi/misc.json - Eric
> > 
> >  hw/virtio/Makefile.objs                     |   3 +
> >  hw/virtio/virtio-pci.c                      |  44 +++++
> >  hw/virtio/virtio-pci.h                      |  14 ++
> >  hw/virtio/virtio-pmem.c                     | 241
> >  ++++++++++++++++++++++++++++
> >  include/hw/pci/pci.h                        |   1 +
> >  include/hw/virtio/virtio-pmem.h             |  42 +++++
> >  include/standard-headers/linux/virtio_ids.h |   1 +
> >  qapi/misc.json                              |  26 ++-
> >  8 files changed, 371 insertions(+), 1 deletion(-)
> >  create mode 100644 hw/virtio/virtio-pmem.c
> >  create mode 100644 include/hw/virtio/virtio-pmem.h
> > 
> > diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> > index 1b2799cfd8..7f914d45d0 100644
> > --- a/hw/virtio/Makefile.objs
> > +++ b/hw/virtio/Makefile.objs
> > @@ -10,6 +10,9 @@ obj-$(CONFIG_VIRTIO_CRYPTO) += virtio-crypto.o
> >  obj-$(call land,$(CONFIG_VIRTIO_CRYPTO),$(CONFIG_VIRTIO_PCI)) +=
> >  virtio-crypto-pci.o
> >  
> >  obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
> > +ifeq ($(CONFIG_MEM_HOTPLUG),y)
> > +obj-$(CONFIG_LINUX) += virtio-pmem.o
> > +endif
> >  obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
> >  endif
> >  
> > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > index 3a01fe90f0..93d3fc05c7 100644
> > --- a/hw/virtio/virtio-pci.c
> > +++ b/hw/virtio/virtio-pci.c
> > @@ -2521,6 +2521,49 @@ static const TypeInfo virtio_rng_pci_info = {
> >      .class_init    = virtio_rng_pci_class_init,
> >  };
> >  
> > +/* virtio-pmem-pci */
> > +
> > +static void virtio_pmem_pci_realize(VirtIOPCIProxy *vpci_dev, Error
> > **errp)
> > +{
> > +    VirtIOPMEMPCI *vpmem = VIRTIO_PMEM_PCI(vpci_dev);
> > +    DeviceState *vdev = DEVICE(&vpmem->vdev);
> > +
> > +    qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
> > +    object_property_set_bool(OBJECT(vdev), true, "realized", errp);
> > +}
> > +
> > +static void virtio_pmem_pci_class_init(ObjectClass *klass, void *data)
> > +{
> > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > +    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
> > +    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
> > +    k->realize = virtio_pmem_pci_realize;
> > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > +    pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
> > +    pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_PMEM;
> > +    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
> > +    pcidev_k->class_id = PCI_CLASS_OTHERS;
> > +}
> > +
> > +static void virtio_pmem_pci_instance_init(Object *obj)
> > +{
> > +    VirtIOPMEMPCI *dev = VIRTIO_PMEM_PCI(obj);
> > +
> > +    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
> > +                                TYPE_VIRTIO_PMEM);
> > +    object_property_add_alias(obj, "memdev", OBJECT(&dev->vdev), "memdev",
> > +                              &error_abort);
> > +}
> > +
> > +static const TypeInfo virtio_pmem_pci_info = {
> > +    .name          = TYPE_VIRTIO_PMEM_PCI,
> > +    .parent        = TYPE_VIRTIO_PCI,
> > +    .instance_size = sizeof(VirtIOPMEMPCI),
> > +    .instance_init = virtio_pmem_pci_instance_init,
> > +    .class_init    = virtio_pmem_pci_class_init,
> > +};
> > +
> > +
> >  /* virtio-input-pci */
> >  
> >  static Property virtio_input_pci_properties[] = {
> > @@ -2714,6 +2757,7 @@ static void virtio_pci_register_types(void)
> >      type_register_static(&virtio_balloon_pci_info);
> >      type_register_static(&virtio_serial_pci_info);
> >      type_register_static(&virtio_net_pci_info);
> > +    type_register_static(&virtio_pmem_pci_info);
> >  #ifdef CONFIG_VHOST_SCSI
> >      type_register_static(&vhost_scsi_pci_info);
> >  #endif
> > diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> > index 813082b0d7..fe74fcad3f 100644
> > --- a/hw/virtio/virtio-pci.h
> > +++ b/hw/virtio/virtio-pci.h
> > @@ -19,6 +19,7 @@
> >  #include "hw/virtio/virtio-blk.h"
> >  #include "hw/virtio/virtio-net.h"
> >  #include "hw/virtio/virtio-rng.h"
> > +#include "hw/virtio/virtio-pmem.h"
> >  #include "hw/virtio/virtio-serial.h"
> >  #include "hw/virtio/virtio-scsi.h"
> >  #include "hw/virtio/virtio-balloon.h"
> > @@ -57,6 +58,7 @@ typedef struct VirtIOInputHostPCI VirtIOInputHostPCI;
> >  typedef struct VirtIOGPUPCI VirtIOGPUPCI;
> >  typedef struct VHostVSockPCI VHostVSockPCI;
> >  typedef struct VirtIOCryptoPCI VirtIOCryptoPCI;
> > +typedef struct VirtIOPMEMPCI VirtIOPMEMPCI;
> >  
> >  /* virtio-pci-bus */
> >  
> > @@ -274,6 +276,18 @@ struct VirtIOBlkPCI {
> >      VirtIOBlock vdev;
> >  };
> >  
> > +/*
> > + * virtio-pmem-pci: This extends VirtioPCIProxy.
> > + */
> > +#define TYPE_VIRTIO_PMEM_PCI "virtio-pmem-pci"
> > +#define VIRTIO_PMEM_PCI(obj) \
> > +        OBJECT_CHECK(VirtIOPMEMPCI, (obj), TYPE_VIRTIO_PMEM_PCI)
> > +
> > +struct VirtIOPMEMPCI {
> > +    VirtIOPCIProxy parent_obj;
> > +    VirtIOPMEM vdev;
> > +};
> > +
> >  /*
> >   * virtio-balloon-pci: This extends VirtioPCIProxy.
> >   */
> > diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
> > new file mode 100644
> > index 0000000000..69ae4c0a50
> > --- /dev/null
> > +++ b/hw/virtio/virtio-pmem.c
> > @@ -0,0 +1,241 @@
> > +/*
> > + * Virtio pmem device
> > + *
> > + * Copyright (C) 2018 Red Hat, Inc.
> > + * Copyright (C) 2018 Pankaj Gupta <pagupta@redhat.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qapi/error.h"
> > +#include "qemu-common.h"
> > +#include "qemu/error-report.h"
> > +#include "hw/virtio/virtio-access.h"
> > +#include "hw/virtio/virtio-pmem.h"
> > +#include "hw/mem/memory-device.h"
> > +#include "block/aio.h"
> > +#include "block/thread-pool.h"
> > +
> > +typedef struct VirtIOPMEMresp {
> > +    int ret;
> > +} VirtIOPMEMResp;
> > +
> > +typedef struct VirtIODeviceRequest {
> > +    VirtQueueElement elem;
> > +    int fd;
> > +    VirtIOPMEM *pmem;
> > +    VirtIOPMEMResp resp;
> > +} VirtIODeviceRequest;
> > +
> > +static int worker_cb(void *opaque)
> > +{
> > +    VirtIODeviceRequest *req = opaque;
> > +    int err = 0;
> > +
> > +    /* flush raw backing image */
> > +    err = fsync(req->fd);
> > +    if (err != 0) {
> > +        err = EIO;
> > +    }
> > +    req->resp.ret = err;
> 
> As I mentioned in the kernel patch, I think you should 1 for
> error and let the guest pick the error it wants to return to
> the calling thread.

Sure.

> 
> > +
> > +    return 0;
> > +}
> > +
> > +static void done_cb(void *opaque, int ret)
> > +{
> > +    VirtIODeviceRequest *req = opaque;
> > +    int len = iov_from_buf(req->elem.in_sg, req->elem.in_num, 0,
> > +                              &req->resp, sizeof(VirtIOPMEMResp));
> > +
> > +    /* Callbacks are serialized, so no need to use atomic ops.  */
> > +    virtqueue_push(req->pmem->rq_vq, &req->elem, len);
> > +    virtio_notify((VirtIODevice *)req->pmem, req->pmem->rq_vq);
> > +    g_free(req);
> > +}
> > +
> > +static void virtio_pmem_flush(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +    VirtIODeviceRequest *req;
> > +    VirtIOPMEM *pmem = VIRTIO_PMEM(vdev);
> > +    HostMemoryBackend *backend = MEMORY_BACKEND(pmem->memdev);
> > +    ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context());
> > +
> > +    req = virtqueue_pop(vq, sizeof(VirtIODeviceRequest));
> > +    if (!req) {
> > +        virtio_error(vdev, "virtio-pmem missing request data");
> > +        return;
> > +    }
> > +
> > +    if (req->elem.out_num < 1 || req->elem.in_num < 1) {
> > +        virtio_error(vdev, "virtio-pmem request not proper");
> > +        g_free(req);
> > +        return;
> > +    }
> 
> I think you should abort() in those errors.

Just skimmed over how other devices handle such errors (virtio_blk & virtio_scsi):
None of these is aborting?

  if (req->elem.out_num < 1 || req->elem.in_num < 1) {
        virtio_error(vdev, "virtio-blk missing headers");
        return -1;
    }

Thanks,
Pankaj

> 
> > +    req->fd = memory_region_get_fd(&backend->mr);
> > +    req->pmem = pmem;
> > +    thread_pool_submit_aio(pool, worker_cb, req, done_cb, req);
> > +}
> > +
> > +static void virtio_pmem_get_config(VirtIODevice *vdev, uint8_t *config)
> > +{
> > +    VirtIOPMEM *pmem = VIRTIO_PMEM(vdev);
> > +    struct virtio_pmem_config *pmemcfg = (struct virtio_pmem_config *)
> > config;
> > +
> > +    virtio_stq_p(vdev, &pmemcfg->start, pmem->start);
> > +    virtio_stq_p(vdev, &pmemcfg->size, pmem->size);
> > +}
> > +
> > +static uint64_t virtio_pmem_get_features(VirtIODevice *vdev, uint64_t
> > features,
> > +                                        Error **errp)
> > +{
> > +    return features;
> > +}
> > +
> > +static void virtio_pmem_realize(DeviceState *dev, Error **errp)
> > +{
> > +    VirtIODevice   *vdev   = VIRTIO_DEVICE(dev);
> > +    VirtIOPMEM     *pmem   = VIRTIO_PMEM(dev);
> > +    MachineState   *ms     = MACHINE(qdev_get_machine());
> > +    uint64_t align;
> > +    Error *local_err = NULL;
> > +    MemoryRegion *mr;
> > +
> > +    if (!pmem->memdev) {
> > +        error_setg(errp, "virtio-pmem memdev not set");
> > +        return;
> > +    }
> > +
> > +    mr  = host_memory_backend_get_memory(pmem->memdev);
> > +    align = memory_region_get_alignment(mr);
> > +    pmem->size = QEMU_ALIGN_DOWN(memory_region_size(mr), align);
> > +    pmem->start = memory_device_get_free_addr(ms, NULL, align, pmem->size,
> > +
> > &local_err);
> > +    if (local_err) {
> > +        error_setg(errp, "Can't get free address in mem device");
> > +        return;
> > +    }
> > +    memory_region_init_alias(&pmem->mr, OBJECT(pmem),
> > +                             "virtio_pmem-memory", mr, 0, pmem->size);
> > +    memory_device_plug_region(ms, &pmem->mr, pmem->start);
> > +
> > +    host_memory_backend_set_mapped(pmem->memdev, true);
> > +    virtio_init(vdev, TYPE_VIRTIO_PMEM, VIRTIO_ID_PMEM,
> > +                                          sizeof(struct
> > virtio_pmem_config));
> > +    pmem->rq_vq = virtio_add_queue(vdev, 128, virtio_pmem_flush);
> > +}
> > +
> > +static void virtio_mem_check_memdev(Object *obj, const char *name, Object
> > *val,
> > +                                    Error **errp)
> > +{
> > +    if (host_memory_backend_is_mapped(MEMORY_BACKEND(val))) {
> > +        char *path = object_get_canonical_path_component(val);
> > +        error_setg(errp, "Can't use already busy memdev: %s", path);
> > +        g_free(path);
> > +        return;
> > +    }
> > +
> > +    qdev_prop_allow_set_link_before_realize(obj, name, val, errp);
> > +}
> > +
> > +static const char *virtio_pmem_get_device_id(VirtIOPMEM *vm)
> > +{
> > +    Object *obj = OBJECT(vm);
> > +    DeviceState *parent_dev;
> > +
> > +    /* always use the ID of the proxy device */
> > +    if (obj->parent && object_dynamic_cast(obj->parent, TYPE_DEVICE)) {
> > +        parent_dev = DEVICE(obj->parent);
> > +        return parent_dev->id;
> > +    }
> > +    return NULL;
> > +}
> > +
> > +static void virtio_pmem_md_fill_device_info(const MemoryDeviceState *md,
> > +                                           MemoryDeviceInfo *info)
> > +{
> > +    VirtioPMemDeviceInfo *vi = g_new0(VirtioPMemDeviceInfo, 1);
> > +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> > +    const char *id = virtio_pmem_get_device_id(vm);
> > +
> > +    if (id) {
> > +        vi->has_id = true;
> > +        vi->id = g_strdup(id);
> > +    }
> > +
> > +    vi->start = vm->start;
> > +    vi->size = vm->size;
> > +    vi->memdev = object_get_canonical_path(OBJECT(vm->memdev));
> > +
> > +    info->u.virtio_pmem.data = vi;
> > +    info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_PMEM;
> > +}
> > +
> > +static uint64_t virtio_pmem_md_get_addr(const MemoryDeviceState *md)
> > +{
> > +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> > +
> > +    return vm->start;
> > +}
> > +
> > +static uint64_t virtio_pmem_md_get_plugged_size(const MemoryDeviceState
> > *md)
> > +{
> > +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> > +
> > +    return vm->size;
> > +}
> > +
> > +static uint64_t virtio_pmem_md_get_region_size(const MemoryDeviceState
> > *md)
> > +{
> > +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> > +
> > +    return vm->size;
> > +}
> > +
> > +static void virtio_pmem_instance_init(Object *obj)
> > +{
> > +    VirtIOPMEM *vm = VIRTIO_PMEM(obj);
> > +    object_property_add_link(obj, "memdev", TYPE_MEMORY_BACKEND,
> > +                                (Object **)&vm->memdev,
> > +                                (void *) virtio_mem_check_memdev,
> > +                                OBJ_PROP_LINK_STRONG,
> > +                                &error_abort);
> > +}
> > +
> > +
> > +static void virtio_pmem_class_init(ObjectClass *klass, void *data)
> > +{
> > +    VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
> > +    MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(klass);
> > +
> > +    vdc->realize      =  virtio_pmem_realize;
> > +    vdc->get_config   =  virtio_pmem_get_config;
> > +    vdc->get_features =  virtio_pmem_get_features;
> > +
> > +    mdc->get_addr         = virtio_pmem_md_get_addr;
> > +    mdc->get_plugged_size = virtio_pmem_md_get_plugged_size;
> > +    mdc->get_region_size  = virtio_pmem_md_get_region_size;
> > +    mdc->fill_device_info = virtio_pmem_md_fill_device_info;
> > +}
> > +
> > +static TypeInfo virtio_pmem_info = {
> > +    .name          = TYPE_VIRTIO_PMEM,
> > +    .parent        = TYPE_VIRTIO_DEVICE,
> > +    .class_init    = virtio_pmem_class_init,
> > +    .instance_size = sizeof(VirtIOPMEM),
> > +    .instance_init = virtio_pmem_instance_init,
> > +    .interfaces = (InterfaceInfo[]) {
> > +        { TYPE_MEMORY_DEVICE },
> > +        { }
> > +  },
> > +};
> > +
> > +static void virtio_register_types(void)
> > +{
> > +    type_register_static(&virtio_pmem_info);
> > +}
> > +
> > +type_init(virtio_register_types)
> > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > index 990d6fcbde..28829b6437 100644
> > --- a/include/hw/pci/pci.h
> > +++ b/include/hw/pci/pci.h
> > @@ -85,6 +85,7 @@ extern bool pci_available;
> >  #define PCI_DEVICE_ID_VIRTIO_RNG         0x1005
> >  #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
> >  #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
> > +#define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
> >  
> >  #define PCI_VENDOR_ID_REDHAT             0x1b36
> >  #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
> > diff --git a/include/hw/virtio/virtio-pmem.h
> > b/include/hw/virtio/virtio-pmem.h
> > new file mode 100644
> > index 0000000000..fda3ee691c
> > --- /dev/null
> > +++ b/include/hw/virtio/virtio-pmem.h
> > @@ -0,0 +1,42 @@
> > +/*
> > + * Virtio pmem Device
> > + *
> > + * Copyright Red Hat, Inc. 2018
> > + * Copyright Pankaj Gupta <pagupta@redhat.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or
> > + * (at your option) any later version.  See the COPYING file in the
> > + * top-level directory.
> > + */
> > +
> > +#ifndef QEMU_VIRTIO_PMEM_H
> > +#define QEMU_VIRTIO_PMEM_H
> > +
> > +#include "hw/virtio/virtio.h"
> > +#include "exec/memory.h"
> > +#include "sysemu/hostmem.h"
> > +#include "standard-headers/linux/virtio_ids.h"
> > +#include "hw/boards.h"
> > +#include "hw/i386/pc.h"
> > +
> > +#define TYPE_VIRTIO_PMEM "virtio-pmem"
> > +
> > +#define VIRTIO_PMEM(obj) \
> > +        OBJECT_CHECK(VirtIOPMEM, (obj), TYPE_VIRTIO_PMEM)
> > +
> > +/* VirtIOPMEM device structure */
> > +typedef struct VirtIOPMEM {
> > +    VirtIODevice parent_obj;
> > +
> > +    VirtQueue *rq_vq;
> > +    uint64_t start;
> > +    uint64_t size;
> > +    MemoryRegion mr;
> > +    HostMemoryBackend *memdev;
> > +} VirtIOPMEM;
> > +
> > +struct virtio_pmem_config {
> > +    uint64_t start;
> > +    uint64_t size;
> > +};
> > +#endif
> > diff --git a/include/standard-headers/linux/virtio_ids.h
> > b/include/standard-headers/linux/virtio_ids.h
> > index 6d5c3b2d4f..346389565a 100644
> > --- a/include/standard-headers/linux/virtio_ids.h
> > +++ b/include/standard-headers/linux/virtio_ids.h
> > @@ -43,5 +43,6 @@
> >  #define VIRTIO_ID_INPUT        18 /* virtio input */
> >  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
> >  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> > +#define VIRTIO_ID_PMEM         25 /* virtio pmem */
> >  
> >  #endif /* _LINUX_VIRTIO_IDS_H */
> > diff --git a/qapi/misc.json b/qapi/misc.json
> > index d450cfef21..517376b866 100644
> > --- a/qapi/misc.json
> > +++ b/qapi/misc.json
> > @@ -2907,6 +2907,29 @@
> >            }
> >  }
> >  
> > +##
> > +# @VirtioPMemDeviceInfo:
> > +#
> > +# VirtioPMem state information
> > +#
> > +# @id: device's ID
> > +#
> > +# @start: physical address, where device is mapped
> > +#
> > +# @size: size of memory that the device provides
> > +#
> > +# @memdev: memory backend linked with device
> > +#
> > +# Since: 3.1
> > +##
> > +{ 'struct': 'VirtioPMemDeviceInfo',
> > +  'data': { '*id': 'str',
> > +            'start': 'size',
> > +            'size': 'size',
> > +            'memdev': 'str'
> > +          }
> > +}
> > +
> >  ##
> >  # @MemoryDeviceInfo:
> >  #
> > @@ -2916,7 +2939,8 @@
> >  ##
> >  { 'union': 'MemoryDeviceInfo',
> >    'data': { 'dimm': 'PCDIMMDeviceInfo',
> > -            'nvdimm': 'PCDIMMDeviceInfo'
> > +            'nvdimm': 'PCDIMMDeviceInfo',
> > +	    'virtio-pmem': 'VirtioPMemDeviceInfo'
> >            }
> >  }
> >  
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3]  virtio-pmem: Add virtio pmem driver
  2018-09-13  6:58     ` [Qemu-devel] " Pankaj Gupta
@ 2018-09-13 12:19       ` Luiz Capitulino
  2018-09-14 12:13         ` Pankaj Gupta
  0 siblings, 1 reply; 22+ messages in thread
From: Luiz Capitulino @ 2018-09-13 12:19 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: kwolf, jack, xiaoguangrong eric, kvm, riel, linux-nvdimm, david,
	ross zwisler, linux-kernel, qemu-devel, hch, imammedo, mst,
	stefanha, niteshnarayanlal, pbonzini, dan j williams, nilal

On Thu, 13 Sep 2018 02:58:21 -0400 (EDT)
Pankaj Gupta <pagupta@redhat.com> wrote:

> Hi Luiz,
> 
> Thanks for the review.
> 
> >   
> > > This patch adds virtio-pmem driver for KVM guest.
> > > 
> > > Guest reads the persistent memory range information from
> > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > creates a nd_region object with the persistent memory
> > > range information so that existing 'nvdimm/pmem' driver
> > > can reserve this into system memory map. This way
> > > 'virtio-pmem' driver uses existing functionality of pmem
> > > driver to register persistent memory compatible for DAX
> > > capable filesystems.
> > > 
> > > This also provides function to perform guest flush over
> > > VIRTIO from 'pmem' driver when userspace performs flush
> > > on DAX memory range.
> > > 
> > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > > ---
> > >  drivers/virtio/Kconfig           |   9 ++
> > >  drivers/virtio/Makefile          |   1 +
> > >  drivers/virtio/virtio_pmem.c     | 255
> > >  +++++++++++++++++++++++++++++++++++++++
> > >  include/uapi/linux/virtio_ids.h  |   1 +
> > >  include/uapi/linux/virtio_pmem.h |  40 ++++++
> > >  5 files changed, 306 insertions(+)
> > >  create mode 100644 drivers/virtio/virtio_pmem.c
> > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > > 
> > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > > index 3589764..a331e23 100644
> > > --- a/drivers/virtio/Kconfig
> > > +++ b/drivers/virtio/Kconfig
> > > @@ -42,6 +42,15 @@ config VIRTIO_PCI_LEGACY
> > >  
> > >  	  If unsure, say Y.
> > >  
> > > +config VIRTIO_PMEM
> > > +	tristate "Support for virtio pmem driver"
> > > +	depends on VIRTIO
> > > +	help
> > > +	This driver provides support for virtio based flushing interface
> > > +	for persistent memory range.
> > > +
> > > +	If unsure, say M.
> > > +
> > >  config VIRTIO_BALLOON
> > >  	tristate "Virtio balloon driver"
> > >  	depends on VIRTIO
> > > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > > index 3a2b5c5..cbe91c6 100644
> > > --- a/drivers/virtio/Makefile
> > > +++ b/drivers/virtio/Makefile
> > > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> > >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> > >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> > >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > > +obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o
> > > diff --git a/drivers/virtio/virtio_pmem.c b/drivers/virtio/virtio_pmem.c
> > > new file mode 100644
> > > index 0000000..c22cc87
> > > --- /dev/null
> > > +++ b/drivers/virtio/virtio_pmem.c
> > > @@ -0,0 +1,255 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * virtio_pmem.c: Virtio pmem Driver
> > > + *
> > > + * Discovers persistent memory range information
> > > + * from host and provides a virtio based flushing
> > > + * interface.
> > > + */
> > > +#include <linux/virtio.h>
> > > +#include <linux/module.h>
> > > +#include <linux/virtio_ids.h>
> > > +#include <linux/virtio_config.h>
> > > +#include <uapi/linux/virtio_pmem.h>
> > > +#include <linux/spinlock.h>
> > > +#include <linux/libnvdimm.h>
> > > +#include <linux/nd.h>
> > > +
> > > +struct virtio_pmem_request {
> > > +	/* Host return status corresponding to flush request */
> > > +	int ret;
> > > +
> > > +	/* command name*/
> > > +	char name[16];
> > > +
> > > +	/* Wait queue to process deferred work after ack from host */
> > > +	wait_queue_head_t host_acked;
> > > +	bool done;
> > > +
> > > +	/* Wait queue to process deferred work after virt queue buffer avail */
> > > +	wait_queue_head_t wq_buf;
> > > +	bool wq_buf_avail;
> > > +	struct list_head list;
> > > +};
> > > +
> > > +struct virtio_pmem {
> > > +	struct virtio_device *vdev;
> > > +
> > > +	/* Virtio pmem request queue */
> > > +	struct virtqueue *req_vq;
> > > +
> > > +	/* nvdimm bus registers virtio pmem device */
> > > +	struct nvdimm_bus *nvdimm_bus;
> > > +	struct nvdimm_bus_descriptor nd_desc;
> > > +
> > > +	/* List to store deferred work if virtqueue is full */
> > > +	struct list_head req_list;
> > > +
> > > +	/* Synchronize virtqueue data */
> > > +	spinlock_t pmem_lock;
> > > +
> > > +	/* Memory region information */
> > > +	uint64_t start;
> > > +	uint64_t size;
> > > +};
> > > +
> > > +static struct virtio_device_id id_table[] = {
> > > +	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > > +	{ 0 },
> > > +};
> > > +
> > > + /* The interrupt handler */
> > > +static void host_ack(struct virtqueue *vq)
> > > +{
> > > +	unsigned int len;
> > > +	unsigned long flags;
> > > +	struct virtio_pmem_request *req, *req_buf;
> > > +	struct virtio_pmem *vpmem = vq->vdev->priv;
> > > +
> > > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > > +		req->done = true;
> > > +		wake_up(&req->host_acked);
> > > +
> > > +		if (!list_empty(&vpmem->req_list)) {
> > > +			req_buf = list_first_entry(&vpmem->req_list,
> > > +					struct virtio_pmem_request, list);
> > > +			list_del(&vpmem->req_list);
> > > +			req_buf->wq_buf_avail = true;
> > > +			wake_up(&req_buf->wq_buf);
> > > +		}
> > > +	}
> > > +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +}
> > > + /* Initialize virt queue */
> > > +static int init_vq(struct virtio_pmem *vpmem)
> > > +{
> > > +	struct virtqueue *vq;
> > > +
> > > +	/* single vq */
> > > +	vpmem->req_vq = vq = virtio_find_single_vq(vpmem->vdev,
> > > +				host_ack, "flush_queue");
> > > +	if (IS_ERR(vq))
> > > +		return PTR_ERR(vq);
> > > +
> > > +	spin_lock_init(&vpmem->pmem_lock);
> > > +	INIT_LIST_HEAD(&vpmem->req_list);
> > > +
> > > +	return 0;
> > > +};
> > > +
> > > + /* The request submission function */
> > > +static int virtio_pmem_flush(struct nd_region *nd_region)
> > > +{
> > > +	int err;
> > > +	unsigned long flags;
> > > +	struct scatterlist *sgs[2], sg, ret;
> > > +	struct virtio_device *vdev =
> > > +		dev_to_virtio(nd_region->dev.parent->parent);
> > > +	struct virtio_pmem *vpmem = vdev->priv;  
> > 
> > I'm missing a might_sleep() call in this function.  
> 
> I am not sure if we need might_sleep here? 
> We can add it as debugging aid for detecting any problems
> in sleeping from acquired atomic context?

Yes. Since this function sleeps and since some functions that
may run in atomic context call it, it's a good idea to
call might_sleep().

> > > +	struct virtio_pmem_request *req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > +
> > > +	if (!req)
> > > +		return -ENOMEM;
> > > +
> > > +	req->done = req->wq_buf_avail = false;
> > > +	strcpy(req->name, "FLUSH");
> > > +	init_waitqueue_head(&req->host_acked);
> > > +	init_waitqueue_head(&req->wq_buf);
> > > +
> > > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +	sg_init_one(&sg, req->name, strlen(req->name));
> > > +	sgs[0] = &sg;
> > > +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > > +	sgs[1] = &ret;  
> > 
> > It seems that sg_init_one() is only setting fields, in this
> > case you can move spin_lock_irqsave() here.  
> 
> yes, will move spin_lock_irqsave here.
> 
> >   
> > > +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > > +	if (err) {
> > > +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> > > +
> > > +		list_add_tail(&vpmem->req_list, &req->list);
> > > +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +
> > > +		/* When host has read buffer, this completes via host_ack */
> > > +		wait_event(req->wq_buf, req->wq_buf_avail);
> > > +		spin_lock_irqsave(&vpmem->pmem_lock, flags);  
> > 
> > Is this error handling code assuming that at some point
> > virtqueue_add_sgs() will succeed for a different thread? If yes,
> > what happens if the assumption is false? That is, what happens if
> > virtqueue_add_sgs() never succeeds anymore?  
> 
> virtqueue_add_sgs will not succeed and corresponding thread should wait.
> All subsequent calling threads should also wait. As soon as there is first
> available free entry(from host), first waiting thread is acknowledged.
> 
> In worst case if Qemu is not utilizing any of the used buffer will keep
> multiple threads waiting. 
> 
> > 
> > Why not just return an error?  
> 
> As per suggestion by Stefan in previous discussion: if the virtqueue is full.  
> Printing a message and failing the flush isn't appropriate.  This thread needs to 
> wait until virtqueue space becomes available.

If virtqueue_add_sgs() is guaranteed to succeed at some point then OK.
Otherwise, you'll get threads getting stuck forever.

> > > +	}
> > > +	virtqueue_kick(vpmem->req_vq);
> > > +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +
> > > +	/* When host has read buffer, this completes via host_ack */
> > > +	wait_event(req->host_acked, req->done);
> > > +	err = req->ret;  
> > 
> > If I'm understanding the QEMU code correctly, you're returning EIO
> > from QEMU if fsync() fails. I think this is wrong, since we don't know
> > if EIO in QEMU will be the same EIO in the guest. One way to solve this
> > would be to return 0 for success and 1 for failure from QEMU, and let the
> > guest implementation pick its error code (for your implementation it
> > could be EIO).  
> 
> Makes sense, will change this. 
> 
> Thanks,
> Pankaj 
> >   
> > > +	kfree(req);
> > > +
> > > +	return err;
> > > +};
> > > +EXPORT_SYMBOL_GPL(virtio_pmem_flush);
> > > +
> > > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > > +{
> > > +	int err = 0;
> > > +	struct resource res;
> > > +	struct virtio_pmem *vpmem;
> > > +	struct nvdimm_bus *nvdimm_bus;
> > > +	struct nd_region_desc ndr_desc;
> > > +	int nid = dev_to_node(&vdev->dev);
> > > +	struct nd_region *nd_region;
> > > +
> > > +	if (!vdev->config->get) {
> > > +		dev_err(&vdev->dev, "%s failure: config disabled\n",
> > > +			__func__);
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	vdev->priv = vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem),
> > > +			GFP_KERNEL);
> > > +	if (!vpmem) {
> > > +		err = -ENOMEM;
> > > +		goto out_err;
> > > +	}
> > > +
> > > +	vpmem->vdev = vdev;
> > > +	err = init_vq(vpmem);
> > > +	if (err)
> > > +		goto out_err;
> > > +
> > > +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > +			start, &vpmem->start);
> > > +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > +			size, &vpmem->size);
> > > +
> > > +	res.start = vpmem->start;
> > > +	res.end   = vpmem->start + vpmem->size-1;
> > > +	vpmem->nd_desc.provider_name = "virtio-pmem";
> > > +	vpmem->nd_desc.module = THIS_MODULE;
> > > +
> > > +	vpmem->nvdimm_bus = nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > > +						&vpmem->nd_desc);
> > > +	if (!nvdimm_bus)
> > > +		goto out_vq;
> > > +
> > > +	dev_set_drvdata(&vdev->dev, nvdimm_bus);
> > > +	memset(&ndr_desc, 0, sizeof(ndr_desc));
> > > +
> > > +	ndr_desc.res = &res;
> > > +	ndr_desc.numa_node = nid;
> > > +	ndr_desc.flush = virtio_pmem_flush;
> > > +	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > > +	nd_region = nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc);
> > > +
> > > +	if (!nd_region)
> > > +		goto out_nd;
> > > +
> > > +	//virtio_device_ready(vdev);
> > > +	return 0;
> > > +out_nd:
> > > +	err = -ENXIO;
> > > +	nvdimm_bus_unregister(nvdimm_bus);
> > > +out_vq:
> > > +	vdev->config->del_vqs(vdev);
> > > +out_err:
> > > +	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > > +	return err;
> > > +}
> > > +
> > > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > > +{
> > > +	struct virtio_pmem *vpmem = vdev->priv;
> > > +	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > > +
> > > +	nvdimm_bus_unregister(nvdimm_bus);
> > > +	vdev->config->del_vqs(vdev);
> > > +	kfree(vpmem);
> > > +}
> > > +
> > > +#ifdef CONFIG_PM_SLEEP
> > > +static int virtio_pmem_freeze(struct virtio_device *vdev)
> > > +{
> > > +	/* todo: handle freeze function */
> > > +	return -EPERM;
> > > +}
> > > +
> > > +static int virtio_pmem_restore(struct virtio_device *vdev)
> > > +{
> > > +	/* todo: handle restore function */
> > > +	return -EPERM;
> > > +}
> > > +#endif
> > > +
> > > +
> > > +static struct virtio_driver virtio_pmem_driver = {
> > > +	.driver.name		= KBUILD_MODNAME,
> > > +	.driver.owner		= THIS_MODULE,
> > > +	.id_table		= id_table,
> > > +	.probe			= virtio_pmem_probe,
> > > +	.remove			= virtio_pmem_remove,
> > > +#ifdef CONFIG_PM_SLEEP
> > > +	.freeze                 = virtio_pmem_freeze,
> > > +	.restore                = virtio_pmem_restore,
> > > +#endif
> > > +};
> > > +
> > > +module_virtio_driver(virtio_pmem_driver);
> > > +MODULE_DEVICE_TABLE(virtio, id_table);
> > > +MODULE_DESCRIPTION("Virtio pmem driver");
> > > +MODULE_LICENSE("GPL");
> > > diff --git a/include/uapi/linux/virtio_ids.h
> > > b/include/uapi/linux/virtio_ids.h
> > > index 6d5c3b2..3463895 100644
> > > --- a/include/uapi/linux/virtio_ids.h
> > > +++ b/include/uapi/linux/virtio_ids.h
> > > @@ -43,5 +43,6 @@
> > >  #define VIRTIO_ID_INPUT        18 /* virtio input */
> > >  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
> > >  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> > > +#define VIRTIO_ID_PMEM         25 /* virtio pmem */
> > >  
> > >  #endif /* _LINUX_VIRTIO_IDS_H */
> > > diff --git a/include/uapi/linux/virtio_pmem.h
> > > b/include/uapi/linux/virtio_pmem.h
> > > new file mode 100644
> > > index 0000000..c7c22a5
> > > --- /dev/null
> > > +++ b/include/uapi/linux/virtio_pmem.h
> > > @@ -0,0 +1,40 @@
> > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > +/*
> > > + * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
> > > + * anyone can use the definitions to implement compatible drivers/servers:
> > > + *
> > > + *
> > > + * Redistribution and use in source and binary forms, with or without
> > > + * modification, are permitted provided that the following conditions
> > > + * are met:
> > > + * 1. Redistributions of source code must retain the above copyright
> > > + *    notice, this list of conditions and the following disclaimer.
> > > + * 2. Redistributions in binary form must reproduce the above copyright
> > > + *    notice, this list of conditions and the following disclaimer in the
> > > + *    documentation and/or other materials provided with the distribution.
> > > + * 3. Neither the name of IBM nor the names of its contributors
> > > + *    may be used to endorse or promote products derived from this
> > > software
> > > + *    without specific prior written permission.
> > > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > ``AS IS''
> > > + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
> > > THE
> > > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
> > > PURPOSE
> > > + * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
> > > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> > > CONSEQUENTIAL
> > > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> > > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> > > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> > > STRICT
> > > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY
> > > WAY
> > > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> > > + * SUCH DAMAGE.
> > > + *
> > > + * Copyright (C) Red Hat, Inc., 2018-2019
> > > + * Copyright (C) Pankaj Gupta <pagupta@redhat.com>, 2018
> > > + */
> > > +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> > > +#define _UAPI_LINUX_VIRTIO_PMEM_H
> > > +
> > > +struct virtio_pmem_config {
> > > +	__le64 start;
> > > +	__le64 size;
> > > +};
> > > +#endif  
> > 
> > 
> >   
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] qemu: Add virtio pmem device
  2018-09-13  7:06     ` Pankaj Gupta
@ 2018-09-13 12:22       ` Luiz Capitulino
  0 siblings, 0 replies; 22+ messages in thread
From: Luiz Capitulino @ 2018-09-13 12:22 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-kernel, kvm, qemu-devel, linux-nvdimm, jack, stefanha,
	dan j williams, riel, nilal, kwolf, pbonzini, ross zwisler,
	david, xiaoguangrong eric, hch, mst, niteshnarayanlal, imammedo,
	eblake

On Thu, 13 Sep 2018 03:06:27 -0400 (EDT)
Pankaj Gupta <pagupta@redhat.com> wrote:

> >   
> > >  This patch adds virtio-pmem Qemu device.
> > > 
> > >  This device presents memory address range information to guest
> > >  which is backed by file backend type. It acts like persistent
> > >  memory device for KVM guest. Guest can perform read and
> > >  persistent write operations on this memory range with the help
> > >  of DAX capable filesystem.
> > > 
> > >  Persistent guest writes are assured with the help of virtio
> > >  based flushing interface. When guest userspace space performs
> > >  fsync on file fd on pmem device, a flush command is send to
> > >  Qemu over VIRTIO and host side flush/sync is done on backing
> > >  image file.
> > > 
> > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > > ---
> > > Changes from RFC v3:
> > > - Return EIO for host fsync failure instead of errno - Luiz, Stefan
> > > - Change version for inclusion to Qemu 3.1 - Eric
> > > 
> > > Changes from RFC v2:
> > > - Use aio_worker() to avoid Qemu from hanging with blocking fsync
> > >   call - Stefan
> > > - Use virtio_st*_p() for endianess - Stefan
> > > - Correct indentation in qapi/misc.json - Eric
> > > 
> > >  hw/virtio/Makefile.objs                     |   3 +
> > >  hw/virtio/virtio-pci.c                      |  44 +++++
> > >  hw/virtio/virtio-pci.h                      |  14 ++
> > >  hw/virtio/virtio-pmem.c                     | 241
> > >  ++++++++++++++++++++++++++++
> > >  include/hw/pci/pci.h                        |   1 +
> > >  include/hw/virtio/virtio-pmem.h             |  42 +++++
> > >  include/standard-headers/linux/virtio_ids.h |   1 +
> > >  qapi/misc.json                              |  26 ++-
> > >  8 files changed, 371 insertions(+), 1 deletion(-)
> > >  create mode 100644 hw/virtio/virtio-pmem.c
> > >  create mode 100644 include/hw/virtio/virtio-pmem.h
> > > 
> > > diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
> > > index 1b2799cfd8..7f914d45d0 100644
> > > --- a/hw/virtio/Makefile.objs
> > > +++ b/hw/virtio/Makefile.objs
> > > @@ -10,6 +10,9 @@ obj-$(CONFIG_VIRTIO_CRYPTO) += virtio-crypto.o
> > >  obj-$(call land,$(CONFIG_VIRTIO_CRYPTO),$(CONFIG_VIRTIO_PCI)) +=
> > >  virtio-crypto-pci.o
> > >  
> > >  obj-$(CONFIG_LINUX) += vhost.o vhost-backend.o vhost-user.o
> > > +ifeq ($(CONFIG_MEM_HOTPLUG),y)
> > > +obj-$(CONFIG_LINUX) += virtio-pmem.o
> > > +endif
> > >  obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
> > >  endif
> > >  
> > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > > index 3a01fe90f0..93d3fc05c7 100644
> > > --- a/hw/virtio/virtio-pci.c
> > > +++ b/hw/virtio/virtio-pci.c
> > > @@ -2521,6 +2521,49 @@ static const TypeInfo virtio_rng_pci_info = {
> > >      .class_init    = virtio_rng_pci_class_init,
> > >  };
> > >  
> > > +/* virtio-pmem-pci */
> > > +
> > > +static void virtio_pmem_pci_realize(VirtIOPCIProxy *vpci_dev, Error
> > > **errp)
> > > +{
> > > +    VirtIOPMEMPCI *vpmem = VIRTIO_PMEM_PCI(vpci_dev);
> > > +    DeviceState *vdev = DEVICE(&vpmem->vdev);
> > > +
> > > +    qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
> > > +    object_property_set_bool(OBJECT(vdev), true, "realized", errp);
> > > +}
> > > +
> > > +static void virtio_pmem_pci_class_init(ObjectClass *klass, void *data)
> > > +{
> > > +    DeviceClass *dc = DEVICE_CLASS(klass);
> > > +    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
> > > +    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
> > > +    k->realize = virtio_pmem_pci_realize;
> > > +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> > > +    pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
> > > +    pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_PMEM;
> > > +    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
> > > +    pcidev_k->class_id = PCI_CLASS_OTHERS;
> > > +}
> > > +
> > > +static void virtio_pmem_pci_instance_init(Object *obj)
> > > +{
> > > +    VirtIOPMEMPCI *dev = VIRTIO_PMEM_PCI(obj);
> > > +
> > > +    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
> > > +                                TYPE_VIRTIO_PMEM);
> > > +    object_property_add_alias(obj, "memdev", OBJECT(&dev->vdev), "memdev",
> > > +                              &error_abort);
> > > +}
> > > +
> > > +static const TypeInfo virtio_pmem_pci_info = {
> > > +    .name          = TYPE_VIRTIO_PMEM_PCI,
> > > +    .parent        = TYPE_VIRTIO_PCI,
> > > +    .instance_size = sizeof(VirtIOPMEMPCI),
> > > +    .instance_init = virtio_pmem_pci_instance_init,
> > > +    .class_init    = virtio_pmem_pci_class_init,
> > > +};
> > > +
> > > +
> > >  /* virtio-input-pci */
> > >  
> > >  static Property virtio_input_pci_properties[] = {
> > > @@ -2714,6 +2757,7 @@ static void virtio_pci_register_types(void)
> > >      type_register_static(&virtio_balloon_pci_info);
> > >      type_register_static(&virtio_serial_pci_info);
> > >      type_register_static(&virtio_net_pci_info);
> > > +    type_register_static(&virtio_pmem_pci_info);
> > >  #ifdef CONFIG_VHOST_SCSI
> > >      type_register_static(&vhost_scsi_pci_info);
> > >  #endif
> > > diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
> > > index 813082b0d7..fe74fcad3f 100644
> > > --- a/hw/virtio/virtio-pci.h
> > > +++ b/hw/virtio/virtio-pci.h
> > > @@ -19,6 +19,7 @@
> > >  #include "hw/virtio/virtio-blk.h"
> > >  #include "hw/virtio/virtio-net.h"
> > >  #include "hw/virtio/virtio-rng.h"
> > > +#include "hw/virtio/virtio-pmem.h"
> > >  #include "hw/virtio/virtio-serial.h"
> > >  #include "hw/virtio/virtio-scsi.h"
> > >  #include "hw/virtio/virtio-balloon.h"
> > > @@ -57,6 +58,7 @@ typedef struct VirtIOInputHostPCI VirtIOInputHostPCI;
> > >  typedef struct VirtIOGPUPCI VirtIOGPUPCI;
> > >  typedef struct VHostVSockPCI VHostVSockPCI;
> > >  typedef struct VirtIOCryptoPCI VirtIOCryptoPCI;
> > > +typedef struct VirtIOPMEMPCI VirtIOPMEMPCI;
> > >  
> > >  /* virtio-pci-bus */
> > >  
> > > @@ -274,6 +276,18 @@ struct VirtIOBlkPCI {
> > >      VirtIOBlock vdev;
> > >  };
> > >  
> > > +/*
> > > + * virtio-pmem-pci: This extends VirtioPCIProxy.
> > > + */
> > > +#define TYPE_VIRTIO_PMEM_PCI "virtio-pmem-pci"
> > > +#define VIRTIO_PMEM_PCI(obj) \
> > > +        OBJECT_CHECK(VirtIOPMEMPCI, (obj), TYPE_VIRTIO_PMEM_PCI)
> > > +
> > > +struct VirtIOPMEMPCI {
> > > +    VirtIOPCIProxy parent_obj;
> > > +    VirtIOPMEM vdev;
> > > +};
> > > +
> > >  /*
> > >   * virtio-balloon-pci: This extends VirtioPCIProxy.
> > >   */
> > > diff --git a/hw/virtio/virtio-pmem.c b/hw/virtio/virtio-pmem.c
> > > new file mode 100644
> > > index 0000000000..69ae4c0a50
> > > --- /dev/null
> > > +++ b/hw/virtio/virtio-pmem.c
> > > @@ -0,0 +1,241 @@
> > > +/*
> > > + * Virtio pmem device
> > > + *
> > > + * Copyright (C) 2018 Red Hat, Inc.
> > > + * Copyright (C) 2018 Pankaj Gupta <pagupta@redhat.com>
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2.
> > > + * See the COPYING file in the top-level directory.
> > > + *
> > > + */
> > > +
> > > +#include "qemu/osdep.h"
> > > +#include "qapi/error.h"
> > > +#include "qemu-common.h"
> > > +#include "qemu/error-report.h"
> > > +#include "hw/virtio/virtio-access.h"
> > > +#include "hw/virtio/virtio-pmem.h"
> > > +#include "hw/mem/memory-device.h"
> > > +#include "block/aio.h"
> > > +#include "block/thread-pool.h"
> > > +
> > > +typedef struct VirtIOPMEMresp {
> > > +    int ret;
> > > +} VirtIOPMEMResp;
> > > +
> > > +typedef struct VirtIODeviceRequest {
> > > +    VirtQueueElement elem;
> > > +    int fd;
> > > +    VirtIOPMEM *pmem;
> > > +    VirtIOPMEMResp resp;
> > > +} VirtIODeviceRequest;
> > > +
> > > +static int worker_cb(void *opaque)
> > > +{
> > > +    VirtIODeviceRequest *req = opaque;
> > > +    int err = 0;
> > > +
> > > +    /* flush raw backing image */
> > > +    err = fsync(req->fd);
> > > +    if (err != 0) {
> > > +        err = EIO;
> > > +    }
> > > +    req->resp.ret = err;  
> > 
> > As I mentioned in the kernel patch, I think you should 1 for
> > error and let the guest pick the error it wants to return to
> > the calling thread.  
> 
> Sure.
> 
> >   
> > > +
> > > +    return 0;
> > > +}
> > > +
> > > +static void done_cb(void *opaque, int ret)
> > > +{
> > > +    VirtIODeviceRequest *req = opaque;
> > > +    int len = iov_from_buf(req->elem.in_sg, req->elem.in_num, 0,
> > > +                              &req->resp, sizeof(VirtIOPMEMResp));
> > > +
> > > +    /* Callbacks are serialized, so no need to use atomic ops.  */
> > > +    virtqueue_push(req->pmem->rq_vq, &req->elem, len);
> > > +    virtio_notify((VirtIODevice *)req->pmem, req->pmem->rq_vq);
> > > +    g_free(req);
> > > +}
> > > +
> > > +static void virtio_pmem_flush(VirtIODevice *vdev, VirtQueue *vq)
> > > +{
> > > +    VirtIODeviceRequest *req;
> > > +    VirtIOPMEM *pmem = VIRTIO_PMEM(vdev);
> > > +    HostMemoryBackend *backend = MEMORY_BACKEND(pmem->memdev);
> > > +    ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context());
> > > +
> > > +    req = virtqueue_pop(vq, sizeof(VirtIODeviceRequest));
> > > +    if (!req) {
> > > +        virtio_error(vdev, "virtio-pmem missing request data");
> > > +        return;
> > > +    }
> > > +
> > > +    if (req->elem.out_num < 1 || req->elem.in_num < 1) {
> > > +        virtio_error(vdev, "virtio-pmem request not proper");
> > > +        g_free(req);
> > > +        return;
> > > +    }  
> > 
> > I think you should abort() in those errors.  
> 
> Just skimmed over how other devices handle such errors (virtio_blk & virtio_scsi):
> None of these is aborting?

My fear is threads on the host side getting blocked in a row forever.

> 
>   if (req->elem.out_num < 1 || req->elem.in_num < 1) {
>         virtio_error(vdev, "virtio-blk missing headers");
>         return -1;
>     }
> 
> Thanks,
> Pankaj
> 
> >   
> > > +    req->fd = memory_region_get_fd(&backend->mr);
> > > +    req->pmem = pmem;
> > > +    thread_pool_submit_aio(pool, worker_cb, req, done_cb, req);
> > > +}
> > > +
> > > +static void virtio_pmem_get_config(VirtIODevice *vdev, uint8_t *config)
> > > +{
> > > +    VirtIOPMEM *pmem = VIRTIO_PMEM(vdev);
> > > +    struct virtio_pmem_config *pmemcfg = (struct virtio_pmem_config *)
> > > config;
> > > +
> > > +    virtio_stq_p(vdev, &pmemcfg->start, pmem->start);
> > > +    virtio_stq_p(vdev, &pmemcfg->size, pmem->size);
> > > +}
> > > +
> > > +static uint64_t virtio_pmem_get_features(VirtIODevice *vdev, uint64_t
> > > features,
> > > +                                        Error **errp)
> > > +{
> > > +    return features;
> > > +}
> > > +
> > > +static void virtio_pmem_realize(DeviceState *dev, Error **errp)
> > > +{
> > > +    VirtIODevice   *vdev   = VIRTIO_DEVICE(dev);
> > > +    VirtIOPMEM     *pmem   = VIRTIO_PMEM(dev);
> > > +    MachineState   *ms     = MACHINE(qdev_get_machine());
> > > +    uint64_t align;
> > > +    Error *local_err = NULL;
> > > +    MemoryRegion *mr;
> > > +
> > > +    if (!pmem->memdev) {
> > > +        error_setg(errp, "virtio-pmem memdev not set");
> > > +        return;
> > > +    }
> > > +
> > > +    mr  = host_memory_backend_get_memory(pmem->memdev);
> > > +    align = memory_region_get_alignment(mr);
> > > +    pmem->size = QEMU_ALIGN_DOWN(memory_region_size(mr), align);
> > > +    pmem->start = memory_device_get_free_addr(ms, NULL, align, pmem->size,
> > > +
> > > &local_err);
> > > +    if (local_err) {
> > > +        error_setg(errp, "Can't get free address in mem device");
> > > +        return;
> > > +    }
> > > +    memory_region_init_alias(&pmem->mr, OBJECT(pmem),
> > > +                             "virtio_pmem-memory", mr, 0, pmem->size);
> > > +    memory_device_plug_region(ms, &pmem->mr, pmem->start);
> > > +
> > > +    host_memory_backend_set_mapped(pmem->memdev, true);
> > > +    virtio_init(vdev, TYPE_VIRTIO_PMEM, VIRTIO_ID_PMEM,
> > > +                                          sizeof(struct
> > > virtio_pmem_config));
> > > +    pmem->rq_vq = virtio_add_queue(vdev, 128, virtio_pmem_flush);
> > > +}
> > > +
> > > +static void virtio_mem_check_memdev(Object *obj, const char *name, Object
> > > *val,
> > > +                                    Error **errp)
> > > +{
> > > +    if (host_memory_backend_is_mapped(MEMORY_BACKEND(val))) {
> > > +        char *path = object_get_canonical_path_component(val);
> > > +        error_setg(errp, "Can't use already busy memdev: %s", path);
> > > +        g_free(path);
> > > +        return;
> > > +    }
> > > +
> > > +    qdev_prop_allow_set_link_before_realize(obj, name, val, errp);
> > > +}
> > > +
> > > +static const char *virtio_pmem_get_device_id(VirtIOPMEM *vm)
> > > +{
> > > +    Object *obj = OBJECT(vm);
> > > +    DeviceState *parent_dev;
> > > +
> > > +    /* always use the ID of the proxy device */
> > > +    if (obj->parent && object_dynamic_cast(obj->parent, TYPE_DEVICE)) {
> > > +        parent_dev = DEVICE(obj->parent);
> > > +        return parent_dev->id;
> > > +    }
> > > +    return NULL;
> > > +}
> > > +
> > > +static void virtio_pmem_md_fill_device_info(const MemoryDeviceState *md,
> > > +                                           MemoryDeviceInfo *info)
> > > +{
> > > +    VirtioPMemDeviceInfo *vi = g_new0(VirtioPMemDeviceInfo, 1);
> > > +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> > > +    const char *id = virtio_pmem_get_device_id(vm);
> > > +
> > > +    if (id) {
> > > +        vi->has_id = true;
> > > +        vi->id = g_strdup(id);
> > > +    }
> > > +
> > > +    vi->start = vm->start;
> > > +    vi->size = vm->size;
> > > +    vi->memdev = object_get_canonical_path(OBJECT(vm->memdev));
> > > +
> > > +    info->u.virtio_pmem.data = vi;
> > > +    info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_PMEM;
> > > +}
> > > +
> > > +static uint64_t virtio_pmem_md_get_addr(const MemoryDeviceState *md)
> > > +{
> > > +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> > > +
> > > +    return vm->start;
> > > +}
> > > +
> > > +static uint64_t virtio_pmem_md_get_plugged_size(const MemoryDeviceState
> > > *md)
> > > +{
> > > +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> > > +
> > > +    return vm->size;
> > > +}
> > > +
> > > +static uint64_t virtio_pmem_md_get_region_size(const MemoryDeviceState
> > > *md)
> > > +{
> > > +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> > > +
> > > +    return vm->size;
> > > +}
> > > +
> > > +static void virtio_pmem_instance_init(Object *obj)
> > > +{
> > > +    VirtIOPMEM *vm = VIRTIO_PMEM(obj);
> > > +    object_property_add_link(obj, "memdev", TYPE_MEMORY_BACKEND,
> > > +                                (Object **)&vm->memdev,
> > > +                                (void *) virtio_mem_check_memdev,
> > > +                                OBJ_PROP_LINK_STRONG,
> > > +                                &error_abort);
> > > +}
> > > +
> > > +
> > > +static void virtio_pmem_class_init(ObjectClass *klass, void *data)
> > > +{
> > > +    VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
> > > +    MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(klass);
> > > +
> > > +    vdc->realize      =  virtio_pmem_realize;
> > > +    vdc->get_config   =  virtio_pmem_get_config;
> > > +    vdc->get_features =  virtio_pmem_get_features;
> > > +
> > > +    mdc->get_addr         = virtio_pmem_md_get_addr;
> > > +    mdc->get_plugged_size = virtio_pmem_md_get_plugged_size;
> > > +    mdc->get_region_size  = virtio_pmem_md_get_region_size;
> > > +    mdc->fill_device_info = virtio_pmem_md_fill_device_info;
> > > +}
> > > +
> > > +static TypeInfo virtio_pmem_info = {
> > > +    .name          = TYPE_VIRTIO_PMEM,
> > > +    .parent        = TYPE_VIRTIO_DEVICE,
> > > +    .class_init    = virtio_pmem_class_init,
> > > +    .instance_size = sizeof(VirtIOPMEM),
> > > +    .instance_init = virtio_pmem_instance_init,
> > > +    .interfaces = (InterfaceInfo[]) {
> > > +        { TYPE_MEMORY_DEVICE },
> > > +        { }
> > > +  },
> > > +};
> > > +
> > > +static void virtio_register_types(void)
> > > +{
> > > +    type_register_static(&virtio_pmem_info);
> > > +}
> > > +
> > > +type_init(virtio_register_types)
> > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > index 990d6fcbde..28829b6437 100644
> > > --- a/include/hw/pci/pci.h
> > > +++ b/include/hw/pci/pci.h
> > > @@ -85,6 +85,7 @@ extern bool pci_available;
> > >  #define PCI_DEVICE_ID_VIRTIO_RNG         0x1005
> > >  #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
> > >  #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
> > > +#define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
> > >  
> > >  #define PCI_VENDOR_ID_REDHAT             0x1b36
> > >  #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
> > > diff --git a/include/hw/virtio/virtio-pmem.h
> > > b/include/hw/virtio/virtio-pmem.h
> > > new file mode 100644
> > > index 0000000000..fda3ee691c
> > > --- /dev/null
> > > +++ b/include/hw/virtio/virtio-pmem.h
> > > @@ -0,0 +1,42 @@
> > > +/*
> > > + * Virtio pmem Device
> > > + *
> > > + * Copyright Red Hat, Inc. 2018
> > > + * Copyright Pankaj Gupta <pagupta@redhat.com>
> > > + *
> > > + * This work is licensed under the terms of the GNU GPL, version 2 or
> > > + * (at your option) any later version.  See the COPYING file in the
> > > + * top-level directory.
> > > + */
> > > +
> > > +#ifndef QEMU_VIRTIO_PMEM_H
> > > +#define QEMU_VIRTIO_PMEM_H
> > > +
> > > +#include "hw/virtio/virtio.h"
> > > +#include "exec/memory.h"
> > > +#include "sysemu/hostmem.h"
> > > +#include "standard-headers/linux/virtio_ids.h"
> > > +#include "hw/boards.h"
> > > +#include "hw/i386/pc.h"
> > > +
> > > +#define TYPE_VIRTIO_PMEM "virtio-pmem"
> > > +
> > > +#define VIRTIO_PMEM(obj) \
> > > +        OBJECT_CHECK(VirtIOPMEM, (obj), TYPE_VIRTIO_PMEM)
> > > +
> > > +/* VirtIOPMEM device structure */
> > > +typedef struct VirtIOPMEM {
> > > +    VirtIODevice parent_obj;
> > > +
> > > +    VirtQueue *rq_vq;
> > > +    uint64_t start;
> > > +    uint64_t size;
> > > +    MemoryRegion mr;
> > > +    HostMemoryBackend *memdev;
> > > +} VirtIOPMEM;
> > > +
> > > +struct virtio_pmem_config {
> > > +    uint64_t start;
> > > +    uint64_t size;
> > > +};
> > > +#endif
> > > diff --git a/include/standard-headers/linux/virtio_ids.h
> > > b/include/standard-headers/linux/virtio_ids.h
> > > index 6d5c3b2d4f..346389565a 100644
> > > --- a/include/standard-headers/linux/virtio_ids.h
> > > +++ b/include/standard-headers/linux/virtio_ids.h
> > > @@ -43,5 +43,6 @@
> > >  #define VIRTIO_ID_INPUT        18 /* virtio input */
> > >  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
> > >  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> > > +#define VIRTIO_ID_PMEM         25 /* virtio pmem */
> > >  
> > >  #endif /* _LINUX_VIRTIO_IDS_H */
> > > diff --git a/qapi/misc.json b/qapi/misc.json
> > > index d450cfef21..517376b866 100644
> > > --- a/qapi/misc.json
> > > +++ b/qapi/misc.json
> > > @@ -2907,6 +2907,29 @@
> > >            }
> > >  }
> > >  
> > > +##
> > > +# @VirtioPMemDeviceInfo:
> > > +#
> > > +# VirtioPMem state information
> > > +#
> > > +# @id: device's ID
> > > +#
> > > +# @start: physical address, where device is mapped
> > > +#
> > > +# @size: size of memory that the device provides
> > > +#
> > > +# @memdev: memory backend linked with device
> > > +#
> > > +# Since: 3.1
> > > +##
> > > +{ 'struct': 'VirtioPMemDeviceInfo',
> > > +  'data': { '*id': 'str',
> > > +            'start': 'size',
> > > +            'size': 'size',
> > > +            'memdev': 'str'
> > > +          }
> > > +}
> > > +
> > >  ##
> > >  # @MemoryDeviceInfo:
> > >  #
> > > @@ -2916,7 +2939,8 @@
> > >  ##
> > >  { 'union': 'MemoryDeviceInfo',
> > >    'data': { 'dimm': 'PCDIMMDeviceInfo',
> > > -            'nvdimm': 'PCDIMMDeviceInfo'
> > > +            'nvdimm': 'PCDIMMDeviceInfo',
> > > +	    'virtio-pmem': 'VirtioPMemDeviceInfo'
> > >            }
> > >  }
> > >    
> > 
> >   
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3]  virtio-pmem: Add virtio pmem driver
  2018-09-13 12:19       ` Luiz Capitulino
@ 2018-09-14 12:13         ` Pankaj Gupta
  0 siblings, 0 replies; 22+ messages in thread
From: Pankaj Gupta @ 2018-09-14 12:13 UTC (permalink / raw)
  To: Luiz Capitulino
  Cc: kwolf, jack, xiaoguangrong eric, kvm, riel, linux-nvdimm, david,
	ross zwisler, linux-kernel, qemu-devel, hch, imammedo, mst,
	stefanha, niteshnarayanlal, pbonzini, dan j williams, nilal


> 
> > Hi Luiz,
> > 
> > Thanks for the review.
> > 
> > >   
> > > > This patch adds virtio-pmem driver for KVM guest.
> > > > 
> > > > Guest reads the persistent memory range information from
> > > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > > creates a nd_region object with the persistent memory
> > > > range information so that existing 'nvdimm/pmem' driver
> > > > can reserve this into system memory map. This way
> > > > 'virtio-pmem' driver uses existing functionality of pmem
> > > > driver to register persistent memory compatible for DAX
> > > > capable filesystems.
> > > > 
> > > > This also provides function to perform guest flush over
> > > > VIRTIO from 'pmem' driver when userspace performs flush
> > > > on DAX memory range.
> > > > 
> > > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > > > ---
> > > >  drivers/virtio/Kconfig           |   9 ++
> > > >  drivers/virtio/Makefile          |   1 +
> > > >  drivers/virtio/virtio_pmem.c     | 255
> > > >  +++++++++++++++++++++++++++++++++++++++
> > > >  include/uapi/linux/virtio_ids.h  |   1 +
> > > >  include/uapi/linux/virtio_pmem.h |  40 ++++++
> > > >  5 files changed, 306 insertions(+)
> > > >  create mode 100644 drivers/virtio/virtio_pmem.c
> > > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > > > 
> > > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > > > index 3589764..a331e23 100644
> > > > --- a/drivers/virtio/Kconfig
> > > > +++ b/drivers/virtio/Kconfig
> > > > @@ -42,6 +42,15 @@ config VIRTIO_PCI_LEGACY
> > > >  
> > > >  	  If unsure, say Y.
> > > >  
> > > > +config VIRTIO_PMEM
> > > > +	tristate "Support for virtio pmem driver"
> > > > +	depends on VIRTIO
> > > > +	help
> > > > +	This driver provides support for virtio based flushing interface
> > > > +	for persistent memory range.
> > > > +
> > > > +	If unsure, say M.
> > > > +
> > > >  config VIRTIO_BALLOON
> > > >  	tristate "Virtio balloon driver"
> > > >  	depends on VIRTIO
> > > > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > > > index 3a2b5c5..cbe91c6 100644
> > > > --- a/drivers/virtio/Makefile
> > > > +++ b/drivers/virtio/Makefile
> > > > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> > > >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> > > >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> > > >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > > > +obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o
> > > > diff --git a/drivers/virtio/virtio_pmem.c
> > > > b/drivers/virtio/virtio_pmem.c
> > > > new file mode 100644
> > > > index 0000000..c22cc87
> > > > --- /dev/null
> > > > +++ b/drivers/virtio/virtio_pmem.c
> > > > @@ -0,0 +1,255 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * virtio_pmem.c: Virtio pmem Driver
> > > > + *
> > > > + * Discovers persistent memory range information
> > > > + * from host and provides a virtio based flushing
> > > > + * interface.
> > > > + */
> > > > +#include <linux/virtio.h>
> > > > +#include <linux/module.h>
> > > > +#include <linux/virtio_ids.h>
> > > > +#include <linux/virtio_config.h>
> > > > +#include <uapi/linux/virtio_pmem.h>
> > > > +#include <linux/spinlock.h>
> > > > +#include <linux/libnvdimm.h>
> > > > +#include <linux/nd.h>
> > > > +
> > > > +struct virtio_pmem_request {
> > > > +	/* Host return status corresponding to flush request */
> > > > +	int ret;
> > > > +
> > > > +	/* command name*/
> > > > +	char name[16];
> > > > +
> > > > +	/* Wait queue to process deferred work after ack from host */
> > > > +	wait_queue_head_t host_acked;
> > > > +	bool done;
> > > > +
> > > > +	/* Wait queue to process deferred work after virt queue buffer avail
> > > > */
> > > > +	wait_queue_head_t wq_buf;
> > > > +	bool wq_buf_avail;
> > > > +	struct list_head list;
> > > > +};
> > > > +
> > > > +struct virtio_pmem {
> > > > +	struct virtio_device *vdev;
> > > > +
> > > > +	/* Virtio pmem request queue */
> > > > +	struct virtqueue *req_vq;
> > > > +
> > > > +	/* nvdimm bus registers virtio pmem device */
> > > > +	struct nvdimm_bus *nvdimm_bus;
> > > > +	struct nvdimm_bus_descriptor nd_desc;
> > > > +
> > > > +	/* List to store deferred work if virtqueue is full */
> > > > +	struct list_head req_list;
> > > > +
> > > > +	/* Synchronize virtqueue data */
> > > > +	spinlock_t pmem_lock;
> > > > +
> > > > +	/* Memory region information */
> > > > +	uint64_t start;
> > > > +	uint64_t size;
> > > > +};
> > > > +
> > > > +static struct virtio_device_id id_table[] = {
> > > > +	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > > > +	{ 0 },
> > > > +};
> > > > +
> > > > + /* The interrupt handler */
> > > > +static void host_ack(struct virtqueue *vq)
> > > > +{
> > > > +	unsigned int len;
> > > > +	unsigned long flags;
> > > > +	struct virtio_pmem_request *req, *req_buf;
> > > > +	struct virtio_pmem *vpmem = vq->vdev->priv;
> > > > +
> > > > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > > +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > > > +		req->done = true;
> > > > +		wake_up(&req->host_acked);
> > > > +
> > > > +		if (!list_empty(&vpmem->req_list)) {
> > > > +			req_buf = list_first_entry(&vpmem->req_list,
> > > > +					struct virtio_pmem_request, list);
> > > > +			list_del(&vpmem->req_list);
> > > > +			req_buf->wq_buf_avail = true;
> > > > +			wake_up(&req_buf->wq_buf);
> > > > +		}
> > > > +	}
> > > > +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > > +}
> > > > + /* Initialize virt queue */
> > > > +static int init_vq(struct virtio_pmem *vpmem)
> > > > +{
> > > > +	struct virtqueue *vq;
> > > > +
> > > > +	/* single vq */
> > > > +	vpmem->req_vq = vq = virtio_find_single_vq(vpmem->vdev,
> > > > +				host_ack, "flush_queue");
> > > > +	if (IS_ERR(vq))
> > > > +		return PTR_ERR(vq);
> > > > +
> > > > +	spin_lock_init(&vpmem->pmem_lock);
> > > > +	INIT_LIST_HEAD(&vpmem->req_list);
> > > > +
> > > > +	return 0;
> > > > +};
> > > > +
> > > > + /* The request submission function */
> > > > +static int virtio_pmem_flush(struct nd_region *nd_region)
> > > > +{
> > > > +	int err;
> > > > +	unsigned long flags;
> > > > +	struct scatterlist *sgs[2], sg, ret;
> > > > +	struct virtio_device *vdev =
> > > > +		dev_to_virtio(nd_region->dev.parent->parent);
> > > > +	struct virtio_pmem *vpmem = vdev->priv;
> > > 
> > > I'm missing a might_sleep() call in this function.
> > 
> > I am not sure if we need might_sleep here?
> > We can add it as debugging aid for detecting any problems
> > in sleeping from acquired atomic context?
> 
> Yes. Since this function sleeps and since some functions that
> may run in atomic context call it, it's a good idea to
> call might_sleep().

o.k Will add might_sleep.

> 
> > > > +	struct virtio_pmem_request *req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > > +
> > > > +	if (!req)
> > > > +		return -ENOMEM;
> > > > +
> > > > +	req->done = req->wq_buf_avail = false;
> > > > +	strcpy(req->name, "FLUSH");
> > > > +	init_waitqueue_head(&req->host_acked);
> > > > +	init_waitqueue_head(&req->wq_buf);
> > > > +
> > > > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > > +	sg_init_one(&sg, req->name, strlen(req->name));
> > > > +	sgs[0] = &sg;
> > > > +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > > > +	sgs[1] = &ret;
> > > 
> > > It seems that sg_init_one() is only setting fields, in this
> > > case you can move spin_lock_irqsave() here.
> > 
> > yes, will move spin_lock_irqsave here.
> > 
> > >   
> > > > +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > > > +	if (err) {
> > > > +		dev_err(&vdev->dev, "failed to send command to virtio pmem
> > > > device\n");
> > > > +
> > > > +		list_add_tail(&vpmem->req_list, &req->list);
> > > > +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > > +
> > > > +		/* When host has read buffer, this completes via host_ack */
> > > > +		wait_event(req->wq_buf, req->wq_buf_avail);
> > > > +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > 
> > > Is this error handling code assuming that at some point
> > > virtqueue_add_sgs() will succeed for a different thread? If yes,
> > > what happens if the assumption is false? That is, what happens if
> > > virtqueue_add_sgs() never succeeds anymore?
> > 
> > virtqueue_add_sgs will not succeed and corresponding thread should wait.
> > All subsequent calling threads should also wait. As soon as there is first
> > available free entry(from host), first waiting thread is acknowledged.
> > 
> > In worst case if Qemu is not utilizing any of the used buffer will keep
> > multiple threads waiting.
> > 
> > > 
> > > Why not just return an error?
> > 
> > As per suggestion by Stefan in previous discussion: if the virtqueue is
> > full.
> > Printing a message and failing the flush isn't appropriate.  This thread
> > needs to
> > wait until virtqueue space becomes available.
> 
> If virtqueue_add_sgs() is guaranteed to succeed at some point then OK.
> Otherwise, you'll get threads getting stuck forever.

We are handling here 'virtqueue_add_sgs' failure when virtqueue is full.

For regular virtqueue full case, guest threads should wait. This scales for
more number of fsync requests than current virtqueue size and avoids returning
failure to userspace.

Even if we return error when qemu threads are stuck, every time we return error
unless threads actually progress and free an entry in virtqueue. 
 
> 
> > > > +	}
> > > > +	virtqueue_kick(vpmem->req_vq);
> > > > +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > > +
> > > > +	/* When host has read buffer, this completes via host_ack */
> > > > +	wait_event(req->host_acked, req->done);
> > > > +	err = req->ret;
> > > 
> > > If I'm understanding the QEMU code correctly, you're returning EIO
> > > from QEMU if fsync() fails. I think this is wrong, since we don't know
> > > if EIO in QEMU will be the same EIO in the guest. One way to solve this
> > > would be to return 0 for success and 1 for failure from QEMU, and let the
> > > guest implementation pick its error code (for your implementation it
> > > could be EIO).
> > 
> > Makes sense, will change this.
> > 
> > Thanks,
> > Pankaj
> > >   
> > > > +	kfree(req);
> > > > +
> > > > +	return err;
> > > > +};
> > > > +EXPORT_SYMBOL_GPL(virtio_pmem_flush);
> > > > +
> > > > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > > > +{
> > > > +	int err = 0;
> > > > +	struct resource res;
> > > > +	struct virtio_pmem *vpmem;
> > > > +	struct nvdimm_bus *nvdimm_bus;
> > > > +	struct nd_region_desc ndr_desc;
> > > > +	int nid = dev_to_node(&vdev->dev);
> > > > +	struct nd_region *nd_region;
> > > > +
> > > > +	if (!vdev->config->get) {
> > > > +		dev_err(&vdev->dev, "%s failure: config disabled\n",
> > > > +			__func__);
> > > > +		return -EINVAL;
> > > > +	}
> > > > +
> > > > +	vdev->priv = vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem),
> > > > +			GFP_KERNEL);
> > > > +	if (!vpmem) {
> > > > +		err = -ENOMEM;
> > > > +		goto out_err;
> > > > +	}
> > > > +
> > > > +	vpmem->vdev = vdev;
> > > > +	err = init_vq(vpmem);
> > > > +	if (err)
> > > > +		goto out_err;
> > > > +
> > > > +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > > +			start, &vpmem->start);
> > > > +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > > +			size, &vpmem->size);
> > > > +
> > > > +	res.start = vpmem->start;
> > > > +	res.end   = vpmem->start + vpmem->size-1;
> > > > +	vpmem->nd_desc.provider_name = "virtio-pmem";
> > > > +	vpmem->nd_desc.module = THIS_MODULE;
> > > > +
> > > > +	vpmem->nvdimm_bus = nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > > > +						&vpmem->nd_desc);
> > > > +	if (!nvdimm_bus)
> > > > +		goto out_vq;
> > > > +
> > > > +	dev_set_drvdata(&vdev->dev, nvdimm_bus);
> > > > +	memset(&ndr_desc, 0, sizeof(ndr_desc));
> > > > +
> > > > +	ndr_desc.res = &res;
> > > > +	ndr_desc.numa_node = nid;
> > > > +	ndr_desc.flush = virtio_pmem_flush;
> > > > +	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > > > +	nd_region = nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc);
> > > > +
> > > > +	if (!nd_region)
> > > > +		goto out_nd;
> > > > +
> > > > +	//virtio_device_ready(vdev);
> > > > +	return 0;
> > > > +out_nd:
> > > > +	err = -ENXIO;
> > > > +	nvdimm_bus_unregister(nvdimm_bus);
> > > > +out_vq:
> > > > +	vdev->config->del_vqs(vdev);
> > > > +out_err:
> > > > +	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > > > +	return err;
> > > > +}
> > > > +
> > > > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > > > +{
> > > > +	struct virtio_pmem *vpmem = vdev->priv;
> > > > +	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > > > +
> > > > +	nvdimm_bus_unregister(nvdimm_bus);
> > > > +	vdev->config->del_vqs(vdev);
> > > > +	kfree(vpmem);
> > > > +}
> > > > +
> > > > +#ifdef CONFIG_PM_SLEEP
> > > > +static int virtio_pmem_freeze(struct virtio_device *vdev)
> > > > +{
> > > > +	/* todo: handle freeze function */
> > > > +	return -EPERM;
> > > > +}
> > > > +
> > > > +static int virtio_pmem_restore(struct virtio_device *vdev)
> > > > +{
> > > > +	/* todo: handle restore function */
> > > > +	return -EPERM;
> > > > +}
> > > > +#endif
> > > > +
> > > > +
> > > > +static struct virtio_driver virtio_pmem_driver = {
> > > > +	.driver.name		= KBUILD_MODNAME,
> > > > +	.driver.owner		= THIS_MODULE,
> > > > +	.id_table		= id_table,
> > > > +	.probe			= virtio_pmem_probe,
> > > > +	.remove			= virtio_pmem_remove,
> > > > +#ifdef CONFIG_PM_SLEEP
> > > > +	.freeze                 = virtio_pmem_freeze,
> > > > +	.restore                = virtio_pmem_restore,
> > > > +#endif
> > > > +};
> > > > +
> > > > +module_virtio_driver(virtio_pmem_driver);
> > > > +MODULE_DEVICE_TABLE(virtio, id_table);
> > > > +MODULE_DESCRIPTION("Virtio pmem driver");
> > > > +MODULE_LICENSE("GPL");
> > > > diff --git a/include/uapi/linux/virtio_ids.h
> > > > b/include/uapi/linux/virtio_ids.h
> > > > index 6d5c3b2..3463895 100644
> > > > --- a/include/uapi/linux/virtio_ids.h
> > > > +++ b/include/uapi/linux/virtio_ids.h
> > > > @@ -43,5 +43,6 @@
> > > >  #define VIRTIO_ID_INPUT        18 /* virtio input */
> > > >  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
> > > >  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> > > > +#define VIRTIO_ID_PMEM         25 /* virtio pmem */
> > > >  
> > > >  #endif /* _LINUX_VIRTIO_IDS_H */
> > > > diff --git a/include/uapi/linux/virtio_pmem.h
> > > > b/include/uapi/linux/virtio_pmem.h
> > > > new file mode 100644
> > > > index 0000000..c7c22a5
> > > > --- /dev/null
> > > > +++ b/include/uapi/linux/virtio_pmem.h
> > > > @@ -0,0 +1,40 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0 */
> > > > +/*
> > > > + * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed
> > > > so
> > > > + * anyone can use the definitions to implement compatible
> > > > drivers/servers:
> > > > + *
> > > > + *
> > > > + * Redistribution and use in source and binary forms, with or without
> > > > + * modification, are permitted provided that the following conditions
> > > > + * are met:
> > > > + * 1. Redistributions of source code must retain the above copyright
> > > > + *    notice, this list of conditions and the following disclaimer.
> > > > + * 2. Redistributions in binary form must reproduce the above
> > > > copyright
> > > > + *    notice, this list of conditions and the following disclaimer in
> > > > the
> > > > + *    documentation and/or other materials provided with the
> > > > distribution.
> > > > + * 3. Neither the name of IBM nor the names of its contributors
> > > > + *    may be used to endorse or promote products derived from this
> > > > software
> > > > + *    without specific prior written permission.
> > > > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> > > > ``AS IS''
> > > > + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
> > > > TO,
> > > > THE
> > > > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
> > > > PURPOSE
> > > > + * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
> > > > + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
> > > > CONSEQUENTIAL
> > > > + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE
> > > > GOODS
> > > > + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
> > > > INTERRUPTION)
> > > > + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> > > > STRICT
> > > > + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
> > > > ANY
> > > > WAY
> > > > + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
> > > > OF
> > > > + * SUCH DAMAGE.
> > > > + *
> > > > + * Copyright (C) Red Hat, Inc., 2018-2019
> > > > + * Copyright (C) Pankaj Gupta <pagupta@redhat.com>, 2018
> > > > + */
> > > > +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> > > > +#define _UAPI_LINUX_VIRTIO_PMEM_H
> > > > +
> > > > +struct virtio_pmem_config {
> > > > +	__le64 start;
> > > > +	__le64 size;
> > > > +};
> > > > +#endif
> > > 
> > > 
> > >   
> > 
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH] qemu: Add virtio pmem device
  2018-08-31 13:30 ` [PATCH] qemu: Add virtio pmem device Pankaj Gupta
  2018-09-12 16:57   ` Luiz Capitulino
@ 2018-09-20 11:21   ` David Hildenbrand
  2018-09-20 12:03     ` [Qemu-devel] " Pankaj Gupta
  1 sibling, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2018-09-20 11:21 UTC (permalink / raw)
  To: Pankaj Gupta, linux-kernel, kvm, qemu-devel, linux-nvdimm
  Cc: jack, stefanha, dan.j.williams, riel, nilal, kwolf, pbonzini,
	ross.zwisler, xiaoguangrong.eric, hch, mst, niteshnarayanlal,
	lcapitulino, imammedo, eblake

> @@ -0,0 +1,241 @@
> +/*
> + * Virtio pmem device
> + *
> + * Copyright (C) 2018 Red Hat, Inc.
> + * Copyright (C) 2018 Pankaj Gupta <pagupta@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu-common.h"
> +#include "qemu/error-report.h"
> +#include "hw/virtio/virtio-access.h"
> +#include "hw/virtio/virtio-pmem.h"
> +#include "hw/mem/memory-device.h"
> +#include "block/aio.h"
> +#include "block/thread-pool.h"
> +
> +typedef struct VirtIOPMEMresp {
> +    int ret;
> +} VirtIOPMEMResp;
> +
> +typedef struct VirtIODeviceRequest {
> +    VirtQueueElement elem;
> +    int fd;
> +    VirtIOPMEM *pmem;
> +    VirtIOPMEMResp resp;
> +} VirtIODeviceRequest;

Both, response and request have to go to a linux header (and a header
sync patch).

Also, you are using the same request for host<->guest handling and
internal purposes. The fd or pmem pointer definitely don't belong here.
Use a separate struct for internal handling purposes. (passing to worker_cb)

> +
> +static int worker_cb(void *opaque)
> +{
> +    VirtIODeviceRequest *req = opaque;
> +    int err = 0;
> +
> +    /* flush raw backing image */
> +    err = fsync(req->fd);
> +    if (err != 0) {
> +        err = EIO;
> +    }
> +    req->resp.ret = err;
> +
> +    return 0;
> +}
> +
> +static void done_cb(void *opaque, int ret)
> +{
> +    VirtIODeviceRequest *req = opaque;
> +    int len = iov_from_buf(req->elem.in_sg, req->elem.in_num, 0,
> +                              &req->resp, sizeof(VirtIOPMEMResp));
> +
> +    /* Callbacks are serialized, so no need to use atomic ops.  */
> +    virtqueue_push(req->pmem->rq_vq, &req->elem, len);
> +    virtio_notify((VirtIODevice *)req->pmem, req->pmem->rq_vq);
> +    g_free(req);
> +}
> +
> +static void virtio_pmem_flush(VirtIODevice *vdev, VirtQueue *vq)
> +{
> +    VirtIODeviceRequest *req;
> +    VirtIOPMEM *pmem = VIRTIO_PMEM(vdev);
> +    HostMemoryBackend *backend = MEMORY_BACKEND(pmem->memdev);
> +    ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context());
> +
> +    req = virtqueue_pop(vq, sizeof(VirtIODeviceRequest));
> +    if (!req) {
> +        virtio_error(vdev, "virtio-pmem missing request data");
> +        return;
> +    }
> +
> +    if (req->elem.out_num < 1 || req->elem.in_num < 1) {
> +        virtio_error(vdev, "virtio-pmem request not proper");
> +        g_free(req);
> +        return;
> +    }
> +    req->fd = memory_region_get_fd(&backend->mr);
> +    req->pmem = pmem;
> +    thread_pool_submit_aio(pool, worker_cb, req, done_cb, req);
> +}
> +
> +static void virtio_pmem_get_config(VirtIODevice *vdev, uint8_t *config)
> +{
> +    VirtIOPMEM *pmem = VIRTIO_PMEM(vdev);
> +    struct virtio_pmem_config *pmemcfg = (struct virtio_pmem_config *) config;
> +
> +    virtio_stq_p(vdev, &pmemcfg->start, pmem->start);
> +    virtio_stq_p(vdev, &pmemcfg->size, pmem->size);
> +}
> +
> +static uint64_t virtio_pmem_get_features(VirtIODevice *vdev, uint64_t features,
> +                                        Error **errp)
> +{
> +    return features;
> +}
> +
> +static void virtio_pmem_realize(DeviceState *dev, Error **errp)
> +{
> +    VirtIODevice   *vdev   = VIRTIO_DEVICE(dev);
> +    VirtIOPMEM     *pmem   = VIRTIO_PMEM(dev);
> +    MachineState   *ms     = MACHINE(qdev_get_machine());
> +    uint64_t align;
> +    Error *local_err = NULL;
> +    MemoryRegion *mr;
> +
> +    if (!pmem->memdev) {
> +        error_setg(errp, "virtio-pmem memdev not set");
> +        return;
> +    }
> +
> +    mr  = host_memory_backend_get_memory(pmem->memdev);
> +    align = memory_region_get_alignment(mr);
> +    pmem->size = QEMU_ALIGN_DOWN(memory_region_size(mr), align);
> +    pmem->start = memory_device_get_free_addr(ms, NULL, align, pmem->size,
> +                                                               &local_err);
> +    if (local_err) {
> +        error_setg(errp, "Can't get free address in mem device");
> +        return;
> +    }
> +    memory_region_init_alias(&pmem->mr, OBJECT(pmem),
> +                             "virtio_pmem-memory", mr, 0, pmem->size);
> +    memory_device_plug_region(ms, &pmem->mr, pmem->start);
> +
> +    host_memory_backend_set_mapped(pmem->memdev, true);
> +    virtio_init(vdev, TYPE_VIRTIO_PMEM, VIRTIO_ID_PMEM,
> +                                          sizeof(struct virtio_pmem_config));
> +    pmem->rq_vq = virtio_add_queue(vdev, 128, virtio_pmem_flush);
> +}
> +
> +static void virtio_mem_check_memdev(Object *obj, const char *name, Object *val,
> +                                    Error **errp)
> +{
> +    if (host_memory_backend_is_mapped(MEMORY_BACKEND(val))) {
> +        char *path = object_get_canonical_path_component(val);
> +        error_setg(errp, "Can't use already busy memdev: %s", path);
> +        g_free(path);
> +        return;
> +    }
> +
> +    qdev_prop_allow_set_link_before_realize(obj, name, val, errp);
> +}
> +
> +static const char *virtio_pmem_get_device_id(VirtIOPMEM *vm)
> +{
> +    Object *obj = OBJECT(vm);
> +    DeviceState *parent_dev;
> +
> +    /* always use the ID of the proxy device */
> +    if (obj->parent && object_dynamic_cast(obj->parent, TYPE_DEVICE)) {
> +        parent_dev = DEVICE(obj->parent);
> +        return parent_dev->id;
> +    }
> +    return NULL;
> +}
> +
> +static void virtio_pmem_md_fill_device_info(const MemoryDeviceState *md,
> +                                           MemoryDeviceInfo *info)
> +{
> +    VirtioPMemDeviceInfo *vi = g_new0(VirtioPMemDeviceInfo, 1);
> +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> +    const char *id = virtio_pmem_get_device_id(vm);
> +
> +    if (id) {
> +        vi->has_id = true;
> +        vi->id = g_strdup(id);
> +    }
> +
> +    vi->start = vm->start;
> +    vi->size = vm->size;
> +    vi->memdev = object_get_canonical_path(OBJECT(vm->memdev));
> +
> +    info->u.virtio_pmem.data = vi;
> +    info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_PMEM;
> +}
> +
> +static uint64_t virtio_pmem_md_get_addr(const MemoryDeviceState *md)
> +{
> +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> +
> +    return vm->start;
> +}
> +
> +static uint64_t virtio_pmem_md_get_plugged_size(const MemoryDeviceState *md)
> +{
> +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> +
> +    return vm->size;
> +}
> +
> +static uint64_t virtio_pmem_md_get_region_size(const MemoryDeviceState *md)
> +{
> +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> +
> +    return vm->size;
> +}
> +
> +static void virtio_pmem_instance_init(Object *obj)
> +{
> +    VirtIOPMEM *vm = VIRTIO_PMEM(obj);
> +    object_property_add_link(obj, "memdev", TYPE_MEMORY_BACKEND,
> +                                (Object **)&vm->memdev,
> +                                (void *) virtio_mem_check_memdev,
> +                                OBJ_PROP_LINK_STRONG,
> +                                &error_abort);
> +}
> +
> +
> +static void virtio_pmem_class_init(ObjectClass *klass, void *data)
> +{
> +    VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
> +    MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(klass);
> +
> +    vdc->realize      =  virtio_pmem_realize;
> +    vdc->get_config   =  virtio_pmem_get_config;
> +    vdc->get_features =  virtio_pmem_get_features;
> +
> +    mdc->get_addr         = virtio_pmem_md_get_addr;
> +    mdc->get_plugged_size = virtio_pmem_md_get_plugged_size;
> +    mdc->get_region_size  = virtio_pmem_md_get_region_size;
> +    mdc->fill_device_info = virtio_pmem_md_fill_device_info;
> +}
> +
> +static TypeInfo virtio_pmem_info = {
> +    .name          = TYPE_VIRTIO_PMEM,
> +    .parent        = TYPE_VIRTIO_DEVICE,
> +    .class_init    = virtio_pmem_class_init,
> +    .instance_size = sizeof(VirtIOPMEM),
> +    .instance_init = virtio_pmem_instance_init,
> +    .interfaces = (InterfaceInfo[]) {
> +        { TYPE_MEMORY_DEVICE },
> +        { }
> +  },
> +};
> +
> +static void virtio_register_types(void)
> +{
> +    type_register_static(&virtio_pmem_info);
> +}
> +
> +type_init(virtio_register_types)
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index 990d6fcbde..28829b6437 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -85,6 +85,7 @@ extern bool pci_available;
>  #define PCI_DEVICE_ID_VIRTIO_RNG         0x1005
>  #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
>  #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
> +#define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
>  
>  #define PCI_VENDOR_ID_REDHAT             0x1b36
>  #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
> diff --git a/include/hw/virtio/virtio-pmem.h b/include/hw/virtio/virtio-pmem.h
> new file mode 100644
> index 0000000000..fda3ee691c
> --- /dev/null
> +++ b/include/hw/virtio/virtio-pmem.h
> @@ -0,0 +1,42 @@
> +/*
> + * Virtio pmem Device
> + *
> + * Copyright Red Hat, Inc. 2018
> + * Copyright Pankaj Gupta <pagupta@redhat.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or
> + * (at your option) any later version.  See the COPYING file in the
> + * top-level directory.
> + */
> +
> +#ifndef QEMU_VIRTIO_PMEM_H
> +#define QEMU_VIRTIO_PMEM_H
> +
> +#include "hw/virtio/virtio.h"
> +#include "exec/memory.h"
> +#include "sysemu/hostmem.h"
> +#include "standard-headers/linux/virtio_ids.h"
> +#include "hw/boards.h"
> +#include "hw/i386/pc.h"
> +
> +#define TYPE_VIRTIO_PMEM "virtio-pmem"
> +
> +#define VIRTIO_PMEM(obj) \
> +        OBJECT_CHECK(VirtIOPMEM, (obj), TYPE_VIRTIO_PMEM)
> +
> +/* VirtIOPMEM device structure */
> +typedef struct VirtIOPMEM {
> +    VirtIODevice parent_obj;
> +
> +    VirtQueue *rq_vq;
> +    uint64_t start;
> +    uint64_t size;
> +    MemoryRegion mr;
> +    HostMemoryBackend *memdev;
> +} VirtIOPMEM;
> +
> +struct virtio_pmem_config {
> +    uint64_t start;
> +    uint64_t size;
> +};
> +#endif
> diff --git a/include/standard-headers/linux/virtio_ids.h b/include/standard-headers/linux/virtio_ids.h
> index 6d5c3b2d4f..346389565a 100644
> --- a/include/standard-headers/linux/virtio_ids.h
> +++ b/include/standard-headers/linux/virtio_ids.h
> @@ -43,5 +43,6 @@
>  #define VIRTIO_ID_INPUT        18 /* virtio input */
>  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
>  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> +#define VIRTIO_ID_PMEM         25 /* virtio pmem */

This should be moved to a linux header sync patch.




-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu: Add virtio pmem device
  2018-09-20 11:21   ` David Hildenbrand
@ 2018-09-20 12:03     ` " Pankaj Gupta
  0 siblings, 0 replies; 22+ messages in thread
From: Pankaj Gupta @ 2018-09-20 12:03 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, kvm, qemu-devel, linux-nvdimm, kwolf, jack,
	xiaoguangrong eric, riel, niteshnarayanlal, mst, ross zwisler,
	lcapitulino, hch, stefanha, imammedo, pbonzini, dan j williams,
	nilal


> 
> > @@ -0,0 +1,241 @@
> > +/*
> > + * Virtio pmem device
> > + *
> > + * Copyright (C) 2018 Red Hat, Inc.
> > + * Copyright (C) 2018 Pankaj Gupta <pagupta@redhat.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2.
> > + * See the COPYING file in the top-level directory.
> > + *
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +#include "qapi/error.h"
> > +#include "qemu-common.h"
> > +#include "qemu/error-report.h"
> > +#include "hw/virtio/virtio-access.h"
> > +#include "hw/virtio/virtio-pmem.h"
> > +#include "hw/mem/memory-device.h"
> > +#include "block/aio.h"
> > +#include "block/thread-pool.h"
> > +
> > +typedef struct VirtIOPMEMresp {
> > +    int ret;
> > +} VirtIOPMEMResp;
> > +
> > +typedef struct VirtIODeviceRequest {
> > +    VirtQueueElement elem;
> > +    int fd;
> > +    VirtIOPMEM *pmem;
> > +    VirtIOPMEMResp resp;
> > +} VirtIODeviceRequest;
> 
> Both, response and request have to go to a linux header (and a header
> sync patch).

Sure.

> 
> Also, you are using the same request for host<->guest handling and
> internal purposes. The fd or pmem pointer definitely don't belong here.
> Use a separate struct for internal handling purposes. (passing to worker_cb)

o.k. will add another struct for internal handling.

> 
> > +
> > +static int worker_cb(void *opaque)
> > +{
> > +    VirtIODeviceRequest *req = opaque;
> > +    int err = 0;
> > +
> > +    /* flush raw backing image */
> > +    err = fsync(req->fd);
> > +    if (err != 0) {
> > +        err = EIO;
> > +    }
> > +    req->resp.ret = err;
> > +
> > +    return 0;
> > +}
> > +
> > +static void done_cb(void *opaque, int ret)
> > +{
> > +    VirtIODeviceRequest *req = opaque;
> > +    int len = iov_from_buf(req->elem.in_sg, req->elem.in_num, 0,
> > +                              &req->resp, sizeof(VirtIOPMEMResp));
> > +
> > +    /* Callbacks are serialized, so no need to use atomic ops.  */
> > +    virtqueue_push(req->pmem->rq_vq, &req->elem, len);
> > +    virtio_notify((VirtIODevice *)req->pmem, req->pmem->rq_vq);
> > +    g_free(req);
> > +}
> > +
> > +static void virtio_pmem_flush(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > +    VirtIODeviceRequest *req;
> > +    VirtIOPMEM *pmem = VIRTIO_PMEM(vdev);
> > +    HostMemoryBackend *backend = MEMORY_BACKEND(pmem->memdev);
> > +    ThreadPool *pool = aio_get_thread_pool(qemu_get_aio_context());
> > +
> > +    req = virtqueue_pop(vq, sizeof(VirtIODeviceRequest));
> > +    if (!req) {
> > +        virtio_error(vdev, "virtio-pmem missing request data");
> > +        return;
> > +    }
> > +
> > +    if (req->elem.out_num < 1 || req->elem.in_num < 1) {
> > +        virtio_error(vdev, "virtio-pmem request not proper");
> > +        g_free(req);
> > +        return;
> > +    }
> > +    req->fd = memory_region_get_fd(&backend->mr);
> > +    req->pmem = pmem;
> > +    thread_pool_submit_aio(pool, worker_cb, req, done_cb, req);
> > +}
> > +
> > +static void virtio_pmem_get_config(VirtIODevice *vdev, uint8_t *config)
> > +{
> > +    VirtIOPMEM *pmem = VIRTIO_PMEM(vdev);
> > +    struct virtio_pmem_config *pmemcfg = (struct virtio_pmem_config *)
> > config;
> > +
> > +    virtio_stq_p(vdev, &pmemcfg->start, pmem->start);
> > +    virtio_stq_p(vdev, &pmemcfg->size, pmem->size);
> > +}
> > +
> > +static uint64_t virtio_pmem_get_features(VirtIODevice *vdev, uint64_t
> > features,
> > +                                        Error **errp)
> > +{
> > +    return features;
> > +}
> > +
> > +static void virtio_pmem_realize(DeviceState *dev, Error **errp)
> > +{
> > +    VirtIODevice   *vdev   = VIRTIO_DEVICE(dev);
> > +    VirtIOPMEM     *pmem   = VIRTIO_PMEM(dev);
> > +    MachineState   *ms     = MACHINE(qdev_get_machine());
> > +    uint64_t align;
> > +    Error *local_err = NULL;
> > +    MemoryRegion *mr;
> > +
> > +    if (!pmem->memdev) {
> > +        error_setg(errp, "virtio-pmem memdev not set");
> > +        return;
> > +    }
> > +
> > +    mr  = host_memory_backend_get_memory(pmem->memdev);
> > +    align = memory_region_get_alignment(mr);
> > +    pmem->size = QEMU_ALIGN_DOWN(memory_region_size(mr), align);
> > +    pmem->start = memory_device_get_free_addr(ms, NULL, align, pmem->size,
> > +
> > &local_err);
> > +    if (local_err) {
> > +        error_setg(errp, "Can't get free address in mem device");
> > +        return;
> > +    }
> > +    memory_region_init_alias(&pmem->mr, OBJECT(pmem),
> > +                             "virtio_pmem-memory", mr, 0, pmem->size);
> > +    memory_device_plug_region(ms, &pmem->mr, pmem->start);
> > +
> > +    host_memory_backend_set_mapped(pmem->memdev, true);
> > +    virtio_init(vdev, TYPE_VIRTIO_PMEM, VIRTIO_ID_PMEM,
> > +                                          sizeof(struct
> > virtio_pmem_config));
> > +    pmem->rq_vq = virtio_add_queue(vdev, 128, virtio_pmem_flush);
> > +}
> > +
> > +static void virtio_mem_check_memdev(Object *obj, const char *name, Object
> > *val,
> > +                                    Error **errp)
> > +{
> > +    if (host_memory_backend_is_mapped(MEMORY_BACKEND(val))) {
> > +        char *path = object_get_canonical_path_component(val);
> > +        error_setg(errp, "Can't use already busy memdev: %s", path);
> > +        g_free(path);
> > +        return;
> > +    }
> > +
> > +    qdev_prop_allow_set_link_before_realize(obj, name, val, errp);
> > +}
> > +
> > +static const char *virtio_pmem_get_device_id(VirtIOPMEM *vm)
> > +{
> > +    Object *obj = OBJECT(vm);
> > +    DeviceState *parent_dev;
> > +
> > +    /* always use the ID of the proxy device */
> > +    if (obj->parent && object_dynamic_cast(obj->parent, TYPE_DEVICE)) {
> > +        parent_dev = DEVICE(obj->parent);
> > +        return parent_dev->id;
> > +    }
> > +    return NULL;
> > +}
> > +
> > +static void virtio_pmem_md_fill_device_info(const MemoryDeviceState *md,
> > +                                           MemoryDeviceInfo *info)
> > +{
> > +    VirtioPMemDeviceInfo *vi = g_new0(VirtioPMemDeviceInfo, 1);
> > +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> > +    const char *id = virtio_pmem_get_device_id(vm);
> > +
> > +    if (id) {
> > +        vi->has_id = true;
> > +        vi->id = g_strdup(id);
> > +    }
> > +
> > +    vi->start = vm->start;
> > +    vi->size = vm->size;
> > +    vi->memdev = object_get_canonical_path(OBJECT(vm->memdev));
> > +
> > +    info->u.virtio_pmem.data = vi;
> > +    info->type = MEMORY_DEVICE_INFO_KIND_VIRTIO_PMEM;
> > +}
> > +
> > +static uint64_t virtio_pmem_md_get_addr(const MemoryDeviceState *md)
> > +{
> > +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> > +
> > +    return vm->start;
> > +}
> > +
> > +static uint64_t virtio_pmem_md_get_plugged_size(const MemoryDeviceState
> > *md)
> > +{
> > +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> > +
> > +    return vm->size;
> > +}
> > +
> > +static uint64_t virtio_pmem_md_get_region_size(const MemoryDeviceState
> > *md)
> > +{
> > +    VirtIOPMEM *vm = VIRTIO_PMEM(md);
> > +
> > +    return vm->size;
> > +}
> > +
> > +static void virtio_pmem_instance_init(Object *obj)
> > +{
> > +    VirtIOPMEM *vm = VIRTIO_PMEM(obj);
> > +    object_property_add_link(obj, "memdev", TYPE_MEMORY_BACKEND,
> > +                                (Object **)&vm->memdev,
> > +                                (void *) virtio_mem_check_memdev,
> > +                                OBJ_PROP_LINK_STRONG,
> > +                                &error_abort);
> > +}
> > +
> > +
> > +static void virtio_pmem_class_init(ObjectClass *klass, void *data)
> > +{
> > +    VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
> > +    MemoryDeviceClass *mdc = MEMORY_DEVICE_CLASS(klass);
> > +
> > +    vdc->realize      =  virtio_pmem_realize;
> > +    vdc->get_config   =  virtio_pmem_get_config;
> > +    vdc->get_features =  virtio_pmem_get_features;
> > +
> > +    mdc->get_addr         = virtio_pmem_md_get_addr;
> > +    mdc->get_plugged_size = virtio_pmem_md_get_plugged_size;
> > +    mdc->get_region_size  = virtio_pmem_md_get_region_size;
> > +    mdc->fill_device_info = virtio_pmem_md_fill_device_info;
> > +}
> > +
> > +static TypeInfo virtio_pmem_info = {
> > +    .name          = TYPE_VIRTIO_PMEM,
> > +    .parent        = TYPE_VIRTIO_DEVICE,
> > +    .class_init    = virtio_pmem_class_init,
> > +    .instance_size = sizeof(VirtIOPMEM),
> > +    .instance_init = virtio_pmem_instance_init,
> > +    .interfaces = (InterfaceInfo[]) {
> > +        { TYPE_MEMORY_DEVICE },
> > +        { }
> > +  },
> > +};
> > +
> > +static void virtio_register_types(void)
> > +{
> > +    type_register_static(&virtio_pmem_info);
> > +}
> > +
> > +type_init(virtio_register_types)
> > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > index 990d6fcbde..28829b6437 100644
> > --- a/include/hw/pci/pci.h
> > +++ b/include/hw/pci/pci.h
> > @@ -85,6 +85,7 @@ extern bool pci_available;
> >  #define PCI_DEVICE_ID_VIRTIO_RNG         0x1005
> >  #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
> >  #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
> > +#define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
> >  
> >  #define PCI_VENDOR_ID_REDHAT             0x1b36
> >  #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
> > diff --git a/include/hw/virtio/virtio-pmem.h
> > b/include/hw/virtio/virtio-pmem.h
> > new file mode 100644
> > index 0000000000..fda3ee691c
> > --- /dev/null
> > +++ b/include/hw/virtio/virtio-pmem.h
> > @@ -0,0 +1,42 @@
> > +/*
> > + * Virtio pmem Device
> > + *
> > + * Copyright Red Hat, Inc. 2018
> > + * Copyright Pankaj Gupta <pagupta@redhat.com>
> > + *
> > + * This work is licensed under the terms of the GNU GPL, version 2 or
> > + * (at your option) any later version.  See the COPYING file in the
> > + * top-level directory.
> > + */
> > +
> > +#ifndef QEMU_VIRTIO_PMEM_H
> > +#define QEMU_VIRTIO_PMEM_H
> > +
> > +#include "hw/virtio/virtio.h"
> > +#include "exec/memory.h"
> > +#include "sysemu/hostmem.h"
> > +#include "standard-headers/linux/virtio_ids.h"
> > +#include "hw/boards.h"
> > +#include "hw/i386/pc.h"
> > +
> > +#define TYPE_VIRTIO_PMEM "virtio-pmem"
> > +
> > +#define VIRTIO_PMEM(obj) \
> > +        OBJECT_CHECK(VirtIOPMEM, (obj), TYPE_VIRTIO_PMEM)
> > +
> > +/* VirtIOPMEM device structure */
> > +typedef struct VirtIOPMEM {
> > +    VirtIODevice parent_obj;
> > +
> > +    VirtQueue *rq_vq;
> > +    uint64_t start;
> > +    uint64_t size;
> > +    MemoryRegion mr;
> > +    HostMemoryBackend *memdev;
> > +} VirtIOPMEM;
> > +
> > +struct virtio_pmem_config {
> > +    uint64_t start;
> > +    uint64_t size;
> > +};
> > +#endif
> > diff --git a/include/standard-headers/linux/virtio_ids.h
> > b/include/standard-headers/linux/virtio_ids.h
> > index 6d5c3b2d4f..346389565a 100644
> > --- a/include/standard-headers/linux/virtio_ids.h
> > +++ b/include/standard-headers/linux/virtio_ids.h
> > @@ -43,5 +43,6 @@
> >  #define VIRTIO_ID_INPUT        18 /* virtio input */
> >  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
> >  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> > +#define VIRTIO_ID_PMEM         25 /* virtio pmem */
> 
> This should be moved to a linux header sync patch.

Sure.

Thanks,
Pankaj
> 
> 
> 
> 
> --
> 
> Thanks,
> 
> David / dhildenb
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/3] libnvdimm: nd_region flush callback support
  2018-08-31 13:30 ` [PATCH 2/3] libnvdimm: nd_region flush callback support Pankaj Gupta
  2018-09-04 15:29   ` kbuild test robot
@ 2018-09-22  0:43   ` Dan Williams
  1 sibling, 0 replies; 22+ messages in thread
From: Dan Williams @ 2018-09-22  0:43 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Linux Kernel Mailing List, KVM list, Qemu Developers,
	linux-nvdimm, Jan Kara, Stefan Hajnoczi, Rik van Riel,
	Nitesh Narayan Lal, Kevin Wolf, Paolo Bonzini, Zwisler, Ross,
	David Hildenbrand, Xiao Guangrong, Christoph Hellwig,
	Michael S. Tsirkin, niteshnarayanlal, lcapitulino, Igor Mammedov,
	Eric Blake

On Fri, Aug 31, 2018 at 6:32 AM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> This patch adds functionality to perform flush from guest
> to host over VIRTIO. We are registering a callback based
> on 'nd_region' type. virtio_pmem driver requires this special
> flush function. For rest of the region types we are registering
> existing flush function. Report error returned by host fsync
> failure to userspace.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>

This looks ok to me, just some nits below.

> ---
>  drivers/acpi/nfit/core.c     |  7 +++++--
>  drivers/nvdimm/claim.c       |  3 ++-
>  drivers/nvdimm/pmem.c        | 12 ++++++++----
>  drivers/nvdimm/region_devs.c | 12 ++++++++++--
>  include/linux/libnvdimm.h    |  4 +++-
>  5 files changed, 28 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
> index b072cfc..cd63b69 100644
> --- a/drivers/acpi/nfit/core.c
> +++ b/drivers/acpi/nfit/core.c
> @@ -2216,6 +2216,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw,
>  {
>         u64 cmd, offset;
>         struct nfit_blk_mmio *mmio = &nfit_blk->mmio[DCR];
> +       struct nd_region *nd_region = nfit_blk->nd_region;
>
>         enum {
>                 BCW_OFFSET_MASK = (1ULL << 48)-1,
> @@ -2234,7 +2235,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw,
>                 offset = to_interleave_offset(offset, mmio);
>
>         writeq(cmd, mmio->addr.base + offset);
> -       nvdimm_flush(nfit_blk->nd_region);
> +       nd_region->flush(nd_region);

I would keep the indirect function call override inside of
nvdimm_flush. Then this hunk can go away...

>
>         if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
>                 readq(mmio->addr.base + offset);
> @@ -2245,6 +2246,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
>                 unsigned int lane)
>  {
>         struct nfit_blk_mmio *mmio = &nfit_blk->mmio[BDW];
> +       struct nd_region *nd_region = nfit_blk->nd_region;
>         unsigned int copied = 0;
>         u64 base_offset;
>         int rc;
> @@ -2283,7 +2285,8 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
>         }
>
>         if (rw)
> -               nvdimm_flush(nfit_blk->nd_region);
> +               nd_region->flush(nd_region);
> +
>

...ditto, no need to touch this code.

>         rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
>         return rc;
> diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
> index fb667bf..49dce9c 100644
> --- a/drivers/nvdimm/claim.c
> +++ b/drivers/nvdimm/claim.c
> @@ -262,6 +262,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
>  {
>         struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
>         unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
> +       struct nd_region *nd_region = to_nd_region(ndns->dev.parent);
>         sector_t sector = offset >> 9;
>         int rc = 0;
>
> @@ -301,7 +302,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
>         }
>
>         memcpy_flushcache(nsio->addr + offset, buf, size);
> -       nvdimm_flush(to_nd_region(ndns->dev.parent));
> +       nd_region->flush(nd_region);

For this you would need to teach nsio_rw_bytes() that the flush can fail.

>
>         return rc;
>  }
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 6071e29..ba57cfa 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -201,7 +201,8 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
>         struct nd_region *nd_region = to_region(pmem);
>
>         if (bio->bi_opf & REQ_PREFLUSH)
> -               nvdimm_flush(nd_region);
> +               bio->bi_status = nd_region->flush(nd_region);
> +

Let's have nvdimm_flush() return 0 or -EIO if it fails since thats
what nsio_rw_bytes() expects, and you'll need to translate that to:
BLK_STS_IOERR

>
>         do_acct = nd_iostat_start(bio, &start);
>         bio_for_each_segment(bvec, bio, iter) {
> @@ -216,7 +217,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
>                 nd_iostat_end(bio, start);
>
>         if (bio->bi_opf & REQ_FUA)
> -               nvdimm_flush(nd_region);
> +               bio->bi_status = nd_region->flush(nd_region);

Same comment.

>
>         bio_endio(bio);
>         return BLK_QC_T_NONE;
> @@ -517,6 +518,7 @@ static int nd_pmem_probe(struct device *dev)
>  static int nd_pmem_remove(struct device *dev)
>  {
>         struct pmem_device *pmem = dev_get_drvdata(dev);
> +       struct nd_region *nd_region = to_region(pmem);
>
>         if (is_nd_btt(dev))
>                 nvdimm_namespace_detach_btt(to_nd_btt(dev));
> @@ -528,14 +530,16 @@ static int nd_pmem_remove(struct device *dev)
>                 sysfs_put(pmem->bb_state);
>                 pmem->bb_state = NULL;
>         }
> -       nvdimm_flush(to_nd_region(dev->parent));
> +       nd_region->flush(nd_region);

Not needed if the indirect function call moves inside nvdimm_flush().

>
>         return 0;
>  }
>
>  static void nd_pmem_shutdown(struct device *dev)
>  {
> -       nvdimm_flush(to_nd_region(dev->parent));
> +       struct nd_region *nd_region = to_nd_region(dev->parent);
> +
> +       nd_region->flush(nd_region);
>  }
>
>  static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
> diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
> index fa37afc..a170a6b 100644
> --- a/drivers/nvdimm/region_devs.c
> +++ b/drivers/nvdimm/region_devs.c
> @@ -290,7 +290,7 @@ static ssize_t deep_flush_store(struct device *dev, struct device_attribute *att
>                 return rc;
>         if (!flush)
>                 return -EINVAL;
> -       nvdimm_flush(nd_region);
> +       nd_region->flush(nd_region);

Let's pass the error code through if the flush fails.

>
>         return len;
>  }
> @@ -1065,6 +1065,11 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
>         dev->of_node = ndr_desc->of_node;
>         nd_region->ndr_size = resource_size(ndr_desc->res);
>         nd_region->ndr_start = ndr_desc->res->start;
> +       if (ndr_desc->flush)
> +               nd_region->flush = ndr_desc->flush;
> +       else
> +               nd_region->flush = nvdimm_flush;
> +

We'll need to rename the existing nvdimm_flush() to generic_nvdimm_flush().

>         nd_device_register(dev);
>
>         return nd_region;
> @@ -1109,7 +1114,7 @@ EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
>   * nvdimm_flush - flush any posted write queues between the cpu and pmem media
>   * @nd_region: blk or interleaved pmem region
>   */
> -void nvdimm_flush(struct nd_region *nd_region)
> +int nvdimm_flush(struct nd_region *nd_region)
>  {
>         struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
>         int i, idx;
> @@ -1133,7 +1138,10 @@ void nvdimm_flush(struct nd_region *nd_region)
>                 if (ndrd_get_flush_wpq(ndrd, i, 0))
>                         writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
>         wmb();
> +
> +       return 0;
>  }
> +

Needless newline.

>  EXPORT_SYMBOL_GPL(nvdimm_flush);
>
>  /**
> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
> index 097072c..3af7177 100644
> --- a/include/linux/libnvdimm.h
> +++ b/include/linux/libnvdimm.h
> @@ -115,6 +115,7 @@ struct nd_mapping_desc {
>         int position;
>  };
>
> +struct nd_region;
>  struct nd_region_desc {
>         struct resource *res;
>         struct nd_mapping_desc *mapping;
> @@ -126,6 +127,7 @@ struct nd_region_desc {
>         int numa_node;
>         unsigned long flags;
>         struct device_node *of_node;
> +       int (*flush)(struct nd_region *nd_region);
>  };
>
>  struct device;
> @@ -201,7 +203,7 @@ unsigned long nd_blk_memremap_flags(struct nd_blk_region *ndbr);
>  unsigned int nd_region_acquire_lane(struct nd_region *nd_region);
>  void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane);
>  u64 nd_fletcher64(void *addr, size_t len, bool le);
> -void nvdimm_flush(struct nd_region *nd_region);
> +int nvdimm_flush(struct nd_region *nd_region);
>  int nvdimm_has_flush(struct nd_region *nd_region);
>  int nvdimm_has_cache(struct nd_region *nd_region);
>
> --
> 2.9.3
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/3] nd: move nd_region to common header
  2018-08-31 13:30 ` [PATCH 1/3] nd: move nd_region to common header Pankaj Gupta
@ 2018-09-22  0:47   ` Dan Williams
  0 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2018-09-22  0:47 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Linux Kernel Mailing List, KVM list, Qemu Developers,
	linux-nvdimm, Jan Kara, Stefan Hajnoczi, Rik van Riel,
	Nitesh Narayan Lal, Kevin Wolf, Paolo Bonzini, Zwisler, Ross,
	David Hildenbrand, Xiao Guangrong, Christoph Hellwig,
	Michael S. Tsirkin, niteshnarayanlal, lcapitulino, Igor Mammedov,
	Eric Blake

On Fri, Aug 31, 2018 at 6:31 AM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> This patch moves nd_region definition to common header
> include/linux/nd.h file. This is required for flush callback
> support for both virtio-pmem & pmem driver.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  drivers/nvdimm/nd.h | 39 ---------------------------------------
>  include/linux/nd.h  | 40 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 40 insertions(+), 39 deletions(-)

No, we need to find a way to do this without dumping all of these
internal details to a public / global header.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/3] virtio-pmem: Add virtio pmem driver
  2018-08-31 13:30 ` [PATCH 3/3] virtio-pmem: Add virtio pmem driver Pankaj Gupta
                     ` (2 preceding siblings ...)
  2018-09-12 16:54   ` Luiz Capitulino
@ 2018-09-22  1:08   ` Dan Williams
  3 siblings, 0 replies; 22+ messages in thread
From: Dan Williams @ 2018-09-22  1:08 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Linux Kernel Mailing List, KVM list, Qemu Developers,
	linux-nvdimm, Jan Kara, Stefan Hajnoczi, Rik van Riel,
	Nitesh Narayan Lal, Kevin Wolf, Paolo Bonzini, Zwisler, Ross,
	David Hildenbrand, Xiao Guangrong, Christoph Hellwig,
	Michael S. Tsirkin, niteshnarayanlal, lcapitulino, Igor Mammedov,
	Eric Blake

On Fri, Aug 31, 2018 at 6:32 AM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> This patch adds virtio-pmem driver for KVM guest.
>
> Guest reads the persistent memory range information from
> Qemu over VIRTIO and registers it on nvdimm_bus. It also
> creates a nd_region object with the persistent memory
> range information so that existing 'nvdimm/pmem' driver
> can reserve this into system memory map. This way
> 'virtio-pmem' driver uses existing functionality of pmem
> driver to register persistent memory compatible for DAX
> capable filesystems.
>
> This also provides function to perform guest flush over
> VIRTIO from 'pmem' driver when userspace performs flush
> on DAX memory range.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  drivers/virtio/Kconfig           |   9 ++
>  drivers/virtio/Makefile          |   1 +
>  drivers/virtio/virtio_pmem.c     | 255 +++++++++++++++++++++++++++++++++++++++
>  include/uapi/linux/virtio_ids.h  |   1 +
>  include/uapi/linux/virtio_pmem.h |  40 ++++++
>  5 files changed, 306 insertions(+)
>  create mode 100644 drivers/virtio/virtio_pmem.c
>  create mode 100644 include/uapi/linux/virtio_pmem.h
>
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 3589764..a331e23 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -42,6 +42,15 @@ config VIRTIO_PCI_LEGACY
>
>           If unsure, say Y.
>
> +config VIRTIO_PMEM
> +       tristate "Support for virtio pmem driver"
> +       depends on VIRTIO
> +       help
> +       This driver provides support for virtio based flushing interface
> +       for persistent memory range.
> +
> +       If unsure, say M.
> +
>  config VIRTIO_BALLOON
>         tristate "Virtio balloon driver"
>         depends on VIRTIO
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 3a2b5c5..cbe91c6 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PMEM) += virtio_pmem.o
> diff --git a/drivers/virtio/virtio_pmem.c b/drivers/virtio/virtio_pmem.c
> new file mode 100644
> index 0000000..c22cc87
> --- /dev/null
> +++ b/drivers/virtio/virtio_pmem.c
> @@ -0,0 +1,255 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + */
> +#include <linux/virtio.h>
> +#include <linux/module.h>
> +#include <linux/virtio_ids.h>
> +#include <linux/virtio_config.h>
> +#include <uapi/linux/virtio_pmem.h>
> +#include <linux/spinlock.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/nd.h>

I think we need to split this driver into 2 files,
drivers/virtio/pmem.c would discover and register the virtual pmem
device with the libnvdimm core, and drivers/nvdimm/virtio.c would
house virtio_pmem_flush().

> +
> +struct virtio_pmem_request {
> +       /* Host return status corresponding to flush request */
> +       int ret;
> +
> +       /* command name*/
> +       char name[16];
> +
> +       /* Wait queue to process deferred work after ack from host */
> +       wait_queue_head_t host_acked;
> +       bool done;
> +
> +       /* Wait queue to process deferred work after virt queue buffer avail */
> +       wait_queue_head_t wq_buf;

Why does this need wait_queue's per request? shouldn't this be per-device?

> +       bool wq_buf_avail;
> +       struct list_head list;
> +};
> +
> +struct virtio_pmem {
> +       struct virtio_device *vdev;
> +
> +       /* Virtio pmem request queue */
> +       struct virtqueue *req_vq;
> +
> +       /* nvdimm bus registers virtio pmem device */
> +       struct nvdimm_bus *nvdimm_bus;
> +       struct nvdimm_bus_descriptor nd_desc;
> +
> +       /* List to store deferred work if virtqueue is full */
> +       struct list_head req_list;
> +
> +       /* Synchronize virtqueue data */
> +       spinlock_t pmem_lock;
> +
> +       /* Memory region information */
> +       uint64_t start;
> +       uint64_t size;
> +};
> +
> +static struct virtio_device_id id_table[] = {
> +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> +       { 0 },
> +};
> +
> + /* The interrupt handler */
> +static void host_ack(struct virtqueue *vq)
> +{
> +       unsigned int len;
> +       unsigned long flags;
> +       struct virtio_pmem_request *req, *req_buf;
> +       struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> +               req->done = true;
> +               wake_up(&req->host_acked);
> +
> +               if (!list_empty(&vpmem->req_list)) {
> +                       req_buf = list_first_entry(&vpmem->req_list,
> +                                       struct virtio_pmem_request, list);
> +                       list_del(&vpmem->req_list);
> +                       req_buf->wq_buf_avail = true;
> +                       wake_up(&req_buf->wq_buf);
> +               }
> +       }
> +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +}
> + /* Initialize virt queue */
> +static int init_vq(struct virtio_pmem *vpmem)
> +{
> +       struct virtqueue *vq;
> +
> +       /* single vq */
> +       vpmem->req_vq = vq = virtio_find_single_vq(vpmem->vdev,
> +                               host_ack, "flush_queue");
> +       if (IS_ERR(vq))
> +               return PTR_ERR(vq);
> +
> +       spin_lock_init(&vpmem->pmem_lock);
> +       INIT_LIST_HEAD(&vpmem->req_list);
> +
> +       return 0;
> +};
> +
> + /* The request submission function */
> +static int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> +       int err;
> +       unsigned long flags;
> +       struct scatterlist *sgs[2], sg, ret;
> +       struct virtio_device *vdev =
> +               dev_to_virtio(nd_region->dev.parent->parent);

That's a long de-ref chain I would just stash the vdev in
nd_region->provider_data.

> +       struct virtio_pmem *vpmem = vdev->priv;
> +       struct virtio_pmem_request *req = kmalloc(sizeof(*req), GFP_KERNEL);
> +
> +       if (!req)
> +               return -ENOMEM;
> +
> +       req->done = req->wq_buf_avail = false;
> +       strcpy(req->name, "FLUSH");
> +       init_waitqueue_head(&req->host_acked);
> +       init_waitqueue_head(&req->wq_buf);
> +
> +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       sg_init_one(&sg, req->name, strlen(req->name));
> +       sgs[0] = &sg;
> +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> +       sgs[1] = &ret;
> +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> +       if (err) {
> +               dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +
> +               list_add_tail(&vpmem->req_list, &req->list);
> +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +               /* When host has read buffer, this completes via host_ack */
> +               wait_event(req->wq_buf, req->wq_buf_avail);
> +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       }
> +       virtqueue_kick(vpmem->req_vq);
> +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +       /* When host has read buffer, this completes via host_ack */
> +       wait_event(req->host_acked, req->done);

Hmm, this seems awkward if this is called from pmem_make_request. If
we need to wait for completion that should be managed by the guest
block layer. I.e. make_request should just queue request and then
trigger bio_endio() when the response comes back.

However this does mean that nvdimm_flush() becomes asynchronous. So
maybe we need to pass in a 'sync' flag or the bio directly to indicate
this is an asynchronous flush request from pmem_make_request() vs a
synchronous one from nsio_rw_bytes().

> +       err = req->ret;
> +       kfree(req);
> +
> +       return err;
> +};
> +EXPORT_SYMBOL_GPL(virtio_pmem_flush);
> +
> +static int virtio_pmem_probe(struct virtio_device *vdev)
> +{
> +       int err = 0;
> +       struct resource res;
> +       struct virtio_pmem *vpmem;
> +       struct nvdimm_bus *nvdimm_bus;
> +       struct nd_region_desc ndr_desc;
> +       int nid = dev_to_node(&vdev->dev);
> +       struct nd_region *nd_region;
> +
> +       if (!vdev->config->get) {
> +               dev_err(&vdev->dev, "%s failure: config disabled\n",
> +                       __func__);
> +               return -EINVAL;
> +       }
> +
> +       vdev->priv = vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem),
> +                       GFP_KERNEL);
> +       if (!vpmem) {
> +               err = -ENOMEM;
> +               goto out_err;
> +       }
> +
> +       vpmem->vdev = vdev;
> +       err = init_vq(vpmem);
> +       if (err)
> +               goto out_err;
> +
> +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +                       start, &vpmem->start);
> +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +                       size, &vpmem->size);
> +
> +       res.start = vpmem->start;
> +       res.end   = vpmem->start + vpmem->size-1;
> +       vpmem->nd_desc.provider_name = "virtio-pmem";
> +       vpmem->nd_desc.module = THIS_MODULE;
> +
> +       vpmem->nvdimm_bus = nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> +                                               &vpmem->nd_desc);
> +       if (!nvdimm_bus)
> +               goto out_vq;
> +
> +       dev_set_drvdata(&vdev->dev, nvdimm_bus);
> +       memset(&ndr_desc, 0, sizeof(ndr_desc));
> +
> +       ndr_desc.res = &res;
> +       ndr_desc.numa_node = nid;
> +       ndr_desc.flush = virtio_pmem_flush;
> +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> +       nd_region = nvdimm_pmem_region_create(nvdimm_bus, &ndr_desc);
> +
> +       if (!nd_region)
> +               goto out_nd;
> +
> +       //virtio_device_ready(vdev);
> +       return 0;
> +out_nd:
> +       err = -ENXIO;
> +       nvdimm_bus_unregister(nvdimm_bus);
> +out_vq:
> +       vdev->config->del_vqs(vdev);
> +out_err:
> +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> +       return err;
> +}
> +
> +static void virtio_pmem_remove(struct virtio_device *vdev)
> +{
> +       struct virtio_pmem *vpmem = vdev->priv;
> +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> +
> +       nvdimm_bus_unregister(nvdimm_bus);
> +       vdev->config->del_vqs(vdev);
> +       kfree(vpmem);
> +}
> +
> +#ifdef CONFIG_PM_SLEEP
> +static int virtio_pmem_freeze(struct virtio_device *vdev)
> +{
> +       /* todo: handle freeze function */
> +       return -EPERM;
> +}
> +
> +static int virtio_pmem_restore(struct virtio_device *vdev)
> +{
> +       /* todo: handle restore function */
> +       return -EPERM;
> +}
> +#endif

As far as I can see there's nothing to do on a power transition, I
would just omit this completely.

> +
> +
> +static struct virtio_driver virtio_pmem_driver = {
> +       .driver.name            = KBUILD_MODNAME,
> +       .driver.owner           = THIS_MODULE,
> +       .id_table               = id_table,
> +       .probe                  = virtio_pmem_probe,
> +       .remove                 = virtio_pmem_remove,
> +#ifdef CONFIG_PM_SLEEP
> +       .freeze                 = virtio_pmem_freeze,
> +       .restore                = virtio_pmem_restore,
> +#endif
> +};
> +
> +module_virtio_driver(virtio_pmem_driver);
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +MODULE_DESCRIPTION("Virtio pmem driver");
> +MODULE_LICENSE("GPL");
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 6d5c3b2..3463895 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -43,5 +43,6 @@
>  #define VIRTIO_ID_INPUT        18 /* virtio input */
>  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
>  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> +#define VIRTIO_ID_PMEM         25 /* virtio pmem */
>
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
> new file mode 100644
> index 0000000..c7c22a5
> --- /dev/null
> +++ b/include/uapi/linux/virtio_pmem.h
> @@ -0,0 +1,40 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * This header, excluding the #ifdef __KERNEL__ part, is BSD licensed so
> + * anyone can use the definitions to implement compatible drivers/servers:

The SPDX identifier does not match this BSD license, and the whole
point of the SPDX identifier is to get out of the need to have these
large text blobs of license goop.

> + *
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions
> + * are met:
> + * 1. Redistributions of source code must retain the above copyright
> + *    notice, this list of conditions and the following disclaimer.
> + * 2. Redistributions in binary form must reproduce the above copyright
> + *    notice, this list of conditions and the following disclaimer in the
> + *    documentation and/or other materials provided with the distribution.
> + * 3. Neither the name of IBM nor the names of its contributors
> + *    may be used to endorse or promote products derived from this software
> + *    without specific prior written permission.
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS ``AS IS''
> + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
> + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
> + * ARE DISCLAIMED.  IN NO EVENT SHALL IBM OR CONTRIBUTORS BE LIABLE
> + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
> + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
> + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
> + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
> + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
> + * SUCH DAMAGE.
> + *
> + * Copyright (C) Red Hat, Inc., 2018-2019
> + * Copyright (C) Pankaj Gupta <pagupta@redhat.com>, 2018
> + */
> +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> +#define _UAPI_LINUX_VIRTIO_PMEM_H
> +
> +struct virtio_pmem_config {
> +       __le64 start;
> +       __le64 size;
> +};
> +#endif

Why does this need to be in the uapi?

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, back to index

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-31 13:30 [PATCH 0/3] kvm "fake DAX" device Pankaj Gupta
2018-08-31 13:30 ` [PATCH 1/3] nd: move nd_region to common header Pankaj Gupta
2018-09-22  0:47   ` Dan Williams
2018-08-31 13:30 ` [PATCH 2/3] libnvdimm: nd_region flush callback support Pankaj Gupta
2018-09-04 15:29   ` kbuild test robot
2018-09-05  8:40     ` Pankaj Gupta
2018-09-22  0:43   ` Dan Williams
2018-08-31 13:30 ` [PATCH 3/3] virtio-pmem: Add virtio pmem driver Pankaj Gupta
2018-09-04 15:17   ` kbuild test robot
2018-09-05  8:34     ` Pankaj Gupta
2018-09-05 12:02   ` kbuild test robot
2018-09-12 16:54   ` Luiz Capitulino
2018-09-13  6:58     ` [Qemu-devel] " Pankaj Gupta
2018-09-13 12:19       ` Luiz Capitulino
2018-09-14 12:13         ` Pankaj Gupta
2018-09-22  1:08   ` Dan Williams
2018-08-31 13:30 ` [PATCH] qemu: Add virtio pmem device Pankaj Gupta
2018-09-12 16:57   ` Luiz Capitulino
2018-09-13  7:06     ` Pankaj Gupta
2018-09-13 12:22       ` Luiz Capitulino
2018-09-20 11:21   ` David Hildenbrand
2018-09-20 12:03     ` [Qemu-devel] " Pankaj Gupta

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox