All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 0/6] virtio pmem driver
@ 2019-04-26  5:00 ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: jack, mst, jasowang, david, lcapitulino, adilger.kernel, zwisler,
	aarcange, darrick.wong, david, willy, hch, nilal, lenb, kilobyte,
	riel, yuval.shaia, stefanha, pbonzini, kwolf, tytso,
	xiaoguangrong.eric, cohuck, rjw, imammedo

 This patch series has implementation for "virtio pmem". 
 "virtio pmem" is fake persistent memory(nvdimm) in guest 
 which allows to bypass the guest page cache. This also
 implements a VIRTIO based asynchronous flush mechanism.  
 
 Sharing guest kernel driver in this patchset with the 
 changes suggested in v4. Tested with Qemu side device 
 emulation [6] for virtio-pmem. Documented the impact of
 possible page cache side channel attacks with suggested
 countermeasures.

 Incorporated all the review suggestions. 
 
 Details of project idea for 'virtio pmem' flushing interface 
 is shared [3] & [4].

 Implementation is divided into two parts:
 New virtio pmem guest driver and qemu code changes for new 
 virtio pmem paravirtualized device.

1. Guest virtio-pmem kernel driver
---------------------------------
   - Reads persistent memory range from paravirt device and 
     registers with 'nvdimm_bus'.  
   - 'nvdimm/pmem' driver uses this information to allocate 
     persistent memory region and setup filesystem operations 
     to the allocated memory. 
   - virtio pmem driver implements asynchronous flushing 
     interface to flush from guest to host.

2. Qemu virtio-pmem device
---------------------------------
   - Creates virtio pmem device and exposes a memory range to 
     KVM guest. 
   - At host side this is file backed memory which acts as 
     persistent memory. 
   - Qemu side flush uses aio thread pool API's and virtio 
     for asynchronous guest multi request handling. 

   David Hildenbrand CCed also posted a modified version[7] of 
   qemu virtio-pmem code based on updated Qemu memory device API. 

 Virtio-pmem security implications and countermeasures:
 -----------------------------------------------------

 In previous posting of kernel driver, there was discussion [9]
 on possible implications of page cache side channel attacks with 
 virtio pmem. After thorough analysis of details of known side 
 channel attacks, below are the suggestions:

 - Depends entirely on how host backing image file is mapped 
   into guest address space. 

 - virtio-pmem device emulation, by default shared mapping is used
   to map host backing file. It is recommended to use separate
   backing file at host side for every guest. This will prevent
   any possibility of executing common code from multiple guests
   and any chance of inferring guest local data based based on 
   execution time.

 - If backing file is required to be shared among multiple guests 
   it is recommended to don't support host page cache eviction 
   commands from the guest driver. This will avoid any possibility
   of inferring guest local data or host data from another guest. 

 - Proposed device specification [8] for virtio-pmem device with 
   details of possible security implications and suggested 
   countermeasures for device emulation.

 Virtio-pmem errors handling:
 ----------------------------------------
  Checked behaviour of virtio-pmem for below types of errors
  Need suggestions on expected behaviour for handling these errors?

  - Hardware Errors: Uncorrectable recoverable Errors: 
  a] virtio-pmem: 
    - As per current logic if error page belongs to Qemu process, 
      host MCE handler isolates(hwpoison) that page and send SIGBUS. 
      Qemu SIGBUS handler injects exception to KVM guest. 
    - KVM guest then isolates the page and send SIGBUS to guest 
      userspace process which has mapped the page. 
  
  b] Existing implementation for ACPI pmem driver: 
    - Handles such errors with MCE notifier and creates a list 
      of bad blocks. Read/direct access DAX operation return EIO 
      if accessed memory page fall in bad block list.
    - It also starts backgound scrubbing.  
    - Similar functionality can be reused in virtio-pmem with MCE 
      notifier but without scrubbing(no ACPI/ARS)? Need inputs to 
      confirm if this behaviour is ok or needs any change?

Changes from PATCH v6: [1]
 - Corrected comment format in patch 5 & patch 6. [Dave]
 - Changed variable declaration indentation in patch 6 [Darrick]
 - Add Reviewed-by tag by 'Jan Kara' in patch 4 & patch 5

Changes from PATCH v5: [2]
  Changes suggested in by - [Cornelia, Yuval]
- Remove assignment chaining in virtio driver
- Better error message and remove not required free
- Check nd_region before use

  Changes suggested by - [Jan Kara]
- dax_synchronous() for !CONFIG_DAX
- Correct 'daxdev_mapping_supported' comment and non-dax implementation

  Changes suggested by - [Dan Williams]
- Pass meaningful flag 'DAXDEV_F_SYNC' to alloc_dax
- Gate nvdimm_flush instead of additional async parameter
- Move block chaining logic to flush callback than common nvdimm_flush
- Use NULL flush callback for generic flush for better readability [Dan, Jan]

- Use virtio device id 27 from 25(already used) - [MST]

Changes from PATCH v4:
- Factor out MAP_SYNC supported functionality to a common helper
				[Dave, Darrick, Jan]
- Comment, indentation and virtqueue_kick failure handle - Yuval Shaia

Changes from PATCH v3: 
- Use generic dax_synchronous() helper to check for DAXDEV_SYNC 
  flag - [Dan, Darrick, Jan]
- Add 'is_nvdimm_async' function
- Document page cache side channel attacks implications & 
  countermeasures - [Dave Chinner, Michael]

Changes from PATCH v2: 
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: 
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Changes from RFC v3
- Rebase to latest upstream - Luiz
- Call ndregion->flush in place of nvdimm_flush- Luiz
- kmalloc return check - Luiz
- virtqueue full handling - Stefan
- Don't map entire virtio_pmem_req to device - Stefan
- request leak, correct sizeof req - Stefan
- Move declaration to virtio_pmem.c

Changes from RFC v2:
- Add flush function in the nd_region in place of switching
  on a flag - Dan & Stefan
- Add flush completion function with proper locking and wait
  for host side flush completion - Stefan & Dan
- Keep userspace API in uapi header file - Stefan, MST
- Use LE fields & New device id - MST
- Indentation & spacing suggestions - MST & Eric
- Remove extra header files & add licensing - Stefan

Changes from RFC v1:
- Reuse existing 'pmem' code for registering persistent 
  memory and other operations instead of creating an entirely 
  new block driver.
- Use VIRTIO driver to register memory information with 
  nvdimm_bus and create region_type accordingly. 
- Call VIRTIO flush from existing pmem driver.

Pankaj Gupta (6):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   dax: check synchronous mapping is supported
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[1] https://lkml.org/lkml/2019/4/23/1092
[2] https://lkml.org/lkml/2019/4/10/3
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=linux-kernel&m=153572228719237&w=2 
[7] https://marc.info/?l=qemu-devel&m=153555721901824&w=2
[8] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
[9] https://lkml.org/lkml/2019/1/9/1191

 drivers/acpi/nfit/core.c         |    4 -
 drivers/dax/bus.c                |    2 
 drivers/dax/super.c              |   13 +++-
 drivers/md/dm.c                  |    3 
 drivers/nvdimm/claim.c           |    6 +
 drivers/nvdimm/nd.h              |    1 
 drivers/nvdimm/pmem.c            |   16 +++--
 drivers/nvdimm/region_devs.c     |   33 ++++++++++
 drivers/nvdimm/virtio_pmem.c     |  114 +++++++++++++++++++++++++++++++++++++
 drivers/virtio/Kconfig           |   10 +++
 drivers/virtio/Makefile          |    1 
 drivers/virtio/pmem.c            |  118 +++++++++++++++++++++++++++++++++++++++
 fs/ext4/file.c                   |   10 +--
 fs/xfs/xfs_file.c                |    9 +-
 include/linux/dax.h              |   25 +++++++-
 include/linux/libnvdimm.h        |    9 ++
 include/linux/virtio_pmem.h      |   60 +++++++++++++++++++
 include/uapi/linux/virtio_ids.h  |    1 
 include/uapi/linux/virtio_pmem.h |   10 +++
 19 files changed, 420 insertions(+), 25 deletions(-)


_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v7 0/6] virtio pmem driver
@ 2019-04-26  5:00 ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: dan.j.williams, zwisler, vishal.l.verma, dave.jiang, mst,
	jasowang, willy, rjw, hch, lenb, jack, tytso, adilger.kernel,
	darrick.wong, lcapitulino, kwolf, imammedo, jmoyer, nilal, riel,
	stefanha, aarcange, david, david, cohuck, xiaoguangrong.eric,
	pbonzini, kilobyte, yuval.shaia, pagupta

 This patch series has implementation for "virtio pmem". 
 "virtio pmem" is fake persistent memory(nvdimm) in guest 
 which allows to bypass the guest page cache. This also
 implements a VIRTIO based asynchronous flush mechanism.  
 
 Sharing guest kernel driver in this patchset with the 
 changes suggested in v4. Tested with Qemu side device 
 emulation [6] for virtio-pmem. Documented the impact of
 possible page cache side channel attacks with suggested
 countermeasures.

 Incorporated all the review suggestions. 
 
 Details of project idea for 'virtio pmem' flushing interface 
 is shared [3] & [4].

 Implementation is divided into two parts:
 New virtio pmem guest driver and qemu code changes for new 
 virtio pmem paravirtualized device.

1. Guest virtio-pmem kernel driver
---------------------------------
   - Reads persistent memory range from paravirt device and 
     registers with 'nvdimm_bus'.  
   - 'nvdimm/pmem' driver uses this information to allocate 
     persistent memory region and setup filesystem operations 
     to the allocated memory. 
   - virtio pmem driver implements asynchronous flushing 
     interface to flush from guest to host.

2. Qemu virtio-pmem device
---------------------------------
   - Creates virtio pmem device and exposes a memory range to 
     KVM guest. 
   - At host side this is file backed memory which acts as 
     persistent memory. 
   - Qemu side flush uses aio thread pool API's and virtio 
     for asynchronous guest multi request handling. 

   David Hildenbrand CCed also posted a modified version[7] of 
   qemu virtio-pmem code based on updated Qemu memory device API. 

 Virtio-pmem security implications and countermeasures:
 -----------------------------------------------------

 In previous posting of kernel driver, there was discussion [9]
 on possible implications of page cache side channel attacks with 
 virtio pmem. After thorough analysis of details of known side 
 channel attacks, below are the suggestions:

 - Depends entirely on how host backing image file is mapped 
   into guest address space. 

 - virtio-pmem device emulation, by default shared mapping is used
   to map host backing file. It is recommended to use separate
   backing file at host side for every guest. This will prevent
   any possibility of executing common code from multiple guests
   and any chance of inferring guest local data based based on 
   execution time.

 - If backing file is required to be shared among multiple guests 
   it is recommended to don't support host page cache eviction 
   commands from the guest driver. This will avoid any possibility
   of inferring guest local data or host data from another guest. 

 - Proposed device specification [8] for virtio-pmem device with 
   details of possible security implications and suggested 
   countermeasures for device emulation.

 Virtio-pmem errors handling:
 ----------------------------------------
  Checked behaviour of virtio-pmem for below types of errors
  Need suggestions on expected behaviour for handling these errors?

  - Hardware Errors: Uncorrectable recoverable Errors: 
  a] virtio-pmem: 
    - As per current logic if error page belongs to Qemu process, 
      host MCE handler isolates(hwpoison) that page and send SIGBUS. 
      Qemu SIGBUS handler injects exception to KVM guest. 
    - KVM guest then isolates the page and send SIGBUS to guest 
      userspace process which has mapped the page. 
  
  b] Existing implementation for ACPI pmem driver: 
    - Handles such errors with MCE notifier and creates a list 
      of bad blocks. Read/direct access DAX operation return EIO 
      if accessed memory page fall in bad block list.
    - It also starts backgound scrubbing.  
    - Similar functionality can be reused in virtio-pmem with MCE 
      notifier but without scrubbing(no ACPI/ARS)? Need inputs to 
      confirm if this behaviour is ok or needs any change?

Changes from PATCH v6: [1]
 - Corrected comment format in patch 5 & patch 6. [Dave]
 - Changed variable declaration indentation in patch 6 [Darrick]
 - Add Reviewed-by tag by 'Jan Kara' in patch 4 & patch 5

Changes from PATCH v5: [2]
  Changes suggested in by - [Cornelia, Yuval]
- Remove assignment chaining in virtio driver
- Better error message and remove not required free
- Check nd_region before use

  Changes suggested by - [Jan Kara]
- dax_synchronous() for !CONFIG_DAX
- Correct 'daxdev_mapping_supported' comment and non-dax implementation

  Changes suggested by - [Dan Williams]
- Pass meaningful flag 'DAXDEV_F_SYNC' to alloc_dax
- Gate nvdimm_flush instead of additional async parameter
- Move block chaining logic to flush callback than common nvdimm_flush
- Use NULL flush callback for generic flush for better readability [Dan, Jan]

- Use virtio device id 27 from 25(already used) - [MST]

Changes from PATCH v4:
- Factor out MAP_SYNC supported functionality to a common helper
				[Dave, Darrick, Jan]
- Comment, indentation and virtqueue_kick failure handle - Yuval Shaia

Changes from PATCH v3: 
- Use generic dax_synchronous() helper to check for DAXDEV_SYNC 
  flag - [Dan, Darrick, Jan]
- Add 'is_nvdimm_async' function
- Document page cache side channel attacks implications & 
  countermeasures - [Dave Chinner, Michael]

Changes from PATCH v2: 
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: 
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Changes from RFC v3
- Rebase to latest upstream - Luiz
- Call ndregion->flush in place of nvdimm_flush- Luiz
- kmalloc return check - Luiz
- virtqueue full handling - Stefan
- Don't map entire virtio_pmem_req to device - Stefan
- request leak, correct sizeof req - Stefan
- Move declaration to virtio_pmem.c

Changes from RFC v2:
- Add flush function in the nd_region in place of switching
  on a flag - Dan & Stefan
- Add flush completion function with proper locking and wait
  for host side flush completion - Stefan & Dan
- Keep userspace API in uapi header file - Stefan, MST
- Use LE fields & New device id - MST
- Indentation & spacing suggestions - MST & Eric
- Remove extra header files & add licensing - Stefan

Changes from RFC v1:
- Reuse existing 'pmem' code for registering persistent 
  memory and other operations instead of creating an entirely 
  new block driver.
- Use VIRTIO driver to register memory information with 
  nvdimm_bus and create region_type accordingly. 
- Call VIRTIO flush from existing pmem driver.

Pankaj Gupta (6):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   dax: check synchronous mapping is supported
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[1] https://lkml.org/lkml/2019/4/23/1092
[2] https://lkml.org/lkml/2019/4/10/3
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=linux-kernel&m=153572228719237&w=2 
[7] https://marc.info/?l=qemu-devel&m=153555721901824&w=2
[8] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
[9] https://lkml.org/lkml/2019/1/9/1191

 drivers/acpi/nfit/core.c         |    4 -
 drivers/dax/bus.c                |    2 
 drivers/dax/super.c              |   13 +++-
 drivers/md/dm.c                  |    3 
 drivers/nvdimm/claim.c           |    6 +
 drivers/nvdimm/nd.h              |    1 
 drivers/nvdimm/pmem.c            |   16 +++--
 drivers/nvdimm/region_devs.c     |   33 ++++++++++
 drivers/nvdimm/virtio_pmem.c     |  114 +++++++++++++++++++++++++++++++++++++
 drivers/virtio/Kconfig           |   10 +++
 drivers/virtio/Makefile          |    1 
 drivers/virtio/pmem.c            |  118 +++++++++++++++++++++++++++++++++++++++
 fs/ext4/file.c                   |   10 +--
 fs/xfs/xfs_file.c                |    9 +-
 include/linux/dax.h              |   25 +++++++-
 include/linux/libnvdimm.h        |    9 ++
 include/linux/virtio_pmem.h      |   60 +++++++++++++++++++
 include/uapi/linux/virtio_ids.h  |    1 
 include/uapi/linux/virtio_pmem.h |   10 +++
 19 files changed, 420 insertions(+), 25 deletions(-)

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [Qemu-devel] [PATCH v7 0/6] virtio pmem driver
@ 2019-04-26  5:00 ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: dan.j.williams, zwisler, vishal.l.verma, dave.jiang, mst,
	jasowang, willy, rjw, hch, lenb, jack, tytso, adilger.kernel,
	darrick.wong, lcapitulino, kwolf, imammedo, jmoyer, nilal, riel,
	stefanha, aarcange, david, david, cohuck, xiaoguangrong.eric,
	pbonzini, kilobyte, yuval.shaia, pagupta

 This patch series has implementation for "virtio pmem". 
 "virtio pmem" is fake persistent memory(nvdimm) in guest 
 which allows to bypass the guest page cache. This also
 implements a VIRTIO based asynchronous flush mechanism.  
 
 Sharing guest kernel driver in this patchset with the 
 changes suggested in v4. Tested with Qemu side device 
 emulation [6] for virtio-pmem. Documented the impact of
 possible page cache side channel attacks with suggested
 countermeasures.

 Incorporated all the review suggestions. 
 
 Details of project idea for 'virtio pmem' flushing interface 
 is shared [3] & [4].

 Implementation is divided into two parts:
 New virtio pmem guest driver and qemu code changes for new 
 virtio pmem paravirtualized device.

1. Guest virtio-pmem kernel driver
---------------------------------
   - Reads persistent memory range from paravirt device and 
     registers with 'nvdimm_bus'.  
   - 'nvdimm/pmem' driver uses this information to allocate 
     persistent memory region and setup filesystem operations 
     to the allocated memory. 
   - virtio pmem driver implements asynchronous flushing 
     interface to flush from guest to host.

2. Qemu virtio-pmem device
---------------------------------
   - Creates virtio pmem device and exposes a memory range to 
     KVM guest. 
   - At host side this is file backed memory which acts as 
     persistent memory. 
   - Qemu side flush uses aio thread pool API's and virtio 
     for asynchronous guest multi request handling. 

   David Hildenbrand CCed also posted a modified version[7] of 
   qemu virtio-pmem code based on updated Qemu memory device API. 

 Virtio-pmem security implications and countermeasures:
 -----------------------------------------------------

 In previous posting of kernel driver, there was discussion [9]
 on possible implications of page cache side channel attacks with 
 virtio pmem. After thorough analysis of details of known side 
 channel attacks, below are the suggestions:

 - Depends entirely on how host backing image file is mapped 
   into guest address space. 

 - virtio-pmem device emulation, by default shared mapping is used
   to map host backing file. It is recommended to use separate
   backing file at host side for every guest. This will prevent
   any possibility of executing common code from multiple guests
   and any chance of inferring guest local data based based on 
   execution time.

 - If backing file is required to be shared among multiple guests 
   it is recommended to don't support host page cache eviction 
   commands from the guest driver. This will avoid any possibility
   of inferring guest local data or host data from another guest. 

 - Proposed device specification [8] for virtio-pmem device with 
   details of possible security implications and suggested 
   countermeasures for device emulation.

 Virtio-pmem errors handling:
 ----------------------------------------
  Checked behaviour of virtio-pmem for below types of errors
  Need suggestions on expected behaviour for handling these errors?

  - Hardware Errors: Uncorrectable recoverable Errors: 
  a] virtio-pmem: 
    - As per current logic if error page belongs to Qemu process, 
      host MCE handler isolates(hwpoison) that page and send SIGBUS. 
      Qemu SIGBUS handler injects exception to KVM guest. 
    - KVM guest then isolates the page and send SIGBUS to guest 
      userspace process which has mapped the page. 
  
  b] Existing implementation for ACPI pmem driver: 
    - Handles such errors with MCE notifier and creates a list 
      of bad blocks. Read/direct access DAX operation return EIO 
      if accessed memory page fall in bad block list.
    - It also starts backgound scrubbing.  
    - Similar functionality can be reused in virtio-pmem with MCE 
      notifier but without scrubbing(no ACPI/ARS)? Need inputs to 
      confirm if this behaviour is ok or needs any change?

Changes from PATCH v6: [1]
 - Corrected comment format in patch 5 & patch 6. [Dave]
 - Changed variable declaration indentation in patch 6 [Darrick]
 - Add Reviewed-by tag by 'Jan Kara' in patch 4 & patch 5

Changes from PATCH v5: [2]
  Changes suggested in by - [Cornelia, Yuval]
- Remove assignment chaining in virtio driver
- Better error message and remove not required free
- Check nd_region before use

  Changes suggested by - [Jan Kara]
- dax_synchronous() for !CONFIG_DAX
- Correct 'daxdev_mapping_supported' comment and non-dax implementation

  Changes suggested by - [Dan Williams]
- Pass meaningful flag 'DAXDEV_F_SYNC' to alloc_dax
- Gate nvdimm_flush instead of additional async parameter
- Move block chaining logic to flush callback than common nvdimm_flush
- Use NULL flush callback for generic flush for better readability [Dan, Jan]

- Use virtio device id 27 from 25(already used) - [MST]

Changes from PATCH v4:
- Factor out MAP_SYNC supported functionality to a common helper
				[Dave, Darrick, Jan]
- Comment, indentation and virtqueue_kick failure handle - Yuval Shaia

Changes from PATCH v3: 
- Use generic dax_synchronous() helper to check for DAXDEV_SYNC 
  flag - [Dan, Darrick, Jan]
- Add 'is_nvdimm_async' function
- Document page cache side channel attacks implications & 
  countermeasures - [Dave Chinner, Michael]

Changes from PATCH v2: 
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: 
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Changes from RFC v3
- Rebase to latest upstream - Luiz
- Call ndregion->flush in place of nvdimm_flush- Luiz
- kmalloc return check - Luiz
- virtqueue full handling - Stefan
- Don't map entire virtio_pmem_req to device - Stefan
- request leak, correct sizeof req - Stefan
- Move declaration to virtio_pmem.c

Changes from RFC v2:
- Add flush function in the nd_region in place of switching
  on a flag - Dan & Stefan
- Add flush completion function with proper locking and wait
  for host side flush completion - Stefan & Dan
- Keep userspace API in uapi header file - Stefan, MST
- Use LE fields & New device id - MST
- Indentation & spacing suggestions - MST & Eric
- Remove extra header files & add licensing - Stefan

Changes from RFC v1:
- Reuse existing 'pmem' code for registering persistent 
  memory and other operations instead of creating an entirely 
  new block driver.
- Use VIRTIO driver to register memory information with 
  nvdimm_bus and create region_type accordingly. 
- Call VIRTIO flush from existing pmem driver.

Pankaj Gupta (6):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   dax: check synchronous mapping is supported
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[1] https://lkml.org/lkml/2019/4/23/1092
[2] https://lkml.org/lkml/2019/4/10/3
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=linux-kernel&m=153572228719237&w=2 
[7] https://marc.info/?l=qemu-devel&m=153555721901824&w=2
[8] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
[9] https://lkml.org/lkml/2019/1/9/1191

 drivers/acpi/nfit/core.c         |    4 -
 drivers/dax/bus.c                |    2 
 drivers/dax/super.c              |   13 +++-
 drivers/md/dm.c                  |    3 
 drivers/nvdimm/claim.c           |    6 +
 drivers/nvdimm/nd.h              |    1 
 drivers/nvdimm/pmem.c            |   16 +++--
 drivers/nvdimm/region_devs.c     |   33 ++++++++++
 drivers/nvdimm/virtio_pmem.c     |  114 +++++++++++++++++++++++++++++++++++++
 drivers/virtio/Kconfig           |   10 +++
 drivers/virtio/Makefile          |    1 
 drivers/virtio/pmem.c            |  118 +++++++++++++++++++++++++++++++++++++++
 fs/ext4/file.c                   |   10 +--
 fs/xfs/xfs_file.c                |    9 +-
 include/linux/dax.h              |   25 +++++++-
 include/linux/libnvdimm.h        |    9 ++
 include/linux/virtio_pmem.h      |   60 +++++++++++++++++++
 include/uapi/linux/virtio_ids.h  |    1 
 include/uapi/linux/virtio_pmem.h |   10 +++
 19 files changed, 420 insertions(+), 25 deletions(-)

^ permalink raw reply	[flat|nested] 107+ messages in thread

* [Qemu-devel] [PATCH v7 0/6] virtio pmem driver
@ 2019-04-26  5:00 ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: pagupta, jack, mst, jasowang, david, lcapitulino, adilger.kernel,
	zwisler, aarcange, dave.jiang, darrick.wong, vishal.l.verma,
	david, willy, hch, jmoyer, nilal, lenb, kilobyte, riel,
	yuval.shaia, stefanha, pbonzini, dan.j.williams, kwolf, tytso,
	xiaoguangrong.eric, cohuck, rjw, imammedo

 This patch series has implementation for "virtio pmem". 
 "virtio pmem" is fake persistent memory(nvdimm) in guest 
 which allows to bypass the guest page cache. This also
 implements a VIRTIO based asynchronous flush mechanism.  
 
 Sharing guest kernel driver in this patchset with the 
 changes suggested in v4. Tested with Qemu side device 
 emulation [6] for virtio-pmem. Documented the impact of
 possible page cache side channel attacks with suggested
 countermeasures.

 Incorporated all the review suggestions. 
 
 Details of project idea for 'virtio pmem' flushing interface 
 is shared [3] & [4].

 Implementation is divided into two parts:
 New virtio pmem guest driver and qemu code changes for new 
 virtio pmem paravirtualized device.

1. Guest virtio-pmem kernel driver
---------------------------------
   - Reads persistent memory range from paravirt device and 
     registers with 'nvdimm_bus'.  
   - 'nvdimm/pmem' driver uses this information to allocate 
     persistent memory region and setup filesystem operations 
     to the allocated memory. 
   - virtio pmem driver implements asynchronous flushing 
     interface to flush from guest to host.

2. Qemu virtio-pmem device
---------------------------------
   - Creates virtio pmem device and exposes a memory range to 
     KVM guest. 
   - At host side this is file backed memory which acts as 
     persistent memory. 
   - Qemu side flush uses aio thread pool API's and virtio 
     for asynchronous guest multi request handling. 

   David Hildenbrand CCed also posted a modified version[7] of 
   qemu virtio-pmem code based on updated Qemu memory device API. 

 Virtio-pmem security implications and countermeasures:
 -----------------------------------------------------

 In previous posting of kernel driver, there was discussion [9]
 on possible implications of page cache side channel attacks with 
 virtio pmem. After thorough analysis of details of known side 
 channel attacks, below are the suggestions:

 - Depends entirely on how host backing image file is mapped 
   into guest address space. 

 - virtio-pmem device emulation, by default shared mapping is used
   to map host backing file. It is recommended to use separate
   backing file at host side for every guest. This will prevent
   any possibility of executing common code from multiple guests
   and any chance of inferring guest local data based based on 
   execution time.

 - If backing file is required to be shared among multiple guests 
   it is recommended to don't support host page cache eviction 
   commands from the guest driver. This will avoid any possibility
   of inferring guest local data or host data from another guest. 

 - Proposed device specification [8] for virtio-pmem device with 
   details of possible security implications and suggested 
   countermeasures for device emulation.

 Virtio-pmem errors handling:
 ----------------------------------------
  Checked behaviour of virtio-pmem for below types of errors
  Need suggestions on expected behaviour for handling these errors?

  - Hardware Errors: Uncorrectable recoverable Errors: 
  a] virtio-pmem: 
    - As per current logic if error page belongs to Qemu process, 
      host MCE handler isolates(hwpoison) that page and send SIGBUS. 
      Qemu SIGBUS handler injects exception to KVM guest. 
    - KVM guest then isolates the page and send SIGBUS to guest 
      userspace process which has mapped the page. 
  
  b] Existing implementation for ACPI pmem driver: 
    - Handles such errors with MCE notifier and creates a list 
      of bad blocks. Read/direct access DAX operation return EIO 
      if accessed memory page fall in bad block list.
    - It also starts backgound scrubbing.  
    - Similar functionality can be reused in virtio-pmem with MCE 
      notifier but without scrubbing(no ACPI/ARS)? Need inputs to 
      confirm if this behaviour is ok or needs any change?

Changes from PATCH v6: [1]
 - Corrected comment format in patch 5 & patch 6. [Dave]
 - Changed variable declaration indentation in patch 6 [Darrick]
 - Add Reviewed-by tag by 'Jan Kara' in patch 4 & patch 5

Changes from PATCH v5: [2]
  Changes suggested in by - [Cornelia, Yuval]
- Remove assignment chaining in virtio driver
- Better error message and remove not required free
- Check nd_region before use

  Changes suggested by - [Jan Kara]
- dax_synchronous() for !CONFIG_DAX
- Correct 'daxdev_mapping_supported' comment and non-dax implementation

  Changes suggested by - [Dan Williams]
- Pass meaningful flag 'DAXDEV_F_SYNC' to alloc_dax
- Gate nvdimm_flush instead of additional async parameter
- Move block chaining logic to flush callback than common nvdimm_flush
- Use NULL flush callback for generic flush for better readability [Dan, Jan]

- Use virtio device id 27 from 25(already used) - [MST]

Changes from PATCH v4:
- Factor out MAP_SYNC supported functionality to a common helper
				[Dave, Darrick, Jan]
- Comment, indentation and virtqueue_kick failure handle - Yuval Shaia

Changes from PATCH v3: 
- Use generic dax_synchronous() helper to check for DAXDEV_SYNC 
  flag - [Dan, Darrick, Jan]
- Add 'is_nvdimm_async' function
- Document page cache side channel attacks implications & 
  countermeasures - [Dave Chinner, Michael]

Changes from PATCH v2: 
- Disable MAP_SYNC for ext4 & XFS filesystems - [Dan] 
- Use name 'virtio pmem' in place of 'fake dax' 

Changes from PATCH v1: 
- 0-day build test for build dependency on libnvdimm 

 Changes suggested by - [Dan Williams]
- Split the driver into two parts virtio & pmem  
- Move queuing of async block request to block layer
- Add "sync" parameter in nvdimm_flush function
- Use indirect call for nvdimm_flush
- Don’t move declarations to common global header e.g nd.h
- nvdimm_flush() return 0 or -EIO if it fails
- Teach nsio_rw_bytes() that the flush can fail
- Rename nvdimm_flush() to generic_nvdimm_flush()
- Use 'nd_region->provider_data' for long dereferencing
- Remove virtio_pmem_freeze/restore functions
- Remove BSD license text with SPDX license text

- Add might_sleep() in virtio_pmem_flush - [Luiz]
- Make spin_lock_irqsave() narrow

Changes from RFC v3
- Rebase to latest upstream - Luiz
- Call ndregion->flush in place of nvdimm_flush- Luiz
- kmalloc return check - Luiz
- virtqueue full handling - Stefan
- Don't map entire virtio_pmem_req to device - Stefan
- request leak, correct sizeof req - Stefan
- Move declaration to virtio_pmem.c

Changes from RFC v2:
- Add flush function in the nd_region in place of switching
  on a flag - Dan & Stefan
- Add flush completion function with proper locking and wait
  for host side flush completion - Stefan & Dan
- Keep userspace API in uapi header file - Stefan, MST
- Use LE fields & New device id - MST
- Indentation & spacing suggestions - MST & Eric
- Remove extra header files & add licensing - Stefan

Changes from RFC v1:
- Reuse existing 'pmem' code for registering persistent 
  memory and other operations instead of creating an entirely 
  new block driver.
- Use VIRTIO driver to register memory information with 
  nvdimm_bus and create region_type accordingly. 
- Call VIRTIO flush from existing pmem driver.

Pankaj Gupta (6):
   libnvdimm: nd_region flush callback support
   virtio-pmem: Add virtio-pmem guest driver
   libnvdimm: add nd_region buffered dax_dev flag
   dax: check synchronous mapping is supported
   ext4: disable map_sync for virtio pmem
   xfs: disable map_sync for virtio pmem

[1] https://lkml.org/lkml/2019/4/23/1092
[2] https://lkml.org/lkml/2019/4/10/3
[3] https://www.spinics.net/lists/kvm/msg149761.html
[4] https://www.spinics.net/lists/kvm/msg153095.html  
[5] https://lkml.org/lkml/2018/8/31/413
[6] https://marc.info/?l=linux-kernel&m=153572228719237&w=2 
[7] https://marc.info/?l=qemu-devel&m=153555721901824&w=2
[8] https://lists.oasis-open.org/archives/virtio-dev/201903/msg00083.html
[9] https://lkml.org/lkml/2019/1/9/1191

 drivers/acpi/nfit/core.c         |    4 -
 drivers/dax/bus.c                |    2 
 drivers/dax/super.c              |   13 +++-
 drivers/md/dm.c                  |    3 
 drivers/nvdimm/claim.c           |    6 +
 drivers/nvdimm/nd.h              |    1 
 drivers/nvdimm/pmem.c            |   16 +++--
 drivers/nvdimm/region_devs.c     |   33 ++++++++++
 drivers/nvdimm/virtio_pmem.c     |  114 +++++++++++++++++++++++++++++++++++++
 drivers/virtio/Kconfig           |   10 +++
 drivers/virtio/Makefile          |    1 
 drivers/virtio/pmem.c            |  118 +++++++++++++++++++++++++++++++++++++++
 fs/ext4/file.c                   |   10 +--
 fs/xfs/xfs_file.c                |    9 +-
 include/linux/dax.h              |   25 +++++++-
 include/linux/libnvdimm.h        |    9 ++
 include/linux/virtio_pmem.h      |   60 +++++++++++++++++++
 include/uapi/linux/virtio_ids.h  |    1 
 include/uapi/linux/virtio_pmem.h |   10 +++
 19 files changed, 420 insertions(+), 25 deletions(-)




^ permalink raw reply	[flat|nested] 107+ messages in thread

* [PATCH v7 1/6] libnvdimm: nd_region flush callback support
  2019-04-26  5:00 ` Pankaj Gupta
  (?)
  (?)
@ 2019-04-26  5:00   ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: jack, mst, jasowang, david, lcapitulino, adilger.kernel, zwisler,
	aarcange, darrick.wong, david, willy, hch, nilal, lenb, kilobyte,
	riel, yuval.shaia, stefanha, pbonzini, kwolf, tytso,
	xiaoguangrong.eric, cohuck, rjw, imammedo

This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/acpi/nfit/core.c     |  4 ++--
 drivers/nvdimm/claim.c       |  6 ++++--
 drivers/nvdimm/nd.h          |  1 +
 drivers/nvdimm/pmem.c        | 13 ++++++++-----
 drivers/nvdimm/region_devs.c | 26 ++++++++++++++++++++++++--
 include/linux/libnvdimm.h    |  8 +++++++-
 6 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 5a389a4f4f65..08dde76cf459 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2434,7 +2434,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw,
 		offset = to_interleave_offset(offset, mmio);
 
 	writeq(cmd, mmio->addr.base + offset);
-	nvdimm_flush(nfit_blk->nd_region);
+	nvdimm_flush(nfit_blk->nd_region, NULL);
 
 	if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
 		readq(mmio->addr.base + offset);
@@ -2483,7 +2483,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
 	}
 
 	if (rw)
-		nvdimm_flush(nfit_blk->nd_region);
+		nvdimm_flush(nfit_blk->nd_region, NULL);
 
 	rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
 	return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf469c7..13510bae1e6f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
 	unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
 	sector_t sector = offset >> 9;
-	int rc = 0;
+	int rc = 0, ret = 0;
 
 	if (unlikely(!size))
 		return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 	}
 
 	memcpy_flushcache(nsio->addr + offset, buf, size);
-	nvdimm_flush(to_nd_region(ndns->dev.parent));
+	ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL);
+	if (ret)
+		rc = ret;
 
 	return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index a5ac3b240293..0c74d2428bd7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -159,6 +159,7 @@ struct nd_region {
 	struct badblocks bb;
 	struct nd_interleave_set *nd_set;
 	struct nd_percpu_lane __percpu *lane;
+	int (*flush)(struct nd_region *nd_region, struct bio *bio);
 	struct nd_mapping mapping[0];
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index bc2f700feef8..f719245da170 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+	int ret = 0;
 	blk_status_t rc = 0;
 	bool do_acct;
 	unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 	struct nd_region *nd_region = to_region(pmem);
 
 	if (bio->bi_opf & REQ_PREFLUSH)
-		nvdimm_flush(nd_region);
+		ret = nvdimm_flush(nd_region, bio);
 
 	do_acct = nd_iostat_start(bio, &start);
 	bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 		nd_iostat_end(bio, start);
 
 	if (bio->bi_opf & REQ_FUA)
-		nvdimm_flush(nd_region);
+		ret = nvdimm_flush(nd_region, bio);
+
+	if (ret)
+		bio->bi_status = errno_to_blk_status(ret);
 
 	bio_endio(bio);
 	return BLK_QC_T_NONE;
@@ -469,7 +473,6 @@ static int pmem_attach_disk(struct device *dev,
 	}
 	dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
 	pmem->dax_dev = dax_dev;
-
 	gendev = disk_to_dev(disk);
 	gendev->groups = pmem_attribute_groups;
 
@@ -527,14 +530,14 @@ static int nd_pmem_remove(struct device *dev)
 		sysfs_put(pmem->bb_state);
 		pmem->bb_state = NULL;
 	}
-	nvdimm_flush(to_nd_region(dev->parent));
+	nvdimm_flush(to_nd_region(dev->parent), NULL);
 
 	return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-	nvdimm_flush(to_nd_region(dev->parent));
+	nvdimm_flush(to_nd_region(dev->parent), NULL);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b4ef7d9ff22e..e5b59708865e 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -295,7 +295,9 @@ static ssize_t deep_flush_store(struct device *dev, struct device_attribute *att
 		return rc;
 	if (!flush)
 		return -EINVAL;
-	nvdimm_flush(nd_region);
+	rc = nvdimm_flush(nd_region, NULL);
+	if (rc)
+		return rc;
 
 	return len;
 }
@@ -1085,6 +1087,11 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 	dev->of_node = ndr_desc->of_node;
 	nd_region->ndr_size = resource_size(ndr_desc->res);
 	nd_region->ndr_start = ndr_desc->res->start;
+	if (ndr_desc->flush)
+		nd_region->flush = ndr_desc->flush;
+	else
+		nd_region->flush = NULL;
+
 	nd_device_register(dev);
 
 	return nd_region;
@@ -1125,11 +1132,24 @@ struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus,
 }
 EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
 
+int nvdimm_flush(struct nd_region *nd_region, struct bio *bio)
+{
+	int rc = 0;
+
+	if (!nd_region->flush)
+		rc = generic_nvdimm_flush(nd_region);
+	else {
+		if (nd_region->flush(nd_region, bio))
+			rc = -EIO;
+	}
+
+	return rc;
+}
 /**
  * nvdimm_flush - flush any posted write queues between the cpu and pmem media
  * @nd_region: blk or interleaved pmem region
  */
-void nvdimm_flush(struct nd_region *nd_region)
+int generic_nvdimm_flush(struct nd_region *nd_region)
 {
 	struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
 	int i, idx;
@@ -1153,6 +1173,8 @@ void nvdimm_flush(struct nd_region *nd_region)
 		if (ndrd_get_flush_wpq(ndrd, i, 0))
 			writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
 	wmb();
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(nvdimm_flush);
 
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index feb342d026f2..a5f369ec3726 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -65,6 +65,9 @@ enum {
 	 */
 	ND_REGION_PERSIST_MEMCTRL = 2,
 
+	/* Platform provides asynchronous flush mechanism */
+	ND_REGION_ASYNC = 3,
+
 	/* mark newly adjusted resources as requiring a label update */
 	DPA_RESOURCE_ADJUSTED = 1 << 0,
 };
@@ -121,6 +124,7 @@ struct nd_mapping_desc {
 	int position;
 };
 
+struct nd_region;
 struct nd_region_desc {
 	struct resource *res;
 	struct nd_mapping_desc *mapping;
@@ -133,6 +137,7 @@ struct nd_region_desc {
 	int target_node;
 	unsigned long flags;
 	struct device_node *of_node;
+	int (*flush)(struct nd_region *nd_region, struct bio *bio);
 };
 
 struct device;
@@ -260,7 +265,8 @@ unsigned long nd_blk_memremap_flags(struct nd_blk_region *ndbr);
 unsigned int nd_region_acquire_lane(struct nd_region *nd_region);
 void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane);
 u64 nd_fletcher64(void *addr, size_t len, bool le);
-void nvdimm_flush(struct nd_region *nd_region);
+int nvdimm_flush(struct nd_region *nd_region, struct bio *bio);
+int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm);
-- 
2.20.1

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 1/6] libnvdimm: nd_region flush callback support
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: dan.j.williams, zwisler, vishal.l.verma, dave.jiang, mst,
	jasowang, willy, rjw, hch, lenb, jack, tytso, adilger.kernel,
	darrick.wong, lcapitulino, kwolf, imammedo, jmoyer, nilal, riel,
	stefanha, aarcange, david, david, cohuck, xiaoguangrong.eric,
	pbonzini, kilobyte, yuval.shaia, pagupta

This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/acpi/nfit/core.c     |  4 ++--
 drivers/nvdimm/claim.c       |  6 ++++--
 drivers/nvdimm/nd.h          |  1 +
 drivers/nvdimm/pmem.c        | 13 ++++++++-----
 drivers/nvdimm/region_devs.c | 26 ++++++++++++++++++++++++--
 include/linux/libnvdimm.h    |  8 +++++++-
 6 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 5a389a4f4f65..08dde76cf459 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2434,7 +2434,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw,
 		offset = to_interleave_offset(offset, mmio);
 
 	writeq(cmd, mmio->addr.base + offset);
-	nvdimm_flush(nfit_blk->nd_region);
+	nvdimm_flush(nfit_blk->nd_region, NULL);
 
 	if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
 		readq(mmio->addr.base + offset);
@@ -2483,7 +2483,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
 	}
 
 	if (rw)
-		nvdimm_flush(nfit_blk->nd_region);
+		nvdimm_flush(nfit_blk->nd_region, NULL);
 
 	rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
 	return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf469c7..13510bae1e6f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
 	unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
 	sector_t sector = offset >> 9;
-	int rc = 0;
+	int rc = 0, ret = 0;
 
 	if (unlikely(!size))
 		return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 	}
 
 	memcpy_flushcache(nsio->addr + offset, buf, size);
-	nvdimm_flush(to_nd_region(ndns->dev.parent));
+	ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL);
+	if (ret)
+		rc = ret;
 
 	return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index a5ac3b240293..0c74d2428bd7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -159,6 +159,7 @@ struct nd_region {
 	struct badblocks bb;
 	struct nd_interleave_set *nd_set;
 	struct nd_percpu_lane __percpu *lane;
+	int (*flush)(struct nd_region *nd_region, struct bio *bio);
 	struct nd_mapping mapping[0];
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index bc2f700feef8..f719245da170 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+	int ret = 0;
 	blk_status_t rc = 0;
 	bool do_acct;
 	unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 	struct nd_region *nd_region = to_region(pmem);
 
 	if (bio->bi_opf & REQ_PREFLUSH)
-		nvdimm_flush(nd_region);
+		ret = nvdimm_flush(nd_region, bio);
 
 	do_acct = nd_iostat_start(bio, &start);
 	bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 		nd_iostat_end(bio, start);
 
 	if (bio->bi_opf & REQ_FUA)
-		nvdimm_flush(nd_region);
+		ret = nvdimm_flush(nd_region, bio);
+
+	if (ret)
+		bio->bi_status = errno_to_blk_status(ret);
 
 	bio_endio(bio);
 	return BLK_QC_T_NONE;
@@ -469,7 +473,6 @@ static int pmem_attach_disk(struct device *dev,
 	}
 	dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
 	pmem->dax_dev = dax_dev;
-
 	gendev = disk_to_dev(disk);
 	gendev->groups = pmem_attribute_groups;
 
@@ -527,14 +530,14 @@ static int nd_pmem_remove(struct device *dev)
 		sysfs_put(pmem->bb_state);
 		pmem->bb_state = NULL;
 	}
-	nvdimm_flush(to_nd_region(dev->parent));
+	nvdimm_flush(to_nd_region(dev->parent), NULL);
 
 	return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-	nvdimm_flush(to_nd_region(dev->parent));
+	nvdimm_flush(to_nd_region(dev->parent), NULL);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b4ef7d9ff22e..e5b59708865e 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -295,7 +295,9 @@ static ssize_t deep_flush_store(struct device *dev, struct device_attribute *att
 		return rc;
 	if (!flush)
 		return -EINVAL;
-	nvdimm_flush(nd_region);
+	rc = nvdimm_flush(nd_region, NULL);
+	if (rc)
+		return rc;
 
 	return len;
 }
@@ -1085,6 +1087,11 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 	dev->of_node = ndr_desc->of_node;
 	nd_region->ndr_size = resource_size(ndr_desc->res);
 	nd_region->ndr_start = ndr_desc->res->start;
+	if (ndr_desc->flush)
+		nd_region->flush = ndr_desc->flush;
+	else
+		nd_region->flush = NULL;
+
 	nd_device_register(dev);
 
 	return nd_region;
@@ -1125,11 +1132,24 @@ struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus,
 }
 EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
 
+int nvdimm_flush(struct nd_region *nd_region, struct bio *bio)
+{
+	int rc = 0;
+
+	if (!nd_region->flush)
+		rc = generic_nvdimm_flush(nd_region);
+	else {
+		if (nd_region->flush(nd_region, bio))
+			rc = -EIO;
+	}
+
+	return rc;
+}
 /**
  * nvdimm_flush - flush any posted write queues between the cpu and pmem media
  * @nd_region: blk or interleaved pmem region
  */
-void nvdimm_flush(struct nd_region *nd_region)
+int generic_nvdimm_flush(struct nd_region *nd_region)
 {
 	struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
 	int i, idx;
@@ -1153,6 +1173,8 @@ void nvdimm_flush(struct nd_region *nd_region)
 		if (ndrd_get_flush_wpq(ndrd, i, 0))
 			writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
 	wmb();
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(nvdimm_flush);
 
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index feb342d026f2..a5f369ec3726 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -65,6 +65,9 @@ enum {
 	 */
 	ND_REGION_PERSIST_MEMCTRL = 2,
 
+	/* Platform provides asynchronous flush mechanism */
+	ND_REGION_ASYNC = 3,
+
 	/* mark newly adjusted resources as requiring a label update */
 	DPA_RESOURCE_ADJUSTED = 1 << 0,
 };
@@ -121,6 +124,7 @@ struct nd_mapping_desc {
 	int position;
 };
 
+struct nd_region;
 struct nd_region_desc {
 	struct resource *res;
 	struct nd_mapping_desc *mapping;
@@ -133,6 +137,7 @@ struct nd_region_desc {
 	int target_node;
 	unsigned long flags;
 	struct device_node *of_node;
+	int (*flush)(struct nd_region *nd_region, struct bio *bio);
 };
 
 struct device;
@@ -260,7 +265,8 @@ unsigned long nd_blk_memremap_flags(struct nd_blk_region *ndbr);
 unsigned int nd_region_acquire_lane(struct nd_region *nd_region);
 void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane);
 u64 nd_fletcher64(void *addr, size_t len, bool le);
-void nvdimm_flush(struct nd_region *nd_region);
+int nvdimm_flush(struct nd_region *nd_region, struct bio *bio);
+int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [Qemu-devel] [PATCH v7 1/6] libnvdimm: nd_region flush callback support
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: dan.j.williams, zwisler, vishal.l.verma, dave.jiang, mst,
	jasowang, willy, rjw, hch, lenb, jack, tytso, adilger.kernel,
	darrick.wong, lcapitulino, kwolf, imammedo, jmoyer, nilal, riel,
	stefanha, aarcange, david, david, cohuck, xiaoguangrong.eric,
	pbonzini, kilobyte, yuval.shaia, pagupta

This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/acpi/nfit/core.c     |  4 ++--
 drivers/nvdimm/claim.c       |  6 ++++--
 drivers/nvdimm/nd.h          |  1 +
 drivers/nvdimm/pmem.c        | 13 ++++++++-----
 drivers/nvdimm/region_devs.c | 26 ++++++++++++++++++++++++--
 include/linux/libnvdimm.h    |  8 +++++++-
 6 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 5a389a4f4f65..08dde76cf459 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2434,7 +2434,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw,
 		offset = to_interleave_offset(offset, mmio);
 
 	writeq(cmd, mmio->addr.base + offset);
-	nvdimm_flush(nfit_blk->nd_region);
+	nvdimm_flush(nfit_blk->nd_region, NULL);
 
 	if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
 		readq(mmio->addr.base + offset);
@@ -2483,7 +2483,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
 	}
 
 	if (rw)
-		nvdimm_flush(nfit_blk->nd_region);
+		nvdimm_flush(nfit_blk->nd_region, NULL);
 
 	rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
 	return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf469c7..13510bae1e6f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
 	unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
 	sector_t sector = offset >> 9;
-	int rc = 0;
+	int rc = 0, ret = 0;
 
 	if (unlikely(!size))
 		return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 	}
 
 	memcpy_flushcache(nsio->addr + offset, buf, size);
-	nvdimm_flush(to_nd_region(ndns->dev.parent));
+	ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL);
+	if (ret)
+		rc = ret;
 
 	return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index a5ac3b240293..0c74d2428bd7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -159,6 +159,7 @@ struct nd_region {
 	struct badblocks bb;
 	struct nd_interleave_set *nd_set;
 	struct nd_percpu_lane __percpu *lane;
+	int (*flush)(struct nd_region *nd_region, struct bio *bio);
 	struct nd_mapping mapping[0];
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index bc2f700feef8..f719245da170 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+	int ret = 0;
 	blk_status_t rc = 0;
 	bool do_acct;
 	unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 	struct nd_region *nd_region = to_region(pmem);
 
 	if (bio->bi_opf & REQ_PREFLUSH)
-		nvdimm_flush(nd_region);
+		ret = nvdimm_flush(nd_region, bio);
 
 	do_acct = nd_iostat_start(bio, &start);
 	bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 		nd_iostat_end(bio, start);
 
 	if (bio->bi_opf & REQ_FUA)
-		nvdimm_flush(nd_region);
+		ret = nvdimm_flush(nd_region, bio);
+
+	if (ret)
+		bio->bi_status = errno_to_blk_status(ret);
 
 	bio_endio(bio);
 	return BLK_QC_T_NONE;
@@ -469,7 +473,6 @@ static int pmem_attach_disk(struct device *dev,
 	}
 	dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
 	pmem->dax_dev = dax_dev;
-
 	gendev = disk_to_dev(disk);
 	gendev->groups = pmem_attribute_groups;
 
@@ -527,14 +530,14 @@ static int nd_pmem_remove(struct device *dev)
 		sysfs_put(pmem->bb_state);
 		pmem->bb_state = NULL;
 	}
-	nvdimm_flush(to_nd_region(dev->parent));
+	nvdimm_flush(to_nd_region(dev->parent), NULL);
 
 	return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-	nvdimm_flush(to_nd_region(dev->parent));
+	nvdimm_flush(to_nd_region(dev->parent), NULL);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b4ef7d9ff22e..e5b59708865e 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -295,7 +295,9 @@ static ssize_t deep_flush_store(struct device *dev, struct device_attribute *att
 		return rc;
 	if (!flush)
 		return -EINVAL;
-	nvdimm_flush(nd_region);
+	rc = nvdimm_flush(nd_region, NULL);
+	if (rc)
+		return rc;
 
 	return len;
 }
@@ -1085,6 +1087,11 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 	dev->of_node = ndr_desc->of_node;
 	nd_region->ndr_size = resource_size(ndr_desc->res);
 	nd_region->ndr_start = ndr_desc->res->start;
+	if (ndr_desc->flush)
+		nd_region->flush = ndr_desc->flush;
+	else
+		nd_region->flush = NULL;
+
 	nd_device_register(dev);
 
 	return nd_region;
@@ -1125,11 +1132,24 @@ struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus,
 }
 EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
 
+int nvdimm_flush(struct nd_region *nd_region, struct bio *bio)
+{
+	int rc = 0;
+
+	if (!nd_region->flush)
+		rc = generic_nvdimm_flush(nd_region);
+	else {
+		if (nd_region->flush(nd_region, bio))
+			rc = -EIO;
+	}
+
+	return rc;
+}
 /**
  * nvdimm_flush - flush any posted write queues between the cpu and pmem media
  * @nd_region: blk or interleaved pmem region
  */
-void nvdimm_flush(struct nd_region *nd_region)
+int generic_nvdimm_flush(struct nd_region *nd_region)
 {
 	struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
 	int i, idx;
@@ -1153,6 +1173,8 @@ void nvdimm_flush(struct nd_region *nd_region)
 		if (ndrd_get_flush_wpq(ndrd, i, 0))
 			writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
 	wmb();
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(nvdimm_flush);
 
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index feb342d026f2..a5f369ec3726 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -65,6 +65,9 @@ enum {
 	 */
 	ND_REGION_PERSIST_MEMCTRL = 2,
 
+	/* Platform provides asynchronous flush mechanism */
+	ND_REGION_ASYNC = 3,
+
 	/* mark newly adjusted resources as requiring a label update */
 	DPA_RESOURCE_ADJUSTED = 1 << 0,
 };
@@ -121,6 +124,7 @@ struct nd_mapping_desc {
 	int position;
 };
 
+struct nd_region;
 struct nd_region_desc {
 	struct resource *res;
 	struct nd_mapping_desc *mapping;
@@ -133,6 +137,7 @@ struct nd_region_desc {
 	int target_node;
 	unsigned long flags;
 	struct device_node *of_node;
+	int (*flush)(struct nd_region *nd_region, struct bio *bio);
 };
 
 struct device;
@@ -260,7 +265,8 @@ unsigned long nd_blk_memremap_flags(struct nd_blk_region *ndbr);
 unsigned int nd_region_acquire_lane(struct nd_region *nd_region);
 void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane);
 u64 nd_fletcher64(void *addr, size_t len, bool le);
-void nvdimm_flush(struct nd_region *nd_region);
+int nvdimm_flush(struct nd_region *nd_region, struct bio *bio);
+int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [Qemu-devel] [PATCH v7 1/6] libnvdimm: nd_region flush callback support
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: pagupta, jack, mst, jasowang, david, lcapitulino, adilger.kernel,
	zwisler, aarcange, dave.jiang, darrick.wong, vishal.l.verma,
	david, willy, hch, jmoyer, nilal, lenb, kilobyte, riel,
	yuval.shaia, stefanha, pbonzini, dan.j.williams, kwolf, tytso,
	xiaoguangrong.eric, cohuck, rjw, imammedo

This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/acpi/nfit/core.c     |  4 ++--
 drivers/nvdimm/claim.c       |  6 ++++--
 drivers/nvdimm/nd.h          |  1 +
 drivers/nvdimm/pmem.c        | 13 ++++++++-----
 drivers/nvdimm/region_devs.c | 26 ++++++++++++++++++++++++--
 include/linux/libnvdimm.h    |  8 +++++++-
 6 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 5a389a4f4f65..08dde76cf459 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2434,7 +2434,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw,
 		offset = to_interleave_offset(offset, mmio);
 
 	writeq(cmd, mmio->addr.base + offset);
-	nvdimm_flush(nfit_blk->nd_region);
+	nvdimm_flush(nfit_blk->nd_region, NULL);
 
 	if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
 		readq(mmio->addr.base + offset);
@@ -2483,7 +2483,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
 	}
 
 	if (rw)
-		nvdimm_flush(nfit_blk->nd_region);
+		nvdimm_flush(nfit_blk->nd_region, NULL);
 
 	rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
 	return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf469c7..13510bae1e6f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
 	unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
 	sector_t sector = offset >> 9;
-	int rc = 0;
+	int rc = 0, ret = 0;
 
 	if (unlikely(!size))
 		return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 	}
 
 	memcpy_flushcache(nsio->addr + offset, buf, size);
-	nvdimm_flush(to_nd_region(ndns->dev.parent));
+	ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL);
+	if (ret)
+		rc = ret;
 
 	return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index a5ac3b240293..0c74d2428bd7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -159,6 +159,7 @@ struct nd_region {
 	struct badblocks bb;
 	struct nd_interleave_set *nd_set;
 	struct nd_percpu_lane __percpu *lane;
+	int (*flush)(struct nd_region *nd_region, struct bio *bio);
 	struct nd_mapping mapping[0];
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index bc2f700feef8..f719245da170 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+	int ret = 0;
 	blk_status_t rc = 0;
 	bool do_acct;
 	unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 	struct nd_region *nd_region = to_region(pmem);
 
 	if (bio->bi_opf & REQ_PREFLUSH)
-		nvdimm_flush(nd_region);
+		ret = nvdimm_flush(nd_region, bio);
 
 	do_acct = nd_iostat_start(bio, &start);
 	bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 		nd_iostat_end(bio, start);
 
 	if (bio->bi_opf & REQ_FUA)
-		nvdimm_flush(nd_region);
+		ret = nvdimm_flush(nd_region, bio);
+
+	if (ret)
+		bio->bi_status = errno_to_blk_status(ret);
 
 	bio_endio(bio);
 	return BLK_QC_T_NONE;
@@ -469,7 +473,6 @@ static int pmem_attach_disk(struct device *dev,
 	}
 	dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
 	pmem->dax_dev = dax_dev;
-
 	gendev = disk_to_dev(disk);
 	gendev->groups = pmem_attribute_groups;
 
@@ -527,14 +530,14 @@ static int nd_pmem_remove(struct device *dev)
 		sysfs_put(pmem->bb_state);
 		pmem->bb_state = NULL;
 	}
-	nvdimm_flush(to_nd_region(dev->parent));
+	nvdimm_flush(to_nd_region(dev->parent), NULL);
 
 	return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-	nvdimm_flush(to_nd_region(dev->parent));
+	nvdimm_flush(to_nd_region(dev->parent), NULL);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b4ef7d9ff22e..e5b59708865e 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -295,7 +295,9 @@ static ssize_t deep_flush_store(struct device *dev, struct device_attribute *att
 		return rc;
 	if (!flush)
 		return -EINVAL;
-	nvdimm_flush(nd_region);
+	rc = nvdimm_flush(nd_region, NULL);
+	if (rc)
+		return rc;
 
 	return len;
 }
@@ -1085,6 +1087,11 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 	dev->of_node = ndr_desc->of_node;
 	nd_region->ndr_size = resource_size(ndr_desc->res);
 	nd_region->ndr_start = ndr_desc->res->start;
+	if (ndr_desc->flush)
+		nd_region->flush = ndr_desc->flush;
+	else
+		nd_region->flush = NULL;
+
 	nd_device_register(dev);
 
 	return nd_region;
@@ -1125,11 +1132,24 @@ struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus,
 }
 EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
 
+int nvdimm_flush(struct nd_region *nd_region, struct bio *bio)
+{
+	int rc = 0;
+
+	if (!nd_region->flush)
+		rc = generic_nvdimm_flush(nd_region);
+	else {
+		if (nd_region->flush(nd_region, bio))
+			rc = -EIO;
+	}
+
+	return rc;
+}
 /**
  * nvdimm_flush - flush any posted write queues between the cpu and pmem media
  * @nd_region: blk or interleaved pmem region
  */
-void nvdimm_flush(struct nd_region *nd_region)
+int generic_nvdimm_flush(struct nd_region *nd_region)
 {
 	struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
 	int i, idx;
@@ -1153,6 +1173,8 @@ void nvdimm_flush(struct nd_region *nd_region)
 		if (ndrd_get_flush_wpq(ndrd, i, 0))
 			writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
 	wmb();
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(nvdimm_flush);
 
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index feb342d026f2..a5f369ec3726 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -65,6 +65,9 @@ enum {
 	 */
 	ND_REGION_PERSIST_MEMCTRL = 2,
 
+	/* Platform provides asynchronous flush mechanism */
+	ND_REGION_ASYNC = 3,
+
 	/* mark newly adjusted resources as requiring a label update */
 	DPA_RESOURCE_ADJUSTED = 1 << 0,
 };
@@ -121,6 +124,7 @@ struct nd_mapping_desc {
 	int position;
 };
 
+struct nd_region;
 struct nd_region_desc {
 	struct resource *res;
 	struct nd_mapping_desc *mapping;
@@ -133,6 +137,7 @@ struct nd_region_desc {
 	int target_node;
 	unsigned long flags;
 	struct device_node *of_node;
+	int (*flush)(struct nd_region *nd_region, struct bio *bio);
 };
 
 struct device;
@@ -260,7 +265,8 @@ unsigned long nd_blk_memremap_flags(struct nd_blk_region *ndbr);
 unsigned int nd_region_acquire_lane(struct nd_region *nd_region);
 void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane);
 u64 nd_fletcher64(void *addr, size_t len, bool le);
-void nvdimm_flush(struct nd_region *nd_region);
+int nvdimm_flush(struct nd_region *nd_region, struct bio *bio);
+int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 1/6] libnvdimm: nd_region flush callback support
  2019-04-26  5:00 ` Pankaj Gupta
                   ` (3 preceding siblings ...)
  (?)
@ 2019-04-26  5:00 ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: pagupta, jack, mst, david, lcapitulino, adilger.kernel, zwisler,
	aarcange, dave.jiang, darrick.wong, vishal.l.verma, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, yuval.shaia, stefanha,
	pbonzini, dan.j.williams, tytso, xiaoguangrong.eric, cohuck, rjw,
	imammedo

This patch adds functionality to perform flush from guest
to host over VIRTIO. We are registering a callback based
on 'nd_region' type. virtio_pmem driver requires this special
flush function. For rest of the region types we are registering
existing flush function. Report error returned by host fsync
failure to userspace.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/acpi/nfit/core.c     |  4 ++--
 drivers/nvdimm/claim.c       |  6 ++++--
 drivers/nvdimm/nd.h          |  1 +
 drivers/nvdimm/pmem.c        | 13 ++++++++-----
 drivers/nvdimm/region_devs.c | 26 ++++++++++++++++++++++++--
 include/linux/libnvdimm.h    |  8 +++++++-
 6 files changed, 46 insertions(+), 12 deletions(-)

diff --git a/drivers/acpi/nfit/core.c b/drivers/acpi/nfit/core.c
index 5a389a4f4f65..08dde76cf459 100644
--- a/drivers/acpi/nfit/core.c
+++ b/drivers/acpi/nfit/core.c
@@ -2434,7 +2434,7 @@ static void write_blk_ctl(struct nfit_blk *nfit_blk, unsigned int bw,
 		offset = to_interleave_offset(offset, mmio);
 
 	writeq(cmd, mmio->addr.base + offset);
-	nvdimm_flush(nfit_blk->nd_region);
+	nvdimm_flush(nfit_blk->nd_region, NULL);
 
 	if (nfit_blk->dimm_flags & NFIT_BLK_DCR_LATCH)
 		readq(mmio->addr.base + offset);
@@ -2483,7 +2483,7 @@ static int acpi_nfit_blk_single_io(struct nfit_blk *nfit_blk,
 	}
 
 	if (rw)
-		nvdimm_flush(nfit_blk->nd_region);
+		nvdimm_flush(nfit_blk->nd_region, NULL);
 
 	rc = read_blk_stat(nfit_blk, lane) ? -EIO : 0;
 	return rc;
diff --git a/drivers/nvdimm/claim.c b/drivers/nvdimm/claim.c
index fb667bf469c7..13510bae1e6f 100644
--- a/drivers/nvdimm/claim.c
+++ b/drivers/nvdimm/claim.c
@@ -263,7 +263,7 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 	struct nd_namespace_io *nsio = to_nd_namespace_io(&ndns->dev);
 	unsigned int sz_align = ALIGN(size + (offset & (512 - 1)), 512);
 	sector_t sector = offset >> 9;
-	int rc = 0;
+	int rc = 0, ret = 0;
 
 	if (unlikely(!size))
 		return 0;
@@ -301,7 +301,9 @@ static int nsio_rw_bytes(struct nd_namespace_common *ndns,
 	}
 
 	memcpy_flushcache(nsio->addr + offset, buf, size);
-	nvdimm_flush(to_nd_region(ndns->dev.parent));
+	ret = nvdimm_flush(to_nd_region(ndns->dev.parent), NULL);
+	if (ret)
+		rc = ret;
 
 	return rc;
 }
diff --git a/drivers/nvdimm/nd.h b/drivers/nvdimm/nd.h
index a5ac3b240293..0c74d2428bd7 100644
--- a/drivers/nvdimm/nd.h
+++ b/drivers/nvdimm/nd.h
@@ -159,6 +159,7 @@ struct nd_region {
 	struct badblocks bb;
 	struct nd_interleave_set *nd_set;
 	struct nd_percpu_lane __percpu *lane;
+	int (*flush)(struct nd_region *nd_region, struct bio *bio);
 	struct nd_mapping mapping[0];
 };
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index bc2f700feef8..f719245da170 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -192,6 +192,7 @@ static blk_status_t pmem_do_bvec(struct pmem_device *pmem, struct page *page,
 
 static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 {
+	int ret = 0;
 	blk_status_t rc = 0;
 	bool do_acct;
 	unsigned long start;
@@ -201,7 +202,7 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 	struct nd_region *nd_region = to_region(pmem);
 
 	if (bio->bi_opf & REQ_PREFLUSH)
-		nvdimm_flush(nd_region);
+		ret = nvdimm_flush(nd_region, bio);
 
 	do_acct = nd_iostat_start(bio, &start);
 	bio_for_each_segment(bvec, bio, iter) {
@@ -216,7 +217,10 @@ static blk_qc_t pmem_make_request(struct request_queue *q, struct bio *bio)
 		nd_iostat_end(bio, start);
 
 	if (bio->bi_opf & REQ_FUA)
-		nvdimm_flush(nd_region);
+		ret = nvdimm_flush(nd_region, bio);
+
+	if (ret)
+		bio->bi_status = errno_to_blk_status(ret);
 
 	bio_endio(bio);
 	return BLK_QC_T_NONE;
@@ -469,7 +473,6 @@ static int pmem_attach_disk(struct device *dev,
 	}
 	dax_write_cache(dax_dev, nvdimm_has_cache(nd_region));
 	pmem->dax_dev = dax_dev;
-
 	gendev = disk_to_dev(disk);
 	gendev->groups = pmem_attribute_groups;
 
@@ -527,14 +530,14 @@ static int nd_pmem_remove(struct device *dev)
 		sysfs_put(pmem->bb_state);
 		pmem->bb_state = NULL;
 	}
-	nvdimm_flush(to_nd_region(dev->parent));
+	nvdimm_flush(to_nd_region(dev->parent), NULL);
 
 	return 0;
 }
 
 static void nd_pmem_shutdown(struct device *dev)
 {
-	nvdimm_flush(to_nd_region(dev->parent));
+	nvdimm_flush(to_nd_region(dev->parent), NULL);
 }
 
 static void nd_pmem_notify(struct device *dev, enum nvdimm_event event)
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index b4ef7d9ff22e..e5b59708865e 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -295,7 +295,9 @@ static ssize_t deep_flush_store(struct device *dev, struct device_attribute *att
 		return rc;
 	if (!flush)
 		return -EINVAL;
-	nvdimm_flush(nd_region);
+	rc = nvdimm_flush(nd_region, NULL);
+	if (rc)
+		return rc;
 
 	return len;
 }
@@ -1085,6 +1087,11 @@ static struct nd_region *nd_region_create(struct nvdimm_bus *nvdimm_bus,
 	dev->of_node = ndr_desc->of_node;
 	nd_region->ndr_size = resource_size(ndr_desc->res);
 	nd_region->ndr_start = ndr_desc->res->start;
+	if (ndr_desc->flush)
+		nd_region->flush = ndr_desc->flush;
+	else
+		nd_region->flush = NULL;
+
 	nd_device_register(dev);
 
 	return nd_region;
@@ -1125,11 +1132,24 @@ struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus,
 }
 EXPORT_SYMBOL_GPL(nvdimm_volatile_region_create);
 
+int nvdimm_flush(struct nd_region *nd_region, struct bio *bio)
+{
+	int rc = 0;
+
+	if (!nd_region->flush)
+		rc = generic_nvdimm_flush(nd_region);
+	else {
+		if (nd_region->flush(nd_region, bio))
+			rc = -EIO;
+	}
+
+	return rc;
+}
 /**
  * nvdimm_flush - flush any posted write queues between the cpu and pmem media
  * @nd_region: blk or interleaved pmem region
  */
-void nvdimm_flush(struct nd_region *nd_region)
+int generic_nvdimm_flush(struct nd_region *nd_region)
 {
 	struct nd_region_data *ndrd = dev_get_drvdata(&nd_region->dev);
 	int i, idx;
@@ -1153,6 +1173,8 @@ void nvdimm_flush(struct nd_region *nd_region)
 		if (ndrd_get_flush_wpq(ndrd, i, 0))
 			writeq(1, ndrd_get_flush_wpq(ndrd, i, idx));
 	wmb();
+
+	return 0;
 }
 EXPORT_SYMBOL_GPL(nvdimm_flush);
 
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index feb342d026f2..a5f369ec3726 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -65,6 +65,9 @@ enum {
 	 */
 	ND_REGION_PERSIST_MEMCTRL = 2,
 
+	/* Platform provides asynchronous flush mechanism */
+	ND_REGION_ASYNC = 3,
+
 	/* mark newly adjusted resources as requiring a label update */
 	DPA_RESOURCE_ADJUSTED = 1 << 0,
 };
@@ -121,6 +124,7 @@ struct nd_mapping_desc {
 	int position;
 };
 
+struct nd_region;
 struct nd_region_desc {
 	struct resource *res;
 	struct nd_mapping_desc *mapping;
@@ -133,6 +137,7 @@ struct nd_region_desc {
 	int target_node;
 	unsigned long flags;
 	struct device_node *of_node;
+	int (*flush)(struct nd_region *nd_region, struct bio *bio);
 };
 
 struct device;
@@ -260,7 +265,8 @@ unsigned long nd_blk_memremap_flags(struct nd_blk_region *ndbr);
 unsigned int nd_region_acquire_lane(struct nd_region *nd_region);
 void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane);
 u64 nd_fletcher64(void *addr, size_t len, bool le);
-void nvdimm_flush(struct nd_region *nd_region);
+int nvdimm_flush(struct nd_region *nd_region, struct bio *bio);
+int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-04-26  5:00 ` Pankaj Gupta
  (?)
  (?)
@ 2019-04-26  5:00   ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: jack, mst, jasowang, david, lcapitulino, adilger.kernel, zwisler,
	aarcange, darrick.wong, david, willy, hch, nilal, lenb, kilobyte,
	riel, yuval.shaia, stefanha, pbonzini, kwolf, tytso,
	xiaoguangrong.eric, cohuck, rjw, imammedo

This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
 drivers/virtio/Kconfig           |  10 +++
 drivers/virtio/Makefile          |   1 +
 drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
 include/linux/virtio_pmem.h      |  60 ++++++++++++++++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  10 +++
 7 files changed, 314 insertions(+)
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/virtio/pmem.c
 create mode 100644 include/linux/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
new file mode 100644
index 000000000000..66b582f751a3
--- /dev/null
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include <linux/virtio_pmem.h>
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+	unsigned int len;
+	unsigned long flags;
+	struct virtio_pmem_request *req, *req_buf;
+	struct virtio_pmem *vpmem = vq->vdev->priv;
+
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
+		req->done = true;
+		wake_up(&req->host_acked);
+
+		if (!list_empty(&vpmem->req_list)) {
+			req_buf = list_first_entry(&vpmem->req_list,
+					struct virtio_pmem_request, list);
+			list_del(&vpmem->req_list);
+			req_buf->wq_buf_avail = true;
+			wake_up(&req_buf->wq_buf);
+		}
+	}
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+	int err;
+	unsigned long flags;
+	struct scatterlist *sgs[2], sg, ret;
+	struct virtio_device *vdev = nd_region->provider_data;
+	struct virtio_pmem *vpmem = vdev->priv;
+	struct virtio_pmem_request *req;
+
+	might_sleep();
+	req = kmalloc(sizeof(*req), GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+
+	req->done = req->wq_buf_avail = false;
+	strcpy(req->name, "FLUSH");
+	init_waitqueue_head(&req->host_acked);
+	init_waitqueue_head(&req->wq_buf);
+	sg_init_one(&sg, req->name, strlen(req->name));
+	sgs[0] = &sg;
+	sg_init_one(&ret, &req->ret, sizeof(req->ret));
+	sgs[1] = &ret;
+
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
+	if (err) {
+		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
+
+		list_add_tail(&vpmem->req_list, &req->list);
+		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+		/* When host has read buffer, this completes via host_ack */
+		wait_event(req->wq_buf, req->wq_buf_avail);
+		spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	}
+	err = virtqueue_kick(vpmem->req_vq);
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+	if (!err) {
+		err = -EIO;
+		goto ret;
+	}
+	/* When host has read buffer, this completes via host_ack */
+	wait_event(req->host_acked, req->done);
+	err = req->ret;
+ret:
+	kfree(req);
+	return err;
+};
+
+ /* The asynchronous flush callback function */
+int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
+{
+	int rc = 0;
+
+	/* Create child bio for asynchronous flush and chain with
+	 * parent bio. Otherwise directly call nd_region flush.
+	 */
+	if (bio && bio->bi_iter.bi_sector != -1) {
+		struct bio *child = bio_alloc(GFP_ATOMIC, 0);
+
+		if (!child)
+			return -ENOMEM;
+		bio_copy_dev(child, bio);
+		child->bi_opf = REQ_PREFLUSH;
+		child->bi_iter.bi_sector = -1;
+		bio_chain(child, bio);
+		submit_bio(child);
+	} else {
+		if (virtio_pmem_flush(nd_region))
+			rc = -EIO;
+	}
+
+	return rc;
+};
+EXPORT_SYMBOL_GPL(async_pmem_flush);
+MODULE_LICENSE("GPL");
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 35897649c24f..9f634a2ed638 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
 
 	  If unsure, say Y.
 
+config VIRTIO_PMEM
+	tristate "Support for virtio pmem driver"
+	depends on VIRTIO
+	depends on LIBNVDIMM
+	help
+	This driver provides support for virtio based flushing interface
+	for persistent memory range.
+
+	If unsure, say M.
+
 config VIRTIO_BALLOON
 	tristate "Virtio balloon driver"
 	depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 3a2b5c5dcf46..143ce91eabe9 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
+obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
new file mode 100644
index 000000000000..309788628e41
--- /dev/null
+++ b/drivers/virtio/pmem.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and registers the virtual pmem device
+ * with libnvdimm core.
+ */
+#include <linux/virtio_pmem.h>
+#include <../../drivers/nvdimm/nd.h>
+
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+ /* Initialize virt queue */
+static int init_vq(struct virtio_pmem *vpmem)
+{
+	/* single vq */
+	vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
+				host_ack, "flush_queue");
+	if (IS_ERR(vpmem->req_vq))
+		return PTR_ERR(vpmem->req_vq);
+
+	spin_lock_init(&vpmem->pmem_lock);
+	INIT_LIST_HEAD(&vpmem->req_list);
+
+	return 0;
+};
+
+static int virtio_pmem_probe(struct virtio_device *vdev)
+{
+	int err = 0;
+	struct resource res;
+	struct virtio_pmem *vpmem;
+	struct nd_region_desc ndr_desc = {};
+	int nid = dev_to_node(&vdev->dev);
+	struct nd_region *nd_region;
+
+	if (!vdev->config->get) {
+		dev_err(&vdev->dev, "%s failure: config access disabled\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
+	if (!vpmem) {
+		err = -ENOMEM;
+		goto out_err;
+	}
+
+	vpmem->vdev = vdev;
+	vdev->priv = vpmem;
+	err = init_vq(vpmem);
+	if (err)
+		goto out_err;
+
+	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
+			start, &vpmem->start);
+	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
+			size, &vpmem->size);
+
+	res.start = vpmem->start;
+	res.end   = vpmem->start + vpmem->size-1;
+	vpmem->nd_desc.provider_name = "virtio-pmem";
+	vpmem->nd_desc.module = THIS_MODULE;
+
+	vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
+						&vpmem->nd_desc);
+	if (!vpmem->nvdimm_bus)
+		goto out_vq;
+
+	dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
+
+	ndr_desc.res = &res;
+	ndr_desc.numa_node = nid;
+	ndr_desc.flush = async_pmem_flush;
+	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
+	set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
+	nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
+
+	if (!nd_region)
+		goto out_nd;
+	nd_region->provider_data =  dev_to_virtio
+					(nd_region->dev.parent->parent);
+	return 0;
+out_nd:
+	err = -ENXIO;
+	nvdimm_bus_unregister(vpmem->nvdimm_bus);
+out_vq:
+	vdev->config->del_vqs(vdev);
+out_err:
+	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
+	return err;
+}
+
+static void virtio_pmem_remove(struct virtio_device *vdev)
+{
+	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
+
+	nvdimm_bus_unregister(nvdimm_bus);
+	vdev->config->del_vqs(vdev);
+	vdev->config->reset(vdev);
+}
+
+static struct virtio_driver virtio_pmem_driver = {
+	.driver.name		= KBUILD_MODNAME,
+	.driver.owner		= THIS_MODULE,
+	.id_table		= id_table,
+	.probe			= virtio_pmem_probe,
+	.remove			= virtio_pmem_remove,
+};
+
+module_virtio_driver(virtio_pmem_driver);
+MODULE_DEVICE_TABLE(virtio, id_table);
+MODULE_DESCRIPTION("Virtio pmem driver");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
new file mode 100644
index 000000000000..ab1da877575d
--- /dev/null
+++ b/include/linux/virtio_pmem.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * virtio_pmem.h: virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ **/
+
+#ifndef _LINUX_VIRTIO_PMEM_H
+#define _LINUX_VIRTIO_PMEM_H
+
+#include <linux/virtio_ids.h>
+#include <linux/module.h>
+#include <linux/virtio_config.h>
+#include <uapi/linux/virtio_pmem.h>
+#include <linux/libnvdimm.h>
+#include <linux/spinlock.h>
+
+struct virtio_pmem_request {
+	/* Host return status corresponding to flush request */
+	int ret;
+
+	/* command name*/
+	char name[16];
+
+	/* Wait queue to process deferred work after ack from host */
+	wait_queue_head_t host_acked;
+	bool done;
+
+	/* Wait queue to process deferred work after virt queue buffer avail */
+	wait_queue_head_t wq_buf;
+	bool wq_buf_avail;
+	struct list_head list;
+};
+
+struct virtio_pmem {
+	struct virtio_device *vdev;
+
+	/* Virtio pmem request queue */
+	struct virtqueue *req_vq;
+
+	/* nvdimm bus registers virtio pmem device */
+	struct nvdimm_bus *nvdimm_bus;
+	struct nvdimm_bus_descriptor nd_desc;
+
+	/* List to store deferred work if virtqueue is full */
+	struct list_head req_list;
+
+	/* Synchronize virtqueue data */
+	spinlock_t pmem_lock;
+
+	/* Memory region information */
+	uint64_t start;
+	uint64_t size;
+};
+
+void host_ack(struct virtqueue *vq);
+int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
+#endif
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index 6d5c3b2d4f4d..32b2f94d1f58 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -43,5 +43,6 @@
 #define VIRTIO_ID_INPUT        18 /* virtio input */
 #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
 #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
+#define VIRTIO_ID_PMEM         27 /* virtio pmem */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
new file mode 100644
index 000000000000..fa3f7d52717a
--- /dev/null
+++ b/include/uapi/linux/virtio_pmem.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
+#define _UAPI_LINUX_VIRTIO_PMEM_H
+
+struct virtio_pmem_config {
+	__le64 start;
+	__le64 size;
+};
+#endif
-- 
2.20.1

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: dan.j.williams, zwisler, vishal.l.verma, dave.jiang, mst,
	jasowang, willy, rjw, hch, lenb, jack, tytso, adilger.kernel,
	darrick.wong, lcapitulino, kwolf, imammedo, jmoyer, nilal, riel,
	stefanha, aarcange, david, david, cohuck, xiaoguangrong.eric,
	pbonzini, kilobyte, yuval.shaia, pagupta

This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
 drivers/virtio/Kconfig           |  10 +++
 drivers/virtio/Makefile          |   1 +
 drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
 include/linux/virtio_pmem.h      |  60 ++++++++++++++++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  10 +++
 7 files changed, 314 insertions(+)
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/virtio/pmem.c
 create mode 100644 include/linux/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
new file mode 100644
index 000000000000..66b582f751a3
--- /dev/null
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include <linux/virtio_pmem.h>
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+	unsigned int len;
+	unsigned long flags;
+	struct virtio_pmem_request *req, *req_buf;
+	struct virtio_pmem *vpmem = vq->vdev->priv;
+
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
+		req->done = true;
+		wake_up(&req->host_acked);
+
+		if (!list_empty(&vpmem->req_list)) {
+			req_buf = list_first_entry(&vpmem->req_list,
+					struct virtio_pmem_request, list);
+			list_del(&vpmem->req_list);
+			req_buf->wq_buf_avail = true;
+			wake_up(&req_buf->wq_buf);
+		}
+	}
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+	int err;
+	unsigned long flags;
+	struct scatterlist *sgs[2], sg, ret;
+	struct virtio_device *vdev = nd_region->provider_data;
+	struct virtio_pmem *vpmem = vdev->priv;
+	struct virtio_pmem_request *req;
+
+	might_sleep();
+	req = kmalloc(sizeof(*req), GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+
+	req->done = req->wq_buf_avail = false;
+	strcpy(req->name, "FLUSH");
+	init_waitqueue_head(&req->host_acked);
+	init_waitqueue_head(&req->wq_buf);
+	sg_init_one(&sg, req->name, strlen(req->name));
+	sgs[0] = &sg;
+	sg_init_one(&ret, &req->ret, sizeof(req->ret));
+	sgs[1] = &ret;
+
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
+	if (err) {
+		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
+
+		list_add_tail(&vpmem->req_list, &req->list);
+		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+		/* When host has read buffer, this completes via host_ack */
+		wait_event(req->wq_buf, req->wq_buf_avail);
+		spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	}
+	err = virtqueue_kick(vpmem->req_vq);
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+	if (!err) {
+		err = -EIO;
+		goto ret;
+	}
+	/* When host has read buffer, this completes via host_ack */
+	wait_event(req->host_acked, req->done);
+	err = req->ret;
+ret:
+	kfree(req);
+	return err;
+};
+
+ /* The asynchronous flush callback function */
+int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
+{
+	int rc = 0;
+
+	/* Create child bio for asynchronous flush and chain with
+	 * parent bio. Otherwise directly call nd_region flush.
+	 */
+	if (bio && bio->bi_iter.bi_sector != -1) {
+		struct bio *child = bio_alloc(GFP_ATOMIC, 0);
+
+		if (!child)
+			return -ENOMEM;
+		bio_copy_dev(child, bio);
+		child->bi_opf = REQ_PREFLUSH;
+		child->bi_iter.bi_sector = -1;
+		bio_chain(child, bio);
+		submit_bio(child);
+	} else {
+		if (virtio_pmem_flush(nd_region))
+			rc = -EIO;
+	}
+
+	return rc;
+};
+EXPORT_SYMBOL_GPL(async_pmem_flush);
+MODULE_LICENSE("GPL");
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 35897649c24f..9f634a2ed638 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
 
 	  If unsure, say Y.
 
+config VIRTIO_PMEM
+	tristate "Support for virtio pmem driver"
+	depends on VIRTIO
+	depends on LIBNVDIMM
+	help
+	This driver provides support for virtio based flushing interface
+	for persistent memory range.
+
+	If unsure, say M.
+
 config VIRTIO_BALLOON
 	tristate "Virtio balloon driver"
 	depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 3a2b5c5dcf46..143ce91eabe9 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
+obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
new file mode 100644
index 000000000000..309788628e41
--- /dev/null
+++ b/drivers/virtio/pmem.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and registers the virtual pmem device
+ * with libnvdimm core.
+ */
+#include <linux/virtio_pmem.h>
+#include <../../drivers/nvdimm/nd.h>
+
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+ /* Initialize virt queue */
+static int init_vq(struct virtio_pmem *vpmem)
+{
+	/* single vq */
+	vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
+				host_ack, "flush_queue");
+	if (IS_ERR(vpmem->req_vq))
+		return PTR_ERR(vpmem->req_vq);
+
+	spin_lock_init(&vpmem->pmem_lock);
+	INIT_LIST_HEAD(&vpmem->req_list);
+
+	return 0;
+};
+
+static int virtio_pmem_probe(struct virtio_device *vdev)
+{
+	int err = 0;
+	struct resource res;
+	struct virtio_pmem *vpmem;
+	struct nd_region_desc ndr_desc = {};
+	int nid = dev_to_node(&vdev->dev);
+	struct nd_region *nd_region;
+
+	if (!vdev->config->get) {
+		dev_err(&vdev->dev, "%s failure: config access disabled\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
+	if (!vpmem) {
+		err = -ENOMEM;
+		goto out_err;
+	}
+
+	vpmem->vdev = vdev;
+	vdev->priv = vpmem;
+	err = init_vq(vpmem);
+	if (err)
+		goto out_err;
+
+	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
+			start, &vpmem->start);
+	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
+			size, &vpmem->size);
+
+	res.start = vpmem->start;
+	res.end   = vpmem->start + vpmem->size-1;
+	vpmem->nd_desc.provider_name = "virtio-pmem";
+	vpmem->nd_desc.module = THIS_MODULE;
+
+	vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
+						&vpmem->nd_desc);
+	if (!vpmem->nvdimm_bus)
+		goto out_vq;
+
+	dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
+
+	ndr_desc.res = &res;
+	ndr_desc.numa_node = nid;
+	ndr_desc.flush = async_pmem_flush;
+	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
+	set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
+	nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
+
+	if (!nd_region)
+		goto out_nd;
+	nd_region->provider_data =  dev_to_virtio
+					(nd_region->dev.parent->parent);
+	return 0;
+out_nd:
+	err = -ENXIO;
+	nvdimm_bus_unregister(vpmem->nvdimm_bus);
+out_vq:
+	vdev->config->del_vqs(vdev);
+out_err:
+	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
+	return err;
+}
+
+static void virtio_pmem_remove(struct virtio_device *vdev)
+{
+	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
+
+	nvdimm_bus_unregister(nvdimm_bus);
+	vdev->config->del_vqs(vdev);
+	vdev->config->reset(vdev);
+}
+
+static struct virtio_driver virtio_pmem_driver = {
+	.driver.name		= KBUILD_MODNAME,
+	.driver.owner		= THIS_MODULE,
+	.id_table		= id_table,
+	.probe			= virtio_pmem_probe,
+	.remove			= virtio_pmem_remove,
+};
+
+module_virtio_driver(virtio_pmem_driver);
+MODULE_DEVICE_TABLE(virtio, id_table);
+MODULE_DESCRIPTION("Virtio pmem driver");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
new file mode 100644
index 000000000000..ab1da877575d
--- /dev/null
+++ b/include/linux/virtio_pmem.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * virtio_pmem.h: virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ **/
+
+#ifndef _LINUX_VIRTIO_PMEM_H
+#define _LINUX_VIRTIO_PMEM_H
+
+#include <linux/virtio_ids.h>
+#include <linux/module.h>
+#include <linux/virtio_config.h>
+#include <uapi/linux/virtio_pmem.h>
+#include <linux/libnvdimm.h>
+#include <linux/spinlock.h>
+
+struct virtio_pmem_request {
+	/* Host return status corresponding to flush request */
+	int ret;
+
+	/* command name*/
+	char name[16];
+
+	/* Wait queue to process deferred work after ack from host */
+	wait_queue_head_t host_acked;
+	bool done;
+
+	/* Wait queue to process deferred work after virt queue buffer avail */
+	wait_queue_head_t wq_buf;
+	bool wq_buf_avail;
+	struct list_head list;
+};
+
+struct virtio_pmem {
+	struct virtio_device *vdev;
+
+	/* Virtio pmem request queue */
+	struct virtqueue *req_vq;
+
+	/* nvdimm bus registers virtio pmem device */
+	struct nvdimm_bus *nvdimm_bus;
+	struct nvdimm_bus_descriptor nd_desc;
+
+	/* List to store deferred work if virtqueue is full */
+	struct list_head req_list;
+
+	/* Synchronize virtqueue data */
+	spinlock_t pmem_lock;
+
+	/* Memory region information */
+	uint64_t start;
+	uint64_t size;
+};
+
+void host_ack(struct virtqueue *vq);
+int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
+#endif
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index 6d5c3b2d4f4d..32b2f94d1f58 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -43,5 +43,6 @@
 #define VIRTIO_ID_INPUT        18 /* virtio input */
 #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
 #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
+#define VIRTIO_ID_PMEM         27 /* virtio pmem */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
new file mode 100644
index 000000000000..fa3f7d52717a
--- /dev/null
+++ b/include/uapi/linux/virtio_pmem.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
+#define _UAPI_LINUX_VIRTIO_PMEM_H
+
+struct virtio_pmem_config {
+	__le64 start;
+	__le64 size;
+};
+#endif
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: dan.j.williams, zwisler, vishal.l.verma, dave.jiang, mst,
	jasowang, willy, rjw, hch, lenb, jack, tytso, adilger.kernel,
	darrick.wong, lcapitulino, kwolf, imammedo, jmoyer, nilal, riel,
	stefanha, aarcange, david, david, cohuck, xiaoguangrong.eric,
	pbonzini, kilobyte, yuval.shaia, pagupta

This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
 drivers/virtio/Kconfig           |  10 +++
 drivers/virtio/Makefile          |   1 +
 drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
 include/linux/virtio_pmem.h      |  60 ++++++++++++++++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  10 +++
 7 files changed, 314 insertions(+)
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/virtio/pmem.c
 create mode 100644 include/linux/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
new file mode 100644
index 000000000000..66b582f751a3
--- /dev/null
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include <linux/virtio_pmem.h>
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+	unsigned int len;
+	unsigned long flags;
+	struct virtio_pmem_request *req, *req_buf;
+	struct virtio_pmem *vpmem = vq->vdev->priv;
+
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
+		req->done = true;
+		wake_up(&req->host_acked);
+
+		if (!list_empty(&vpmem->req_list)) {
+			req_buf = list_first_entry(&vpmem->req_list,
+					struct virtio_pmem_request, list);
+			list_del(&vpmem->req_list);
+			req_buf->wq_buf_avail = true;
+			wake_up(&req_buf->wq_buf);
+		}
+	}
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+	int err;
+	unsigned long flags;
+	struct scatterlist *sgs[2], sg, ret;
+	struct virtio_device *vdev = nd_region->provider_data;
+	struct virtio_pmem *vpmem = vdev->priv;
+	struct virtio_pmem_request *req;
+
+	might_sleep();
+	req = kmalloc(sizeof(*req), GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+
+	req->done = req->wq_buf_avail = false;
+	strcpy(req->name, "FLUSH");
+	init_waitqueue_head(&req->host_acked);
+	init_waitqueue_head(&req->wq_buf);
+	sg_init_one(&sg, req->name, strlen(req->name));
+	sgs[0] = &sg;
+	sg_init_one(&ret, &req->ret, sizeof(req->ret));
+	sgs[1] = &ret;
+
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
+	if (err) {
+		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
+
+		list_add_tail(&vpmem->req_list, &req->list);
+		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+		/* When host has read buffer, this completes via host_ack */
+		wait_event(req->wq_buf, req->wq_buf_avail);
+		spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	}
+	err = virtqueue_kick(vpmem->req_vq);
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+	if (!err) {
+		err = -EIO;
+		goto ret;
+	}
+	/* When host has read buffer, this completes via host_ack */
+	wait_event(req->host_acked, req->done);
+	err = req->ret;
+ret:
+	kfree(req);
+	return err;
+};
+
+ /* The asynchronous flush callback function */
+int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
+{
+	int rc = 0;
+
+	/* Create child bio for asynchronous flush and chain with
+	 * parent bio. Otherwise directly call nd_region flush.
+	 */
+	if (bio && bio->bi_iter.bi_sector != -1) {
+		struct bio *child = bio_alloc(GFP_ATOMIC, 0);
+
+		if (!child)
+			return -ENOMEM;
+		bio_copy_dev(child, bio);
+		child->bi_opf = REQ_PREFLUSH;
+		child->bi_iter.bi_sector = -1;
+		bio_chain(child, bio);
+		submit_bio(child);
+	} else {
+		if (virtio_pmem_flush(nd_region))
+			rc = -EIO;
+	}
+
+	return rc;
+};
+EXPORT_SYMBOL_GPL(async_pmem_flush);
+MODULE_LICENSE("GPL");
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 35897649c24f..9f634a2ed638 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
 
 	  If unsure, say Y.
 
+config VIRTIO_PMEM
+	tristate "Support for virtio pmem driver"
+	depends on VIRTIO
+	depends on LIBNVDIMM
+	help
+	This driver provides support for virtio based flushing interface
+	for persistent memory range.
+
+	If unsure, say M.
+
 config VIRTIO_BALLOON
 	tristate "Virtio balloon driver"
 	depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 3a2b5c5dcf46..143ce91eabe9 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
+obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
new file mode 100644
index 000000000000..309788628e41
--- /dev/null
+++ b/drivers/virtio/pmem.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and registers the virtual pmem device
+ * with libnvdimm core.
+ */
+#include <linux/virtio_pmem.h>
+#include <../../drivers/nvdimm/nd.h>
+
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+ /* Initialize virt queue */
+static int init_vq(struct virtio_pmem *vpmem)
+{
+	/* single vq */
+	vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
+				host_ack, "flush_queue");
+	if (IS_ERR(vpmem->req_vq))
+		return PTR_ERR(vpmem->req_vq);
+
+	spin_lock_init(&vpmem->pmem_lock);
+	INIT_LIST_HEAD(&vpmem->req_list);
+
+	return 0;
+};
+
+static int virtio_pmem_probe(struct virtio_device *vdev)
+{
+	int err = 0;
+	struct resource res;
+	struct virtio_pmem *vpmem;
+	struct nd_region_desc ndr_desc = {};
+	int nid = dev_to_node(&vdev->dev);
+	struct nd_region *nd_region;
+
+	if (!vdev->config->get) {
+		dev_err(&vdev->dev, "%s failure: config access disabled\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
+	if (!vpmem) {
+		err = -ENOMEM;
+		goto out_err;
+	}
+
+	vpmem->vdev = vdev;
+	vdev->priv = vpmem;
+	err = init_vq(vpmem);
+	if (err)
+		goto out_err;
+
+	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
+			start, &vpmem->start);
+	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
+			size, &vpmem->size);
+
+	res.start = vpmem->start;
+	res.end   = vpmem->start + vpmem->size-1;
+	vpmem->nd_desc.provider_name = "virtio-pmem";
+	vpmem->nd_desc.module = THIS_MODULE;
+
+	vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
+						&vpmem->nd_desc);
+	if (!vpmem->nvdimm_bus)
+		goto out_vq;
+
+	dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
+
+	ndr_desc.res = &res;
+	ndr_desc.numa_node = nid;
+	ndr_desc.flush = async_pmem_flush;
+	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
+	set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
+	nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
+
+	if (!nd_region)
+		goto out_nd;
+	nd_region->provider_data =  dev_to_virtio
+					(nd_region->dev.parent->parent);
+	return 0;
+out_nd:
+	err = -ENXIO;
+	nvdimm_bus_unregister(vpmem->nvdimm_bus);
+out_vq:
+	vdev->config->del_vqs(vdev);
+out_err:
+	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
+	return err;
+}
+
+static void virtio_pmem_remove(struct virtio_device *vdev)
+{
+	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
+
+	nvdimm_bus_unregister(nvdimm_bus);
+	vdev->config->del_vqs(vdev);
+	vdev->config->reset(vdev);
+}
+
+static struct virtio_driver virtio_pmem_driver = {
+	.driver.name		= KBUILD_MODNAME,
+	.driver.owner		= THIS_MODULE,
+	.id_table		= id_table,
+	.probe			= virtio_pmem_probe,
+	.remove			= virtio_pmem_remove,
+};
+
+module_virtio_driver(virtio_pmem_driver);
+MODULE_DEVICE_TABLE(virtio, id_table);
+MODULE_DESCRIPTION("Virtio pmem driver");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
new file mode 100644
index 000000000000..ab1da877575d
--- /dev/null
+++ b/include/linux/virtio_pmem.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * virtio_pmem.h: virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ **/
+
+#ifndef _LINUX_VIRTIO_PMEM_H
+#define _LINUX_VIRTIO_PMEM_H
+
+#include <linux/virtio_ids.h>
+#include <linux/module.h>
+#include <linux/virtio_config.h>
+#include <uapi/linux/virtio_pmem.h>
+#include <linux/libnvdimm.h>
+#include <linux/spinlock.h>
+
+struct virtio_pmem_request {
+	/* Host return status corresponding to flush request */
+	int ret;
+
+	/* command name*/
+	char name[16];
+
+	/* Wait queue to process deferred work after ack from host */
+	wait_queue_head_t host_acked;
+	bool done;
+
+	/* Wait queue to process deferred work after virt queue buffer avail */
+	wait_queue_head_t wq_buf;
+	bool wq_buf_avail;
+	struct list_head list;
+};
+
+struct virtio_pmem {
+	struct virtio_device *vdev;
+
+	/* Virtio pmem request queue */
+	struct virtqueue *req_vq;
+
+	/* nvdimm bus registers virtio pmem device */
+	struct nvdimm_bus *nvdimm_bus;
+	struct nvdimm_bus_descriptor nd_desc;
+
+	/* List to store deferred work if virtqueue is full */
+	struct list_head req_list;
+
+	/* Synchronize virtqueue data */
+	spinlock_t pmem_lock;
+
+	/* Memory region information */
+	uint64_t start;
+	uint64_t size;
+};
+
+void host_ack(struct virtqueue *vq);
+int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
+#endif
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index 6d5c3b2d4f4d..32b2f94d1f58 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -43,5 +43,6 @@
 #define VIRTIO_ID_INPUT        18 /* virtio input */
 #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
 #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
+#define VIRTIO_ID_PMEM         27 /* virtio pmem */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
new file mode 100644
index 000000000000..fa3f7d52717a
--- /dev/null
+++ b/include/uapi/linux/virtio_pmem.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
+#define _UAPI_LINUX_VIRTIO_PMEM_H
+
+struct virtio_pmem_config {
+	__le64 start;
+	__le64 size;
+};
+#endif
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: pagupta, jack, mst, jasowang, david, lcapitulino, adilger.kernel,
	zwisler, aarcange, dave.jiang, darrick.wong, vishal.l.verma,
	david, willy, hch, jmoyer, nilal, lenb, kilobyte, riel,
	yuval.shaia, stefanha, pbonzini, dan.j.williams, kwolf, tytso,
	xiaoguangrong.eric, cohuck, rjw, imammedo

This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
 drivers/virtio/Kconfig           |  10 +++
 drivers/virtio/Makefile          |   1 +
 drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
 include/linux/virtio_pmem.h      |  60 ++++++++++++++++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  10 +++
 7 files changed, 314 insertions(+)
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/virtio/pmem.c
 create mode 100644 include/linux/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
new file mode 100644
index 000000000000..66b582f751a3
--- /dev/null
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include <linux/virtio_pmem.h>
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+	unsigned int len;
+	unsigned long flags;
+	struct virtio_pmem_request *req, *req_buf;
+	struct virtio_pmem *vpmem = vq->vdev->priv;
+
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
+		req->done = true;
+		wake_up(&req->host_acked);
+
+		if (!list_empty(&vpmem->req_list)) {
+			req_buf = list_first_entry(&vpmem->req_list,
+					struct virtio_pmem_request, list);
+			list_del(&vpmem->req_list);
+			req_buf->wq_buf_avail = true;
+			wake_up(&req_buf->wq_buf);
+		}
+	}
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+	int err;
+	unsigned long flags;
+	struct scatterlist *sgs[2], sg, ret;
+	struct virtio_device *vdev = nd_region->provider_data;
+	struct virtio_pmem *vpmem = vdev->priv;
+	struct virtio_pmem_request *req;
+
+	might_sleep();
+	req = kmalloc(sizeof(*req), GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+
+	req->done = req->wq_buf_avail = false;
+	strcpy(req->name, "FLUSH");
+	init_waitqueue_head(&req->host_acked);
+	init_waitqueue_head(&req->wq_buf);
+	sg_init_one(&sg, req->name, strlen(req->name));
+	sgs[0] = &sg;
+	sg_init_one(&ret, &req->ret, sizeof(req->ret));
+	sgs[1] = &ret;
+
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
+	if (err) {
+		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
+
+		list_add_tail(&vpmem->req_list, &req->list);
+		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+		/* When host has read buffer, this completes via host_ack */
+		wait_event(req->wq_buf, req->wq_buf_avail);
+		spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	}
+	err = virtqueue_kick(vpmem->req_vq);
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+	if (!err) {
+		err = -EIO;
+		goto ret;
+	}
+	/* When host has read buffer, this completes via host_ack */
+	wait_event(req->host_acked, req->done);
+	err = req->ret;
+ret:
+	kfree(req);
+	return err;
+};
+
+ /* The asynchronous flush callback function */
+int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
+{
+	int rc = 0;
+
+	/* Create child bio for asynchronous flush and chain with
+	 * parent bio. Otherwise directly call nd_region flush.
+	 */
+	if (bio && bio->bi_iter.bi_sector != -1) {
+		struct bio *child = bio_alloc(GFP_ATOMIC, 0);
+
+		if (!child)
+			return -ENOMEM;
+		bio_copy_dev(child, bio);
+		child->bi_opf = REQ_PREFLUSH;
+		child->bi_iter.bi_sector = -1;
+		bio_chain(child, bio);
+		submit_bio(child);
+	} else {
+		if (virtio_pmem_flush(nd_region))
+			rc = -EIO;
+	}
+
+	return rc;
+};
+EXPORT_SYMBOL_GPL(async_pmem_flush);
+MODULE_LICENSE("GPL");
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 35897649c24f..9f634a2ed638 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
 
 	  If unsure, say Y.
 
+config VIRTIO_PMEM
+	tristate "Support for virtio pmem driver"
+	depends on VIRTIO
+	depends on LIBNVDIMM
+	help
+	This driver provides support for virtio based flushing interface
+	for persistent memory range.
+
+	If unsure, say M.
+
 config VIRTIO_BALLOON
 	tristate "Virtio balloon driver"
 	depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 3a2b5c5dcf46..143ce91eabe9 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
+obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
new file mode 100644
index 000000000000..309788628e41
--- /dev/null
+++ b/drivers/virtio/pmem.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and registers the virtual pmem device
+ * with libnvdimm core.
+ */
+#include <linux/virtio_pmem.h>
+#include <../../drivers/nvdimm/nd.h>
+
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+ /* Initialize virt queue */
+static int init_vq(struct virtio_pmem *vpmem)
+{
+	/* single vq */
+	vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
+				host_ack, "flush_queue");
+	if (IS_ERR(vpmem->req_vq))
+		return PTR_ERR(vpmem->req_vq);
+
+	spin_lock_init(&vpmem->pmem_lock);
+	INIT_LIST_HEAD(&vpmem->req_list);
+
+	return 0;
+};
+
+static int virtio_pmem_probe(struct virtio_device *vdev)
+{
+	int err = 0;
+	struct resource res;
+	struct virtio_pmem *vpmem;
+	struct nd_region_desc ndr_desc = {};
+	int nid = dev_to_node(&vdev->dev);
+	struct nd_region *nd_region;
+
+	if (!vdev->config->get) {
+		dev_err(&vdev->dev, "%s failure: config access disabled\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
+	if (!vpmem) {
+		err = -ENOMEM;
+		goto out_err;
+	}
+
+	vpmem->vdev = vdev;
+	vdev->priv = vpmem;
+	err = init_vq(vpmem);
+	if (err)
+		goto out_err;
+
+	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
+			start, &vpmem->start);
+	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
+			size, &vpmem->size);
+
+	res.start = vpmem->start;
+	res.end   = vpmem->start + vpmem->size-1;
+	vpmem->nd_desc.provider_name = "virtio-pmem";
+	vpmem->nd_desc.module = THIS_MODULE;
+
+	vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
+						&vpmem->nd_desc);
+	if (!vpmem->nvdimm_bus)
+		goto out_vq;
+
+	dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
+
+	ndr_desc.res = &res;
+	ndr_desc.numa_node = nid;
+	ndr_desc.flush = async_pmem_flush;
+	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
+	set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
+	nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
+
+	if (!nd_region)
+		goto out_nd;
+	nd_region->provider_data =  dev_to_virtio
+					(nd_region->dev.parent->parent);
+	return 0;
+out_nd:
+	err = -ENXIO;
+	nvdimm_bus_unregister(vpmem->nvdimm_bus);
+out_vq:
+	vdev->config->del_vqs(vdev);
+out_err:
+	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
+	return err;
+}
+
+static void virtio_pmem_remove(struct virtio_device *vdev)
+{
+	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
+
+	nvdimm_bus_unregister(nvdimm_bus);
+	vdev->config->del_vqs(vdev);
+	vdev->config->reset(vdev);
+}
+
+static struct virtio_driver virtio_pmem_driver = {
+	.driver.name		= KBUILD_MODNAME,
+	.driver.owner		= THIS_MODULE,
+	.id_table		= id_table,
+	.probe			= virtio_pmem_probe,
+	.remove			= virtio_pmem_remove,
+};
+
+module_virtio_driver(virtio_pmem_driver);
+MODULE_DEVICE_TABLE(virtio, id_table);
+MODULE_DESCRIPTION("Virtio pmem driver");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
new file mode 100644
index 000000000000..ab1da877575d
--- /dev/null
+++ b/include/linux/virtio_pmem.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * virtio_pmem.h: virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ **/
+
+#ifndef _LINUX_VIRTIO_PMEM_H
+#define _LINUX_VIRTIO_PMEM_H
+
+#include <linux/virtio_ids.h>
+#include <linux/module.h>
+#include <linux/virtio_config.h>
+#include <uapi/linux/virtio_pmem.h>
+#include <linux/libnvdimm.h>
+#include <linux/spinlock.h>
+
+struct virtio_pmem_request {
+	/* Host return status corresponding to flush request */
+	int ret;
+
+	/* command name*/
+	char name[16];
+
+	/* Wait queue to process deferred work after ack from host */
+	wait_queue_head_t host_acked;
+	bool done;
+
+	/* Wait queue to process deferred work after virt queue buffer avail */
+	wait_queue_head_t wq_buf;
+	bool wq_buf_avail;
+	struct list_head list;
+};
+
+struct virtio_pmem {
+	struct virtio_device *vdev;
+
+	/* Virtio pmem request queue */
+	struct virtqueue *req_vq;
+
+	/* nvdimm bus registers virtio pmem device */
+	struct nvdimm_bus *nvdimm_bus;
+	struct nvdimm_bus_descriptor nd_desc;
+
+	/* List to store deferred work if virtqueue is full */
+	struct list_head req_list;
+
+	/* Synchronize virtqueue data */
+	spinlock_t pmem_lock;
+
+	/* Memory region information */
+	uint64_t start;
+	uint64_t size;
+};
+
+void host_ack(struct virtqueue *vq);
+int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
+#endif
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index 6d5c3b2d4f4d..32b2f94d1f58 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -43,5 +43,6 @@
 #define VIRTIO_ID_INPUT        18 /* virtio input */
 #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
 #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
+#define VIRTIO_ID_PMEM         27 /* virtio pmem */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
new file mode 100644
index 000000000000..fa3f7d52717a
--- /dev/null
+++ b/include/uapi/linux/virtio_pmem.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
+#define _UAPI_LINUX_VIRTIO_PMEM_H
+
+struct virtio_pmem_config {
+	__le64 start;
+	__le64 size;
+};
+#endif
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-04-26  5:00 ` Pankaj Gupta
                   ` (4 preceding siblings ...)
  (?)
@ 2019-04-26  5:00 ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: pagupta, jack, mst, david, lcapitulino, adilger.kernel, zwisler,
	aarcange, dave.jiang, darrick.wong, vishal.l.verma, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, yuval.shaia, stefanha,
	pbonzini, dan.j.williams, tytso, xiaoguangrong.eric, cohuck, rjw,
	imammedo

This patch adds virtio-pmem driver for KVM guest.

Guest reads the persistent memory range information from
Qemu over VIRTIO and registers it on nvdimm_bus. It also
creates a nd_region object with the persistent memory
range information so that existing 'nvdimm/pmem' driver
can reserve this into system memory map. This way
'virtio-pmem' driver uses existing functionality of pmem
driver to register persistent memory compatible for DAX
capable filesystems.

This also provides function to perform guest flush over
VIRTIO from 'pmem' driver when userspace performs flush
on DAX memory range.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
 drivers/virtio/Kconfig           |  10 +++
 drivers/virtio/Makefile          |   1 +
 drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
 include/linux/virtio_pmem.h      |  60 ++++++++++++++++
 include/uapi/linux/virtio_ids.h  |   1 +
 include/uapi/linux/virtio_pmem.h |  10 +++
 7 files changed, 314 insertions(+)
 create mode 100644 drivers/nvdimm/virtio_pmem.c
 create mode 100644 drivers/virtio/pmem.c
 create mode 100644 include/linux/virtio_pmem.h
 create mode 100644 include/uapi/linux/virtio_pmem.h

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
new file mode 100644
index 000000000000..66b582f751a3
--- /dev/null
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ */
+#include <linux/virtio_pmem.h>
+#include "nd.h"
+
+ /* The interrupt handler */
+void host_ack(struct virtqueue *vq)
+{
+	unsigned int len;
+	unsigned long flags;
+	struct virtio_pmem_request *req, *req_buf;
+	struct virtio_pmem *vpmem = vq->vdev->priv;
+
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
+		req->done = true;
+		wake_up(&req->host_acked);
+
+		if (!list_empty(&vpmem->req_list)) {
+			req_buf = list_first_entry(&vpmem->req_list,
+					struct virtio_pmem_request, list);
+			list_del(&vpmem->req_list);
+			req_buf->wq_buf_avail = true;
+			wake_up(&req_buf->wq_buf);
+		}
+	}
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+}
+EXPORT_SYMBOL_GPL(host_ack);
+
+ /* The request submission function */
+int virtio_pmem_flush(struct nd_region *nd_region)
+{
+	int err;
+	unsigned long flags;
+	struct scatterlist *sgs[2], sg, ret;
+	struct virtio_device *vdev = nd_region->provider_data;
+	struct virtio_pmem *vpmem = vdev->priv;
+	struct virtio_pmem_request *req;
+
+	might_sleep();
+	req = kmalloc(sizeof(*req), GFP_KERNEL);
+	if (!req)
+		return -ENOMEM;
+
+	req->done = req->wq_buf_avail = false;
+	strcpy(req->name, "FLUSH");
+	init_waitqueue_head(&req->host_acked);
+	init_waitqueue_head(&req->wq_buf);
+	sg_init_one(&sg, req->name, strlen(req->name));
+	sgs[0] = &sg;
+	sg_init_one(&ret, &req->ret, sizeof(req->ret));
+	sgs[1] = &ret;
+
+	spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
+	if (err) {
+		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
+
+		list_add_tail(&vpmem->req_list, &req->list);
+		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+		/* When host has read buffer, this completes via host_ack */
+		wait_event(req->wq_buf, req->wq_buf_avail);
+		spin_lock_irqsave(&vpmem->pmem_lock, flags);
+	}
+	err = virtqueue_kick(vpmem->req_vq);
+	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+
+	if (!err) {
+		err = -EIO;
+		goto ret;
+	}
+	/* When host has read buffer, this completes via host_ack */
+	wait_event(req->host_acked, req->done);
+	err = req->ret;
+ret:
+	kfree(req);
+	return err;
+};
+
+ /* The asynchronous flush callback function */
+int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
+{
+	int rc = 0;
+
+	/* Create child bio for asynchronous flush and chain with
+	 * parent bio. Otherwise directly call nd_region flush.
+	 */
+	if (bio && bio->bi_iter.bi_sector != -1) {
+		struct bio *child = bio_alloc(GFP_ATOMIC, 0);
+
+		if (!child)
+			return -ENOMEM;
+		bio_copy_dev(child, bio);
+		child->bi_opf = REQ_PREFLUSH;
+		child->bi_iter.bi_sector = -1;
+		bio_chain(child, bio);
+		submit_bio(child);
+	} else {
+		if (virtio_pmem_flush(nd_region))
+			rc = -EIO;
+	}
+
+	return rc;
+};
+EXPORT_SYMBOL_GPL(async_pmem_flush);
+MODULE_LICENSE("GPL");
diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
index 35897649c24f..9f634a2ed638 100644
--- a/drivers/virtio/Kconfig
+++ b/drivers/virtio/Kconfig
@@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
 
 	  If unsure, say Y.
 
+config VIRTIO_PMEM
+	tristate "Support for virtio pmem driver"
+	depends on VIRTIO
+	depends on LIBNVDIMM
+	help
+	This driver provides support for virtio based flushing interface
+	for persistent memory range.
+
+	If unsure, say M.
+
 config VIRTIO_BALLOON
 	tristate "Virtio balloon driver"
 	depends on VIRTIO
diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
index 3a2b5c5dcf46..143ce91eabe9 100644
--- a/drivers/virtio/Makefile
+++ b/drivers/virtio/Makefile
@@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
+obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
new file mode 100644
index 000000000000..309788628e41
--- /dev/null
+++ b/drivers/virtio/pmem.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * virtio_pmem.c: Virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and registers the virtual pmem device
+ * with libnvdimm core.
+ */
+#include <linux/virtio_pmem.h>
+#include <../../drivers/nvdimm/nd.h>
+
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
+	{ 0 },
+};
+
+ /* Initialize virt queue */
+static int init_vq(struct virtio_pmem *vpmem)
+{
+	/* single vq */
+	vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
+				host_ack, "flush_queue");
+	if (IS_ERR(vpmem->req_vq))
+		return PTR_ERR(vpmem->req_vq);
+
+	spin_lock_init(&vpmem->pmem_lock);
+	INIT_LIST_HEAD(&vpmem->req_list);
+
+	return 0;
+};
+
+static int virtio_pmem_probe(struct virtio_device *vdev)
+{
+	int err = 0;
+	struct resource res;
+	struct virtio_pmem *vpmem;
+	struct nd_region_desc ndr_desc = {};
+	int nid = dev_to_node(&vdev->dev);
+	struct nd_region *nd_region;
+
+	if (!vdev->config->get) {
+		dev_err(&vdev->dev, "%s failure: config access disabled\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
+	if (!vpmem) {
+		err = -ENOMEM;
+		goto out_err;
+	}
+
+	vpmem->vdev = vdev;
+	vdev->priv = vpmem;
+	err = init_vq(vpmem);
+	if (err)
+		goto out_err;
+
+	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
+			start, &vpmem->start);
+	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
+			size, &vpmem->size);
+
+	res.start = vpmem->start;
+	res.end   = vpmem->start + vpmem->size-1;
+	vpmem->nd_desc.provider_name = "virtio-pmem";
+	vpmem->nd_desc.module = THIS_MODULE;
+
+	vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
+						&vpmem->nd_desc);
+	if (!vpmem->nvdimm_bus)
+		goto out_vq;
+
+	dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
+
+	ndr_desc.res = &res;
+	ndr_desc.numa_node = nid;
+	ndr_desc.flush = async_pmem_flush;
+	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
+	set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
+	nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
+
+	if (!nd_region)
+		goto out_nd;
+	nd_region->provider_data =  dev_to_virtio
+					(nd_region->dev.parent->parent);
+	return 0;
+out_nd:
+	err = -ENXIO;
+	nvdimm_bus_unregister(vpmem->nvdimm_bus);
+out_vq:
+	vdev->config->del_vqs(vdev);
+out_err:
+	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
+	return err;
+}
+
+static void virtio_pmem_remove(struct virtio_device *vdev)
+{
+	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
+
+	nvdimm_bus_unregister(nvdimm_bus);
+	vdev->config->del_vqs(vdev);
+	vdev->config->reset(vdev);
+}
+
+static struct virtio_driver virtio_pmem_driver = {
+	.driver.name		= KBUILD_MODNAME,
+	.driver.owner		= THIS_MODULE,
+	.id_table		= id_table,
+	.probe			= virtio_pmem_probe,
+	.remove			= virtio_pmem_remove,
+};
+
+module_virtio_driver(virtio_pmem_driver);
+MODULE_DEVICE_TABLE(virtio, id_table);
+MODULE_DESCRIPTION("Virtio pmem driver");
+MODULE_LICENSE("GPL");
diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
new file mode 100644
index 000000000000..ab1da877575d
--- /dev/null
+++ b/include/linux/virtio_pmem.h
@@ -0,0 +1,60 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * virtio_pmem.h: virtio pmem Driver
+ *
+ * Discovers persistent memory range information
+ * from host and provides a virtio based flushing
+ * interface.
+ **/
+
+#ifndef _LINUX_VIRTIO_PMEM_H
+#define _LINUX_VIRTIO_PMEM_H
+
+#include <linux/virtio_ids.h>
+#include <linux/module.h>
+#include <linux/virtio_config.h>
+#include <uapi/linux/virtio_pmem.h>
+#include <linux/libnvdimm.h>
+#include <linux/spinlock.h>
+
+struct virtio_pmem_request {
+	/* Host return status corresponding to flush request */
+	int ret;
+
+	/* command name*/
+	char name[16];
+
+	/* Wait queue to process deferred work after ack from host */
+	wait_queue_head_t host_acked;
+	bool done;
+
+	/* Wait queue to process deferred work after virt queue buffer avail */
+	wait_queue_head_t wq_buf;
+	bool wq_buf_avail;
+	struct list_head list;
+};
+
+struct virtio_pmem {
+	struct virtio_device *vdev;
+
+	/* Virtio pmem request queue */
+	struct virtqueue *req_vq;
+
+	/* nvdimm bus registers virtio pmem device */
+	struct nvdimm_bus *nvdimm_bus;
+	struct nvdimm_bus_descriptor nd_desc;
+
+	/* List to store deferred work if virtqueue is full */
+	struct list_head req_list;
+
+	/* Synchronize virtqueue data */
+	spinlock_t pmem_lock;
+
+	/* Memory region information */
+	uint64_t start;
+	uint64_t size;
+};
+
+void host_ack(struct virtqueue *vq);
+int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
+#endif
diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
index 6d5c3b2d4f4d..32b2f94d1f58 100644
--- a/include/uapi/linux/virtio_ids.h
+++ b/include/uapi/linux/virtio_ids.h
@@ -43,5 +43,6 @@
 #define VIRTIO_ID_INPUT        18 /* virtio input */
 #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
 #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
+#define VIRTIO_ID_PMEM         27 /* virtio pmem */
 
 #endif /* _LINUX_VIRTIO_IDS_H */
diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
new file mode 100644
index 000000000000..fa3f7d52717a
--- /dev/null
+++ b/include/uapi/linux/virtio_pmem.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
+#define _UAPI_LINUX_VIRTIO_PMEM_H
+
+struct virtio_pmem_config {
+	__le64 start;
+	__le64 size;
+};
+#endif
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 3/6] libnvdimm: add dax_dev sync flag
  2019-04-26  5:00 ` Pankaj Gupta
  (?)
  (?)
@ 2019-04-26  5:00   ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: jack, mst, jasowang, david, lcapitulino, adilger.kernel, zwisler,
	aarcange, darrick.wong, david, willy, hch, nilal, lenb, kilobyte,
	riel, yuval.shaia, stefanha, pbonzini, kwolf, tytso,
	xiaoguangrong.eric, cohuck, rjw, imammedo

This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/dax/bus.c            |  2 +-
 drivers/dax/super.c          | 13 ++++++++++++-
 drivers/md/dm.c              |  3 ++-
 drivers/nvdimm/pmem.c        |  3 ++-
 drivers/nvdimm/region_devs.c |  7 +++++++
 include/linux/dax.h          |  8 ++++++--
 include/linux/libnvdimm.h    |  1 +
 7 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2109cfe80219..5f184e751c82 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region *dax_region, int id,
 	 * No 'host' or dax_operations since there is no access to this
 	 * device outside of mmap of the resulting character device.
 	 */
-	dax_dev = alloc_dax(dev_dax, NULL, NULL);
+	dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
 	if (!dax_dev)
 		goto err;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 0a339b85133e..bd6509308d05 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -186,6 +186,8 @@ enum dax_device_flags {
 	DAXDEV_ALIVE,
 	/* gate whether dax_flush() calls the low level flush routine */
 	DAXDEV_WRITE_CACHE,
+	/* flag to check if device supports synchronous flush */
+	DAXDEV_SYNC,
 };
 
 /**
@@ -354,6 +356,12 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+bool dax_synchronous(struct dax_device *dax_dev)
+{
+	return test_bit(DAXDEV_SYNC, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_synchronous);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
 	lockdep_assert_held(&dax_srcu);
@@ -511,7 +519,7 @@ static void dax_add_host(struct dax_device *dax_dev, const char *host)
 }
 
 struct dax_device *alloc_dax(void *private, const char *__host,
-		const struct dax_operations *ops)
+		const struct dax_operations *ops, bool sync)
 {
 	struct dax_device *dax_dev;
 	const char *host;
@@ -534,6 +542,9 @@ struct dax_device *alloc_dax(void *private, const char *__host,
 	dax_add_host(dax_dev, host);
 	dax_dev->ops = ops;
 	dax_dev->private = private;
+	if (sync)
+		set_bit(DAXDEV_SYNC, &dax_dev->flags);
+
 	return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 68d24056d0b1..dd5266fb5471 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1965,7 +1965,8 @@ static struct mapped_device *alloc_dev(int minor)
 	sprintf(md->disk->disk_name, "dm-%d", minor);
 
 	if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
-		dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops);
+		dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops,
+							DAXDEV_F_SYNC);
 		if (!dax_dev)
 			goto bad;
 	}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f719245da170..34fa20381c05 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -466,7 +466,8 @@ static int pmem_attach_disk(struct device *dev,
 	nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_res);
 	disk->bb = &pmem->bb;
 
-	dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops);
+	dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops,
+					is_nvdimm_sync(nd_region));
 	if (!dax_dev) {
 		put_disk(disk);
 		return -ENOMEM;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index e5b59708865e..427cf28380c6 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1219,6 +1219,13 @@ int nvdimm_has_cache(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_cache);
 
+bool is_nvdimm_sync(struct nd_region *nd_region)
+{
+	return is_nd_pmem(&nd_region->dev) &&
+		!test_bit(ND_REGION_ASYNC, &nd_region->flags);
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+
 struct conflict_context {
 	struct nd_region *nd_region;
 	resource_size_t start, size;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a74a29..c97fc0cc7167 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,9 @@
 #include <linux/radix-tree.h>
 #include <asm/pgtable.h>
 
+/* Flag for synchronous flush */
+#define DAXDEV_F_SYNC true
+
 typedef unsigned long dax_entry_t;
 
 struct iomap_ops;
@@ -32,18 +35,19 @@ extern struct attribute_group dax_attribute_group;
 #if IS_ENABLED(CONFIG_DAX)
 struct dax_device *dax_get_by_host(const char *host);
 struct dax_device *alloc_dax(void *private, const char *host,
-		const struct dax_operations *ops);
+		const struct dax_operations *ops, bool sync);
 void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
+bool dax_synchronous(struct dax_device *dax_dev);
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
 	return NULL;
 }
 static inline struct dax_device *alloc_dax(void *private, const char *host,
-		const struct dax_operations *ops)
+		const struct dax_operations *ops, bool sync)
 {
 	/*
 	 * Callers should check IS_ENABLED(CONFIG_DAX) to know if this
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index a5f369ec3726..8ae6b65d67e2 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -270,6 +270,7 @@ int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm);
+bool is_nvdimm_sync(struct nd_region *nd_region);
 
 static inline int nvdimm_ctl(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
 		unsigned int buf_len, int *cmd_rc)
-- 
2.20.1

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 3/6] libnvdimm: add dax_dev sync flag
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: dan.j.williams, zwisler, vishal.l.verma, dave.jiang, mst,
	jasowang, willy, rjw, hch, lenb, jack, tytso, adilger.kernel,
	darrick.wong, lcapitulino, kwolf, imammedo, jmoyer, nilal, riel,
	stefanha, aarcange, david, david, cohuck, xiaoguangrong.eric,
	pbonzini, kilobyte, yuval.shaia, pagupta

This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/dax/bus.c            |  2 +-
 drivers/dax/super.c          | 13 ++++++++++++-
 drivers/md/dm.c              |  3 ++-
 drivers/nvdimm/pmem.c        |  3 ++-
 drivers/nvdimm/region_devs.c |  7 +++++++
 include/linux/dax.h          |  8 ++++++--
 include/linux/libnvdimm.h    |  1 +
 7 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2109cfe80219..5f184e751c82 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region *dax_region, int id,
 	 * No 'host' or dax_operations since there is no access to this
 	 * device outside of mmap of the resulting character device.
 	 */
-	dax_dev = alloc_dax(dev_dax, NULL, NULL);
+	dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
 	if (!dax_dev)
 		goto err;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 0a339b85133e..bd6509308d05 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -186,6 +186,8 @@ enum dax_device_flags {
 	DAXDEV_ALIVE,
 	/* gate whether dax_flush() calls the low level flush routine */
 	DAXDEV_WRITE_CACHE,
+	/* flag to check if device supports synchronous flush */
+	DAXDEV_SYNC,
 };
 
 /**
@@ -354,6 +356,12 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+bool dax_synchronous(struct dax_device *dax_dev)
+{
+	return test_bit(DAXDEV_SYNC, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_synchronous);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
 	lockdep_assert_held(&dax_srcu);
@@ -511,7 +519,7 @@ static void dax_add_host(struct dax_device *dax_dev, const char *host)
 }
 
 struct dax_device *alloc_dax(void *private, const char *__host,
-		const struct dax_operations *ops)
+		const struct dax_operations *ops, bool sync)
 {
 	struct dax_device *dax_dev;
 	const char *host;
@@ -534,6 +542,9 @@ struct dax_device *alloc_dax(void *private, const char *__host,
 	dax_add_host(dax_dev, host);
 	dax_dev->ops = ops;
 	dax_dev->private = private;
+	if (sync)
+		set_bit(DAXDEV_SYNC, &dax_dev->flags);
+
 	return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 68d24056d0b1..dd5266fb5471 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1965,7 +1965,8 @@ static struct mapped_device *alloc_dev(int minor)
 	sprintf(md->disk->disk_name, "dm-%d", minor);
 
 	if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
-		dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops);
+		dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops,
+							DAXDEV_F_SYNC);
 		if (!dax_dev)
 			goto bad;
 	}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f719245da170..34fa20381c05 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -466,7 +466,8 @@ static int pmem_attach_disk(struct device *dev,
 	nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_res);
 	disk->bb = &pmem->bb;
 
-	dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops);
+	dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops,
+					is_nvdimm_sync(nd_region));
 	if (!dax_dev) {
 		put_disk(disk);
 		return -ENOMEM;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index e5b59708865e..427cf28380c6 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1219,6 +1219,13 @@ int nvdimm_has_cache(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_cache);
 
+bool is_nvdimm_sync(struct nd_region *nd_region)
+{
+	return is_nd_pmem(&nd_region->dev) &&
+		!test_bit(ND_REGION_ASYNC, &nd_region->flags);
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+
 struct conflict_context {
 	struct nd_region *nd_region;
 	resource_size_t start, size;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a74a29..c97fc0cc7167 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,9 @@
 #include <linux/radix-tree.h>
 #include <asm/pgtable.h>
 
+/* Flag for synchronous flush */
+#define DAXDEV_F_SYNC true
+
 typedef unsigned long dax_entry_t;
 
 struct iomap_ops;
@@ -32,18 +35,19 @@ extern struct attribute_group dax_attribute_group;
 #if IS_ENABLED(CONFIG_DAX)
 struct dax_device *dax_get_by_host(const char *host);
 struct dax_device *alloc_dax(void *private, const char *host,
-		const struct dax_operations *ops);
+		const struct dax_operations *ops, bool sync);
 void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
+bool dax_synchronous(struct dax_device *dax_dev);
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
 	return NULL;
 }
 static inline struct dax_device *alloc_dax(void *private, const char *host,
-		const struct dax_operations *ops)
+		const struct dax_operations *ops, bool sync)
 {
 	/*
 	 * Callers should check IS_ENABLED(CONFIG_DAX) to know if this
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index a5f369ec3726..8ae6b65d67e2 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -270,6 +270,7 @@ int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm);
+bool is_nvdimm_sync(struct nd_region *nd_region);
 
 static inline int nvdimm_ctl(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
 		unsigned int buf_len, int *cmd_rc)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [Qemu-devel] [PATCH v7 3/6] libnvdimm: add dax_dev sync flag
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: dan.j.williams, zwisler, vishal.l.verma, dave.jiang, mst,
	jasowang, willy, rjw, hch, lenb, jack, tytso, adilger.kernel,
	darrick.wong, lcapitulino, kwolf, imammedo, jmoyer, nilal, riel,
	stefanha, aarcange, david, david, cohuck, xiaoguangrong.eric,
	pbonzini, kilobyte, yuval.shaia, pagupta

This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/dax/bus.c            |  2 +-
 drivers/dax/super.c          | 13 ++++++++++++-
 drivers/md/dm.c              |  3 ++-
 drivers/nvdimm/pmem.c        |  3 ++-
 drivers/nvdimm/region_devs.c |  7 +++++++
 include/linux/dax.h          |  8 ++++++--
 include/linux/libnvdimm.h    |  1 +
 7 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2109cfe80219..5f184e751c82 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region *dax_region, int id,
 	 * No 'host' or dax_operations since there is no access to this
 	 * device outside of mmap of the resulting character device.
 	 */
-	dax_dev = alloc_dax(dev_dax, NULL, NULL);
+	dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
 	if (!dax_dev)
 		goto err;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 0a339b85133e..bd6509308d05 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -186,6 +186,8 @@ enum dax_device_flags {
 	DAXDEV_ALIVE,
 	/* gate whether dax_flush() calls the low level flush routine */
 	DAXDEV_WRITE_CACHE,
+	/* flag to check if device supports synchronous flush */
+	DAXDEV_SYNC,
 };
 
 /**
@@ -354,6 +356,12 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+bool dax_synchronous(struct dax_device *dax_dev)
+{
+	return test_bit(DAXDEV_SYNC, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_synchronous);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
 	lockdep_assert_held(&dax_srcu);
@@ -511,7 +519,7 @@ static void dax_add_host(struct dax_device *dax_dev, const char *host)
 }
 
 struct dax_device *alloc_dax(void *private, const char *__host,
-		const struct dax_operations *ops)
+		const struct dax_operations *ops, bool sync)
 {
 	struct dax_device *dax_dev;
 	const char *host;
@@ -534,6 +542,9 @@ struct dax_device *alloc_dax(void *private, const char *__host,
 	dax_add_host(dax_dev, host);
 	dax_dev->ops = ops;
 	dax_dev->private = private;
+	if (sync)
+		set_bit(DAXDEV_SYNC, &dax_dev->flags);
+
 	return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 68d24056d0b1..dd5266fb5471 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1965,7 +1965,8 @@ static struct mapped_device *alloc_dev(int minor)
 	sprintf(md->disk->disk_name, "dm-%d", minor);
 
 	if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
-		dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops);
+		dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops,
+							DAXDEV_F_SYNC);
 		if (!dax_dev)
 			goto bad;
 	}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f719245da170..34fa20381c05 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -466,7 +466,8 @@ static int pmem_attach_disk(struct device *dev,
 	nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_res);
 	disk->bb = &pmem->bb;
 
-	dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops);
+	dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops,
+					is_nvdimm_sync(nd_region));
 	if (!dax_dev) {
 		put_disk(disk);
 		return -ENOMEM;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index e5b59708865e..427cf28380c6 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1219,6 +1219,13 @@ int nvdimm_has_cache(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_cache);
 
+bool is_nvdimm_sync(struct nd_region *nd_region)
+{
+	return is_nd_pmem(&nd_region->dev) &&
+		!test_bit(ND_REGION_ASYNC, &nd_region->flags);
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+
 struct conflict_context {
 	struct nd_region *nd_region;
 	resource_size_t start, size;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a74a29..c97fc0cc7167 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,9 @@
 #include <linux/radix-tree.h>
 #include <asm/pgtable.h>
 
+/* Flag for synchronous flush */
+#define DAXDEV_F_SYNC true
+
 typedef unsigned long dax_entry_t;
 
 struct iomap_ops;
@@ -32,18 +35,19 @@ extern struct attribute_group dax_attribute_group;
 #if IS_ENABLED(CONFIG_DAX)
 struct dax_device *dax_get_by_host(const char *host);
 struct dax_device *alloc_dax(void *private, const char *host,
-		const struct dax_operations *ops);
+		const struct dax_operations *ops, bool sync);
 void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
+bool dax_synchronous(struct dax_device *dax_dev);
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
 	return NULL;
 }
 static inline struct dax_device *alloc_dax(void *private, const char *host,
-		const struct dax_operations *ops)
+		const struct dax_operations *ops, bool sync)
 {
 	/*
 	 * Callers should check IS_ENABLED(CONFIG_DAX) to know if this
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index a5f369ec3726..8ae6b65d67e2 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -270,6 +270,7 @@ int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm);
+bool is_nvdimm_sync(struct nd_region *nd_region);
 
 static inline int nvdimm_ctl(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
 		unsigned int buf_len, int *cmd_rc)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [Qemu-devel] [PATCH v7 3/6] libnvdimm: add dax_dev sync flag
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: pagupta, jack, mst, jasowang, david, lcapitulino, adilger.kernel,
	zwisler, aarcange, dave.jiang, darrick.wong, vishal.l.verma,
	david, willy, hch, jmoyer, nilal, lenb, kilobyte, riel,
	yuval.shaia, stefanha, pbonzini, dan.j.williams, kwolf, tytso,
	xiaoguangrong.eric, cohuck, rjw, imammedo

This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/dax/bus.c            |  2 +-
 drivers/dax/super.c          | 13 ++++++++++++-
 drivers/md/dm.c              |  3 ++-
 drivers/nvdimm/pmem.c        |  3 ++-
 drivers/nvdimm/region_devs.c |  7 +++++++
 include/linux/dax.h          |  8 ++++++--
 include/linux/libnvdimm.h    |  1 +
 7 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2109cfe80219..5f184e751c82 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region *dax_region, int id,
 	 * No 'host' or dax_operations since there is no access to this
 	 * device outside of mmap of the resulting character device.
 	 */
-	dax_dev = alloc_dax(dev_dax, NULL, NULL);
+	dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
 	if (!dax_dev)
 		goto err;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 0a339b85133e..bd6509308d05 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -186,6 +186,8 @@ enum dax_device_flags {
 	DAXDEV_ALIVE,
 	/* gate whether dax_flush() calls the low level flush routine */
 	DAXDEV_WRITE_CACHE,
+	/* flag to check if device supports synchronous flush */
+	DAXDEV_SYNC,
 };
 
 /**
@@ -354,6 +356,12 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+bool dax_synchronous(struct dax_device *dax_dev)
+{
+	return test_bit(DAXDEV_SYNC, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_synchronous);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
 	lockdep_assert_held(&dax_srcu);
@@ -511,7 +519,7 @@ static void dax_add_host(struct dax_device *dax_dev, const char *host)
 }
 
 struct dax_device *alloc_dax(void *private, const char *__host,
-		const struct dax_operations *ops)
+		const struct dax_operations *ops, bool sync)
 {
 	struct dax_device *dax_dev;
 	const char *host;
@@ -534,6 +542,9 @@ struct dax_device *alloc_dax(void *private, const char *__host,
 	dax_add_host(dax_dev, host);
 	dax_dev->ops = ops;
 	dax_dev->private = private;
+	if (sync)
+		set_bit(DAXDEV_SYNC, &dax_dev->flags);
+
 	return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 68d24056d0b1..dd5266fb5471 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1965,7 +1965,8 @@ static struct mapped_device *alloc_dev(int minor)
 	sprintf(md->disk->disk_name, "dm-%d", minor);
 
 	if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
-		dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops);
+		dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops,
+							DAXDEV_F_SYNC);
 		if (!dax_dev)
 			goto bad;
 	}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f719245da170..34fa20381c05 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -466,7 +466,8 @@ static int pmem_attach_disk(struct device *dev,
 	nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_res);
 	disk->bb = &pmem->bb;
 
-	dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops);
+	dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops,
+					is_nvdimm_sync(nd_region));
 	if (!dax_dev) {
 		put_disk(disk);
 		return -ENOMEM;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index e5b59708865e..427cf28380c6 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1219,6 +1219,13 @@ int nvdimm_has_cache(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_cache);
 
+bool is_nvdimm_sync(struct nd_region *nd_region)
+{
+	return is_nd_pmem(&nd_region->dev) &&
+		!test_bit(ND_REGION_ASYNC, &nd_region->flags);
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+
 struct conflict_context {
 	struct nd_region *nd_region;
 	resource_size_t start, size;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a74a29..c97fc0cc7167 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,9 @@
 #include <linux/radix-tree.h>
 #include <asm/pgtable.h>
 
+/* Flag for synchronous flush */
+#define DAXDEV_F_SYNC true
+
 typedef unsigned long dax_entry_t;
 
 struct iomap_ops;
@@ -32,18 +35,19 @@ extern struct attribute_group dax_attribute_group;
 #if IS_ENABLED(CONFIG_DAX)
 struct dax_device *dax_get_by_host(const char *host);
 struct dax_device *alloc_dax(void *private, const char *host,
-		const struct dax_operations *ops);
+		const struct dax_operations *ops, bool sync);
 void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
+bool dax_synchronous(struct dax_device *dax_dev);
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
 	return NULL;
 }
 static inline struct dax_device *alloc_dax(void *private, const char *host,
-		const struct dax_operations *ops)
+		const struct dax_operations *ops, bool sync)
 {
 	/*
 	 * Callers should check IS_ENABLED(CONFIG_DAX) to know if this
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index a5f369ec3726..8ae6b65d67e2 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -270,6 +270,7 @@ int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm);
+bool is_nvdimm_sync(struct nd_region *nd_region);
 
 static inline int nvdimm_ctl(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
 		unsigned int buf_len, int *cmd_rc)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 3/6] libnvdimm: add dax_dev sync flag
  2019-04-26  5:00 ` Pankaj Gupta
                   ` (7 preceding siblings ...)
  (?)
@ 2019-04-26  5:00 ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: pagupta, jack, mst, david, lcapitulino, adilger.kernel, zwisler,
	aarcange, dave.jiang, darrick.wong, vishal.l.verma, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, yuval.shaia, stefanha,
	pbonzini, dan.j.williams, tytso, xiaoguangrong.eric, cohuck, rjw,
	imammedo

This patch adds 'DAXDEV_SYNC' flag which is set
for nd_region doing synchronous flush. This later
is used to disable MAP_SYNC functionality for
ext4 & xfs filesystem for devices don't support
synchronous flush.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 drivers/dax/bus.c            |  2 +-
 drivers/dax/super.c          | 13 ++++++++++++-
 drivers/md/dm.c              |  3 ++-
 drivers/nvdimm/pmem.c        |  3 ++-
 drivers/nvdimm/region_devs.c |  7 +++++++
 include/linux/dax.h          |  8 ++++++--
 include/linux/libnvdimm.h    |  1 +
 7 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/drivers/dax/bus.c b/drivers/dax/bus.c
index 2109cfe80219..5f184e751c82 100644
--- a/drivers/dax/bus.c
+++ b/drivers/dax/bus.c
@@ -388,7 +388,7 @@ struct dev_dax *__devm_create_dev_dax(struct dax_region *dax_region, int id,
 	 * No 'host' or dax_operations since there is no access to this
 	 * device outside of mmap of the resulting character device.
 	 */
-	dax_dev = alloc_dax(dev_dax, NULL, NULL);
+	dax_dev = alloc_dax(dev_dax, NULL, NULL, DAXDEV_F_SYNC);
 	if (!dax_dev)
 		goto err;
 
diff --git a/drivers/dax/super.c b/drivers/dax/super.c
index 0a339b85133e..bd6509308d05 100644
--- a/drivers/dax/super.c
+++ b/drivers/dax/super.c
@@ -186,6 +186,8 @@ enum dax_device_flags {
 	DAXDEV_ALIVE,
 	/* gate whether dax_flush() calls the low level flush routine */
 	DAXDEV_WRITE_CACHE,
+	/* flag to check if device supports synchronous flush */
+	DAXDEV_SYNC,
 };
 
 /**
@@ -354,6 +356,12 @@ bool dax_write_cache_enabled(struct dax_device *dax_dev)
 }
 EXPORT_SYMBOL_GPL(dax_write_cache_enabled);
 
+bool dax_synchronous(struct dax_device *dax_dev)
+{
+	return test_bit(DAXDEV_SYNC, &dax_dev->flags);
+}
+EXPORT_SYMBOL_GPL(dax_synchronous);
+
 bool dax_alive(struct dax_device *dax_dev)
 {
 	lockdep_assert_held(&dax_srcu);
@@ -511,7 +519,7 @@ static void dax_add_host(struct dax_device *dax_dev, const char *host)
 }
 
 struct dax_device *alloc_dax(void *private, const char *__host,
-		const struct dax_operations *ops)
+		const struct dax_operations *ops, bool sync)
 {
 	struct dax_device *dax_dev;
 	const char *host;
@@ -534,6 +542,9 @@ struct dax_device *alloc_dax(void *private, const char *__host,
 	dax_add_host(dax_dev, host);
 	dax_dev->ops = ops;
 	dax_dev->private = private;
+	if (sync)
+		set_bit(DAXDEV_SYNC, &dax_dev->flags);
+
 	return dax_dev;
 
  err_dev:
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 68d24056d0b1..dd5266fb5471 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1965,7 +1965,8 @@ static struct mapped_device *alloc_dev(int minor)
 	sprintf(md->disk->disk_name, "dm-%d", minor);
 
 	if (IS_ENABLED(CONFIG_DAX_DRIVER)) {
-		dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops);
+		dax_dev = alloc_dax(md, md->disk->disk_name, &dm_dax_ops,
+							DAXDEV_F_SYNC);
 		if (!dax_dev)
 			goto bad;
 	}
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index f719245da170..34fa20381c05 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -466,7 +466,8 @@ static int pmem_attach_disk(struct device *dev,
 	nvdimm_badblocks_populate(nd_region, &pmem->bb, &bb_res);
 	disk->bb = &pmem->bb;
 
-	dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops);
+	dax_dev = alloc_dax(pmem, disk->disk_name, &pmem_dax_ops,
+					is_nvdimm_sync(nd_region));
 	if (!dax_dev) {
 		put_disk(disk);
 		return -ENOMEM;
diff --git a/drivers/nvdimm/region_devs.c b/drivers/nvdimm/region_devs.c
index e5b59708865e..427cf28380c6 100644
--- a/drivers/nvdimm/region_devs.c
+++ b/drivers/nvdimm/region_devs.c
@@ -1219,6 +1219,13 @@ int nvdimm_has_cache(struct nd_region *nd_region)
 }
 EXPORT_SYMBOL_GPL(nvdimm_has_cache);
 
+bool is_nvdimm_sync(struct nd_region *nd_region)
+{
+	return is_nd_pmem(&nd_region->dev) &&
+		!test_bit(ND_REGION_ASYNC, &nd_region->flags);
+}
+EXPORT_SYMBOL_GPL(is_nvdimm_sync);
+
 struct conflict_context {
 	struct nd_region *nd_region;
 	resource_size_t start, size;
diff --git a/include/linux/dax.h b/include/linux/dax.h
index 0dd316a74a29..c97fc0cc7167 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -7,6 +7,9 @@
 #include <linux/radix-tree.h>
 #include <asm/pgtable.h>
 
+/* Flag for synchronous flush */
+#define DAXDEV_F_SYNC true
+
 typedef unsigned long dax_entry_t;
 
 struct iomap_ops;
@@ -32,18 +35,19 @@ extern struct attribute_group dax_attribute_group;
 #if IS_ENABLED(CONFIG_DAX)
 struct dax_device *dax_get_by_host(const char *host);
 struct dax_device *alloc_dax(void *private, const char *host,
-		const struct dax_operations *ops);
+		const struct dax_operations *ops, bool sync);
 void put_dax(struct dax_device *dax_dev);
 void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
+bool dax_synchronous(struct dax_device *dax_dev);
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
 	return NULL;
 }
 static inline struct dax_device *alloc_dax(void *private, const char *host,
-		const struct dax_operations *ops)
+		const struct dax_operations *ops, bool sync)
 {
 	/*
 	 * Callers should check IS_ENABLED(CONFIG_DAX) to know if this
diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
index a5f369ec3726..8ae6b65d67e2 100644
--- a/include/linux/libnvdimm.h
+++ b/include/linux/libnvdimm.h
@@ -270,6 +270,7 @@ int generic_nvdimm_flush(struct nd_region *nd_region);
 int nvdimm_has_flush(struct nd_region *nd_region);
 int nvdimm_has_cache(struct nd_region *nd_region);
 int nvdimm_in_overwrite(struct nvdimm *nvdimm);
+bool is_nvdimm_sync(struct nd_region *nd_region);
 
 static inline int nvdimm_ctl(struct nvdimm *nvdimm, unsigned int cmd, void *buf,
 		unsigned int buf_len, int *cmd_rc)
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 4/6]  dax: check synchronous mapping is supported
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: jack, mst, jasowang, david, lcapitulino, adilger.kernel, zwisler,
	aarcange, darrick.wong, david, willy, hch, nilal, lenb, kilobyte,
	riel, yuval.shaia, stefanha, pbonzini, kwolf, tytso,
	xiaoguangrong.eric, cohuck, rjw, imammedo

This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC. 

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 include/linux/dax.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index c97fc0cc7167..41b4a5db6305 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -41,6 +41,18 @@ void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
 bool dax_synchronous(struct dax_device *dax_dev);
+/*
+ * Check if given mapping is supported by the file / underlying device.
+ */
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+				struct dax_device *dax_dev)
+{
+	if (!(vma->vm_flags & VM_SYNC))
+		return true;
+	if (!IS_DAX(file_inode(vma->vm_file)))
+		return false;
+	return dax_synchronous(dax_dev);
+}
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -68,6 +80,11 @@ static inline bool dax_write_cache_enabled(struct dax_device *dax_dev)
 {
 	return false;
 }
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+				struct dax_device *dax_dev)
+{
+	return !(vma->flags & VM_SYNC);
+}
 #endif
 
 struct writeback_control;
-- 
2.20.1

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 4/6]  dax: check synchronous mapping is supported
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kvm-u79uwXL29TY76Z2rM5mHXA, linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA
  Cc: jack-AlSwsSmVLrQ, mst-H+wXaHxf7aLQT0dZR+AlfA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, david-FqsqvQoI3Ljby3iVrkZq2A,
	lcapitulino-H+wXaHxf7aLQT0dZR+AlfA,
	adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q,
	zwisler-DgEjT+Ai2ygdnm+yROfE0A, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	darrick.wong-QHcLZuEGTsvQT0dZR+AlfA,
	david-H+wXaHxf7aLQT0dZR+AlfA, willy-wEGCiKHe2LqWVfeAwA7xHQ,
	hch-wEGCiKHe2LqWVfeAwA7xHQ, nilal-H+wXaHxf7aLQT0dZR+AlfA,
	lenb-DgEjT+Ai2ygdnm+yROfE0A, kilobyte-b9QjgO8OEXPVItvQsEIGlw,
	riel-ebMLmSuQjDVBDgjK7y7TUQ, yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA,
	stefanha-H+wXaHxf7aLQT0dZR+AlfA, pbonzini-H+wXaHxf7aLQT0dZR+AlfA,
	kwolf-H+wXaHxf7aLQT0dZR+AlfA, tytso-3s7WtUTddSA,
	xiaoguangrong.eric-Re5JQEeQqe8AvxtiuMwx3w,
	cohuck-H+wXaHxf7aLQT0dZR+AlfA, rjw-LthD3rsA81gm4RdzfppkhA,
	imammedo-H+wXaHxf7aLQT0dZR+AlfA

This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC. 

Suggested-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
Signed-off-by: Pankaj Gupta <pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Reviewed-by: Jan Kara <jack-AlSwsSmVLrQ@public.gmane.org>
---
 include/linux/dax.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index c97fc0cc7167..41b4a5db6305 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -41,6 +41,18 @@ void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
 bool dax_synchronous(struct dax_device *dax_dev);
+/*
+ * Check if given mapping is supported by the file / underlying device.
+ */
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+				struct dax_device *dax_dev)
+{
+	if (!(vma->vm_flags & VM_SYNC))
+		return true;
+	if (!IS_DAX(file_inode(vma->vm_file)))
+		return false;
+	return dax_synchronous(dax_dev);
+}
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -68,6 +80,11 @@ static inline bool dax_write_cache_enabled(struct dax_device *dax_dev)
 {
 	return false;
 }
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+				struct dax_device *dax_dev)
+{
+	return !(vma->flags & VM_SYNC);
+}
 #endif
 
 struct writeback_control;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 4/6]  dax: check synchronous mapping is supported
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: dan.j.williams, zwisler, vishal.l.verma, dave.jiang, mst,
	jasowang, willy, rjw, hch, lenb, jack, tytso, adilger.kernel,
	darrick.wong, lcapitulino, kwolf, imammedo, jmoyer, nilal, riel,
	stefanha, aarcange, david, david, cohuck, xiaoguangrong.eric,
	pbonzini, kilobyte, yuval.shaia, pagupta

This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC. 

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 include/linux/dax.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index c97fc0cc7167..41b4a5db6305 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -41,6 +41,18 @@ void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
 bool dax_synchronous(struct dax_device *dax_dev);
+/*
+ * Check if given mapping is supported by the file / underlying device.
+ */
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+				struct dax_device *dax_dev)
+{
+	if (!(vma->vm_flags & VM_SYNC))
+		return true;
+	if (!IS_DAX(file_inode(vma->vm_file)))
+		return false;
+	return dax_synchronous(dax_dev);
+}
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -68,6 +80,11 @@ static inline bool dax_write_cache_enabled(struct dax_device *dax_dev)
 {
 	return false;
 }
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+				struct dax_device *dax_dev)
+{
+	return !(vma->flags & VM_SYNC);
+}
 #endif
 
 struct writeback_control;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [Qemu-devel] [PATCH v7 4/6] dax: check synchronous mapping is supported
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: dan.j.williams, zwisler, vishal.l.verma, dave.jiang, mst,
	jasowang, willy, rjw, hch, lenb, jack, tytso, adilger.kernel,
	darrick.wong, lcapitulino, kwolf, imammedo, jmoyer, nilal, riel,
	stefanha, aarcange, david, david, cohuck, xiaoguangrong.eric,
	pbonzini, kilobyte, yuval.shaia, pagupta

This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC. 

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 include/linux/dax.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index c97fc0cc7167..41b4a5db6305 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -41,6 +41,18 @@ void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
 bool dax_synchronous(struct dax_device *dax_dev);
+/*
+ * Check if given mapping is supported by the file / underlying device.
+ */
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+				struct dax_device *dax_dev)
+{
+	if (!(vma->vm_flags & VM_SYNC))
+		return true;
+	if (!IS_DAX(file_inode(vma->vm_file)))
+		return false;
+	return dax_synchronous(dax_dev);
+}
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -68,6 +80,11 @@ static inline bool dax_write_cache_enabled(struct dax_device *dax_dev)
 {
 	return false;
 }
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+				struct dax_device *dax_dev)
+{
+	return !(vma->flags & VM_SYNC);
+}
 #endif
 
 struct writeback_control;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [Qemu-devel] [PATCH v7 4/6] dax: check synchronous mapping is supported
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: pagupta, jack, mst, jasowang, david, lcapitulino, adilger.kernel,
	zwisler, aarcange, dave.jiang, darrick.wong, vishal.l.verma,
	david, willy, hch, jmoyer, nilal, lenb, kilobyte, riel,
	yuval.shaia, stefanha, pbonzini, dan.j.williams, kwolf, tytso,
	xiaoguangrong.eric, cohuck, rjw, imammedo

This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC. 

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 include/linux/dax.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index c97fc0cc7167..41b4a5db6305 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -41,6 +41,18 @@ void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
 bool dax_synchronous(struct dax_device *dax_dev);
+/*
+ * Check if given mapping is supported by the file / underlying device.
+ */
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+				struct dax_device *dax_dev)
+{
+	if (!(vma->vm_flags & VM_SYNC))
+		return true;
+	if (!IS_DAX(file_inode(vma->vm_file)))
+		return false;
+	return dax_synchronous(dax_dev);
+}
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -68,6 +80,11 @@ static inline bool dax_write_cache_enabled(struct dax_device *dax_dev)
 {
 	return false;
 }
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+				struct dax_device *dax_dev)
+{
+	return !(vma->flags & VM_SYNC);
+}
 #endif
 
 struct writeback_control;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 4/6]  dax: check synchronous mapping is supported
  2019-04-26  5:00 ` Pankaj Gupta
                   ` (8 preceding siblings ...)
  (?)
@ 2019-04-26  5:00 ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: pagupta, jack, mst, david, lcapitulino, adilger.kernel, zwisler,
	aarcange, dave.jiang, darrick.wong, vishal.l.verma, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, yuval.shaia, stefanha,
	pbonzini, dan.j.williams, tytso, xiaoguangrong.eric, cohuck, rjw,
	imammedo

This patch introduces 'daxdev_mapping_supported' helper
which checks if 'MAP_SYNC' is supported with filesystem
mapping. It also checks if corresponding dax_device is
synchronous. Virtio pmem device is asynchronous and
does not not support VM_SYNC. 

Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 include/linux/dax.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/linux/dax.h b/include/linux/dax.h
index c97fc0cc7167..41b4a5db6305 100644
--- a/include/linux/dax.h
+++ b/include/linux/dax.h
@@ -41,6 +41,18 @@ void kill_dax(struct dax_device *dax_dev);
 void dax_write_cache(struct dax_device *dax_dev, bool wc);
 bool dax_write_cache_enabled(struct dax_device *dax_dev);
 bool dax_synchronous(struct dax_device *dax_dev);
+/*
+ * Check if given mapping is supported by the file / underlying device.
+ */
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+				struct dax_device *dax_dev)
+{
+	if (!(vma->vm_flags & VM_SYNC))
+		return true;
+	if (!IS_DAX(file_inode(vma->vm_file)))
+		return false;
+	return dax_synchronous(dax_dev);
+}
 #else
 static inline struct dax_device *dax_get_by_host(const char *host)
 {
@@ -68,6 +80,11 @@ static inline bool dax_write_cache_enabled(struct dax_device *dax_dev)
 {
 	return false;
 }
+static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
+				struct dax_device *dax_dev)
+{
+	return !(vma->flags & VM_SYNC);
+}
 #endif
 
 struct writeback_control;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 5/6] ext4: disable map_sync for async flush
  2019-04-26  5:00 ` Pankaj Gupta
  (?)
  (?)
@ 2019-04-26  5:00   ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: jack, mst, jasowang, david, lcapitulino, adilger.kernel, zwisler,
	aarcange, darrick.wong, david, willy, hch, nilal, lenb, kilobyte,
	riel, yuval.shaia, stefanha, pbonzini, kwolf, tytso,
	xiaoguangrong.eric, cohuck, rjw, imammedo

Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and ext4.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/file.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 98ec11f69cd4..dee549339e13 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,15 +360,17 @@ static const struct vm_operations_struct ext4_file_vm_ops = {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct inode *inode = file->f_mapping->host;
+	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	struct dax_device *dax_dev = sbi->s_daxdev;
 
-	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
+	if (unlikely(ext4_forced_shutdown(sbi)))
 		return -EIO;
 
 	/*
-	 * We don't support synchronous mappings for non-DAX files. At least
-	 * until someone comes with a sensible use case.
+	 * We don't support synchronous mappings for non-DAX files and
+	 * for DAX files if underneath dax_device is not synchronous.
 	 */
-	if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
+	if (!daxdev_mapping_supported(vma, dax_dev))
 		return -EOPNOTSUPP;
 
 	file_accessed(file);
-- 
2.20.1

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 5/6] ext4: disable map_sync for async flush
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: dan.j.williams, zwisler, vishal.l.verma, dave.jiang, mst,
	jasowang, willy, rjw, hch, lenb, jack, tytso, adilger.kernel,
	darrick.wong, lcapitulino, kwolf, imammedo, jmoyer, nilal, riel,
	stefanha, aarcange, david, david, cohuck, xiaoguangrong.eric,
	pbonzini, kilobyte, yuval.shaia, pagupta

Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and ext4.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/file.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 98ec11f69cd4..dee549339e13 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,15 +360,17 @@ static const struct vm_operations_struct ext4_file_vm_ops = {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct inode *inode = file->f_mapping->host;
+	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	struct dax_device *dax_dev = sbi->s_daxdev;
 
-	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
+	if (unlikely(ext4_forced_shutdown(sbi)))
 		return -EIO;
 
 	/*
-	 * We don't support synchronous mappings for non-DAX files. At least
-	 * until someone comes with a sensible use case.
+	 * We don't support synchronous mappings for non-DAX files and
+	 * for DAX files if underneath dax_device is not synchronous.
 	 */
-	if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
+	if (!daxdev_mapping_supported(vma, dax_dev))
 		return -EOPNOTSUPP;
 
 	file_accessed(file);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [Qemu-devel] [PATCH v7 5/6] ext4: disable map_sync for async flush
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: dan.j.williams, zwisler, vishal.l.verma, dave.jiang, mst,
	jasowang, willy, rjw, hch, lenb, jack, tytso, adilger.kernel,
	darrick.wong, lcapitulino, kwolf, imammedo, jmoyer, nilal, riel,
	stefanha, aarcange, david, david, cohuck, xiaoguangrong.eric,
	pbonzini, kilobyte, yuval.shaia, pagupta

Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and ext4.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/file.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 98ec11f69cd4..dee549339e13 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,15 +360,17 @@ static const struct vm_operations_struct ext4_file_vm_ops = {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct inode *inode = file->f_mapping->host;
+	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	struct dax_device *dax_dev = sbi->s_daxdev;
 
-	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
+	if (unlikely(ext4_forced_shutdown(sbi)))
 		return -EIO;
 
 	/*
-	 * We don't support synchronous mappings for non-DAX files. At least
-	 * until someone comes with a sensible use case.
+	 * We don't support synchronous mappings for non-DAX files and
+	 * for DAX files if underneath dax_device is not synchronous.
 	 */
-	if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
+	if (!daxdev_mapping_supported(vma, dax_dev))
 		return -EOPNOTSUPP;
 
 	file_accessed(file);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [Qemu-devel] [PATCH v7 5/6] ext4: disable map_sync for async flush
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: pagupta, jack, mst, jasowang, david, lcapitulino, adilger.kernel,
	zwisler, aarcange, dave.jiang, darrick.wong, vishal.l.verma,
	david, willy, hch, jmoyer, nilal, lenb, kilobyte, riel,
	yuval.shaia, stefanha, pbonzini, dan.j.williams, kwolf, tytso,
	xiaoguangrong.eric, cohuck, rjw, imammedo

Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and ext4.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/file.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 98ec11f69cd4..dee549339e13 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,15 +360,17 @@ static const struct vm_operations_struct ext4_file_vm_ops = {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct inode *inode = file->f_mapping->host;
+	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	struct dax_device *dax_dev = sbi->s_daxdev;
 
-	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
+	if (unlikely(ext4_forced_shutdown(sbi)))
 		return -EIO;
 
 	/*
-	 * We don't support synchronous mappings for non-DAX files. At least
-	 * until someone comes with a sensible use case.
+	 * We don't support synchronous mappings for non-DAX files and
+	 * for DAX files if underneath dax_device is not synchronous.
 	 */
-	if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
+	if (!daxdev_mapping_supported(vma, dax_dev))
 		return -EOPNOTSUPP;
 
 	file_accessed(file);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 5/6] ext4: disable map_sync for async flush
  2019-04-26  5:00 ` Pankaj Gupta
                   ` (11 preceding siblings ...)
  (?)
@ 2019-04-26  5:00 ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: pagupta, jack, mst, david, lcapitulino, adilger.kernel, zwisler,
	aarcange, dave.jiang, darrick.wong, vishal.l.verma, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, yuval.shaia, stefanha,
	pbonzini, dan.j.williams, tytso, xiaoguangrong.eric, cohuck, rjw,
	imammedo

Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and ext4.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/file.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 98ec11f69cd4..dee549339e13 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -360,15 +360,17 @@ static const struct vm_operations_struct ext4_file_vm_ops = {
 static int ext4_file_mmap(struct file *file, struct vm_area_struct *vma)
 {
 	struct inode *inode = file->f_mapping->host;
+	struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
+	struct dax_device *dax_dev = sbi->s_daxdev;
 
-	if (unlikely(ext4_forced_shutdown(EXT4_SB(inode->i_sb))))
+	if (unlikely(ext4_forced_shutdown(sbi)))
 		return -EIO;
 
 	/*
-	 * We don't support synchronous mappings for non-DAX files. At least
-	 * until someone comes with a sensible use case.
+	 * We don't support synchronous mappings for non-DAX files and
+	 * for DAX files if underneath dax_device is not synchronous.
 	 */
-	if (!IS_DAX(file_inode(file)) && (vma->vm_flags & VM_SYNC))
+	if (!daxdev_mapping_supported(vma, dax_dev))
 		return -EOPNOTSUPP;
 
 	file_accessed(file);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 6/6] xfs: disable map_sync for async flush
  2019-04-26  5:00 ` Pankaj Gupta
  (?)
  (?)
@ 2019-04-26  5:00   ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: jack, mst, jasowang, david, lcapitulino, adilger.kernel, zwisler,
	aarcange, darrick.wong, david, willy, hch, nilal, lenb, kilobyte,
	riel, yuval.shaia, stefanha, pbonzini, kwolf, tytso,
	xiaoguangrong.eric, cohuck, rjw, imammedo

Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and xfs.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 fs/xfs/xfs_file.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a7ceae90110e..f17652cca5ff 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1203,11 +1203,14 @@ xfs_file_mmap(
 	struct file	*filp,
 	struct vm_area_struct *vma)
 {
+	struct dax_device 	*dax_dev;
+
+	dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
 	/*
-	 * We don't support synchronous mappings for non-DAX files. At least
-	 * until someone comes with a sensible use case.
+	 * We don't support synchronous mappings for non-DAX files and
+	 * for DAX files if underneath dax_device is not synchronous.
 	 */
-	if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
+	if (!daxdev_mapping_supported(vma, dax_dev))
 		return -EOPNOTSUPP;
 
 	file_accessed(filp);
-- 
2.20.1

_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 6/6] xfs: disable map_sync for async flush
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: dan.j.williams, zwisler, vishal.l.verma, dave.jiang, mst,
	jasowang, willy, rjw, hch, lenb, jack, tytso, adilger.kernel,
	darrick.wong, lcapitulino, kwolf, imammedo, jmoyer, nilal, riel,
	stefanha, aarcange, david, david, cohuck, xiaoguangrong.eric,
	pbonzini, kilobyte, yuval.shaia, pagupta

Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and xfs.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 fs/xfs/xfs_file.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a7ceae90110e..f17652cca5ff 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1203,11 +1203,14 @@ xfs_file_mmap(
 	struct file	*filp,
 	struct vm_area_struct *vma)
 {
+	struct dax_device 	*dax_dev;
+
+	dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
 	/*
-	 * We don't support synchronous mappings for non-DAX files. At least
-	 * until someone comes with a sensible use case.
+	 * We don't support synchronous mappings for non-DAX files and
+	 * for DAX files if underneath dax_device is not synchronous.
 	 */
-	if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
+	if (!daxdev_mapping_supported(vma, dax_dev))
 		return -EOPNOTSUPP;
 
 	file_accessed(filp);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [Qemu-devel] [PATCH v7 6/6] xfs: disable map_sync for async flush
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: dan.j.williams, zwisler, vishal.l.verma, dave.jiang, mst,
	jasowang, willy, rjw, hch, lenb, jack, tytso, adilger.kernel,
	darrick.wong, lcapitulino, kwolf, imammedo, jmoyer, nilal, riel,
	stefanha, aarcange, david, david, cohuck, xiaoguangrong.eric,
	pbonzini, kilobyte, yuval.shaia, pagupta

Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and xfs.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 fs/xfs/xfs_file.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a7ceae90110e..f17652cca5ff 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1203,11 +1203,14 @@ xfs_file_mmap(
 	struct file	*filp,
 	struct vm_area_struct *vma)
 {
+	struct dax_device 	*dax_dev;
+
+	dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
 	/*
-	 * We don't support synchronous mappings for non-DAX files. At least
-	 * until someone comes with a sensible use case.
+	 * We don't support synchronous mappings for non-DAX files and
+	 * for DAX files if underneath dax_device is not synchronous.
 	 */
-	if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
+	if (!daxdev_mapping_supported(vma, dax_dev))
 		return -EOPNOTSUPP;
 
 	file_accessed(filp);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [Qemu-devel] [PATCH v7 6/6] xfs: disable map_sync for async flush
@ 2019-04-26  5:00   ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: pagupta, jack, mst, jasowang, david, lcapitulino, adilger.kernel,
	zwisler, aarcange, dave.jiang, darrick.wong, vishal.l.verma,
	david, willy, hch, jmoyer, nilal, lenb, kilobyte, riel,
	yuval.shaia, stefanha, pbonzini, dan.j.williams, kwolf, tytso,
	xiaoguangrong.eric, cohuck, rjw, imammedo

Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and xfs.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 fs/xfs/xfs_file.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a7ceae90110e..f17652cca5ff 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1203,11 +1203,14 @@ xfs_file_mmap(
 	struct file	*filp,
 	struct vm_area_struct *vma)
 {
+	struct dax_device 	*dax_dev;
+
+	dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
 	/*
-	 * We don't support synchronous mappings for non-DAX files. At least
-	 * until someone comes with a sensible use case.
+	 * We don't support synchronous mappings for non-DAX files and
+	 * for DAX files if underneath dax_device is not synchronous.
 	 */
-	if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
+	if (!daxdev_mapping_supported(vma, dax_dev))
 		return -EOPNOTSUPP;
 
 	file_accessed(filp);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 107+ messages in thread

* [PATCH v7 6/6] xfs: disable map_sync for async flush
  2019-04-26  5:00 ` Pankaj Gupta
                   ` (13 preceding siblings ...)
  (?)
@ 2019-04-26  5:00 ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-26  5:00 UTC (permalink / raw)
  To: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: pagupta, jack, mst, david, lcapitulino, adilger.kernel, zwisler,
	aarcange, dave.jiang, darrick.wong, vishal.l.verma, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, yuval.shaia, stefanha,
	pbonzini, dan.j.williams, tytso, xiaoguangrong.eric, cohuck, rjw,
	imammedo

Dont support 'MAP_SYNC' with non-DAX files and DAX files
with asynchronous dax_device. Virtio pmem provides
asynchronous host page cache flush mechanism. We don't
support 'MAP_SYNC' with virtio pmem and xfs.

Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
---
 fs/xfs/xfs_file.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index a7ceae90110e..f17652cca5ff 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -1203,11 +1203,14 @@ xfs_file_mmap(
 	struct file	*filp,
 	struct vm_area_struct *vma)
 {
+	struct dax_device 	*dax_dev;
+
+	dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
 	/*
-	 * We don't support synchronous mappings for non-DAX files. At least
-	 * until someone comes with a sensible use case.
+	 * We don't support synchronous mappings for non-DAX files and
+	 * for DAX files if underneath dax_device is not synchronous.
 	 */
-	if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
+	if (!daxdev_mapping_supported(vma, dax_dev))
 		return -EOPNOTSUPP;
 
 	file_accessed(filp);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-04-30  5:53       ` Yuval Shaia
  0 siblings, 0 replies; 107+ messages in thread
From: Yuval Shaia @ 2019-04-30  5:53 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck-H+wXaHxf7aLQT0dZR+AlfA, jack-AlSwsSmVLrQ,
	kvm-u79uwXL29TY76Z2rM5mHXA, mst-H+wXaHxf7aLQT0dZR+AlfA,
	jasowang-H+wXaHxf7aLQT0dZR+AlfA, david-FqsqvQoI3Ljby3iVrkZq2A,
	qemu-devel-qX2TKyscuCcdnm+yROfE0A,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	adilger.kernel-m1MBpc4rdrD3fQ9qLvQP4Q,
	zwisler-DgEjT+Ai2ygdnm+yROfE0A, aarcange-H+wXaHxf7aLQT0dZR+AlfA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	david-H+wXaHxf7aLQT0dZR+AlfA, willy-wEGCiKHe2LqWVfeAwA7xHQ,
	hch-wEGCiKHe2LqWVfeAwA7xHQ, linux-acpi-u79uwXL29TY76Z2rM5mHXA,
	linux-ext4-u79uwXL29TY76Z2rM5mHXA, lenb-DgEjT+Ai2ygdnm+yROfE0A,
	kilobyte-b9QjgO8OEXPVItvQsEIGlw, riel-ebMLmSuQjDVBDgjK7y7TUQ,
	stefanha-H+wXaHxf7aLQT0dZR+AlfA, imammedo-H+wXaHxf7aLQT0dZR+AlfA,
	lcapitulino-H+wXaHxf7aLQT0dZR+AlfA, kwolf-H+wXaHxf7aLQT0dZR+AlfA,
	nilal-H+wXaHxf7aLQT0dZR+AlfA, tytso-3s7WtUTddSA,
	xiaoguangrong.eric-Re5JQEeQqe8AvxtiuMwx3w,
	darrick.wong-QHcLZuEGTsvQT0dZR+AlfA, rjw-LthD3rsA81gm4RdzfppkhA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-xfs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	pbonzini-H+wXaHxf7aLQT0dZR+AlfA

On Fri, Apr 26, 2019 at 10:30:35AM +0530, Pankaj Gupta wrote:
> This patch adds virtio-pmem driver for KVM guest.
> 
> Guest reads the persistent memory range information from
> Qemu over VIRTIO and registers it on nvdimm_bus. It also
> creates a nd_region object with the persistent memory
> range information so that existing 'nvdimm/pmem' driver
> can reserve this into system memory map. This way
> 'virtio-pmem' driver uses existing functionality of pmem
> driver to register persistent memory compatible for DAX
> capable filesystems.
> 
> This also provides function to perform guest flush over
> VIRTIO from 'pmem' driver when userspace performs flush
> on DAX memory range.
> 
> Signed-off-by: Pankaj Gupta <pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
>  drivers/virtio/Kconfig           |  10 +++
>  drivers/virtio/Makefile          |   1 +
>  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
>  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
>  include/uapi/linux/virtio_ids.h  |   1 +
>  include/uapi/linux/virtio_pmem.h |  10 +++
>  7 files changed, 314 insertions(+)
>  create mode 100644 drivers/nvdimm/virtio_pmem.c
>  create mode 100644 drivers/virtio/pmem.c
>  create mode 100644 include/linux/virtio_pmem.h
>  create mode 100644 include/uapi/linux/virtio_pmem.h
> 
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> new file mode 100644
> index 000000000000..66b582f751a3
> --- /dev/null
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + */
> +#include <linux/virtio_pmem.h>
> +#include "nd.h"
> +
> + /* The interrupt handler */
> +void host_ack(struct virtqueue *vq)
> +{
> +	unsigned int len;
> +	unsigned long flags;
> +	struct virtio_pmem_request *req, *req_buf;
> +	struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> +		req->done = true;
> +		wake_up(&req->host_acked);
> +
> +		if (!list_empty(&vpmem->req_list)) {
> +			req_buf = list_first_entry(&vpmem->req_list,
> +					struct virtio_pmem_request, list);
> +			list_del(&vpmem->req_list);
> +			req_buf->wq_buf_avail = true;
> +			wake_up(&req_buf->wq_buf);
> +		}
> +	}
> +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(host_ack);
> +
> + /* The request submission function */
> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> +	int err;
> +	unsigned long flags;
> +	struct scatterlist *sgs[2], sg, ret;
> +	struct virtio_device *vdev = nd_region->provider_data;
> +	struct virtio_pmem *vpmem = vdev->priv;
> +	struct virtio_pmem_request *req;
> +
> +	might_sleep();
> +	req = kmalloc(sizeof(*req), GFP_KERNEL);
> +	if (!req)
> +		return -ENOMEM;
> +
> +	req->done = req->wq_buf_avail = false;
> +	strcpy(req->name, "FLUSH");
> +	init_waitqueue_head(&req->host_acked);
> +	init_waitqueue_head(&req->wq_buf);
> +	sg_init_one(&sg, req->name, strlen(req->name));
> +	sgs[0] = &sg;
> +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> +	sgs[1] = &ret;
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> +	if (err) {
> +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +
> +		list_add_tail(&vpmem->req_list, &req->list);
> +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +		/* When host has read buffer, this completes via host_ack */
> +		wait_event(req->wq_buf, req->wq_buf_avail);
> +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	}
> +	err = virtqueue_kick(vpmem->req_vq);
> +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +	if (!err) {
> +		err = -EIO;
> +		goto ret;
> +	}
> +	/* When host has read buffer, this completes via host_ack */
> +	wait_event(req->host_acked, req->done);
> +	err = req->ret;
> +ret:
> +	kfree(req);
> +	return err;
> +};
> +
> + /* The asynchronous flush callback function */
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> +{
> +	int rc = 0;
> +
> +	/* Create child bio for asynchronous flush and chain with
> +	 * parent bio. Otherwise directly call nd_region flush.
> +	 */
> +	if (bio && bio->bi_iter.bi_sector != -1) {
> +		struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> +
> +		if (!child)
> +			return -ENOMEM;
> +		bio_copy_dev(child, bio);
> +		child->bi_opf = REQ_PREFLUSH;
> +		child->bi_iter.bi_sector = -1;
> +		bio_chain(child, bio);
> +		submit_bio(child);
> +	} else {
> +		if (virtio_pmem_flush(nd_region))
> +			rc = -EIO;
> +	}
> +
> +	return rc;
> +};
> +EXPORT_SYMBOL_GPL(async_pmem_flush);
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 35897649c24f..9f634a2ed638 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
>  
>  	  If unsure, say Y.
>  
> +config VIRTIO_PMEM
> +	tristate "Support for virtio pmem driver"
> +	depends on VIRTIO
> +	depends on LIBNVDIMM
> +	help
> +	This driver provides support for virtio based flushing interface
> +	for persistent memory range.
> +
> +	If unsure, say M.
> +
>  config VIRTIO_BALLOON
>  	tristate "Virtio balloon driver"
>  	depends on VIRTIO
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 3a2b5c5dcf46..143ce91eabe9 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> new file mode 100644
> index 000000000000..309788628e41
> --- /dev/null
> +++ b/drivers/virtio/pmem.c
> @@ -0,0 +1,118 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and registers the virtual pmem device
> + * with libnvdimm core.
> + */
> +#include <linux/virtio_pmem.h>
> +#include <../../drivers/nvdimm/nd.h>
> +
> +static struct virtio_device_id id_table[] = {
> +	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> +	{ 0 },
> +};
> +
> + /* Initialize virt queue */
> +static int init_vq(struct virtio_pmem *vpmem)
> +{
> +	/* single vq */
> +	vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> +				host_ack, "flush_queue");
> +	if (IS_ERR(vpmem->req_vq))
> +		return PTR_ERR(vpmem->req_vq);
> +
> +	spin_lock_init(&vpmem->pmem_lock);
> +	INIT_LIST_HEAD(&vpmem->req_list);
> +
> +	return 0;
> +};
> +
> +static int virtio_pmem_probe(struct virtio_device *vdev)
> +{
> +	int err = 0;
> +	struct resource res;
> +	struct virtio_pmem *vpmem;
> +	struct nd_region_desc ndr_desc = {};
> +	int nid = dev_to_node(&vdev->dev);
> +	struct nd_region *nd_region;
> +
> +	if (!vdev->config->get) {
> +		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> +			__func__);
> +		return -EINVAL;
> +	}
> +
> +	vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> +	if (!vpmem) {
> +		err = -ENOMEM;
> +		goto out_err;
> +	}
> +
> +	vpmem->vdev = vdev;
> +	vdev->priv = vpmem;
> +	err = init_vq(vpmem);
> +	if (err)
> +		goto out_err;
> +
> +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +			start, &vpmem->start);
> +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +			size, &vpmem->size);
> +
> +	res.start = vpmem->start;
> +	res.end   = vpmem->start + vpmem->size-1;
> +	vpmem->nd_desc.provider_name = "virtio-pmem";
> +	vpmem->nd_desc.module = THIS_MODULE;
> +
> +	vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> +						&vpmem->nd_desc);
> +	if (!vpmem->nvdimm_bus)
> +		goto out_vq;
> +
> +	dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> +
> +	ndr_desc.res = &res;
> +	ndr_desc.numa_node = nid;
> +	ndr_desc.flush = async_pmem_flush;
> +	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> +	set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> +	nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> +
> +	if (!nd_region)
> +		goto out_nd;
> +	nd_region->provider_data =  dev_to_virtio

Delete extra space here ----------^^
I think this will let you join the two lines.

> +					(nd_region->dev.parent->parent);
> +	return 0;
> +out_nd:
> +	err = -ENXIO;
> +	nvdimm_bus_unregister(vpmem->nvdimm_bus);
> +out_vq:
> +	vdev->config->del_vqs(vdev);
> +out_err:
> +	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> +	return err;
> +}
> +
> +static void virtio_pmem_remove(struct virtio_device *vdev)
> +{
> +	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> +
> +	nvdimm_bus_unregister(nvdimm_bus);
> +	vdev->config->del_vqs(vdev);
> +	vdev->config->reset(vdev);
> +}
> +
> +static struct virtio_driver virtio_pmem_driver = {
> +	.driver.name		= KBUILD_MODNAME,
> +	.driver.owner		= THIS_MODULE,
> +	.id_table		= id_table,
> +	.probe			= virtio_pmem_probe,
> +	.remove			= virtio_pmem_remove,
> +};
> +
> +module_virtio_driver(virtio_pmem_driver);
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +MODULE_DESCRIPTION("Virtio pmem driver");
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..ab1da877575d
> --- /dev/null
> +++ b/include/linux/virtio_pmem.h
> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * virtio_pmem.h: virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + **/
> +
> +#ifndef _LINUX_VIRTIO_PMEM_H
> +#define _LINUX_VIRTIO_PMEM_H
> +
> +#include <linux/virtio_ids.h>
> +#include <linux/module.h>
> +#include <linux/virtio_config.h>
> +#include <uapi/linux/virtio_pmem.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/spinlock.h>
> +
> +struct virtio_pmem_request {
> +	/* Host return status corresponding to flush request */
> +	int ret;
> +
> +	/* command name*/
> +	char name[16];
> +
> +	/* Wait queue to process deferred work after ack from host */
> +	wait_queue_head_t host_acked;
> +	bool done;
> +
> +	/* Wait queue to process deferred work after virt queue buffer avail */
> +	wait_queue_head_t wq_buf;
> +	bool wq_buf_avail;
> +	struct list_head list;
> +};
> +
> +struct virtio_pmem {
> +	struct virtio_device *vdev;
> +
> +	/* Virtio pmem request queue */
> +	struct virtqueue *req_vq;
> +
> +	/* nvdimm bus registers virtio pmem device */
> +	struct nvdimm_bus *nvdimm_bus;
> +	struct nvdimm_bus_descriptor nd_desc;
> +
> +	/* List to store deferred work if virtqueue is full */
> +	struct list_head req_list;
> +
> +	/* Synchronize virtqueue data */
> +	spinlock_t pmem_lock;
> +
> +	/* Memory region information */
> +	uint64_t start;
> +	uint64_t size;
> +};
> +
> +void host_ack(struct virtqueue *vq);
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> +#endif
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 6d5c3b2d4f4d..32b2f94d1f58 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -43,5 +43,6 @@
>  #define VIRTIO_ID_INPUT        18 /* virtio input */
>  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
>  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
>  
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..fa3f7d52717a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_pmem.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> +#define _UAPI_LINUX_VIRTIO_PMEM_H
> +
> +struct virtio_pmem_config {
> +	__le64 start;
> +	__le64 size;
> +};
> +#endif

Suggesting to fix the above minor formatting error.

With this:

Reviewed-by: Yuval Shaia <yuval.shaia-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

> -- 
> 2.20.1
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-04-30  5:53       ` Yuval Shaia
  0 siblings, 0 replies; 107+ messages in thread
From: Yuval Shaia @ 2019-04-30  5:53 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs, jack, mst,
	jasowang, david, lcapitulino, adilger.kernel, zwisler, aarcange,
	dave.jiang, darrick.wong, vishal.l.verma, david, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, stefanha, pbonzini,
	dan.j.williams, kwolf, tytso, xiaoguangrong.eric, cohuck, rjw,
	imammedo

On Fri, Apr 26, 2019 at 10:30:35AM +0530, Pankaj Gupta wrote:
> This patch adds virtio-pmem driver for KVM guest.
> 
> Guest reads the persistent memory range information from
> Qemu over VIRTIO and registers it on nvdimm_bus. It also
> creates a nd_region object with the persistent memory
> range information so that existing 'nvdimm/pmem' driver
> can reserve this into system memory map. This way
> 'virtio-pmem' driver uses existing functionality of pmem
> driver to register persistent memory compatible for DAX
> capable filesystems.
> 
> This also provides function to perform guest flush over
> VIRTIO from 'pmem' driver when userspace performs flush
> on DAX memory range.
> 
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
>  drivers/virtio/Kconfig           |  10 +++
>  drivers/virtio/Makefile          |   1 +
>  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
>  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
>  include/uapi/linux/virtio_ids.h  |   1 +
>  include/uapi/linux/virtio_pmem.h |  10 +++
>  7 files changed, 314 insertions(+)
>  create mode 100644 drivers/nvdimm/virtio_pmem.c
>  create mode 100644 drivers/virtio/pmem.c
>  create mode 100644 include/linux/virtio_pmem.h
>  create mode 100644 include/uapi/linux/virtio_pmem.h
> 
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> new file mode 100644
> index 000000000000..66b582f751a3
> --- /dev/null
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + */
> +#include <linux/virtio_pmem.h>
> +#include "nd.h"
> +
> + /* The interrupt handler */
> +void host_ack(struct virtqueue *vq)
> +{
> +	unsigned int len;
> +	unsigned long flags;
> +	struct virtio_pmem_request *req, *req_buf;
> +	struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> +		req->done = true;
> +		wake_up(&req->host_acked);
> +
> +		if (!list_empty(&vpmem->req_list)) {
> +			req_buf = list_first_entry(&vpmem->req_list,
> +					struct virtio_pmem_request, list);
> +			list_del(&vpmem->req_list);
> +			req_buf->wq_buf_avail = true;
> +			wake_up(&req_buf->wq_buf);
> +		}
> +	}
> +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(host_ack);
> +
> + /* The request submission function */
> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> +	int err;
> +	unsigned long flags;
> +	struct scatterlist *sgs[2], sg, ret;
> +	struct virtio_device *vdev = nd_region->provider_data;
> +	struct virtio_pmem *vpmem = vdev->priv;
> +	struct virtio_pmem_request *req;
> +
> +	might_sleep();
> +	req = kmalloc(sizeof(*req), GFP_KERNEL);
> +	if (!req)
> +		return -ENOMEM;
> +
> +	req->done = req->wq_buf_avail = false;
> +	strcpy(req->name, "FLUSH");
> +	init_waitqueue_head(&req->host_acked);
> +	init_waitqueue_head(&req->wq_buf);
> +	sg_init_one(&sg, req->name, strlen(req->name));
> +	sgs[0] = &sg;
> +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> +	sgs[1] = &ret;
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> +	if (err) {
> +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +
> +		list_add_tail(&vpmem->req_list, &req->list);
> +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +		/* When host has read buffer, this completes via host_ack */
> +		wait_event(req->wq_buf, req->wq_buf_avail);
> +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	}
> +	err = virtqueue_kick(vpmem->req_vq);
> +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +	if (!err) {
> +		err = -EIO;
> +		goto ret;
> +	}
> +	/* When host has read buffer, this completes via host_ack */
> +	wait_event(req->host_acked, req->done);
> +	err = req->ret;
> +ret:
> +	kfree(req);
> +	return err;
> +};
> +
> + /* The asynchronous flush callback function */
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> +{
> +	int rc = 0;
> +
> +	/* Create child bio for asynchronous flush and chain with
> +	 * parent bio. Otherwise directly call nd_region flush.
> +	 */
> +	if (bio && bio->bi_iter.bi_sector != -1) {
> +		struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> +
> +		if (!child)
> +			return -ENOMEM;
> +		bio_copy_dev(child, bio);
> +		child->bi_opf = REQ_PREFLUSH;
> +		child->bi_iter.bi_sector = -1;
> +		bio_chain(child, bio);
> +		submit_bio(child);
> +	} else {
> +		if (virtio_pmem_flush(nd_region))
> +			rc = -EIO;
> +	}
> +
> +	return rc;
> +};
> +EXPORT_SYMBOL_GPL(async_pmem_flush);
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 35897649c24f..9f634a2ed638 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
>  
>  	  If unsure, say Y.
>  
> +config VIRTIO_PMEM
> +	tristate "Support for virtio pmem driver"
> +	depends on VIRTIO
> +	depends on LIBNVDIMM
> +	help
> +	This driver provides support for virtio based flushing interface
> +	for persistent memory range.
> +
> +	If unsure, say M.
> +
>  config VIRTIO_BALLOON
>  	tristate "Virtio balloon driver"
>  	depends on VIRTIO
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 3a2b5c5dcf46..143ce91eabe9 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> new file mode 100644
> index 000000000000..309788628e41
> --- /dev/null
> +++ b/drivers/virtio/pmem.c
> @@ -0,0 +1,118 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and registers the virtual pmem device
> + * with libnvdimm core.
> + */
> +#include <linux/virtio_pmem.h>
> +#include <../../drivers/nvdimm/nd.h>
> +
> +static struct virtio_device_id id_table[] = {
> +	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> +	{ 0 },
> +};
> +
> + /* Initialize virt queue */
> +static int init_vq(struct virtio_pmem *vpmem)
> +{
> +	/* single vq */
> +	vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> +				host_ack, "flush_queue");
> +	if (IS_ERR(vpmem->req_vq))
> +		return PTR_ERR(vpmem->req_vq);
> +
> +	spin_lock_init(&vpmem->pmem_lock);
> +	INIT_LIST_HEAD(&vpmem->req_list);
> +
> +	return 0;
> +};
> +
> +static int virtio_pmem_probe(struct virtio_device *vdev)
> +{
> +	int err = 0;
> +	struct resource res;
> +	struct virtio_pmem *vpmem;
> +	struct nd_region_desc ndr_desc = {};
> +	int nid = dev_to_node(&vdev->dev);
> +	struct nd_region *nd_region;
> +
> +	if (!vdev->config->get) {
> +		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> +			__func__);
> +		return -EINVAL;
> +	}
> +
> +	vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> +	if (!vpmem) {
> +		err = -ENOMEM;
> +		goto out_err;
> +	}
> +
> +	vpmem->vdev = vdev;
> +	vdev->priv = vpmem;
> +	err = init_vq(vpmem);
> +	if (err)
> +		goto out_err;
> +
> +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +			start, &vpmem->start);
> +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +			size, &vpmem->size);
> +
> +	res.start = vpmem->start;
> +	res.end   = vpmem->start + vpmem->size-1;
> +	vpmem->nd_desc.provider_name = "virtio-pmem";
> +	vpmem->nd_desc.module = THIS_MODULE;
> +
> +	vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> +						&vpmem->nd_desc);
> +	if (!vpmem->nvdimm_bus)
> +		goto out_vq;
> +
> +	dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> +
> +	ndr_desc.res = &res;
> +	ndr_desc.numa_node = nid;
> +	ndr_desc.flush = async_pmem_flush;
> +	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> +	set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> +	nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> +
> +	if (!nd_region)
> +		goto out_nd;
> +	nd_region->provider_data =  dev_to_virtio

Delete extra space here ----------^^
I think this will let you join the two lines.

> +					(nd_region->dev.parent->parent);
> +	return 0;
> +out_nd:
> +	err = -ENXIO;
> +	nvdimm_bus_unregister(vpmem->nvdimm_bus);
> +out_vq:
> +	vdev->config->del_vqs(vdev);
> +out_err:
> +	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> +	return err;
> +}
> +
> +static void virtio_pmem_remove(struct virtio_device *vdev)
> +{
> +	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> +
> +	nvdimm_bus_unregister(nvdimm_bus);
> +	vdev->config->del_vqs(vdev);
> +	vdev->config->reset(vdev);
> +}
> +
> +static struct virtio_driver virtio_pmem_driver = {
> +	.driver.name		= KBUILD_MODNAME,
> +	.driver.owner		= THIS_MODULE,
> +	.id_table		= id_table,
> +	.probe			= virtio_pmem_probe,
> +	.remove			= virtio_pmem_remove,
> +};
> +
> +module_virtio_driver(virtio_pmem_driver);
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +MODULE_DESCRIPTION("Virtio pmem driver");
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..ab1da877575d
> --- /dev/null
> +++ b/include/linux/virtio_pmem.h
> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * virtio_pmem.h: virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + **/
> +
> +#ifndef _LINUX_VIRTIO_PMEM_H
> +#define _LINUX_VIRTIO_PMEM_H
> +
> +#include <linux/virtio_ids.h>
> +#include <linux/module.h>
> +#include <linux/virtio_config.h>
> +#include <uapi/linux/virtio_pmem.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/spinlock.h>
> +
> +struct virtio_pmem_request {
> +	/* Host return status corresponding to flush request */
> +	int ret;
> +
> +	/* command name*/
> +	char name[16];
> +
> +	/* Wait queue to process deferred work after ack from host */
> +	wait_queue_head_t host_acked;
> +	bool done;
> +
> +	/* Wait queue to process deferred work after virt queue buffer avail */
> +	wait_queue_head_t wq_buf;
> +	bool wq_buf_avail;
> +	struct list_head list;
> +};
> +
> +struct virtio_pmem {
> +	struct virtio_device *vdev;
> +
> +	/* Virtio pmem request queue */
> +	struct virtqueue *req_vq;
> +
> +	/* nvdimm bus registers virtio pmem device */
> +	struct nvdimm_bus *nvdimm_bus;
> +	struct nvdimm_bus_descriptor nd_desc;
> +
> +	/* List to store deferred work if virtqueue is full */
> +	struct list_head req_list;
> +
> +	/* Synchronize virtqueue data */
> +	spinlock_t pmem_lock;
> +
> +	/* Memory region information */
> +	uint64_t start;
> +	uint64_t size;
> +};
> +
> +void host_ack(struct virtqueue *vq);
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> +#endif
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 6d5c3b2d4f4d..32b2f94d1f58 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -43,5 +43,6 @@
>  #define VIRTIO_ID_INPUT        18 /* virtio input */
>  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
>  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
>  
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..fa3f7d52717a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_pmem.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> +#define _UAPI_LINUX_VIRTIO_PMEM_H
> +
> +struct virtio_pmem_config {
> +	__le64 start;
> +	__le64 size;
> +};
> +#endif

Suggesting to fix the above minor formatting error.

With this:

Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

> -- 
> 2.20.1
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-04-30  5:53       ` Yuval Shaia
  0 siblings, 0 replies; 107+ messages in thread
From: Yuval Shaia @ 2019-04-30  5:53 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, jack, kvm, mst, jasowang, david, qemu-devel,
	virtualization, adilger.kernel, zwisler, aarcange, dave.jiang,
	linux-nvdimm, vishal.l.verma, david, willy, hch, linux-acpi,
	jmoyer, linux-ext4, lenb, kilobyte, riel, stefanha, imammedo,
	dan.j.williams, lcapitulino, kwolf, nilal, tytso,
	xiaoguangrong.eric, darrick.wong, rjw, linux-kernel, linux-xfs,
	linux-fsdevel, pbonzini

On Fri, Apr 26, 2019 at 10:30:35AM +0530, Pankaj Gupta wrote:
> This patch adds virtio-pmem driver for KVM guest.
> 
> Guest reads the persistent memory range information from
> Qemu over VIRTIO and registers it on nvdimm_bus. It also
> creates a nd_region object with the persistent memory
> range information so that existing 'nvdimm/pmem' driver
> can reserve this into system memory map. This way
> 'virtio-pmem' driver uses existing functionality of pmem
> driver to register persistent memory compatible for DAX
> capable filesystems.
> 
> This also provides function to perform guest flush over
> VIRTIO from 'pmem' driver when userspace performs flush
> on DAX memory range.
> 
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
>  drivers/virtio/Kconfig           |  10 +++
>  drivers/virtio/Makefile          |   1 +
>  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
>  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
>  include/uapi/linux/virtio_ids.h  |   1 +
>  include/uapi/linux/virtio_pmem.h |  10 +++
>  7 files changed, 314 insertions(+)
>  create mode 100644 drivers/nvdimm/virtio_pmem.c
>  create mode 100644 drivers/virtio/pmem.c
>  create mode 100644 include/linux/virtio_pmem.h
>  create mode 100644 include/uapi/linux/virtio_pmem.h
> 
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> new file mode 100644
> index 000000000000..66b582f751a3
> --- /dev/null
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + */
> +#include <linux/virtio_pmem.h>
> +#include "nd.h"
> +
> + /* The interrupt handler */
> +void host_ack(struct virtqueue *vq)
> +{
> +	unsigned int len;
> +	unsigned long flags;
> +	struct virtio_pmem_request *req, *req_buf;
> +	struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> +		req->done = true;
> +		wake_up(&req->host_acked);
> +
> +		if (!list_empty(&vpmem->req_list)) {
> +			req_buf = list_first_entry(&vpmem->req_list,
> +					struct virtio_pmem_request, list);
> +			list_del(&vpmem->req_list);
> +			req_buf->wq_buf_avail = true;
> +			wake_up(&req_buf->wq_buf);
> +		}
> +	}
> +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(host_ack);
> +
> + /* The request submission function */
> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> +	int err;
> +	unsigned long flags;
> +	struct scatterlist *sgs[2], sg, ret;
> +	struct virtio_device *vdev = nd_region->provider_data;
> +	struct virtio_pmem *vpmem = vdev->priv;
> +	struct virtio_pmem_request *req;
> +
> +	might_sleep();
> +	req = kmalloc(sizeof(*req), GFP_KERNEL);
> +	if (!req)
> +		return -ENOMEM;
> +
> +	req->done = req->wq_buf_avail = false;
> +	strcpy(req->name, "FLUSH");
> +	init_waitqueue_head(&req->host_acked);
> +	init_waitqueue_head(&req->wq_buf);
> +	sg_init_one(&sg, req->name, strlen(req->name));
> +	sgs[0] = &sg;
> +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> +	sgs[1] = &ret;
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> +	if (err) {
> +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +
> +		list_add_tail(&vpmem->req_list, &req->list);
> +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +		/* When host has read buffer, this completes via host_ack */
> +		wait_event(req->wq_buf, req->wq_buf_avail);
> +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	}
> +	err = virtqueue_kick(vpmem->req_vq);
> +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +	if (!err) {
> +		err = -EIO;
> +		goto ret;
> +	}
> +	/* When host has read buffer, this completes via host_ack */
> +	wait_event(req->host_acked, req->done);
> +	err = req->ret;
> +ret:
> +	kfree(req);
> +	return err;
> +};
> +
> + /* The asynchronous flush callback function */
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> +{
> +	int rc = 0;
> +
> +	/* Create child bio for asynchronous flush and chain with
> +	 * parent bio. Otherwise directly call nd_region flush.
> +	 */
> +	if (bio && bio->bi_iter.bi_sector != -1) {
> +		struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> +
> +		if (!child)
> +			return -ENOMEM;
> +		bio_copy_dev(child, bio);
> +		child->bi_opf = REQ_PREFLUSH;
> +		child->bi_iter.bi_sector = -1;
> +		bio_chain(child, bio);
> +		submit_bio(child);
> +	} else {
> +		if (virtio_pmem_flush(nd_region))
> +			rc = -EIO;
> +	}
> +
> +	return rc;
> +};
> +EXPORT_SYMBOL_GPL(async_pmem_flush);
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 35897649c24f..9f634a2ed638 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
>  
>  	  If unsure, say Y.
>  
> +config VIRTIO_PMEM
> +	tristate "Support for virtio pmem driver"
> +	depends on VIRTIO
> +	depends on LIBNVDIMM
> +	help
> +	This driver provides support for virtio based flushing interface
> +	for persistent memory range.
> +
> +	If unsure, say M.
> +
>  config VIRTIO_BALLOON
>  	tristate "Virtio balloon driver"
>  	depends on VIRTIO
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 3a2b5c5dcf46..143ce91eabe9 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> new file mode 100644
> index 000000000000..309788628e41
> --- /dev/null
> +++ b/drivers/virtio/pmem.c
> @@ -0,0 +1,118 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and registers the virtual pmem device
> + * with libnvdimm core.
> + */
> +#include <linux/virtio_pmem.h>
> +#include <../../drivers/nvdimm/nd.h>
> +
> +static struct virtio_device_id id_table[] = {
> +	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> +	{ 0 },
> +};
> +
> + /* Initialize virt queue */
> +static int init_vq(struct virtio_pmem *vpmem)
> +{
> +	/* single vq */
> +	vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> +				host_ack, "flush_queue");
> +	if (IS_ERR(vpmem->req_vq))
> +		return PTR_ERR(vpmem->req_vq);
> +
> +	spin_lock_init(&vpmem->pmem_lock);
> +	INIT_LIST_HEAD(&vpmem->req_list);
> +
> +	return 0;
> +};
> +
> +static int virtio_pmem_probe(struct virtio_device *vdev)
> +{
> +	int err = 0;
> +	struct resource res;
> +	struct virtio_pmem *vpmem;
> +	struct nd_region_desc ndr_desc = {};
> +	int nid = dev_to_node(&vdev->dev);
> +	struct nd_region *nd_region;
> +
> +	if (!vdev->config->get) {
> +		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> +			__func__);
> +		return -EINVAL;
> +	}
> +
> +	vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> +	if (!vpmem) {
> +		err = -ENOMEM;
> +		goto out_err;
> +	}
> +
> +	vpmem->vdev = vdev;
> +	vdev->priv = vpmem;
> +	err = init_vq(vpmem);
> +	if (err)
> +		goto out_err;
> +
> +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +			start, &vpmem->start);
> +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +			size, &vpmem->size);
> +
> +	res.start = vpmem->start;
> +	res.end   = vpmem->start + vpmem->size-1;
> +	vpmem->nd_desc.provider_name = "virtio-pmem";
> +	vpmem->nd_desc.module = THIS_MODULE;
> +
> +	vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> +						&vpmem->nd_desc);
> +	if (!vpmem->nvdimm_bus)
> +		goto out_vq;
> +
> +	dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> +
> +	ndr_desc.res = &res;
> +	ndr_desc.numa_node = nid;
> +	ndr_desc.flush = async_pmem_flush;
> +	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> +	set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> +	nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> +
> +	if (!nd_region)
> +		goto out_nd;
> +	nd_region->provider_data =  dev_to_virtio

Delete extra space here ----------^^
I think this will let you join the two lines.

> +					(nd_region->dev.parent->parent);
> +	return 0;
> +out_nd:
> +	err = -ENXIO;
> +	nvdimm_bus_unregister(vpmem->nvdimm_bus);
> +out_vq:
> +	vdev->config->del_vqs(vdev);
> +out_err:
> +	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> +	return err;
> +}
> +
> +static void virtio_pmem_remove(struct virtio_device *vdev)
> +{
> +	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> +
> +	nvdimm_bus_unregister(nvdimm_bus);
> +	vdev->config->del_vqs(vdev);
> +	vdev->config->reset(vdev);
> +}
> +
> +static struct virtio_driver virtio_pmem_driver = {
> +	.driver.name		= KBUILD_MODNAME,
> +	.driver.owner		= THIS_MODULE,
> +	.id_table		= id_table,
> +	.probe			= virtio_pmem_probe,
> +	.remove			= virtio_pmem_remove,
> +};
> +
> +module_virtio_driver(virtio_pmem_driver);
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +MODULE_DESCRIPTION("Virtio pmem driver");
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..ab1da877575d
> --- /dev/null
> +++ b/include/linux/virtio_pmem.h
> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * virtio_pmem.h: virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + **/
> +
> +#ifndef _LINUX_VIRTIO_PMEM_H
> +#define _LINUX_VIRTIO_PMEM_H
> +
> +#include <linux/virtio_ids.h>
> +#include <linux/module.h>
> +#include <linux/virtio_config.h>
> +#include <uapi/linux/virtio_pmem.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/spinlock.h>
> +
> +struct virtio_pmem_request {
> +	/* Host return status corresponding to flush request */
> +	int ret;
> +
> +	/* command name*/
> +	char name[16];
> +
> +	/* Wait queue to process deferred work after ack from host */
> +	wait_queue_head_t host_acked;
> +	bool done;
> +
> +	/* Wait queue to process deferred work after virt queue buffer avail */
> +	wait_queue_head_t wq_buf;
> +	bool wq_buf_avail;
> +	struct list_head list;
> +};
> +
> +struct virtio_pmem {
> +	struct virtio_device *vdev;
> +
> +	/* Virtio pmem request queue */
> +	struct virtqueue *req_vq;
> +
> +	/* nvdimm bus registers virtio pmem device */
> +	struct nvdimm_bus *nvdimm_bus;
> +	struct nvdimm_bus_descriptor nd_desc;
> +
> +	/* List to store deferred work if virtqueue is full */
> +	struct list_head req_list;
> +
> +	/* Synchronize virtqueue data */
> +	spinlock_t pmem_lock;
> +
> +	/* Memory region information */
> +	uint64_t start;
> +	uint64_t size;
> +};
> +
> +void host_ack(struct virtqueue *vq);
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> +#endif
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 6d5c3b2d4f4d..32b2f94d1f58 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -43,5 +43,6 @@
>  #define VIRTIO_ID_INPUT        18 /* virtio input */
>  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
>  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
>  
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..fa3f7d52717a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_pmem.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> +#define _UAPI_LINUX_VIRTIO_PMEM_H
> +
> +struct virtio_pmem_config {
> +	__le64 start;
> +	__le64 size;
> +};
> +#endif

Suggesting to fix the above minor formatting error.

With this:

Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

> -- 
> 2.20.1
> 
> 


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-04-26  5:00   ` Pankaj Gupta
                     ` (3 preceding siblings ...)
  (?)
@ 2019-04-30  5:53   ` Yuval Shaia
  -1 siblings, 0 replies; 107+ messages in thread
From: Yuval Shaia @ 2019-04-30  5:53 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, jack, kvm, mst, david, qemu-devel, virtualization,
	adilger.kernel, zwisler, aarcange, dave.jiang, linux-nvdimm,
	vishal.l.verma, willy, hch, linux-acpi, jmoyer, linux-ext4, lenb,
	kilobyte, riel, stefanha, imammedo, dan.j.williams, lcapitulino,
	nilal, tytso, xiaoguangrong.eric, darrick.wong, rjw,
	linux-kernel, linux-xfs, linux-fsdevel, pbonzini

On Fri, Apr 26, 2019 at 10:30:35AM +0530, Pankaj Gupta wrote:
> This patch adds virtio-pmem driver for KVM guest.
> 
> Guest reads the persistent memory range information from
> Qemu over VIRTIO and registers it on nvdimm_bus. It also
> creates a nd_region object with the persistent memory
> range information so that existing 'nvdimm/pmem' driver
> can reserve this into system memory map. This way
> 'virtio-pmem' driver uses existing functionality of pmem
> driver to register persistent memory compatible for DAX
> capable filesystems.
> 
> This also provides function to perform guest flush over
> VIRTIO from 'pmem' driver when userspace performs flush
> on DAX memory range.
> 
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
>  drivers/virtio/Kconfig           |  10 +++
>  drivers/virtio/Makefile          |   1 +
>  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
>  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
>  include/uapi/linux/virtio_ids.h  |   1 +
>  include/uapi/linux/virtio_pmem.h |  10 +++
>  7 files changed, 314 insertions(+)
>  create mode 100644 drivers/nvdimm/virtio_pmem.c
>  create mode 100644 drivers/virtio/pmem.c
>  create mode 100644 include/linux/virtio_pmem.h
>  create mode 100644 include/uapi/linux/virtio_pmem.h
> 
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> new file mode 100644
> index 000000000000..66b582f751a3
> --- /dev/null
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + */
> +#include <linux/virtio_pmem.h>
> +#include "nd.h"
> +
> + /* The interrupt handler */
> +void host_ack(struct virtqueue *vq)
> +{
> +	unsigned int len;
> +	unsigned long flags;
> +	struct virtio_pmem_request *req, *req_buf;
> +	struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> +		req->done = true;
> +		wake_up(&req->host_acked);
> +
> +		if (!list_empty(&vpmem->req_list)) {
> +			req_buf = list_first_entry(&vpmem->req_list,
> +					struct virtio_pmem_request, list);
> +			list_del(&vpmem->req_list);
> +			req_buf->wq_buf_avail = true;
> +			wake_up(&req_buf->wq_buf);
> +		}
> +	}
> +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(host_ack);
> +
> + /* The request submission function */
> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> +	int err;
> +	unsigned long flags;
> +	struct scatterlist *sgs[2], sg, ret;
> +	struct virtio_device *vdev = nd_region->provider_data;
> +	struct virtio_pmem *vpmem = vdev->priv;
> +	struct virtio_pmem_request *req;
> +
> +	might_sleep();
> +	req = kmalloc(sizeof(*req), GFP_KERNEL);
> +	if (!req)
> +		return -ENOMEM;
> +
> +	req->done = req->wq_buf_avail = false;
> +	strcpy(req->name, "FLUSH");
> +	init_waitqueue_head(&req->host_acked);
> +	init_waitqueue_head(&req->wq_buf);
> +	sg_init_one(&sg, req->name, strlen(req->name));
> +	sgs[0] = &sg;
> +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> +	sgs[1] = &ret;
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> +	if (err) {
> +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +
> +		list_add_tail(&vpmem->req_list, &req->list);
> +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +		/* When host has read buffer, this completes via host_ack */
> +		wait_event(req->wq_buf, req->wq_buf_avail);
> +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	}
> +	err = virtqueue_kick(vpmem->req_vq);
> +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +	if (!err) {
> +		err = -EIO;
> +		goto ret;
> +	}
> +	/* When host has read buffer, this completes via host_ack */
> +	wait_event(req->host_acked, req->done);
> +	err = req->ret;
> +ret:
> +	kfree(req);
> +	return err;
> +};
> +
> + /* The asynchronous flush callback function */
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> +{
> +	int rc = 0;
> +
> +	/* Create child bio for asynchronous flush and chain with
> +	 * parent bio. Otherwise directly call nd_region flush.
> +	 */
> +	if (bio && bio->bi_iter.bi_sector != -1) {
> +		struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> +
> +		if (!child)
> +			return -ENOMEM;
> +		bio_copy_dev(child, bio);
> +		child->bi_opf = REQ_PREFLUSH;
> +		child->bi_iter.bi_sector = -1;
> +		bio_chain(child, bio);
> +		submit_bio(child);
> +	} else {
> +		if (virtio_pmem_flush(nd_region))
> +			rc = -EIO;
> +	}
> +
> +	return rc;
> +};
> +EXPORT_SYMBOL_GPL(async_pmem_flush);
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 35897649c24f..9f634a2ed638 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
>  
>  	  If unsure, say Y.
>  
> +config VIRTIO_PMEM
> +	tristate "Support for virtio pmem driver"
> +	depends on VIRTIO
> +	depends on LIBNVDIMM
> +	help
> +	This driver provides support for virtio based flushing interface
> +	for persistent memory range.
> +
> +	If unsure, say M.
> +
>  config VIRTIO_BALLOON
>  	tristate "Virtio balloon driver"
>  	depends on VIRTIO
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 3a2b5c5dcf46..143ce91eabe9 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> new file mode 100644
> index 000000000000..309788628e41
> --- /dev/null
> +++ b/drivers/virtio/pmem.c
> @@ -0,0 +1,118 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and registers the virtual pmem device
> + * with libnvdimm core.
> + */
> +#include <linux/virtio_pmem.h>
> +#include <../../drivers/nvdimm/nd.h>
> +
> +static struct virtio_device_id id_table[] = {
> +	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> +	{ 0 },
> +};
> +
> + /* Initialize virt queue */
> +static int init_vq(struct virtio_pmem *vpmem)
> +{
> +	/* single vq */
> +	vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> +				host_ack, "flush_queue");
> +	if (IS_ERR(vpmem->req_vq))
> +		return PTR_ERR(vpmem->req_vq);
> +
> +	spin_lock_init(&vpmem->pmem_lock);
> +	INIT_LIST_HEAD(&vpmem->req_list);
> +
> +	return 0;
> +};
> +
> +static int virtio_pmem_probe(struct virtio_device *vdev)
> +{
> +	int err = 0;
> +	struct resource res;
> +	struct virtio_pmem *vpmem;
> +	struct nd_region_desc ndr_desc = {};
> +	int nid = dev_to_node(&vdev->dev);
> +	struct nd_region *nd_region;
> +
> +	if (!vdev->config->get) {
> +		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> +			__func__);
> +		return -EINVAL;
> +	}
> +
> +	vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> +	if (!vpmem) {
> +		err = -ENOMEM;
> +		goto out_err;
> +	}
> +
> +	vpmem->vdev = vdev;
> +	vdev->priv = vpmem;
> +	err = init_vq(vpmem);
> +	if (err)
> +		goto out_err;
> +
> +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +			start, &vpmem->start);
> +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +			size, &vpmem->size);
> +
> +	res.start = vpmem->start;
> +	res.end   = vpmem->start + vpmem->size-1;
> +	vpmem->nd_desc.provider_name = "virtio-pmem";
> +	vpmem->nd_desc.module = THIS_MODULE;
> +
> +	vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> +						&vpmem->nd_desc);
> +	if (!vpmem->nvdimm_bus)
> +		goto out_vq;
> +
> +	dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> +
> +	ndr_desc.res = &res;
> +	ndr_desc.numa_node = nid;
> +	ndr_desc.flush = async_pmem_flush;
> +	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> +	set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> +	nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> +
> +	if (!nd_region)
> +		goto out_nd;
> +	nd_region->provider_data =  dev_to_virtio

Delete extra space here ----------^^
I think this will let you join the two lines.

> +					(nd_region->dev.parent->parent);
> +	return 0;
> +out_nd:
> +	err = -ENXIO;
> +	nvdimm_bus_unregister(vpmem->nvdimm_bus);
> +out_vq:
> +	vdev->config->del_vqs(vdev);
> +out_err:
> +	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> +	return err;
> +}
> +
> +static void virtio_pmem_remove(struct virtio_device *vdev)
> +{
> +	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> +
> +	nvdimm_bus_unregister(nvdimm_bus);
> +	vdev->config->del_vqs(vdev);
> +	vdev->config->reset(vdev);
> +}
> +
> +static struct virtio_driver virtio_pmem_driver = {
> +	.driver.name		= KBUILD_MODNAME,
> +	.driver.owner		= THIS_MODULE,
> +	.id_table		= id_table,
> +	.probe			= virtio_pmem_probe,
> +	.remove			= virtio_pmem_remove,
> +};
> +
> +module_virtio_driver(virtio_pmem_driver);
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +MODULE_DESCRIPTION("Virtio pmem driver");
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..ab1da877575d
> --- /dev/null
> +++ b/include/linux/virtio_pmem.h
> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * virtio_pmem.h: virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + **/
> +
> +#ifndef _LINUX_VIRTIO_PMEM_H
> +#define _LINUX_VIRTIO_PMEM_H
> +
> +#include <linux/virtio_ids.h>
> +#include <linux/module.h>
> +#include <linux/virtio_config.h>
> +#include <uapi/linux/virtio_pmem.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/spinlock.h>
> +
> +struct virtio_pmem_request {
> +	/* Host return status corresponding to flush request */
> +	int ret;
> +
> +	/* command name*/
> +	char name[16];
> +
> +	/* Wait queue to process deferred work after ack from host */
> +	wait_queue_head_t host_acked;
> +	bool done;
> +
> +	/* Wait queue to process deferred work after virt queue buffer avail */
> +	wait_queue_head_t wq_buf;
> +	bool wq_buf_avail;
> +	struct list_head list;
> +};
> +
> +struct virtio_pmem {
> +	struct virtio_device *vdev;
> +
> +	/* Virtio pmem request queue */
> +	struct virtqueue *req_vq;
> +
> +	/* nvdimm bus registers virtio pmem device */
> +	struct nvdimm_bus *nvdimm_bus;
> +	struct nvdimm_bus_descriptor nd_desc;
> +
> +	/* List to store deferred work if virtqueue is full */
> +	struct list_head req_list;
> +
> +	/* Synchronize virtqueue data */
> +	spinlock_t pmem_lock;
> +
> +	/* Memory region information */
> +	uint64_t start;
> +	uint64_t size;
> +};
> +
> +void host_ack(struct virtqueue *vq);
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> +#endif
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 6d5c3b2d4f4d..32b2f94d1f58 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -43,5 +43,6 @@
>  #define VIRTIO_ID_INPUT        18 /* virtio input */
>  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
>  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
>  
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..fa3f7d52717a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_pmem.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> +#define _UAPI_LINUX_VIRTIO_PMEM_H
> +
> +struct virtio_pmem_config {
> +	__le64 start;
> +	__le64 size;
> +};
> +#endif

Suggesting to fix the above minor formatting error.

With this:

Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

> -- 
> 2.20.1
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-04-30  6:06         ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-30  6:06 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: cohuck, jack, kvm, mst, jasowang, david, qemu-devel,
	virtualization, adilger kernel, zwisler, aarcange, dave jiang,
	linux-nvdimm, vishal l verma, david, willy, hch, linux-acpi,
	jmoyer, linux-ext4, lenb, kilobyte, riel, stefanha, imammedo,
	dan j williams, lcapitulino, kwolf, nilal, tytso,
	xiaoguangrong eric, darrick wong


> 
> On Fri, Apr 26, 2019 at 10:30:35AM +0530, Pankaj Gupta wrote:
> > This patch adds virtio-pmem driver for KVM guest.
> > 
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> > 
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> > 
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> >  drivers/virtio/Kconfig           |  10 +++
> >  drivers/virtio/Makefile          |   1 +
> >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  10 +++
> >  7 files changed, 314 insertions(+)
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 drivers/virtio/pmem.c
> >  create mode 100644 include/linux/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > 
> > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > new file mode 100644
> > index 000000000000..66b582f751a3
> > --- /dev/null
> > +++ b/drivers/nvdimm/virtio_pmem.c
> > @@ -0,0 +1,114 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include "nd.h"
> > +
> > + /* The interrupt handler */
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +	unsigned int len;
> > +	unsigned long flags;
> > +	struct virtio_pmem_request *req, *req_buf;
> > +	struct virtio_pmem *vpmem = vq->vdev->priv;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > +		req->done = true;
> > +		wake_up(&req->host_acked);
> > +
> > +		if (!list_empty(&vpmem->req_list)) {
> > +			req_buf = list_first_entry(&vpmem->req_list,
> > +					struct virtio_pmem_request, list);
> > +			list_del(&vpmem->req_list);
> > +			req_buf->wq_buf_avail = true;
> > +			wake_up(&req_buf->wq_buf);
> > +		}
> > +	}
> > +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(host_ack);
> > +
> > + /* The request submission function */
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +	int err;
> > +	unsigned long flags;
> > +	struct scatterlist *sgs[2], sg, ret;
> > +	struct virtio_device *vdev = nd_region->provider_data;
> > +	struct virtio_pmem *vpmem = vdev->priv;
> > +	struct virtio_pmem_request *req;
> > +
> > +	might_sleep();
> > +	req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +	if (!req)
> > +		return -ENOMEM;
> > +
> > +	req->done = req->wq_buf_avail = false;
> > +	strcpy(req->name, "FLUSH");
> > +	init_waitqueue_head(&req->host_acked);
> > +	init_waitqueue_head(&req->wq_buf);
> > +	sg_init_one(&sg, req->name, strlen(req->name));
> > +	sgs[0] = &sg;
> > +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > +	sgs[1] = &ret;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > +	if (err) {
> > +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> > +
> > +		list_add_tail(&vpmem->req_list, &req->list);
> > +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +		/* When host has read buffer, this completes via host_ack */
> > +		wait_event(req->wq_buf, req->wq_buf_avail);
> > +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	}
> > +	err = virtqueue_kick(vpmem->req_vq);
> > +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +	if (!err) {
> > +		err = -EIO;
> > +		goto ret;
> > +	}
> > +	/* When host has read buffer, this completes via host_ack */
> > +	wait_event(req->host_acked, req->done);
> > +	err = req->ret;
> > +ret:
> > +	kfree(req);
> > +	return err;
> > +};
> > +
> > + /* The asynchronous flush callback function */
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > +{
> > +	int rc = 0;
> > +
> > +	/* Create child bio for asynchronous flush and chain with
> > +	 * parent bio. Otherwise directly call nd_region flush.
> > +	 */
> > +	if (bio && bio->bi_iter.bi_sector != -1) {
> > +		struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > +
> > +		if (!child)
> > +			return -ENOMEM;
> > +		bio_copy_dev(child, bio);
> > +		child->bi_opf = REQ_PREFLUSH;
> > +		child->bi_iter.bi_sector = -1;
> > +		bio_chain(child, bio);
> > +		submit_bio(child);
> > +	} else {
> > +		if (virtio_pmem_flush(nd_region))
> > +			rc = -EIO;
> > +	}
> > +
> > +	return rc;
> > +};
> > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > +MODULE_LICENSE("GPL");
> > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > index 35897649c24f..9f634a2ed638 100644
> > --- a/drivers/virtio/Kconfig
> > +++ b/drivers/virtio/Kconfig
> > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> >  
> >  	  If unsure, say Y.
> >  
> > +config VIRTIO_PMEM
> > +	tristate "Support for virtio pmem driver"
> > +	depends on VIRTIO
> > +	depends on LIBNVDIMM
> > +	help
> > +	This driver provides support for virtio based flushing interface
> > +	for persistent memory range.
> > +
> > +	If unsure, say M.
> > +
> >  config VIRTIO_BALLOON
> >  	tristate "Virtio balloon driver"
> >  	depends on VIRTIO
> > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > index 3a2b5c5dcf46..143ce91eabe9 100644
> > --- a/drivers/virtio/Makefile
> > +++ b/drivers/virtio/Makefile
> > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > new file mode 100644
> > index 000000000000..309788628e41
> > --- /dev/null
> > +++ b/drivers/virtio/pmem.c
> > @@ -0,0 +1,118 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and registers the virtual pmem device
> > + * with libnvdimm core.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include <../../drivers/nvdimm/nd.h>
> > +
> > +static struct virtio_device_id id_table[] = {
> > +	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > +	{ 0 },
> > +};
> > +
> > + /* Initialize virt queue */
> > +static int init_vq(struct virtio_pmem *vpmem)
> > +{
> > +	/* single vq */
> > +	vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > +				host_ack, "flush_queue");
> > +	if (IS_ERR(vpmem->req_vq))
> > +		return PTR_ERR(vpmem->req_vq);
> > +
> > +	spin_lock_init(&vpmem->pmem_lock);
> > +	INIT_LIST_HEAD(&vpmem->req_list);
> > +
> > +	return 0;
> > +};
> > +
> > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > +{
> > +	int err = 0;
> > +	struct resource res;
> > +	struct virtio_pmem *vpmem;
> > +	struct nd_region_desc ndr_desc = {};
> > +	int nid = dev_to_node(&vdev->dev);
> > +	struct nd_region *nd_region;
> > +
> > +	if (!vdev->config->get) {
> > +		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > +			__func__);
> > +		return -EINVAL;
> > +	}
> > +
> > +	vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > +	if (!vpmem) {
> > +		err = -ENOMEM;
> > +		goto out_err;
> > +	}
> > +
> > +	vpmem->vdev = vdev;
> > +	vdev->priv = vpmem;
> > +	err = init_vq(vpmem);
> > +	if (err)
> > +		goto out_err;
> > +
> > +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +			start, &vpmem->start);
> > +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +			size, &vpmem->size);
> > +
> > +	res.start = vpmem->start;
> > +	res.end   = vpmem->start + vpmem->size-1;
> > +	vpmem->nd_desc.provider_name = "virtio-pmem";
> > +	vpmem->nd_desc.module = THIS_MODULE;
> > +
> > +	vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > +						&vpmem->nd_desc);
> > +	if (!vpmem->nvdimm_bus)
> > +		goto out_vq;
> > +
> > +	dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > +
> > +	ndr_desc.res = &res;
> > +	ndr_desc.numa_node = nid;
> > +	ndr_desc.flush = async_pmem_flush;
> > +	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > +	set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > +	nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> > +
> > +	if (!nd_region)
> > +		goto out_nd;
> > +	nd_region->provider_data =  dev_to_virtio
> 
> Delete extra space here ----------^^

ah. Sure. 

> I think this will let you join the two lines.
> 
> > +					(nd_region->dev.parent->parent);
> > +	return 0;
> > +out_nd:
> > +	err = -ENXIO;
> > +	nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > +out_vq:
> > +	vdev->config->del_vqs(vdev);
> > +out_err:
> > +	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > +	return err;
> > +}
> > +
> > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > +{
> > +	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > +
> > +	nvdimm_bus_unregister(nvdimm_bus);
> > +	vdev->config->del_vqs(vdev);
> > +	vdev->config->reset(vdev);
> > +}
> > +
> > +static struct virtio_driver virtio_pmem_driver = {
> > +	.driver.name		= KBUILD_MODNAME,
> > +	.driver.owner		= THIS_MODULE,
> > +	.id_table		= id_table,
> > +	.probe			= virtio_pmem_probe,
> > +	.remove			= virtio_pmem_remove,
> > +};
> > +
> > +module_virtio_driver(virtio_pmem_driver);
> > +MODULE_DEVICE_TABLE(virtio, id_table);
> > +MODULE_DESCRIPTION("Virtio pmem driver");
> > +MODULE_LICENSE("GPL");
> > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..ab1da877575d
> > --- /dev/null
> > +++ b/include/linux/virtio_pmem.h
> > @@ -0,0 +1,60 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * virtio_pmem.h: virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + **/
> > +
> > +#ifndef _LINUX_VIRTIO_PMEM_H
> > +#define _LINUX_VIRTIO_PMEM_H
> > +
> > +#include <linux/virtio_ids.h>
> > +#include <linux/module.h>
> > +#include <linux/virtio_config.h>
> > +#include <uapi/linux/virtio_pmem.h>
> > +#include <linux/libnvdimm.h>
> > +#include <linux/spinlock.h>
> > +
> > +struct virtio_pmem_request {
> > +	/* Host return status corresponding to flush request */
> > +	int ret;
> > +
> > +	/* command name*/
> > +	char name[16];
> > +
> > +	/* Wait queue to process deferred work after ack from host */
> > +	wait_queue_head_t host_acked;
> > +	bool done;
> > +
> > +	/* Wait queue to process deferred work after virt queue buffer avail */
> > +	wait_queue_head_t wq_buf;
> > +	bool wq_buf_avail;
> > +	struct list_head list;
> > +};
> > +
> > +struct virtio_pmem {
> > +	struct virtio_device *vdev;
> > +
> > +	/* Virtio pmem request queue */
> > +	struct virtqueue *req_vq;
> > +
> > +	/* nvdimm bus registers virtio pmem device */
> > +	struct nvdimm_bus *nvdimm_bus;
> > +	struct nvdimm_bus_descriptor nd_desc;
> > +
> > +	/* List to store deferred work if virtqueue is full */
> > +	struct list_head req_list;
> > +
> > +	/* Synchronize virtqueue data */
> > +	spinlock_t pmem_lock;
> > +
> > +	/* Memory region information */
> > +	uint64_t start;
> > +	uint64_t size;
> > +};
> > +
> > +void host_ack(struct virtqueue *vq);
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> > +#endif
> > diff --git a/include/uapi/linux/virtio_ids.h
> > b/include/uapi/linux/virtio_ids.h
> > index 6d5c3b2d4f4d..32b2f94d1f58 100644
> > --- a/include/uapi/linux/virtio_ids.h
> > +++ b/include/uapi/linux/virtio_ids.h
> > @@ -43,5 +43,6 @@
> >  #define VIRTIO_ID_INPUT        18 /* virtio input */
> >  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
> >  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> > +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
> >  
> >  #endif /* _LINUX_VIRTIO_IDS_H */
> > diff --git a/include/uapi/linux/virtio_pmem.h
> > b/include/uapi/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..fa3f7d52717a
> > --- /dev/null
> > +++ b/include/uapi/linux/virtio_pmem.h
> > @@ -0,0 +1,10 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> > +#define _UAPI_LINUX_VIRTIO_PMEM_H
> > +
> > +struct virtio_pmem_config {
> > +	__le64 start;
> > +	__le64 size;
> > +};
> > +#endif
> 
> Suggesting to fix the above minor formatting error.
> 
> With this:
> 
> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

Thank You!

Best regards,
Pankaj

> 
> > --
> > 2.20.1
> > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-04-30  6:06         ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-30  6:06 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: cohuck, jack, kvm, mst, jasowang, david, qemu-devel,
	virtualization, adilger kernel, zwisler, aarcange, dave jiang,
	linux-nvdimm, vishal l verma, david, willy, hch, linux-acpi,
	jmoyer, linux-ext4, lenb, kilobyte, riel, stefanha, imammedo,
	dan j williams, lcapitulino, kwolf, nilal, tytso,
	xiaoguangrong eric, darrick wong, rjw, linux-kernel, linux-xfs,
	linux-fsdevel, pbonzini


> 
> On Fri, Apr 26, 2019 at 10:30:35AM +0530, Pankaj Gupta wrote:
> > This patch adds virtio-pmem driver for KVM guest.
> > 
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> > 
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> > 
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> >  drivers/virtio/Kconfig           |  10 +++
> >  drivers/virtio/Makefile          |   1 +
> >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  10 +++
> >  7 files changed, 314 insertions(+)
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 drivers/virtio/pmem.c
> >  create mode 100644 include/linux/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > 
> > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > new file mode 100644
> > index 000000000000..66b582f751a3
> > --- /dev/null
> > +++ b/drivers/nvdimm/virtio_pmem.c
> > @@ -0,0 +1,114 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include "nd.h"
> > +
> > + /* The interrupt handler */
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +	unsigned int len;
> > +	unsigned long flags;
> > +	struct virtio_pmem_request *req, *req_buf;
> > +	struct virtio_pmem *vpmem = vq->vdev->priv;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > +		req->done = true;
> > +		wake_up(&req->host_acked);
> > +
> > +		if (!list_empty(&vpmem->req_list)) {
> > +			req_buf = list_first_entry(&vpmem->req_list,
> > +					struct virtio_pmem_request, list);
> > +			list_del(&vpmem->req_list);
> > +			req_buf->wq_buf_avail = true;
> > +			wake_up(&req_buf->wq_buf);
> > +		}
> > +	}
> > +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(host_ack);
> > +
> > + /* The request submission function */
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +	int err;
> > +	unsigned long flags;
> > +	struct scatterlist *sgs[2], sg, ret;
> > +	struct virtio_device *vdev = nd_region->provider_data;
> > +	struct virtio_pmem *vpmem = vdev->priv;
> > +	struct virtio_pmem_request *req;
> > +
> > +	might_sleep();
> > +	req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +	if (!req)
> > +		return -ENOMEM;
> > +
> > +	req->done = req->wq_buf_avail = false;
> > +	strcpy(req->name, "FLUSH");
> > +	init_waitqueue_head(&req->host_acked);
> > +	init_waitqueue_head(&req->wq_buf);
> > +	sg_init_one(&sg, req->name, strlen(req->name));
> > +	sgs[0] = &sg;
> > +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > +	sgs[1] = &ret;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > +	if (err) {
> > +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> > +
> > +		list_add_tail(&vpmem->req_list, &req->list);
> > +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +		/* When host has read buffer, this completes via host_ack */
> > +		wait_event(req->wq_buf, req->wq_buf_avail);
> > +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	}
> > +	err = virtqueue_kick(vpmem->req_vq);
> > +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +	if (!err) {
> > +		err = -EIO;
> > +		goto ret;
> > +	}
> > +	/* When host has read buffer, this completes via host_ack */
> > +	wait_event(req->host_acked, req->done);
> > +	err = req->ret;
> > +ret:
> > +	kfree(req);
> > +	return err;
> > +};
> > +
> > + /* The asynchronous flush callback function */
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > +{
> > +	int rc = 0;
> > +
> > +	/* Create child bio for asynchronous flush and chain with
> > +	 * parent bio. Otherwise directly call nd_region flush.
> > +	 */
> > +	if (bio && bio->bi_iter.bi_sector != -1) {
> > +		struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > +
> > +		if (!child)
> > +			return -ENOMEM;
> > +		bio_copy_dev(child, bio);
> > +		child->bi_opf = REQ_PREFLUSH;
> > +		child->bi_iter.bi_sector = -1;
> > +		bio_chain(child, bio);
> > +		submit_bio(child);
> > +	} else {
> > +		if (virtio_pmem_flush(nd_region))
> > +			rc = -EIO;
> > +	}
> > +
> > +	return rc;
> > +};
> > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > +MODULE_LICENSE("GPL");
> > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > index 35897649c24f..9f634a2ed638 100644
> > --- a/drivers/virtio/Kconfig
> > +++ b/drivers/virtio/Kconfig
> > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> >  
> >  	  If unsure, say Y.
> >  
> > +config VIRTIO_PMEM
> > +	tristate "Support for virtio pmem driver"
> > +	depends on VIRTIO
> > +	depends on LIBNVDIMM
> > +	help
> > +	This driver provides support for virtio based flushing interface
> > +	for persistent memory range.
> > +
> > +	If unsure, say M.
> > +
> >  config VIRTIO_BALLOON
> >  	tristate "Virtio balloon driver"
> >  	depends on VIRTIO
> > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > index 3a2b5c5dcf46..143ce91eabe9 100644
> > --- a/drivers/virtio/Makefile
> > +++ b/drivers/virtio/Makefile
> > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > new file mode 100644
> > index 000000000000..309788628e41
> > --- /dev/null
> > +++ b/drivers/virtio/pmem.c
> > @@ -0,0 +1,118 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and registers the virtual pmem device
> > + * with libnvdimm core.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include <../../drivers/nvdimm/nd.h>
> > +
> > +static struct virtio_device_id id_table[] = {
> > +	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > +	{ 0 },
> > +};
> > +
> > + /* Initialize virt queue */
> > +static int init_vq(struct virtio_pmem *vpmem)
> > +{
> > +	/* single vq */
> > +	vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > +				host_ack, "flush_queue");
> > +	if (IS_ERR(vpmem->req_vq))
> > +		return PTR_ERR(vpmem->req_vq);
> > +
> > +	spin_lock_init(&vpmem->pmem_lock);
> > +	INIT_LIST_HEAD(&vpmem->req_list);
> > +
> > +	return 0;
> > +};
> > +
> > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > +{
> > +	int err = 0;
> > +	struct resource res;
> > +	struct virtio_pmem *vpmem;
> > +	struct nd_region_desc ndr_desc = {};
> > +	int nid = dev_to_node(&vdev->dev);
> > +	struct nd_region *nd_region;
> > +
> > +	if (!vdev->config->get) {
> > +		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > +			__func__);
> > +		return -EINVAL;
> > +	}
> > +
> > +	vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > +	if (!vpmem) {
> > +		err = -ENOMEM;
> > +		goto out_err;
> > +	}
> > +
> > +	vpmem->vdev = vdev;
> > +	vdev->priv = vpmem;
> > +	err = init_vq(vpmem);
> > +	if (err)
> > +		goto out_err;
> > +
> > +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +			start, &vpmem->start);
> > +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +			size, &vpmem->size);
> > +
> > +	res.start = vpmem->start;
> > +	res.end   = vpmem->start + vpmem->size-1;
> > +	vpmem->nd_desc.provider_name = "virtio-pmem";
> > +	vpmem->nd_desc.module = THIS_MODULE;
> > +
> > +	vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > +						&vpmem->nd_desc);
> > +	if (!vpmem->nvdimm_bus)
> > +		goto out_vq;
> > +
> > +	dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > +
> > +	ndr_desc.res = &res;
> > +	ndr_desc.numa_node = nid;
> > +	ndr_desc.flush = async_pmem_flush;
> > +	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > +	set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > +	nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> > +
> > +	if (!nd_region)
> > +		goto out_nd;
> > +	nd_region->provider_data =  dev_to_virtio
> 
> Delete extra space here ----------^^

ah. Sure. 

> I think this will let you join the two lines.
> 
> > +					(nd_region->dev.parent->parent);
> > +	return 0;
> > +out_nd:
> > +	err = -ENXIO;
> > +	nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > +out_vq:
> > +	vdev->config->del_vqs(vdev);
> > +out_err:
> > +	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > +	return err;
> > +}
> > +
> > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > +{
> > +	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > +
> > +	nvdimm_bus_unregister(nvdimm_bus);
> > +	vdev->config->del_vqs(vdev);
> > +	vdev->config->reset(vdev);
> > +}
> > +
> > +static struct virtio_driver virtio_pmem_driver = {
> > +	.driver.name		= KBUILD_MODNAME,
> > +	.driver.owner		= THIS_MODULE,
> > +	.id_table		= id_table,
> > +	.probe			= virtio_pmem_probe,
> > +	.remove			= virtio_pmem_remove,
> > +};
> > +
> > +module_virtio_driver(virtio_pmem_driver);
> > +MODULE_DEVICE_TABLE(virtio, id_table);
> > +MODULE_DESCRIPTION("Virtio pmem driver");
> > +MODULE_LICENSE("GPL");
> > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..ab1da877575d
> > --- /dev/null
> > +++ b/include/linux/virtio_pmem.h
> > @@ -0,0 +1,60 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * virtio_pmem.h: virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + **/
> > +
> > +#ifndef _LINUX_VIRTIO_PMEM_H
> > +#define _LINUX_VIRTIO_PMEM_H
> > +
> > +#include <linux/virtio_ids.h>
> > +#include <linux/module.h>
> > +#include <linux/virtio_config.h>
> > +#include <uapi/linux/virtio_pmem.h>
> > +#include <linux/libnvdimm.h>
> > +#include <linux/spinlock.h>
> > +
> > +struct virtio_pmem_request {
> > +	/* Host return status corresponding to flush request */
> > +	int ret;
> > +
> > +	/* command name*/
> > +	char name[16];
> > +
> > +	/* Wait queue to process deferred work after ack from host */
> > +	wait_queue_head_t host_acked;
> > +	bool done;
> > +
> > +	/* Wait queue to process deferred work after virt queue buffer avail */
> > +	wait_queue_head_t wq_buf;
> > +	bool wq_buf_avail;
> > +	struct list_head list;
> > +};
> > +
> > +struct virtio_pmem {
> > +	struct virtio_device *vdev;
> > +
> > +	/* Virtio pmem request queue */
> > +	struct virtqueue *req_vq;
> > +
> > +	/* nvdimm bus registers virtio pmem device */
> > +	struct nvdimm_bus *nvdimm_bus;
> > +	struct nvdimm_bus_descriptor nd_desc;
> > +
> > +	/* List to store deferred work if virtqueue is full */
> > +	struct list_head req_list;
> > +
> > +	/* Synchronize virtqueue data */
> > +	spinlock_t pmem_lock;
> > +
> > +	/* Memory region information */
> > +	uint64_t start;
> > +	uint64_t size;
> > +};
> > +
> > +void host_ack(struct virtqueue *vq);
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> > +#endif
> > diff --git a/include/uapi/linux/virtio_ids.h
> > b/include/uapi/linux/virtio_ids.h
> > index 6d5c3b2d4f4d..32b2f94d1f58 100644
> > --- a/include/uapi/linux/virtio_ids.h
> > +++ b/include/uapi/linux/virtio_ids.h
> > @@ -43,5 +43,6 @@
> >  #define VIRTIO_ID_INPUT        18 /* virtio input */
> >  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
> >  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> > +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
> >  
> >  #endif /* _LINUX_VIRTIO_IDS_H */
> > diff --git a/include/uapi/linux/virtio_pmem.h
> > b/include/uapi/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..fa3f7d52717a
> > --- /dev/null
> > +++ b/include/uapi/linux/virtio_pmem.h
> > @@ -0,0 +1,10 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> > +#define _UAPI_LINUX_VIRTIO_PMEM_H
> > +
> > +struct virtio_pmem_config {
> > +	__le64 start;
> > +	__le64 size;
> > +};
> > +#endif
> 
> Suggesting to fix the above minor formatting error.
> 
> With this:
> 
> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

Thank You!

Best regards,
Pankaj

> 
> > --
> > 2.20.1
> > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-04-30  6:06         ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-30  6:06 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: jack, kvm, mst, jasowang, david, qemu-devel, virtualization,
	adilger kernel, zwisler, aarcange, dave jiang, linux-nvdimm,
	vishal l verma, david, willy, hch, linux-acpi, jmoyer,
	linux-ext4, lenb, kilobyte, riel, stefanha, pbonzini,
	dan j williams, lcapitulino, kwolf, nilal, tytso,
	xiaoguangrong eric, cohuck, rjw, linux-kernel, linux-xfs,
	linux-fsdevel, imammedo, darrick wong


> 
> On Fri, Apr 26, 2019 at 10:30:35AM +0530, Pankaj Gupta wrote:
> > This patch adds virtio-pmem driver for KVM guest.
> > 
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> > 
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> > 
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> >  drivers/virtio/Kconfig           |  10 +++
> >  drivers/virtio/Makefile          |   1 +
> >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  10 +++
> >  7 files changed, 314 insertions(+)
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 drivers/virtio/pmem.c
> >  create mode 100644 include/linux/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > 
> > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > new file mode 100644
> > index 000000000000..66b582f751a3
> > --- /dev/null
> > +++ b/drivers/nvdimm/virtio_pmem.c
> > @@ -0,0 +1,114 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include "nd.h"
> > +
> > + /* The interrupt handler */
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +	unsigned int len;
> > +	unsigned long flags;
> > +	struct virtio_pmem_request *req, *req_buf;
> > +	struct virtio_pmem *vpmem = vq->vdev->priv;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > +		req->done = true;
> > +		wake_up(&req->host_acked);
> > +
> > +		if (!list_empty(&vpmem->req_list)) {
> > +			req_buf = list_first_entry(&vpmem->req_list,
> > +					struct virtio_pmem_request, list);
> > +			list_del(&vpmem->req_list);
> > +			req_buf->wq_buf_avail = true;
> > +			wake_up(&req_buf->wq_buf);
> > +		}
> > +	}
> > +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(host_ack);
> > +
> > + /* The request submission function */
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +	int err;
> > +	unsigned long flags;
> > +	struct scatterlist *sgs[2], sg, ret;
> > +	struct virtio_device *vdev = nd_region->provider_data;
> > +	struct virtio_pmem *vpmem = vdev->priv;
> > +	struct virtio_pmem_request *req;
> > +
> > +	might_sleep();
> > +	req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +	if (!req)
> > +		return -ENOMEM;
> > +
> > +	req->done = req->wq_buf_avail = false;
> > +	strcpy(req->name, "FLUSH");
> > +	init_waitqueue_head(&req->host_acked);
> > +	init_waitqueue_head(&req->wq_buf);
> > +	sg_init_one(&sg, req->name, strlen(req->name));
> > +	sgs[0] = &sg;
> > +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > +	sgs[1] = &ret;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > +	if (err) {
> > +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> > +
> > +		list_add_tail(&vpmem->req_list, &req->list);
> > +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +		/* When host has read buffer, this completes via host_ack */
> > +		wait_event(req->wq_buf, req->wq_buf_avail);
> > +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	}
> > +	err = virtqueue_kick(vpmem->req_vq);
> > +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +	if (!err) {
> > +		err = -EIO;
> > +		goto ret;
> > +	}
> > +	/* When host has read buffer, this completes via host_ack */
> > +	wait_event(req->host_acked, req->done);
> > +	err = req->ret;
> > +ret:
> > +	kfree(req);
> > +	return err;
> > +};
> > +
> > + /* The asynchronous flush callback function */
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > +{
> > +	int rc = 0;
> > +
> > +	/* Create child bio for asynchronous flush and chain with
> > +	 * parent bio. Otherwise directly call nd_region flush.
> > +	 */
> > +	if (bio && bio->bi_iter.bi_sector != -1) {
> > +		struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > +
> > +		if (!child)
> > +			return -ENOMEM;
> > +		bio_copy_dev(child, bio);
> > +		child->bi_opf = REQ_PREFLUSH;
> > +		child->bi_iter.bi_sector = -1;
> > +		bio_chain(child, bio);
> > +		submit_bio(child);
> > +	} else {
> > +		if (virtio_pmem_flush(nd_region))
> > +			rc = -EIO;
> > +	}
> > +
> > +	return rc;
> > +};
> > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > +MODULE_LICENSE("GPL");
> > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > index 35897649c24f..9f634a2ed638 100644
> > --- a/drivers/virtio/Kconfig
> > +++ b/drivers/virtio/Kconfig
> > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> >  
> >  	  If unsure, say Y.
> >  
> > +config VIRTIO_PMEM
> > +	tristate "Support for virtio pmem driver"
> > +	depends on VIRTIO
> > +	depends on LIBNVDIMM
> > +	help
> > +	This driver provides support for virtio based flushing interface
> > +	for persistent memory range.
> > +
> > +	If unsure, say M.
> > +
> >  config VIRTIO_BALLOON
> >  	tristate "Virtio balloon driver"
> >  	depends on VIRTIO
> > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > index 3a2b5c5dcf46..143ce91eabe9 100644
> > --- a/drivers/virtio/Makefile
> > +++ b/drivers/virtio/Makefile
> > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > new file mode 100644
> > index 000000000000..309788628e41
> > --- /dev/null
> > +++ b/drivers/virtio/pmem.c
> > @@ -0,0 +1,118 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and registers the virtual pmem device
> > + * with libnvdimm core.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include <../../drivers/nvdimm/nd.h>
> > +
> > +static struct virtio_device_id id_table[] = {
> > +	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > +	{ 0 },
> > +};
> > +
> > + /* Initialize virt queue */
> > +static int init_vq(struct virtio_pmem *vpmem)
> > +{
> > +	/* single vq */
> > +	vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > +				host_ack, "flush_queue");
> > +	if (IS_ERR(vpmem->req_vq))
> > +		return PTR_ERR(vpmem->req_vq);
> > +
> > +	spin_lock_init(&vpmem->pmem_lock);
> > +	INIT_LIST_HEAD(&vpmem->req_list);
> > +
> > +	return 0;
> > +};
> > +
> > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > +{
> > +	int err = 0;
> > +	struct resource res;
> > +	struct virtio_pmem *vpmem;
> > +	struct nd_region_desc ndr_desc = {};
> > +	int nid = dev_to_node(&vdev->dev);
> > +	struct nd_region *nd_region;
> > +
> > +	if (!vdev->config->get) {
> > +		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > +			__func__);
> > +		return -EINVAL;
> > +	}
> > +
> > +	vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > +	if (!vpmem) {
> > +		err = -ENOMEM;
> > +		goto out_err;
> > +	}
> > +
> > +	vpmem->vdev = vdev;
> > +	vdev->priv = vpmem;
> > +	err = init_vq(vpmem);
> > +	if (err)
> > +		goto out_err;
> > +
> > +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +			start, &vpmem->start);
> > +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +			size, &vpmem->size);
> > +
> > +	res.start = vpmem->start;
> > +	res.end   = vpmem->start + vpmem->size-1;
> > +	vpmem->nd_desc.provider_name = "virtio-pmem";
> > +	vpmem->nd_desc.module = THIS_MODULE;
> > +
> > +	vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > +						&vpmem->nd_desc);
> > +	if (!vpmem->nvdimm_bus)
> > +		goto out_vq;
> > +
> > +	dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > +
> > +	ndr_desc.res = &res;
> > +	ndr_desc.numa_node = nid;
> > +	ndr_desc.flush = async_pmem_flush;
> > +	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > +	set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > +	nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> > +
> > +	if (!nd_region)
> > +		goto out_nd;
> > +	nd_region->provider_data =  dev_to_virtio
> 
> Delete extra space here ----------^^

ah. Sure. 

> I think this will let you join the two lines.
> 
> > +					(nd_region->dev.parent->parent);
> > +	return 0;
> > +out_nd:
> > +	err = -ENXIO;
> > +	nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > +out_vq:
> > +	vdev->config->del_vqs(vdev);
> > +out_err:
> > +	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > +	return err;
> > +}
> > +
> > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > +{
> > +	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > +
> > +	nvdimm_bus_unregister(nvdimm_bus);
> > +	vdev->config->del_vqs(vdev);
> > +	vdev->config->reset(vdev);
> > +}
> > +
> > +static struct virtio_driver virtio_pmem_driver = {
> > +	.driver.name		= KBUILD_MODNAME,
> > +	.driver.owner		= THIS_MODULE,
> > +	.id_table		= id_table,
> > +	.probe			= virtio_pmem_probe,
> > +	.remove			= virtio_pmem_remove,
> > +};
> > +
> > +module_virtio_driver(virtio_pmem_driver);
> > +MODULE_DEVICE_TABLE(virtio, id_table);
> > +MODULE_DESCRIPTION("Virtio pmem driver");
> > +MODULE_LICENSE("GPL");
> > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..ab1da877575d
> > --- /dev/null
> > +++ b/include/linux/virtio_pmem.h
> > @@ -0,0 +1,60 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * virtio_pmem.h: virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + **/
> > +
> > +#ifndef _LINUX_VIRTIO_PMEM_H
> > +#define _LINUX_VIRTIO_PMEM_H
> > +
> > +#include <linux/virtio_ids.h>
> > +#include <linux/module.h>
> > +#include <linux/virtio_config.h>
> > +#include <uapi/linux/virtio_pmem.h>
> > +#include <linux/libnvdimm.h>
> > +#include <linux/spinlock.h>
> > +
> > +struct virtio_pmem_request {
> > +	/* Host return status corresponding to flush request */
> > +	int ret;
> > +
> > +	/* command name*/
> > +	char name[16];
> > +
> > +	/* Wait queue to process deferred work after ack from host */
> > +	wait_queue_head_t host_acked;
> > +	bool done;
> > +
> > +	/* Wait queue to process deferred work after virt queue buffer avail */
> > +	wait_queue_head_t wq_buf;
> > +	bool wq_buf_avail;
> > +	struct list_head list;
> > +};
> > +
> > +struct virtio_pmem {
> > +	struct virtio_device *vdev;
> > +
> > +	/* Virtio pmem request queue */
> > +	struct virtqueue *req_vq;
> > +
> > +	/* nvdimm bus registers virtio pmem device */
> > +	struct nvdimm_bus *nvdimm_bus;
> > +	struct nvdimm_bus_descriptor nd_desc;
> > +
> > +	/* List to store deferred work if virtqueue is full */
> > +	struct list_head req_list;
> > +
> > +	/* Synchronize virtqueue data */
> > +	spinlock_t pmem_lock;
> > +
> > +	/* Memory region information */
> > +	uint64_t start;
> > +	uint64_t size;
> > +};
> > +
> > +void host_ack(struct virtqueue *vq);
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> > +#endif
> > diff --git a/include/uapi/linux/virtio_ids.h
> > b/include/uapi/linux/virtio_ids.h
> > index 6d5c3b2d4f4d..32b2f94d1f58 100644
> > --- a/include/uapi/linux/virtio_ids.h
> > +++ b/include/uapi/linux/virtio_ids.h
> > @@ -43,5 +43,6 @@
> >  #define VIRTIO_ID_INPUT        18 /* virtio input */
> >  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
> >  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> > +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
> >  
> >  #endif /* _LINUX_VIRTIO_IDS_H */
> > diff --git a/include/uapi/linux/virtio_pmem.h
> > b/include/uapi/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..fa3f7d52717a
> > --- /dev/null
> > +++ b/include/uapi/linux/virtio_pmem.h
> > @@ -0,0 +1,10 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> > +#define _UAPI_LINUX_VIRTIO_PMEM_H
> > +
> > +struct virtio_pmem_config {
> > +	__le64 start;
> > +	__le64 size;
> > +};
> > +#endif
> 
> Suggesting to fix the above minor formatting error.
> 
> With this:
> 
> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

Thank You!

Best regards,
Pankaj

> 
> > --
> > 2.20.1
> > 
> > 
> 
> 


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-04-30  5:53       ` Yuval Shaia
  (?)
  (?)
@ 2019-04-30  6:06       ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-04-30  6:06 UTC (permalink / raw)
  To: Yuval Shaia
  Cc: jack, kvm, mst, david, qemu-devel, virtualization,
	adilger kernel, zwisler, aarcange, dave jiang, linux-nvdimm,
	vishal l verma, willy, hch, linux-acpi, jmoyer, linux-ext4, lenb,
	kilobyte, riel, stefanha, pbonzini, dan j williams, lcapitulino,
	nilal, tytso, xiaoguangrong eric, cohuck, rjw, linux-kernel,
	linux-xfs, linux-fsdevel, imammedo, darrick wong


> 
> On Fri, Apr 26, 2019 at 10:30:35AM +0530, Pankaj Gupta wrote:
> > This patch adds virtio-pmem driver for KVM guest.
> > 
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> > 
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> > 
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> >  drivers/virtio/Kconfig           |  10 +++
> >  drivers/virtio/Makefile          |   1 +
> >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  10 +++
> >  7 files changed, 314 insertions(+)
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 drivers/virtio/pmem.c
> >  create mode 100644 include/linux/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > 
> > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > new file mode 100644
> > index 000000000000..66b582f751a3
> > --- /dev/null
> > +++ b/drivers/nvdimm/virtio_pmem.c
> > @@ -0,0 +1,114 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include "nd.h"
> > +
> > + /* The interrupt handler */
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +	unsigned int len;
> > +	unsigned long flags;
> > +	struct virtio_pmem_request *req, *req_buf;
> > +	struct virtio_pmem *vpmem = vq->vdev->priv;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > +		req->done = true;
> > +		wake_up(&req->host_acked);
> > +
> > +		if (!list_empty(&vpmem->req_list)) {
> > +			req_buf = list_first_entry(&vpmem->req_list,
> > +					struct virtio_pmem_request, list);
> > +			list_del(&vpmem->req_list);
> > +			req_buf->wq_buf_avail = true;
> > +			wake_up(&req_buf->wq_buf);
> > +		}
> > +	}
> > +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(host_ack);
> > +
> > + /* The request submission function */
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +	int err;
> > +	unsigned long flags;
> > +	struct scatterlist *sgs[2], sg, ret;
> > +	struct virtio_device *vdev = nd_region->provider_data;
> > +	struct virtio_pmem *vpmem = vdev->priv;
> > +	struct virtio_pmem_request *req;
> > +
> > +	might_sleep();
> > +	req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +	if (!req)
> > +		return -ENOMEM;
> > +
> > +	req->done = req->wq_buf_avail = false;
> > +	strcpy(req->name, "FLUSH");
> > +	init_waitqueue_head(&req->host_acked);
> > +	init_waitqueue_head(&req->wq_buf);
> > +	sg_init_one(&sg, req->name, strlen(req->name));
> > +	sgs[0] = &sg;
> > +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > +	sgs[1] = &ret;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > +	if (err) {
> > +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> > +
> > +		list_add_tail(&vpmem->req_list, &req->list);
> > +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +		/* When host has read buffer, this completes via host_ack */
> > +		wait_event(req->wq_buf, req->wq_buf_avail);
> > +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	}
> > +	err = virtqueue_kick(vpmem->req_vq);
> > +	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +	if (!err) {
> > +		err = -EIO;
> > +		goto ret;
> > +	}
> > +	/* When host has read buffer, this completes via host_ack */
> > +	wait_event(req->host_acked, req->done);
> > +	err = req->ret;
> > +ret:
> > +	kfree(req);
> > +	return err;
> > +};
> > +
> > + /* The asynchronous flush callback function */
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > +{
> > +	int rc = 0;
> > +
> > +	/* Create child bio for asynchronous flush and chain with
> > +	 * parent bio. Otherwise directly call nd_region flush.
> > +	 */
> > +	if (bio && bio->bi_iter.bi_sector != -1) {
> > +		struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > +
> > +		if (!child)
> > +			return -ENOMEM;
> > +		bio_copy_dev(child, bio);
> > +		child->bi_opf = REQ_PREFLUSH;
> > +		child->bi_iter.bi_sector = -1;
> > +		bio_chain(child, bio);
> > +		submit_bio(child);
> > +	} else {
> > +		if (virtio_pmem_flush(nd_region))
> > +			rc = -EIO;
> > +	}
> > +
> > +	return rc;
> > +};
> > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > +MODULE_LICENSE("GPL");
> > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > index 35897649c24f..9f634a2ed638 100644
> > --- a/drivers/virtio/Kconfig
> > +++ b/drivers/virtio/Kconfig
> > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> >  
> >  	  If unsure, say Y.
> >  
> > +config VIRTIO_PMEM
> > +	tristate "Support for virtio pmem driver"
> > +	depends on VIRTIO
> > +	depends on LIBNVDIMM
> > +	help
> > +	This driver provides support for virtio based flushing interface
> > +	for persistent memory range.
> > +
> > +	If unsure, say M.
> > +
> >  config VIRTIO_BALLOON
> >  	tristate "Virtio balloon driver"
> >  	depends on VIRTIO
> > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > index 3a2b5c5dcf46..143ce91eabe9 100644
> > --- a/drivers/virtio/Makefile
> > +++ b/drivers/virtio/Makefile
> > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > new file mode 100644
> > index 000000000000..309788628e41
> > --- /dev/null
> > +++ b/drivers/virtio/pmem.c
> > @@ -0,0 +1,118 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and registers the virtual pmem device
> > + * with libnvdimm core.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include <../../drivers/nvdimm/nd.h>
> > +
> > +static struct virtio_device_id id_table[] = {
> > +	{ VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > +	{ 0 },
> > +};
> > +
> > + /* Initialize virt queue */
> > +static int init_vq(struct virtio_pmem *vpmem)
> > +{
> > +	/* single vq */
> > +	vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > +				host_ack, "flush_queue");
> > +	if (IS_ERR(vpmem->req_vq))
> > +		return PTR_ERR(vpmem->req_vq);
> > +
> > +	spin_lock_init(&vpmem->pmem_lock);
> > +	INIT_LIST_HEAD(&vpmem->req_list);
> > +
> > +	return 0;
> > +};
> > +
> > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > +{
> > +	int err = 0;
> > +	struct resource res;
> > +	struct virtio_pmem *vpmem;
> > +	struct nd_region_desc ndr_desc = {};
> > +	int nid = dev_to_node(&vdev->dev);
> > +	struct nd_region *nd_region;
> > +
> > +	if (!vdev->config->get) {
> > +		dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > +			__func__);
> > +		return -EINVAL;
> > +	}
> > +
> > +	vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > +	if (!vpmem) {
> > +		err = -ENOMEM;
> > +		goto out_err;
> > +	}
> > +
> > +	vpmem->vdev = vdev;
> > +	vdev->priv = vpmem;
> > +	err = init_vq(vpmem);
> > +	if (err)
> > +		goto out_err;
> > +
> > +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +			start, &vpmem->start);
> > +	virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +			size, &vpmem->size);
> > +
> > +	res.start = vpmem->start;
> > +	res.end   = vpmem->start + vpmem->size-1;
> > +	vpmem->nd_desc.provider_name = "virtio-pmem";
> > +	vpmem->nd_desc.module = THIS_MODULE;
> > +
> > +	vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > +						&vpmem->nd_desc);
> > +	if (!vpmem->nvdimm_bus)
> > +		goto out_vq;
> > +
> > +	dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > +
> > +	ndr_desc.res = &res;
> > +	ndr_desc.numa_node = nid;
> > +	ndr_desc.flush = async_pmem_flush;
> > +	set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > +	set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > +	nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> > +
> > +	if (!nd_region)
> > +		goto out_nd;
> > +	nd_region->provider_data =  dev_to_virtio
> 
> Delete extra space here ----------^^

ah. Sure. 

> I think this will let you join the two lines.
> 
> > +					(nd_region->dev.parent->parent);
> > +	return 0;
> > +out_nd:
> > +	err = -ENXIO;
> > +	nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > +out_vq:
> > +	vdev->config->del_vqs(vdev);
> > +out_err:
> > +	dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > +	return err;
> > +}
> > +
> > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > +{
> > +	struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > +
> > +	nvdimm_bus_unregister(nvdimm_bus);
> > +	vdev->config->del_vqs(vdev);
> > +	vdev->config->reset(vdev);
> > +}
> > +
> > +static struct virtio_driver virtio_pmem_driver = {
> > +	.driver.name		= KBUILD_MODNAME,
> > +	.driver.owner		= THIS_MODULE,
> > +	.id_table		= id_table,
> > +	.probe			= virtio_pmem_probe,
> > +	.remove			= virtio_pmem_remove,
> > +};
> > +
> > +module_virtio_driver(virtio_pmem_driver);
> > +MODULE_DEVICE_TABLE(virtio, id_table);
> > +MODULE_DESCRIPTION("Virtio pmem driver");
> > +MODULE_LICENSE("GPL");
> > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..ab1da877575d
> > --- /dev/null
> > +++ b/include/linux/virtio_pmem.h
> > @@ -0,0 +1,60 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * virtio_pmem.h: virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + **/
> > +
> > +#ifndef _LINUX_VIRTIO_PMEM_H
> > +#define _LINUX_VIRTIO_PMEM_H
> > +
> > +#include <linux/virtio_ids.h>
> > +#include <linux/module.h>
> > +#include <linux/virtio_config.h>
> > +#include <uapi/linux/virtio_pmem.h>
> > +#include <linux/libnvdimm.h>
> > +#include <linux/spinlock.h>
> > +
> > +struct virtio_pmem_request {
> > +	/* Host return status corresponding to flush request */
> > +	int ret;
> > +
> > +	/* command name*/
> > +	char name[16];
> > +
> > +	/* Wait queue to process deferred work after ack from host */
> > +	wait_queue_head_t host_acked;
> > +	bool done;
> > +
> > +	/* Wait queue to process deferred work after virt queue buffer avail */
> > +	wait_queue_head_t wq_buf;
> > +	bool wq_buf_avail;
> > +	struct list_head list;
> > +};
> > +
> > +struct virtio_pmem {
> > +	struct virtio_device *vdev;
> > +
> > +	/* Virtio pmem request queue */
> > +	struct virtqueue *req_vq;
> > +
> > +	/* nvdimm bus registers virtio pmem device */
> > +	struct nvdimm_bus *nvdimm_bus;
> > +	struct nvdimm_bus_descriptor nd_desc;
> > +
> > +	/* List to store deferred work if virtqueue is full */
> > +	struct list_head req_list;
> > +
> > +	/* Synchronize virtqueue data */
> > +	spinlock_t pmem_lock;
> > +
> > +	/* Memory region information */
> > +	uint64_t start;
> > +	uint64_t size;
> > +};
> > +
> > +void host_ack(struct virtqueue *vq);
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> > +#endif
> > diff --git a/include/uapi/linux/virtio_ids.h
> > b/include/uapi/linux/virtio_ids.h
> > index 6d5c3b2d4f4d..32b2f94d1f58 100644
> > --- a/include/uapi/linux/virtio_ids.h
> > +++ b/include/uapi/linux/virtio_ids.h
> > @@ -43,5 +43,6 @@
> >  #define VIRTIO_ID_INPUT        18 /* virtio input */
> >  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
> >  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> > +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
> >  
> >  #endif /* _LINUX_VIRTIO_IDS_H */
> > diff --git a/include/uapi/linux/virtio_pmem.h
> > b/include/uapi/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..fa3f7d52717a
> > --- /dev/null
> > +++ b/include/uapi/linux/virtio_pmem.h
> > @@ -0,0 +1,10 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> > +#define _UAPI_LINUX_VIRTIO_PMEM_H
> > +
> > +struct virtio_pmem_config {
> > +	__le64 start;
> > +	__le64 size;
> > +};
> > +#endif
> 
> Suggesting to fix the above minor formatting error.
> 
> With this:
> 
> Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>

Thank You!

Best regards,
Pankaj

> 
> > --
> > 2.20.1
> > 
> > 
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-04-26  5:00   ` Pankaj Gupta
  (?)
  (?)
@ 2019-05-07 15:35     ` Dan Williams
  -1 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-07 15:35 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang,
	david, Qemu Developers, virtualization, Andreas Dilger,
	Ross Zwisler, Andrea Arcangeli, linux-nvdimm, David Hildenbrand,
	Matthew Wilcox, Christoph Hellwig, Linux ACPI, linux-ext4,
	Len Brown, kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Paolo Bonzini, lcapitulino, Kevin Wolf, Nitesh Narayan Lal,
	Theodore Ts'o, Xiao Guangrong, Darrick J. Wong,
	Rafael J. Wysocki, Linux Kernel Mailing List, linux-xfs,
	linux-fsdevel, Igor Mammedov

Hi Pankaj,

Some minor file placement comments below.

On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> This patch adds virtio-pmem driver for KVM guest.
>
> Guest reads the persistent memory range information from
> Qemu over VIRTIO and registers it on nvdimm_bus. It also
> creates a nd_region object with the persistent memory
> range information so that existing 'nvdimm/pmem' driver
> can reserve this into system memory map. This way
> 'virtio-pmem' driver uses existing functionality of pmem
> driver to register persistent memory compatible for DAX
> capable filesystems.
>
> This also provides function to perform guest flush over
> VIRTIO from 'pmem' driver when userspace performs flush
> on DAX memory range.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
>  drivers/virtio/Kconfig           |  10 +++
>  drivers/virtio/Makefile          |   1 +
>  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
>  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
>  include/uapi/linux/virtio_ids.h  |   1 +
>  include/uapi/linux/virtio_pmem.h |  10 +++
>  7 files changed, 314 insertions(+)
>  create mode 100644 drivers/nvdimm/virtio_pmem.c
>  create mode 100644 drivers/virtio/pmem.c
>  create mode 100644 include/linux/virtio_pmem.h
>  create mode 100644 include/uapi/linux/virtio_pmem.h
>
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> new file mode 100644
> index 000000000000..66b582f751a3
> --- /dev/null
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + */
> +#include <linux/virtio_pmem.h>
> +#include "nd.h"
> +
> + /* The interrupt handler */
> +void host_ack(struct virtqueue *vq)
> +{
> +       unsigned int len;
> +       unsigned long flags;
> +       struct virtio_pmem_request *req, *req_buf;
> +       struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> +               req->done = true;
> +               wake_up(&req->host_acked);
> +
> +               if (!list_empty(&vpmem->req_list)) {
> +                       req_buf = list_first_entry(&vpmem->req_list,
> +                                       struct virtio_pmem_request, list);
> +                       list_del(&vpmem->req_list);
> +                       req_buf->wq_buf_avail = true;
> +                       wake_up(&req_buf->wq_buf);
> +               }
> +       }
> +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(host_ack);
> +
> + /* The request submission function */
> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> +       int err;
> +       unsigned long flags;
> +       struct scatterlist *sgs[2], sg, ret;
> +       struct virtio_device *vdev = nd_region->provider_data;
> +       struct virtio_pmem *vpmem = vdev->priv;
> +       struct virtio_pmem_request *req;
> +
> +       might_sleep();
> +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> +       if (!req)
> +               return -ENOMEM;
> +
> +       req->done = req->wq_buf_avail = false;
> +       strcpy(req->name, "FLUSH");
> +       init_waitqueue_head(&req->host_acked);
> +       init_waitqueue_head(&req->wq_buf);
> +       sg_init_one(&sg, req->name, strlen(req->name));
> +       sgs[0] = &sg;
> +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> +       sgs[1] = &ret;
> +
> +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> +       if (err) {
> +               dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +
> +               list_add_tail(&vpmem->req_list, &req->list);
> +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +               /* When host has read buffer, this completes via host_ack */
> +               wait_event(req->wq_buf, req->wq_buf_avail);
> +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       }
> +       err = virtqueue_kick(vpmem->req_vq);
> +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +       if (!err) {
> +               err = -EIO;
> +               goto ret;
> +       }
> +       /* When host has read buffer, this completes via host_ack */
> +       wait_event(req->host_acked, req->done);
> +       err = req->ret;
> +ret:
> +       kfree(req);
> +       return err;
> +};
> +
> + /* The asynchronous flush callback function */
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> +{
> +       int rc = 0;
> +
> +       /* Create child bio for asynchronous flush and chain with
> +        * parent bio. Otherwise directly call nd_region flush.
> +        */
> +       if (bio && bio->bi_iter.bi_sector != -1) {
> +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> +
> +               if (!child)
> +                       return -ENOMEM;
> +               bio_copy_dev(child, bio);
> +               child->bi_opf = REQ_PREFLUSH;
> +               child->bi_iter.bi_sector = -1;
> +               bio_chain(child, bio);
> +               submit_bio(child);
> +       } else {
> +               if (virtio_pmem_flush(nd_region))
> +                       rc = -EIO;
> +       }
> +
> +       return rc;
> +};
> +EXPORT_SYMBOL_GPL(async_pmem_flush);
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 35897649c24f..9f634a2ed638 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
>
>           If unsure, say Y.
>
> +config VIRTIO_PMEM
> +       tristate "Support for virtio pmem driver"
> +       depends on VIRTIO
> +       depends on LIBNVDIMM
> +       help
> +       This driver provides support for virtio based flushing interface
> +       for persistent memory range.
> +
> +       If unsure, say M.
> +
>  config VIRTIO_BALLOON
>         tristate "Virtio balloon driver"
>         depends on VIRTIO
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 3a2b5c5dcf46..143ce91eabe9 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> new file mode 100644
> index 000000000000..309788628e41
> --- /dev/null
> +++ b/drivers/virtio/pmem.c

It's not clear to me why this driver is located in drivers/virtio/

> @@ -0,0 +1,118 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and registers the virtual pmem device
> + * with libnvdimm core.
> + */
> +#include <linux/virtio_pmem.h>
> +#include <../../drivers/nvdimm/nd.h>

...especially because it seems to require nvdimm internals.

However I don't see why that header is included.

In any event lets move this to drivers/nvdimm/virtio.c to live
alongside the other generic bus provider drivers/nvdimm/e820.c.

> +
> +static struct virtio_device_id id_table[] = {
> +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> +       { 0 },
> +};
> +
> + /* Initialize virt queue */
> +static int init_vq(struct virtio_pmem *vpmem)
> +{
> +       /* single vq */
> +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> +                               host_ack, "flush_queue");
> +       if (IS_ERR(vpmem->req_vq))
> +               return PTR_ERR(vpmem->req_vq);
> +
> +       spin_lock_init(&vpmem->pmem_lock);
> +       INIT_LIST_HEAD(&vpmem->req_list);
> +
> +       return 0;
> +};
> +
> +static int virtio_pmem_probe(struct virtio_device *vdev)
> +{
> +       int err = 0;
> +       struct resource res;
> +       struct virtio_pmem *vpmem;
> +       struct nd_region_desc ndr_desc = {};
> +       int nid = dev_to_node(&vdev->dev);
> +       struct nd_region *nd_region;
> +
> +       if (!vdev->config->get) {
> +               dev_err(&vdev->dev, "%s failure: config access disabled\n",
> +                       __func__);
> +               return -EINVAL;
> +       }
> +
> +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> +       if (!vpmem) {
> +               err = -ENOMEM;
> +               goto out_err;
> +       }
> +
> +       vpmem->vdev = vdev;
> +       vdev->priv = vpmem;
> +       err = init_vq(vpmem);
> +       if (err)
> +               goto out_err;
> +
> +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +                       start, &vpmem->start);
> +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +                       size, &vpmem->size);
> +
> +       res.start = vpmem->start;
> +       res.end   = vpmem->start + vpmem->size-1;
> +       vpmem->nd_desc.provider_name = "virtio-pmem";
> +       vpmem->nd_desc.module = THIS_MODULE;
> +
> +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> +                                               &vpmem->nd_desc);
> +       if (!vpmem->nvdimm_bus)
> +               goto out_vq;
> +
> +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> +
> +       ndr_desc.res = &res;
> +       ndr_desc.numa_node = nid;
> +       ndr_desc.flush = async_pmem_flush;
> +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> +
> +       if (!nd_region)
> +               goto out_nd;
> +       nd_region->provider_data =  dev_to_virtio
> +                                       (nd_region->dev.parent->parent);
> +       return 0;
> +out_nd:
> +       err = -ENXIO;
> +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> +out_vq:
> +       vdev->config->del_vqs(vdev);
> +out_err:
> +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> +       return err;
> +}
> +
> +static void virtio_pmem_remove(struct virtio_device *vdev)
> +{
> +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> +
> +       nvdimm_bus_unregister(nvdimm_bus);
> +       vdev->config->del_vqs(vdev);
> +       vdev->config->reset(vdev);
> +}
> +
> +static struct virtio_driver virtio_pmem_driver = {
> +       .driver.name            = KBUILD_MODNAME,
> +       .driver.owner           = THIS_MODULE,
> +       .id_table               = id_table,
> +       .probe                  = virtio_pmem_probe,
> +       .remove                 = virtio_pmem_remove,
> +};
> +
> +module_virtio_driver(virtio_pmem_driver);
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +MODULE_DESCRIPTION("Virtio pmem driver");
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..ab1da877575d
> --- /dev/null
> +++ b/include/linux/virtio_pmem.h

Why is this a global header?

Seems it can move to drivers/nvdimm/virtio.h.

Also, I'd like to get a virtio ack from Michael (mst@redhat.com)
before taking this through the nvdimm tree.

> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * virtio_pmem.h: virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + **/
> +
> +#ifndef _LINUX_VIRTIO_PMEM_H
> +#define _LINUX_VIRTIO_PMEM_H
> +
> +#include <linux/virtio_ids.h>
> +#include <linux/module.h>
> +#include <linux/virtio_config.h>
> +#include <uapi/linux/virtio_pmem.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/spinlock.h>
> +
> +struct virtio_pmem_request {
> +       /* Host return status corresponding to flush request */
> +       int ret;
> +
> +       /* command name*/
> +       char name[16];
> +
> +       /* Wait queue to process deferred work after ack from host */
> +       wait_queue_head_t host_acked;
> +       bool done;
> +
> +       /* Wait queue to process deferred work after virt queue buffer avail */
> +       wait_queue_head_t wq_buf;
> +       bool wq_buf_avail;
> +       struct list_head list;
> +};
> +
> +struct virtio_pmem {
> +       struct virtio_device *vdev;
> +
> +       /* Virtio pmem request queue */
> +       struct virtqueue *req_vq;
> +
> +       /* nvdimm bus registers virtio pmem device */
> +       struct nvdimm_bus *nvdimm_bus;
> +       struct nvdimm_bus_descriptor nd_desc;
> +
> +       /* List to store deferred work if virtqueue is full */
> +       struct list_head req_list;
> +
> +       /* Synchronize virtqueue data */
> +       spinlock_t pmem_lock;
> +
> +       /* Memory region information */
> +       uint64_t start;
> +       uint64_t size;
> +};
> +
> +void host_ack(struct virtqueue *vq);
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> +#endif
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 6d5c3b2d4f4d..32b2f94d1f58 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -43,5 +43,6 @@
>  #define VIRTIO_ID_INPUT        18 /* virtio input */
>  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
>  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
>
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..fa3f7d52717a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_pmem.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> +#define _UAPI_LINUX_VIRTIO_PMEM_H
> +
> +struct virtio_pmem_config {
> +       __le64 start;
> +       __le64 size;
> +};
> +#endif
> --
> 2.20.1
>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-07 15:35     ` Dan Williams
  0 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-07 15:35 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-nvdimm, Linux Kernel Mailing List, virtualization,
	KVM list, linux-fsdevel, Linux ACPI, Qemu Developers, linux-ext4,
	linux-xfs, Ross Zwisler, Vishal L Verma, Dave Jiang,
	Michael S. Tsirkin, Jason Wang, Matthew Wilcox,
	Rafael J. Wysocki, Christoph Hellwig, Len Brown, Jan Kara,
	Theodore

Hi Pankaj,

Some minor file placement comments below.

On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> This patch adds virtio-pmem driver for KVM guest.
>
> Guest reads the persistent memory range information from
> Qemu over VIRTIO and registers it on nvdimm_bus. It also
> creates a nd_region object with the persistent memory
> range information so that existing 'nvdimm/pmem' driver
> can reserve this into system memory map. This way
> 'virtio-pmem' driver uses existing functionality of pmem
> driver to register persistent memory compatible for DAX
> capable filesystems.
>
> This also provides function to perform guest flush over
> VIRTIO from 'pmem' driver when userspace performs flush
> on DAX memory range.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
>  drivers/virtio/Kconfig           |  10 +++
>  drivers/virtio/Makefile          |   1 +
>  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
>  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
>  include/uapi/linux/virtio_ids.h  |   1 +
>  include/uapi/linux/virtio_pmem.h |  10 +++
>  7 files changed, 314 insertions(+)
>  create mode 100644 drivers/nvdimm/virtio_pmem.c
>  create mode 100644 drivers/virtio/pmem.c
>  create mode 100644 include/linux/virtio_pmem.h
>  create mode 100644 include/uapi/linux/virtio_pmem.h
>
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> new file mode 100644
> index 000000000000..66b582f751a3
> --- /dev/null
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + */
> +#include <linux/virtio_pmem.h>
> +#include "nd.h"
> +
> + /* The interrupt handler */
> +void host_ack(struct virtqueue *vq)
> +{
> +       unsigned int len;
> +       unsigned long flags;
> +       struct virtio_pmem_request *req, *req_buf;
> +       struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> +               req->done = true;
> +               wake_up(&req->host_acked);
> +
> +               if (!list_empty(&vpmem->req_list)) {
> +                       req_buf = list_first_entry(&vpmem->req_list,
> +                                       struct virtio_pmem_request, list);
> +                       list_del(&vpmem->req_list);
> +                       req_buf->wq_buf_avail = true;
> +                       wake_up(&req_buf->wq_buf);
> +               }
> +       }
> +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(host_ack);
> +
> + /* The request submission function */
> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> +       int err;
> +       unsigned long flags;
> +       struct scatterlist *sgs[2], sg, ret;
> +       struct virtio_device *vdev = nd_region->provider_data;
> +       struct virtio_pmem *vpmem = vdev->priv;
> +       struct virtio_pmem_request *req;
> +
> +       might_sleep();
> +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> +       if (!req)
> +               return -ENOMEM;
> +
> +       req->done = req->wq_buf_avail = false;
> +       strcpy(req->name, "FLUSH");
> +       init_waitqueue_head(&req->host_acked);
> +       init_waitqueue_head(&req->wq_buf);
> +       sg_init_one(&sg, req->name, strlen(req->name));
> +       sgs[0] = &sg;
> +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> +       sgs[1] = &ret;
> +
> +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> +       if (err) {
> +               dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +
> +               list_add_tail(&vpmem->req_list, &req->list);
> +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +               /* When host has read buffer, this completes via host_ack */
> +               wait_event(req->wq_buf, req->wq_buf_avail);
> +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       }
> +       err = virtqueue_kick(vpmem->req_vq);
> +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +       if (!err) {
> +               err = -EIO;
> +               goto ret;
> +       }
> +       /* When host has read buffer, this completes via host_ack */
> +       wait_event(req->host_acked, req->done);
> +       err = req->ret;
> +ret:
> +       kfree(req);
> +       return err;
> +};
> +
> + /* The asynchronous flush callback function */
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> +{
> +       int rc = 0;
> +
> +       /* Create child bio for asynchronous flush and chain with
> +        * parent bio. Otherwise directly call nd_region flush.
> +        */
> +       if (bio && bio->bi_iter.bi_sector != -1) {
> +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> +
> +               if (!child)
> +                       return -ENOMEM;
> +               bio_copy_dev(child, bio);
> +               child->bi_opf = REQ_PREFLUSH;
> +               child->bi_iter.bi_sector = -1;
> +               bio_chain(child, bio);
> +               submit_bio(child);
> +       } else {
> +               if (virtio_pmem_flush(nd_region))
> +                       rc = -EIO;
> +       }
> +
> +       return rc;
> +};
> +EXPORT_SYMBOL_GPL(async_pmem_flush);
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 35897649c24f..9f634a2ed638 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
>
>           If unsure, say Y.
>
> +config VIRTIO_PMEM
> +       tristate "Support for virtio pmem driver"
> +       depends on VIRTIO
> +       depends on LIBNVDIMM
> +       help
> +       This driver provides support for virtio based flushing interface
> +       for persistent memory range.
> +
> +       If unsure, say M.
> +
>  config VIRTIO_BALLOON
>         tristate "Virtio balloon driver"
>         depends on VIRTIO
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 3a2b5c5dcf46..143ce91eabe9 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> new file mode 100644
> index 000000000000..309788628e41
> --- /dev/null
> +++ b/drivers/virtio/pmem.c

It's not clear to me why this driver is located in drivers/virtio/

> @@ -0,0 +1,118 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and registers the virtual pmem device
> + * with libnvdimm core.
> + */
> +#include <linux/virtio_pmem.h>
> +#include <../../drivers/nvdimm/nd.h>

...especially because it seems to require nvdimm internals.

However I don't see why that header is included.

In any event lets move this to drivers/nvdimm/virtio.c to live
alongside the other generic bus provider drivers/nvdimm/e820.c.

> +
> +static struct virtio_device_id id_table[] = {
> +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> +       { 0 },
> +};
> +
> + /* Initialize virt queue */
> +static int init_vq(struct virtio_pmem *vpmem)
> +{
> +       /* single vq */
> +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> +                               host_ack, "flush_queue");
> +       if (IS_ERR(vpmem->req_vq))
> +               return PTR_ERR(vpmem->req_vq);
> +
> +       spin_lock_init(&vpmem->pmem_lock);
> +       INIT_LIST_HEAD(&vpmem->req_list);
> +
> +       return 0;
> +};
> +
> +static int virtio_pmem_probe(struct virtio_device *vdev)
> +{
> +       int err = 0;
> +       struct resource res;
> +       struct virtio_pmem *vpmem;
> +       struct nd_region_desc ndr_desc = {};
> +       int nid = dev_to_node(&vdev->dev);
> +       struct nd_region *nd_region;
> +
> +       if (!vdev->config->get) {
> +               dev_err(&vdev->dev, "%s failure: config access disabled\n",
> +                       __func__);
> +               return -EINVAL;
> +       }
> +
> +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> +       if (!vpmem) {
> +               err = -ENOMEM;
> +               goto out_err;
> +       }
> +
> +       vpmem->vdev = vdev;
> +       vdev->priv = vpmem;
> +       err = init_vq(vpmem);
> +       if (err)
> +               goto out_err;
> +
> +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +                       start, &vpmem->start);
> +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +                       size, &vpmem->size);
> +
> +       res.start = vpmem->start;
> +       res.end   = vpmem->start + vpmem->size-1;
> +       vpmem->nd_desc.provider_name = "virtio-pmem";
> +       vpmem->nd_desc.module = THIS_MODULE;
> +
> +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> +                                               &vpmem->nd_desc);
> +       if (!vpmem->nvdimm_bus)
> +               goto out_vq;
> +
> +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> +
> +       ndr_desc.res = &res;
> +       ndr_desc.numa_node = nid;
> +       ndr_desc.flush = async_pmem_flush;
> +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> +
> +       if (!nd_region)
> +               goto out_nd;
> +       nd_region->provider_data =  dev_to_virtio
> +                                       (nd_region->dev.parent->parent);
> +       return 0;
> +out_nd:
> +       err = -ENXIO;
> +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> +out_vq:
> +       vdev->config->del_vqs(vdev);
> +out_err:
> +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> +       return err;
> +}
> +
> +static void virtio_pmem_remove(struct virtio_device *vdev)
> +{
> +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> +
> +       nvdimm_bus_unregister(nvdimm_bus);
> +       vdev->config->del_vqs(vdev);
> +       vdev->config->reset(vdev);
> +}
> +
> +static struct virtio_driver virtio_pmem_driver = {
> +       .driver.name            = KBUILD_MODNAME,
> +       .driver.owner           = THIS_MODULE,
> +       .id_table               = id_table,
> +       .probe                  = virtio_pmem_probe,
> +       .remove                 = virtio_pmem_remove,
> +};
> +
> +module_virtio_driver(virtio_pmem_driver);
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +MODULE_DESCRIPTION("Virtio pmem driver");
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..ab1da877575d
> --- /dev/null
> +++ b/include/linux/virtio_pmem.h

Why is this a global header?

Seems it can move to drivers/nvdimm/virtio.h.

Also, I'd like to get a virtio ack from Michael (mst@redhat.com)
before taking this through the nvdimm tree.

> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * virtio_pmem.h: virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + **/
> +
> +#ifndef _LINUX_VIRTIO_PMEM_H
> +#define _LINUX_VIRTIO_PMEM_H
> +
> +#include <linux/virtio_ids.h>
> +#include <linux/module.h>
> +#include <linux/virtio_config.h>
> +#include <uapi/linux/virtio_pmem.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/spinlock.h>
> +
> +struct virtio_pmem_request {
> +       /* Host return status corresponding to flush request */
> +       int ret;
> +
> +       /* command name*/
> +       char name[16];
> +
> +       /* Wait queue to process deferred work after ack from host */
> +       wait_queue_head_t host_acked;
> +       bool done;
> +
> +       /* Wait queue to process deferred work after virt queue buffer avail */
> +       wait_queue_head_t wq_buf;
> +       bool wq_buf_avail;
> +       struct list_head list;
> +};
> +
> +struct virtio_pmem {
> +       struct virtio_device *vdev;
> +
> +       /* Virtio pmem request queue */
> +       struct virtqueue *req_vq;
> +
> +       /* nvdimm bus registers virtio pmem device */
> +       struct nvdimm_bus *nvdimm_bus;
> +       struct nvdimm_bus_descriptor nd_desc;
> +
> +       /* List to store deferred work if virtqueue is full */
> +       struct list_head req_list;
> +
> +       /* Synchronize virtqueue data */
> +       spinlock_t pmem_lock;
> +
> +       /* Memory region information */
> +       uint64_t start;
> +       uint64_t size;
> +};
> +
> +void host_ack(struct virtqueue *vq);
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> +#endif
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 6d5c3b2d4f4d..32b2f94d1f58 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -43,5 +43,6 @@
>  #define VIRTIO_ID_INPUT        18 /* virtio input */
>  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
>  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
>
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..fa3f7d52717a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_pmem.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> +#define _UAPI_LINUX_VIRTIO_PMEM_H
> +
> +struct virtio_pmem_config {
> +       __le64 start;
> +       __le64 size;
> +};
> +#endif
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-07 15:35     ` Dan Williams
  0 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-07 15:35 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-nvdimm, Linux Kernel Mailing List, virtualization,
	KVM list, linux-fsdevel, Linux ACPI, Qemu Developers, linux-ext4,
	linux-xfs, Ross Zwisler, Vishal L Verma, Dave Jiang,
	Michael S. Tsirkin, Jason Wang, Matthew Wilcox,
	Rafael J. Wysocki, Christoph Hellwig, Len Brown, Jan Kara,
	Theodore Ts'o, Andreas Dilger, Darrick J. Wong, lcapitulino,
	Kevin Wolf, Igor Mammedov, jmoyer, Nitesh Narayan Lal,
	Rik van Riel, Stefan Hajnoczi, Andrea Arcangeli,
	David Hildenbrand, david, cohuck, Xiao Guangrong, Paolo Bonzini,
	kilobyte, yuval shaia

Hi Pankaj,

Some minor file placement comments below.

On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> This patch adds virtio-pmem driver for KVM guest.
>
> Guest reads the persistent memory range information from
> Qemu over VIRTIO and registers it on nvdimm_bus. It also
> creates a nd_region object with the persistent memory
> range information so that existing 'nvdimm/pmem' driver
> can reserve this into system memory map. This way
> 'virtio-pmem' driver uses existing functionality of pmem
> driver to register persistent memory compatible for DAX
> capable filesystems.
>
> This also provides function to perform guest flush over
> VIRTIO from 'pmem' driver when userspace performs flush
> on DAX memory range.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
>  drivers/virtio/Kconfig           |  10 +++
>  drivers/virtio/Makefile          |   1 +
>  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
>  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
>  include/uapi/linux/virtio_ids.h  |   1 +
>  include/uapi/linux/virtio_pmem.h |  10 +++
>  7 files changed, 314 insertions(+)
>  create mode 100644 drivers/nvdimm/virtio_pmem.c
>  create mode 100644 drivers/virtio/pmem.c
>  create mode 100644 include/linux/virtio_pmem.h
>  create mode 100644 include/uapi/linux/virtio_pmem.h
>
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> new file mode 100644
> index 000000000000..66b582f751a3
> --- /dev/null
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + */
> +#include <linux/virtio_pmem.h>
> +#include "nd.h"
> +
> + /* The interrupt handler */
> +void host_ack(struct virtqueue *vq)
> +{
> +       unsigned int len;
> +       unsigned long flags;
> +       struct virtio_pmem_request *req, *req_buf;
> +       struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> +               req->done = true;
> +               wake_up(&req->host_acked);
> +
> +               if (!list_empty(&vpmem->req_list)) {
> +                       req_buf = list_first_entry(&vpmem->req_list,
> +                                       struct virtio_pmem_request, list);
> +                       list_del(&vpmem->req_list);
> +                       req_buf->wq_buf_avail = true;
> +                       wake_up(&req_buf->wq_buf);
> +               }
> +       }
> +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(host_ack);
> +
> + /* The request submission function */
> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> +       int err;
> +       unsigned long flags;
> +       struct scatterlist *sgs[2], sg, ret;
> +       struct virtio_device *vdev = nd_region->provider_data;
> +       struct virtio_pmem *vpmem = vdev->priv;
> +       struct virtio_pmem_request *req;
> +
> +       might_sleep();
> +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> +       if (!req)
> +               return -ENOMEM;
> +
> +       req->done = req->wq_buf_avail = false;
> +       strcpy(req->name, "FLUSH");
> +       init_waitqueue_head(&req->host_acked);
> +       init_waitqueue_head(&req->wq_buf);
> +       sg_init_one(&sg, req->name, strlen(req->name));
> +       sgs[0] = &sg;
> +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> +       sgs[1] = &ret;
> +
> +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> +       if (err) {
> +               dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +
> +               list_add_tail(&vpmem->req_list, &req->list);
> +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +               /* When host has read buffer, this completes via host_ack */
> +               wait_event(req->wq_buf, req->wq_buf_avail);
> +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       }
> +       err = virtqueue_kick(vpmem->req_vq);
> +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +       if (!err) {
> +               err = -EIO;
> +               goto ret;
> +       }
> +       /* When host has read buffer, this completes via host_ack */
> +       wait_event(req->host_acked, req->done);
> +       err = req->ret;
> +ret:
> +       kfree(req);
> +       return err;
> +};
> +
> + /* The asynchronous flush callback function */
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> +{
> +       int rc = 0;
> +
> +       /* Create child bio for asynchronous flush and chain with
> +        * parent bio. Otherwise directly call nd_region flush.
> +        */
> +       if (bio && bio->bi_iter.bi_sector != -1) {
> +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> +
> +               if (!child)
> +                       return -ENOMEM;
> +               bio_copy_dev(child, bio);
> +               child->bi_opf = REQ_PREFLUSH;
> +               child->bi_iter.bi_sector = -1;
> +               bio_chain(child, bio);
> +               submit_bio(child);
> +       } else {
> +               if (virtio_pmem_flush(nd_region))
> +                       rc = -EIO;
> +       }
> +
> +       return rc;
> +};
> +EXPORT_SYMBOL_GPL(async_pmem_flush);
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 35897649c24f..9f634a2ed638 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
>
>           If unsure, say Y.
>
> +config VIRTIO_PMEM
> +       tristate "Support for virtio pmem driver"
> +       depends on VIRTIO
> +       depends on LIBNVDIMM
> +       help
> +       This driver provides support for virtio based flushing interface
> +       for persistent memory range.
> +
> +       If unsure, say M.
> +
>  config VIRTIO_BALLOON
>         tristate "Virtio balloon driver"
>         depends on VIRTIO
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 3a2b5c5dcf46..143ce91eabe9 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> new file mode 100644
> index 000000000000..309788628e41
> --- /dev/null
> +++ b/drivers/virtio/pmem.c

It's not clear to me why this driver is located in drivers/virtio/

> @@ -0,0 +1,118 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and registers the virtual pmem device
> + * with libnvdimm core.
> + */
> +#include <linux/virtio_pmem.h>
> +#include <../../drivers/nvdimm/nd.h>

...especially because it seems to require nvdimm internals.

However I don't see why that header is included.

In any event lets move this to drivers/nvdimm/virtio.c to live
alongside the other generic bus provider drivers/nvdimm/e820.c.

> +
> +static struct virtio_device_id id_table[] = {
> +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> +       { 0 },
> +};
> +
> + /* Initialize virt queue */
> +static int init_vq(struct virtio_pmem *vpmem)
> +{
> +       /* single vq */
> +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> +                               host_ack, "flush_queue");
> +       if (IS_ERR(vpmem->req_vq))
> +               return PTR_ERR(vpmem->req_vq);
> +
> +       spin_lock_init(&vpmem->pmem_lock);
> +       INIT_LIST_HEAD(&vpmem->req_list);
> +
> +       return 0;
> +};
> +
> +static int virtio_pmem_probe(struct virtio_device *vdev)
> +{
> +       int err = 0;
> +       struct resource res;
> +       struct virtio_pmem *vpmem;
> +       struct nd_region_desc ndr_desc = {};
> +       int nid = dev_to_node(&vdev->dev);
> +       struct nd_region *nd_region;
> +
> +       if (!vdev->config->get) {
> +               dev_err(&vdev->dev, "%s failure: config access disabled\n",
> +                       __func__);
> +               return -EINVAL;
> +       }
> +
> +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> +       if (!vpmem) {
> +               err = -ENOMEM;
> +               goto out_err;
> +       }
> +
> +       vpmem->vdev = vdev;
> +       vdev->priv = vpmem;
> +       err = init_vq(vpmem);
> +       if (err)
> +               goto out_err;
> +
> +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +                       start, &vpmem->start);
> +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +                       size, &vpmem->size);
> +
> +       res.start = vpmem->start;
> +       res.end   = vpmem->start + vpmem->size-1;
> +       vpmem->nd_desc.provider_name = "virtio-pmem";
> +       vpmem->nd_desc.module = THIS_MODULE;
> +
> +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> +                                               &vpmem->nd_desc);
> +       if (!vpmem->nvdimm_bus)
> +               goto out_vq;
> +
> +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> +
> +       ndr_desc.res = &res;
> +       ndr_desc.numa_node = nid;
> +       ndr_desc.flush = async_pmem_flush;
> +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> +
> +       if (!nd_region)
> +               goto out_nd;
> +       nd_region->provider_data =  dev_to_virtio
> +                                       (nd_region->dev.parent->parent);
> +       return 0;
> +out_nd:
> +       err = -ENXIO;
> +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> +out_vq:
> +       vdev->config->del_vqs(vdev);
> +out_err:
> +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> +       return err;
> +}
> +
> +static void virtio_pmem_remove(struct virtio_device *vdev)
> +{
> +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> +
> +       nvdimm_bus_unregister(nvdimm_bus);
> +       vdev->config->del_vqs(vdev);
> +       vdev->config->reset(vdev);
> +}
> +
> +static struct virtio_driver virtio_pmem_driver = {
> +       .driver.name            = KBUILD_MODNAME,
> +       .driver.owner           = THIS_MODULE,
> +       .id_table               = id_table,
> +       .probe                  = virtio_pmem_probe,
> +       .remove                 = virtio_pmem_remove,
> +};
> +
> +module_virtio_driver(virtio_pmem_driver);
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +MODULE_DESCRIPTION("Virtio pmem driver");
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..ab1da877575d
> --- /dev/null
> +++ b/include/linux/virtio_pmem.h

Why is this a global header?

Seems it can move to drivers/nvdimm/virtio.h.

Also, I'd like to get a virtio ack from Michael (mst@redhat.com)
before taking this through the nvdimm tree.

> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * virtio_pmem.h: virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + **/
> +
> +#ifndef _LINUX_VIRTIO_PMEM_H
> +#define _LINUX_VIRTIO_PMEM_H
> +
> +#include <linux/virtio_ids.h>
> +#include <linux/module.h>
> +#include <linux/virtio_config.h>
> +#include <uapi/linux/virtio_pmem.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/spinlock.h>
> +
> +struct virtio_pmem_request {
> +       /* Host return status corresponding to flush request */
> +       int ret;
> +
> +       /* command name*/
> +       char name[16];
> +
> +       /* Wait queue to process deferred work after ack from host */
> +       wait_queue_head_t host_acked;
> +       bool done;
> +
> +       /* Wait queue to process deferred work after virt queue buffer avail */
> +       wait_queue_head_t wq_buf;
> +       bool wq_buf_avail;
> +       struct list_head list;
> +};
> +
> +struct virtio_pmem {
> +       struct virtio_device *vdev;
> +
> +       /* Virtio pmem request queue */
> +       struct virtqueue *req_vq;
> +
> +       /* nvdimm bus registers virtio pmem device */
> +       struct nvdimm_bus *nvdimm_bus;
> +       struct nvdimm_bus_descriptor nd_desc;
> +
> +       /* List to store deferred work if virtqueue is full */
> +       struct list_head req_list;
> +
> +       /* Synchronize virtqueue data */
> +       spinlock_t pmem_lock;
> +
> +       /* Memory region information */
> +       uint64_t start;
> +       uint64_t size;
> +};
> +
> +void host_ack(struct virtqueue *vq);
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> +#endif
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 6d5c3b2d4f4d..32b2f94d1f58 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -43,5 +43,6 @@
>  #define VIRTIO_ID_INPUT        18 /* virtio input */
>  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
>  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
>
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..fa3f7d52717a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_pmem.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> +#define _UAPI_LINUX_VIRTIO_PMEM_H
> +
> +struct virtio_pmem_config {
> +       __le64 start;
> +       __le64 size;
> +};
> +#endif
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-07 15:35     ` Dan Williams
  0 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-07 15:35 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang,
	david, Qemu Developers, virtualization, Andreas Dilger,
	Ross Zwisler, Andrea Arcangeli, Dave Jiang, linux-nvdimm,
	Vishal L Verma, David Hildenbrand, Matthew Wilcox,
	Christoph Hellwig, Linux ACPI, jmoyer, linux-ext4, Len Brown,
	kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Paolo Bonzini, lcapitulino, Kevin Wolf, Nitesh Narayan Lal,
	Theodore Ts'o, Xiao Guangrong, Darrick J. Wong,
	Rafael J. Wysocki, Linux Kernel Mailing List, linux-xfs,
	linux-fsdevel, Igor Mammedov

Hi Pankaj,

Some minor file placement comments below.

On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> This patch adds virtio-pmem driver for KVM guest.
>
> Guest reads the persistent memory range information from
> Qemu over VIRTIO and registers it on nvdimm_bus. It also
> creates a nd_region object with the persistent memory
> range information so that existing 'nvdimm/pmem' driver
> can reserve this into system memory map. This way
> 'virtio-pmem' driver uses existing functionality of pmem
> driver to register persistent memory compatible for DAX
> capable filesystems.
>
> This also provides function to perform guest flush over
> VIRTIO from 'pmem' driver when userspace performs flush
> on DAX memory range.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
>  drivers/virtio/Kconfig           |  10 +++
>  drivers/virtio/Makefile          |   1 +
>  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
>  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
>  include/uapi/linux/virtio_ids.h  |   1 +
>  include/uapi/linux/virtio_pmem.h |  10 +++
>  7 files changed, 314 insertions(+)
>  create mode 100644 drivers/nvdimm/virtio_pmem.c
>  create mode 100644 drivers/virtio/pmem.c
>  create mode 100644 include/linux/virtio_pmem.h
>  create mode 100644 include/uapi/linux/virtio_pmem.h
>
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> new file mode 100644
> index 000000000000..66b582f751a3
> --- /dev/null
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + */
> +#include <linux/virtio_pmem.h>
> +#include "nd.h"
> +
> + /* The interrupt handler */
> +void host_ack(struct virtqueue *vq)
> +{
> +       unsigned int len;
> +       unsigned long flags;
> +       struct virtio_pmem_request *req, *req_buf;
> +       struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> +               req->done = true;
> +               wake_up(&req->host_acked);
> +
> +               if (!list_empty(&vpmem->req_list)) {
> +                       req_buf = list_first_entry(&vpmem->req_list,
> +                                       struct virtio_pmem_request, list);
> +                       list_del(&vpmem->req_list);
> +                       req_buf->wq_buf_avail = true;
> +                       wake_up(&req_buf->wq_buf);
> +               }
> +       }
> +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(host_ack);
> +
> + /* The request submission function */
> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> +       int err;
> +       unsigned long flags;
> +       struct scatterlist *sgs[2], sg, ret;
> +       struct virtio_device *vdev = nd_region->provider_data;
> +       struct virtio_pmem *vpmem = vdev->priv;
> +       struct virtio_pmem_request *req;
> +
> +       might_sleep();
> +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> +       if (!req)
> +               return -ENOMEM;
> +
> +       req->done = req->wq_buf_avail = false;
> +       strcpy(req->name, "FLUSH");
> +       init_waitqueue_head(&req->host_acked);
> +       init_waitqueue_head(&req->wq_buf);
> +       sg_init_one(&sg, req->name, strlen(req->name));
> +       sgs[0] = &sg;
> +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> +       sgs[1] = &ret;
> +
> +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> +       if (err) {
> +               dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +
> +               list_add_tail(&vpmem->req_list, &req->list);
> +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +               /* When host has read buffer, this completes via host_ack */
> +               wait_event(req->wq_buf, req->wq_buf_avail);
> +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       }
> +       err = virtqueue_kick(vpmem->req_vq);
> +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +       if (!err) {
> +               err = -EIO;
> +               goto ret;
> +       }
> +       /* When host has read buffer, this completes via host_ack */
> +       wait_event(req->host_acked, req->done);
> +       err = req->ret;
> +ret:
> +       kfree(req);
> +       return err;
> +};
> +
> + /* The asynchronous flush callback function */
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> +{
> +       int rc = 0;
> +
> +       /* Create child bio for asynchronous flush and chain with
> +        * parent bio. Otherwise directly call nd_region flush.
> +        */
> +       if (bio && bio->bi_iter.bi_sector != -1) {
> +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> +
> +               if (!child)
> +                       return -ENOMEM;
> +               bio_copy_dev(child, bio);
> +               child->bi_opf = REQ_PREFLUSH;
> +               child->bi_iter.bi_sector = -1;
> +               bio_chain(child, bio);
> +               submit_bio(child);
> +       } else {
> +               if (virtio_pmem_flush(nd_region))
> +                       rc = -EIO;
> +       }
> +
> +       return rc;
> +};
> +EXPORT_SYMBOL_GPL(async_pmem_flush);
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 35897649c24f..9f634a2ed638 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
>
>           If unsure, say Y.
>
> +config VIRTIO_PMEM
> +       tristate "Support for virtio pmem driver"
> +       depends on VIRTIO
> +       depends on LIBNVDIMM
> +       help
> +       This driver provides support for virtio based flushing interface
> +       for persistent memory range.
> +
> +       If unsure, say M.
> +
>  config VIRTIO_BALLOON
>         tristate "Virtio balloon driver"
>         depends on VIRTIO
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 3a2b5c5dcf46..143ce91eabe9 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> new file mode 100644
> index 000000000000..309788628e41
> --- /dev/null
> +++ b/drivers/virtio/pmem.c

It's not clear to me why this driver is located in drivers/virtio/

> @@ -0,0 +1,118 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and registers the virtual pmem device
> + * with libnvdimm core.
> + */
> +#include <linux/virtio_pmem.h>
> +#include <../../drivers/nvdimm/nd.h>

...especially because it seems to require nvdimm internals.

However I don't see why that header is included.

In any event lets move this to drivers/nvdimm/virtio.c to live
alongside the other generic bus provider drivers/nvdimm/e820.c.

> +
> +static struct virtio_device_id id_table[] = {
> +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> +       { 0 },
> +};
> +
> + /* Initialize virt queue */
> +static int init_vq(struct virtio_pmem *vpmem)
> +{
> +       /* single vq */
> +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> +                               host_ack, "flush_queue");
> +       if (IS_ERR(vpmem->req_vq))
> +               return PTR_ERR(vpmem->req_vq);
> +
> +       spin_lock_init(&vpmem->pmem_lock);
> +       INIT_LIST_HEAD(&vpmem->req_list);
> +
> +       return 0;
> +};
> +
> +static int virtio_pmem_probe(struct virtio_device *vdev)
> +{
> +       int err = 0;
> +       struct resource res;
> +       struct virtio_pmem *vpmem;
> +       struct nd_region_desc ndr_desc = {};
> +       int nid = dev_to_node(&vdev->dev);
> +       struct nd_region *nd_region;
> +
> +       if (!vdev->config->get) {
> +               dev_err(&vdev->dev, "%s failure: config access disabled\n",
> +                       __func__);
> +               return -EINVAL;
> +       }
> +
> +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> +       if (!vpmem) {
> +               err = -ENOMEM;
> +               goto out_err;
> +       }
> +
> +       vpmem->vdev = vdev;
> +       vdev->priv = vpmem;
> +       err = init_vq(vpmem);
> +       if (err)
> +               goto out_err;
> +
> +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +                       start, &vpmem->start);
> +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +                       size, &vpmem->size);
> +
> +       res.start = vpmem->start;
> +       res.end   = vpmem->start + vpmem->size-1;
> +       vpmem->nd_desc.provider_name = "virtio-pmem";
> +       vpmem->nd_desc.module = THIS_MODULE;
> +
> +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> +                                               &vpmem->nd_desc);
> +       if (!vpmem->nvdimm_bus)
> +               goto out_vq;
> +
> +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> +
> +       ndr_desc.res = &res;
> +       ndr_desc.numa_node = nid;
> +       ndr_desc.flush = async_pmem_flush;
> +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> +
> +       if (!nd_region)
> +               goto out_nd;
> +       nd_region->provider_data =  dev_to_virtio
> +                                       (nd_region->dev.parent->parent);
> +       return 0;
> +out_nd:
> +       err = -ENXIO;
> +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> +out_vq:
> +       vdev->config->del_vqs(vdev);
> +out_err:
> +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> +       return err;
> +}
> +
> +static void virtio_pmem_remove(struct virtio_device *vdev)
> +{
> +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> +
> +       nvdimm_bus_unregister(nvdimm_bus);
> +       vdev->config->del_vqs(vdev);
> +       vdev->config->reset(vdev);
> +}
> +
> +static struct virtio_driver virtio_pmem_driver = {
> +       .driver.name            = KBUILD_MODNAME,
> +       .driver.owner           = THIS_MODULE,
> +       .id_table               = id_table,
> +       .probe                  = virtio_pmem_probe,
> +       .remove                 = virtio_pmem_remove,
> +};
> +
> +module_virtio_driver(virtio_pmem_driver);
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +MODULE_DESCRIPTION("Virtio pmem driver");
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..ab1da877575d
> --- /dev/null
> +++ b/include/linux/virtio_pmem.h

Why is this a global header?

Seems it can move to drivers/nvdimm/virtio.h.

Also, I'd like to get a virtio ack from Michael (mst@redhat.com)
before taking this through the nvdimm tree.

> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * virtio_pmem.h: virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + **/
> +
> +#ifndef _LINUX_VIRTIO_PMEM_H
> +#define _LINUX_VIRTIO_PMEM_H
> +
> +#include <linux/virtio_ids.h>
> +#include <linux/module.h>
> +#include <linux/virtio_config.h>
> +#include <uapi/linux/virtio_pmem.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/spinlock.h>
> +
> +struct virtio_pmem_request {
> +       /* Host return status corresponding to flush request */
> +       int ret;
> +
> +       /* command name*/
> +       char name[16];
> +
> +       /* Wait queue to process deferred work after ack from host */
> +       wait_queue_head_t host_acked;
> +       bool done;
> +
> +       /* Wait queue to process deferred work after virt queue buffer avail */
> +       wait_queue_head_t wq_buf;
> +       bool wq_buf_avail;
> +       struct list_head list;
> +};
> +
> +struct virtio_pmem {
> +       struct virtio_device *vdev;
> +
> +       /* Virtio pmem request queue */
> +       struct virtqueue *req_vq;
> +
> +       /* nvdimm bus registers virtio pmem device */
> +       struct nvdimm_bus *nvdimm_bus;
> +       struct nvdimm_bus_descriptor nd_desc;
> +
> +       /* List to store deferred work if virtqueue is full */
> +       struct list_head req_list;
> +
> +       /* Synchronize virtqueue data */
> +       spinlock_t pmem_lock;
> +
> +       /* Memory region information */
> +       uint64_t start;
> +       uint64_t size;
> +};
> +
> +void host_ack(struct virtqueue *vq);
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> +#endif
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 6d5c3b2d4f4d..32b2f94d1f58 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -43,5 +43,6 @@
>  #define VIRTIO_ID_INPUT        18 /* virtio input */
>  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
>  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
>
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..fa3f7d52717a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_pmem.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> +#define _UAPI_LINUX_VIRTIO_PMEM_H
> +
> +struct virtio_pmem_config {
> +       __le64 start;
> +       __le64 size;
> +};
> +#endif
> --
> 2.20.1
>


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-04-26  5:00   ` Pankaj Gupta
                     ` (4 preceding siblings ...)
  (?)
@ 2019-05-07 15:35   ` Dan Williams
  -1 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-07 15:35 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, Jan Kara, KVM list, Michael S. Tsirkin, david,
	Qemu Developers, virtualization, Andreas Dilger, Ross Zwisler,
	Andrea Arcangeli, Dave Jiang, linux-nvdimm, Vishal L Verma,
	Matthew Wilcox, Christoph Hellwig, Linux ACPI, jmoyer,
	linux-ext4, Len Brown, kilobyte, Rik van Riel, yuval shaia,
	Stefan Hajnoczi, Paolo Bonzini, lcapitulino, Nites

Hi Pankaj,

Some minor file placement comments below.

On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> This patch adds virtio-pmem driver for KVM guest.
>
> Guest reads the persistent memory range information from
> Qemu over VIRTIO and registers it on nvdimm_bus. It also
> creates a nd_region object with the persistent memory
> range information so that existing 'nvdimm/pmem' driver
> can reserve this into system memory map. This way
> 'virtio-pmem' driver uses existing functionality of pmem
> driver to register persistent memory compatible for DAX
> capable filesystems.
>
> This also provides function to perform guest flush over
> VIRTIO from 'pmem' driver when userspace performs flush
> on DAX memory range.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
>  drivers/virtio/Kconfig           |  10 +++
>  drivers/virtio/Makefile          |   1 +
>  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
>  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
>  include/uapi/linux/virtio_ids.h  |   1 +
>  include/uapi/linux/virtio_pmem.h |  10 +++
>  7 files changed, 314 insertions(+)
>  create mode 100644 drivers/nvdimm/virtio_pmem.c
>  create mode 100644 drivers/virtio/pmem.c
>  create mode 100644 include/linux/virtio_pmem.h
>  create mode 100644 include/uapi/linux/virtio_pmem.h
>
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> new file mode 100644
> index 000000000000..66b582f751a3
> --- /dev/null
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -0,0 +1,114 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + */
> +#include <linux/virtio_pmem.h>
> +#include "nd.h"
> +
> + /* The interrupt handler */
> +void host_ack(struct virtqueue *vq)
> +{
> +       unsigned int len;
> +       unsigned long flags;
> +       struct virtio_pmem_request *req, *req_buf;
> +       struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> +               req->done = true;
> +               wake_up(&req->host_acked);
> +
> +               if (!list_empty(&vpmem->req_list)) {
> +                       req_buf = list_first_entry(&vpmem->req_list,
> +                                       struct virtio_pmem_request, list);
> +                       list_del(&vpmem->req_list);
> +                       req_buf->wq_buf_avail = true;
> +                       wake_up(&req_buf->wq_buf);
> +               }
> +       }
> +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +}
> +EXPORT_SYMBOL_GPL(host_ack);
> +
> + /* The request submission function */
> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> +       int err;
> +       unsigned long flags;
> +       struct scatterlist *sgs[2], sg, ret;
> +       struct virtio_device *vdev = nd_region->provider_data;
> +       struct virtio_pmem *vpmem = vdev->priv;
> +       struct virtio_pmem_request *req;
> +
> +       might_sleep();
> +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> +       if (!req)
> +               return -ENOMEM;
> +
> +       req->done = req->wq_buf_avail = false;
> +       strcpy(req->name, "FLUSH");
> +       init_waitqueue_head(&req->host_acked);
> +       init_waitqueue_head(&req->wq_buf);
> +       sg_init_one(&sg, req->name, strlen(req->name));
> +       sgs[0] = &sg;
> +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> +       sgs[1] = &ret;
> +
> +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> +       if (err) {
> +               dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +
> +               list_add_tail(&vpmem->req_list, &req->list);
> +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +               /* When host has read buffer, this completes via host_ack */
> +               wait_event(req->wq_buf, req->wq_buf_avail);
> +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +       }
> +       err = virtqueue_kick(vpmem->req_vq);
> +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +       if (!err) {
> +               err = -EIO;
> +               goto ret;
> +       }
> +       /* When host has read buffer, this completes via host_ack */
> +       wait_event(req->host_acked, req->done);
> +       err = req->ret;
> +ret:
> +       kfree(req);
> +       return err;
> +};
> +
> + /* The asynchronous flush callback function */
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> +{
> +       int rc = 0;
> +
> +       /* Create child bio for asynchronous flush and chain with
> +        * parent bio. Otherwise directly call nd_region flush.
> +        */
> +       if (bio && bio->bi_iter.bi_sector != -1) {
> +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> +
> +               if (!child)
> +                       return -ENOMEM;
> +               bio_copy_dev(child, bio);
> +               child->bi_opf = REQ_PREFLUSH;
> +               child->bi_iter.bi_sector = -1;
> +               bio_chain(child, bio);
> +               submit_bio(child);
> +       } else {
> +               if (virtio_pmem_flush(nd_region))
> +                       rc = -EIO;
> +       }
> +
> +       return rc;
> +};
> +EXPORT_SYMBOL_GPL(async_pmem_flush);
> +MODULE_LICENSE("GPL");
> diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> index 35897649c24f..9f634a2ed638 100644
> --- a/drivers/virtio/Kconfig
> +++ b/drivers/virtio/Kconfig
> @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
>
>           If unsure, say Y.
>
> +config VIRTIO_PMEM
> +       tristate "Support for virtio pmem driver"
> +       depends on VIRTIO
> +       depends on LIBNVDIMM
> +       help
> +       This driver provides support for virtio based flushing interface
> +       for persistent memory range.
> +
> +       If unsure, say M.
> +
>  config VIRTIO_BALLOON
>         tristate "Virtio balloon driver"
>         depends on VIRTIO
> diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> index 3a2b5c5dcf46..143ce91eabe9 100644
> --- a/drivers/virtio/Makefile
> +++ b/drivers/virtio/Makefile
> @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
>  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
>  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
>  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> new file mode 100644
> index 000000000000..309788628e41
> --- /dev/null
> +++ b/drivers/virtio/pmem.c

It's not clear to me why this driver is located in drivers/virtio/

> @@ -0,0 +1,118 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * virtio_pmem.c: Virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and registers the virtual pmem device
> + * with libnvdimm core.
> + */
> +#include <linux/virtio_pmem.h>
> +#include <../../drivers/nvdimm/nd.h>

...especially because it seems to require nvdimm internals.

However I don't see why that header is included.

In any event lets move this to drivers/nvdimm/virtio.c to live
alongside the other generic bus provider drivers/nvdimm/e820.c.

> +
> +static struct virtio_device_id id_table[] = {
> +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> +       { 0 },
> +};
> +
> + /* Initialize virt queue */
> +static int init_vq(struct virtio_pmem *vpmem)
> +{
> +       /* single vq */
> +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> +                               host_ack, "flush_queue");
> +       if (IS_ERR(vpmem->req_vq))
> +               return PTR_ERR(vpmem->req_vq);
> +
> +       spin_lock_init(&vpmem->pmem_lock);
> +       INIT_LIST_HEAD(&vpmem->req_list);
> +
> +       return 0;
> +};
> +
> +static int virtio_pmem_probe(struct virtio_device *vdev)
> +{
> +       int err = 0;
> +       struct resource res;
> +       struct virtio_pmem *vpmem;
> +       struct nd_region_desc ndr_desc = {};
> +       int nid = dev_to_node(&vdev->dev);
> +       struct nd_region *nd_region;
> +
> +       if (!vdev->config->get) {
> +               dev_err(&vdev->dev, "%s failure: config access disabled\n",
> +                       __func__);
> +               return -EINVAL;
> +       }
> +
> +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> +       if (!vpmem) {
> +               err = -ENOMEM;
> +               goto out_err;
> +       }
> +
> +       vpmem->vdev = vdev;
> +       vdev->priv = vpmem;
> +       err = init_vq(vpmem);
> +       if (err)
> +               goto out_err;
> +
> +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +                       start, &vpmem->start);
> +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> +                       size, &vpmem->size);
> +
> +       res.start = vpmem->start;
> +       res.end   = vpmem->start + vpmem->size-1;
> +       vpmem->nd_desc.provider_name = "virtio-pmem";
> +       vpmem->nd_desc.module = THIS_MODULE;
> +
> +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> +                                               &vpmem->nd_desc);
> +       if (!vpmem->nvdimm_bus)
> +               goto out_vq;
> +
> +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> +
> +       ndr_desc.res = &res;
> +       ndr_desc.numa_node = nid;
> +       ndr_desc.flush = async_pmem_flush;
> +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus, &ndr_desc);
> +
> +       if (!nd_region)
> +               goto out_nd;
> +       nd_region->provider_data =  dev_to_virtio
> +                                       (nd_region->dev.parent->parent);
> +       return 0;
> +out_nd:
> +       err = -ENXIO;
> +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> +out_vq:
> +       vdev->config->del_vqs(vdev);
> +out_err:
> +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> +       return err;
> +}
> +
> +static void virtio_pmem_remove(struct virtio_device *vdev)
> +{
> +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> +
> +       nvdimm_bus_unregister(nvdimm_bus);
> +       vdev->config->del_vqs(vdev);
> +       vdev->config->reset(vdev);
> +}
> +
> +static struct virtio_driver virtio_pmem_driver = {
> +       .driver.name            = KBUILD_MODNAME,
> +       .driver.owner           = THIS_MODULE,
> +       .id_table               = id_table,
> +       .probe                  = virtio_pmem_probe,
> +       .remove                 = virtio_pmem_remove,
> +};
> +
> +module_virtio_driver(virtio_pmem_driver);
> +MODULE_DEVICE_TABLE(virtio, id_table);
> +MODULE_DESCRIPTION("Virtio pmem driver");
> +MODULE_LICENSE("GPL");
> diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..ab1da877575d
> --- /dev/null
> +++ b/include/linux/virtio_pmem.h

Why is this a global header?

Seems it can move to drivers/nvdimm/virtio.h.

Also, I'd like to get a virtio ack from Michael (mst@redhat.com)
before taking this through the nvdimm tree.

> @@ -0,0 +1,60 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * virtio_pmem.h: virtio pmem Driver
> + *
> + * Discovers persistent memory range information
> + * from host and provides a virtio based flushing
> + * interface.
> + **/
> +
> +#ifndef _LINUX_VIRTIO_PMEM_H
> +#define _LINUX_VIRTIO_PMEM_H
> +
> +#include <linux/virtio_ids.h>
> +#include <linux/module.h>
> +#include <linux/virtio_config.h>
> +#include <uapi/linux/virtio_pmem.h>
> +#include <linux/libnvdimm.h>
> +#include <linux/spinlock.h>
> +
> +struct virtio_pmem_request {
> +       /* Host return status corresponding to flush request */
> +       int ret;
> +
> +       /* command name*/
> +       char name[16];
> +
> +       /* Wait queue to process deferred work after ack from host */
> +       wait_queue_head_t host_acked;
> +       bool done;
> +
> +       /* Wait queue to process deferred work after virt queue buffer avail */
> +       wait_queue_head_t wq_buf;
> +       bool wq_buf_avail;
> +       struct list_head list;
> +};
> +
> +struct virtio_pmem {
> +       struct virtio_device *vdev;
> +
> +       /* Virtio pmem request queue */
> +       struct virtqueue *req_vq;
> +
> +       /* nvdimm bus registers virtio pmem device */
> +       struct nvdimm_bus *nvdimm_bus;
> +       struct nvdimm_bus_descriptor nd_desc;
> +
> +       /* List to store deferred work if virtqueue is full */
> +       struct list_head req_list;
> +
> +       /* Synchronize virtqueue data */
> +       spinlock_t pmem_lock;
> +
> +       /* Memory region information */
> +       uint64_t start;
> +       uint64_t size;
> +};
> +
> +void host_ack(struct virtqueue *vq);
> +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> +#endif
> diff --git a/include/uapi/linux/virtio_ids.h b/include/uapi/linux/virtio_ids.h
> index 6d5c3b2d4f4d..32b2f94d1f58 100644
> --- a/include/uapi/linux/virtio_ids.h
> +++ b/include/uapi/linux/virtio_ids.h
> @@ -43,5 +43,6 @@
>  #define VIRTIO_ID_INPUT        18 /* virtio input */
>  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
>  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
>
>  #endif /* _LINUX_VIRTIO_IDS_H */
> diff --git a/include/uapi/linux/virtio_pmem.h b/include/uapi/linux/virtio_pmem.h
> new file mode 100644
> index 000000000000..fa3f7d52717a
> --- /dev/null
> +++ b/include/uapi/linux/virtio_pmem.h
> @@ -0,0 +1,10 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> +#define _UAPI_LINUX_VIRTIO_PMEM_H
> +
> +struct virtio_pmem_config {
> +       __le64 start;
> +       __le64 size;
> +};
> +#endif
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 6/6] xfs: disable map_sync for async flush
@ 2019-05-07 15:37     ` Dan Williams
  0 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-07 15:37 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang,
	david, Qemu Developers, virtualization, Andreas Dilger,
	Ross Zwisler, Andrea Arcangeli, linux-nvdimm, David Hildenbrand,
	Matthew Wilcox, Christoph Hellwig, Linux ACPI, linux-ext4,
	Len Brown, kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Paolo Bonzini, lcapitulino, Kevin Wolf, Nitesh Narayan Lal,
	Theodore Ts'o, Xiao Guangrong, Darrick J. Wong,
	Rafael J. Wysocki, Linux Kernel Mailing List, linux-xfs,
	linux-fsdevel, Igor Mammedov

On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> Dont support 'MAP_SYNC' with non-DAX files and DAX files
> with asynchronous dax_device. Virtio pmem provides
> asynchronous host page cache flush mechanism. We don't
> support 'MAP_SYNC' with virtio pmem and xfs.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  fs/xfs/xfs_file.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

Darrick, does this look ok to take through the nvdimm tree?

>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index a7ceae90110e..f17652cca5ff 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1203,11 +1203,14 @@ xfs_file_mmap(
>         struct file     *filp,
>         struct vm_area_struct *vma)
>  {
> +       struct dax_device       *dax_dev;
> +
> +       dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
>         /*
> -        * We don't support synchronous mappings for non-DAX files. At least
> -        * until someone comes with a sensible use case.
> +        * We don't support synchronous mappings for non-DAX files and
> +        * for DAX files if underneath dax_device is not synchronous.
>          */
> -       if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> +       if (!daxdev_mapping_supported(vma, dax_dev))
>                 return -EOPNOTSUPP;
>
>         file_accessed(filp);
> --
> 2.20.1
>
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 6/6] xfs: disable map_sync for async flush
@ 2019-05-07 15:37     ` Dan Williams
  0 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-07 15:37 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck-H+wXaHxf7aLQT0dZR+AlfA, Jan Kara, KVM list,
	Michael S. Tsirkin, Jason Wang, david, Qemu Developers,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Andreas Dilger, Ross Zwisler, Andrea Arcangeli, linux-nvdimm,
	David Hildenbrand, Matthew Wilcox, Christoph Hellwig, Linux ACPI,
	linux-ext4, Len Brown, kilobyte-b9QjgO8OEXPVItvQsEIGlw,
	Rik van Riel, yuval shaia, Stefan Hajnoczi, Paolo Bonzini,
	lcapitulino-H+wXaHxf7aLQT0dZR+AlfA, Kevin Wolf, Nitesh

On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta <pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
>
> Dont support 'MAP_SYNC' with non-DAX files and DAX files
> with asynchronous dax_device. Virtio pmem provides
> asynchronous host page cache flush mechanism. We don't
> support 'MAP_SYNC' with virtio pmem and xfs.
>
> Signed-off-by: Pankaj Gupta <pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  fs/xfs/xfs_file.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

Darrick, does this look ok to take through the nvdimm tree?

>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index a7ceae90110e..f17652cca5ff 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1203,11 +1203,14 @@ xfs_file_mmap(
>         struct file     *filp,
>         struct vm_area_struct *vma)
>  {
> +       struct dax_device       *dax_dev;
> +
> +       dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
>         /*
> -        * We don't support synchronous mappings for non-DAX files. At least
> -        * until someone comes with a sensible use case.
> +        * We don't support synchronous mappings for non-DAX files and
> +        * for DAX files if underneath dax_device is not synchronous.
>          */
> -       if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> +       if (!daxdev_mapping_supported(vma, dax_dev))
>                 return -EOPNOTSUPP;
>
>         file_accessed(filp);
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 6/6] xfs: disable map_sync for async flush
@ 2019-05-07 15:37     ` Dan Williams
  0 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-07 15:37 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-nvdimm, Linux Kernel Mailing List, virtualization,
	KVM list, linux-fsdevel, Linux ACPI, Qemu Developers, linux-ext4,
	linux-xfs, Ross Zwisler, Vishal L Verma, Dave Jiang,
	Michael S. Tsirkin, Jason Wang, Matthew Wilcox,
	Rafael J. Wysocki, Christoph Hellwig, Len Brown, Jan Kara,
	Theodore Ts'o, Andreas Dilger, Darrick J. Wong, lcapitulino,
	Kevin Wolf, Igor Mammedov, jmoyer, Nitesh Narayan Lal,
	Rik van Riel, Stefan Hajnoczi, Andrea Arcangeli,
	David Hildenbrand, david, cohuck, Xiao Guangrong, Paolo Bonzini,
	kilobyte, yuval shaia

On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> Dont support 'MAP_SYNC' with non-DAX files and DAX files
> with asynchronous dax_device. Virtio pmem provides
> asynchronous host page cache flush mechanism. We don't
> support 'MAP_SYNC' with virtio pmem and xfs.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  fs/xfs/xfs_file.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

Darrick, does this look ok to take through the nvdimm tree?

>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index a7ceae90110e..f17652cca5ff 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1203,11 +1203,14 @@ xfs_file_mmap(
>         struct file     *filp,
>         struct vm_area_struct *vma)
>  {
> +       struct dax_device       *dax_dev;
> +
> +       dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
>         /*
> -        * We don't support synchronous mappings for non-DAX files. At least
> -        * until someone comes with a sensible use case.
> +        * We don't support synchronous mappings for non-DAX files and
> +        * for DAX files if underneath dax_device is not synchronous.
>          */
> -       if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> +       if (!daxdev_mapping_supported(vma, dax_dev))
>                 return -EOPNOTSUPP;
>
>         file_accessed(filp);
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 6/6] xfs: disable map_sync for async flush
@ 2019-05-07 15:37     ` Dan Williams
  0 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-07 15:37 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang,
	david, Qemu Developers, virtualization, Andreas Dilger,
	Ross Zwisler, Andrea Arcangeli, Dave Jiang, linux-nvdimm,
	Vishal L Verma, David Hildenbrand, Matthew Wilcox,
	Christoph Hellwig, Linux ACPI, jmoyer, linux-ext4, Len Brown,
	kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Paolo Bonzini, lcapitulino, Kevin Wolf, Nitesh Narayan Lal,
	Theodore Ts'o, Xiao Guangrong, Darrick J. Wong,
	Rafael J. Wysocki, Linux Kernel Mailing List, linux-xfs,
	linux-fsdevel, Igor Mammedov

On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> Dont support 'MAP_SYNC' with non-DAX files and DAX files
> with asynchronous dax_device. Virtio pmem provides
> asynchronous host page cache flush mechanism. We don't
> support 'MAP_SYNC' with virtio pmem and xfs.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  fs/xfs/xfs_file.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

Darrick, does this look ok to take through the nvdimm tree?

>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index a7ceae90110e..f17652cca5ff 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1203,11 +1203,14 @@ xfs_file_mmap(
>         struct file     *filp,
>         struct vm_area_struct *vma)
>  {
> +       struct dax_device       *dax_dev;
> +
> +       dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
>         /*
> -        * We don't support synchronous mappings for non-DAX files. At least
> -        * until someone comes with a sensible use case.
> +        * We don't support synchronous mappings for non-DAX files and
> +        * for DAX files if underneath dax_device is not synchronous.
>          */
> -       if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> +       if (!daxdev_mapping_supported(vma, dax_dev))
>                 return -EOPNOTSUPP;
>
>         file_accessed(filp);
> --
> 2.20.1
>


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 6/6] xfs: disable map_sync for async flush
  2019-04-26  5:00   ` Pankaj Gupta
                     ` (2 preceding siblings ...)
  (?)
@ 2019-05-07 15:37   ` Dan Williams
  -1 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-07 15:37 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, Jan Kara, KVM list, Michael S. Tsirkin, david,
	Qemu Developers, virtualization, Andreas Dilger, Ross Zwisler,
	Andrea Arcangeli, Dave Jiang, linux-nvdimm, Vishal L Verma,
	Matthew Wilcox, Christoph Hellwig, Linux ACPI, jmoyer,
	linux-ext4, Len Brown, kilobyte, Rik van Riel, yuval shaia,
	Stefan Hajnoczi, Paolo Bonzini, lcapitulino, Nites

On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> Dont support 'MAP_SYNC' with non-DAX files and DAX files
> with asynchronous dax_device. Virtio pmem provides
> asynchronous host page cache flush mechanism. We don't
> support 'MAP_SYNC' with virtio pmem and xfs.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> ---
>  fs/xfs/xfs_file.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)

Darrick, does this look ok to take through the nvdimm tree?

>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index a7ceae90110e..f17652cca5ff 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -1203,11 +1203,14 @@ xfs_file_mmap(
>         struct file     *filp,
>         struct vm_area_struct *vma)
>  {
> +       struct dax_device       *dax_dev;
> +
> +       dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
>         /*
> -        * We don't support synchronous mappings for non-DAX files. At least
> -        * until someone comes with a sensible use case.
> +        * We don't support synchronous mappings for non-DAX files and
> +        * for DAX files if underneath dax_device is not synchronous.
>          */
> -       if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> +       if (!daxdev_mapping_supported(vma, dax_dev))
>                 return -EOPNOTSUPP;
>
>         file_accessed(filp);
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 3/6] libnvdimm: add dax_dev sync flag
  2019-04-26  5:00   ` Pankaj Gupta
  (?)
  (?)
@ 2019-05-07 15:40     ` Dan Williams
  -1 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-07 15:40 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang,
	david, Qemu Developers, virtualization, Andreas Dilger,
	Ross Zwisler, Andrea Arcangeli, linux-nvdimm, David Hildenbrand,
	Matthew Wilcox, Christoph Hellwig, Linux ACPI, linux-ext4,
	Len Brown, kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Paolo Bonzini, lcapitulino, Kevin Wolf, Nitesh Narayan Lal,
	Theodore Ts'o, Xiao Guangrong, Darrick J. Wong,
	Rafael J. Wysocki, Linux Kernel Mailing List, linux-xfs,
	linux-fsdevel, Igor Mammedov

On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> This patch adds 'DAXDEV_SYNC' flag which is set
> for nd_region doing synchronous flush. This later
> is used to disable MAP_SYNC functionality for
> ext4 & xfs filesystem for devices don't support
> synchronous flush.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
[..]
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 0dd316a74a29..c97fc0cc7167 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -7,6 +7,9 @@
>  #include <linux/radix-tree.h>
>  #include <asm/pgtable.h>
>
> +/* Flag for synchronous flush */
> +#define DAXDEV_F_SYNC true

I'd feel better, i.e. it reads more canonically, if this was defined
as (1UL << 0) and the argument to alloc_dax() was changed to 'unsigned
long flags' rather than a bool.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 3/6] libnvdimm: add dax_dev sync flag
@ 2019-05-07 15:40     ` Dan Williams
  0 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-07 15:40 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-nvdimm, Linux Kernel Mailing List, virtualization,
	KVM list, linux-fsdevel, Linux ACPI, Qemu Developers, linux-ext4,
	linux-xfs, Ross Zwisler, Vishal L Verma, Dave Jiang,
	Michael S. Tsirkin, Jason Wang, Matthew Wilcox,
	Rafael J. Wysocki, Christoph Hellwig, Len Brown, Jan Kara,
	Theodore

On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> This patch adds 'DAXDEV_SYNC' flag which is set
> for nd_region doing synchronous flush. This later
> is used to disable MAP_SYNC functionality for
> ext4 & xfs filesystem for devices don't support
> synchronous flush.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
[..]
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 0dd316a74a29..c97fc0cc7167 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -7,6 +7,9 @@
>  #include <linux/radix-tree.h>
>  #include <asm/pgtable.h>
>
> +/* Flag for synchronous flush */
> +#define DAXDEV_F_SYNC true

I'd feel better, i.e. it reads more canonically, if this was defined
as (1UL << 0) and the argument to alloc_dax() was changed to 'unsigned
long flags' rather than a bool.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 3/6] libnvdimm: add dax_dev sync flag
@ 2019-05-07 15:40     ` Dan Williams
  0 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-07 15:40 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-nvdimm, Linux Kernel Mailing List, virtualization,
	KVM list, linux-fsdevel, Linux ACPI, Qemu Developers, linux-ext4,
	linux-xfs, Ross Zwisler, Vishal L Verma, Dave Jiang,
	Michael S. Tsirkin, Jason Wang, Matthew Wilcox,
	Rafael J. Wysocki, Christoph Hellwig, Len Brown, Jan Kara,
	Theodore Ts'o, Andreas Dilger, Darrick J. Wong, lcapitulino,
	Kevin Wolf, Igor Mammedov, jmoyer, Nitesh Narayan Lal,
	Rik van Riel, Stefan Hajnoczi, Andrea Arcangeli,
	David Hildenbrand, david, cohuck, Xiao Guangrong, Paolo Bonzini,
	kilobyte, yuval shaia

On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> This patch adds 'DAXDEV_SYNC' flag which is set
> for nd_region doing synchronous flush. This later
> is used to disable MAP_SYNC functionality for
> ext4 & xfs filesystem for devices don't support
> synchronous flush.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
[..]
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 0dd316a74a29..c97fc0cc7167 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -7,6 +7,9 @@
>  #include <linux/radix-tree.h>
>  #include <asm/pgtable.h>
>
> +/* Flag for synchronous flush */
> +#define DAXDEV_F_SYNC true

I'd feel better, i.e. it reads more canonically, if this was defined
as (1UL << 0) and the argument to alloc_dax() was changed to 'unsigned
long flags' rather than a bool.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 3/6] libnvdimm: add dax_dev sync flag
@ 2019-05-07 15:40     ` Dan Williams
  0 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-07 15:40 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang,
	david, Qemu Developers, virtualization, Andreas Dilger,
	Ross Zwisler, Andrea Arcangeli, Dave Jiang, linux-nvdimm,
	Vishal L Verma, David Hildenbrand, Matthew Wilcox,
	Christoph Hellwig, Linux ACPI, jmoyer, linux-ext4, Len Brown,
	kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Paolo Bonzini, lcapitulino, Kevin Wolf, Nitesh Narayan Lal,
	Theodore Ts'o, Xiao Guangrong, Darrick J. Wong,
	Rafael J. Wysocki, Linux Kernel Mailing List, linux-xfs,
	linux-fsdevel, Igor Mammedov

On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> This patch adds 'DAXDEV_SYNC' flag which is set
> for nd_region doing synchronous flush. This later
> is used to disable MAP_SYNC functionality for
> ext4 & xfs filesystem for devices don't support
> synchronous flush.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
[..]
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 0dd316a74a29..c97fc0cc7167 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -7,6 +7,9 @@
>  #include <linux/radix-tree.h>
>  #include <asm/pgtable.h>
>
> +/* Flag for synchronous flush */
> +#define DAXDEV_F_SYNC true

I'd feel better, i.e. it reads more canonically, if this was defined
as (1UL << 0) and the argument to alloc_dax() was changed to 'unsigned
long flags' rather than a bool.


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 3/6] libnvdimm: add dax_dev sync flag
  2019-04-26  5:00   ` Pankaj Gupta
                     ` (3 preceding siblings ...)
  (?)
@ 2019-05-07 15:40   ` Dan Williams
  -1 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-07 15:40 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, Jan Kara, KVM list, Michael S. Tsirkin, david,
	Qemu Developers, virtualization, Andreas Dilger, Ross Zwisler,
	Andrea Arcangeli, Dave Jiang, linux-nvdimm, Vishal L Verma,
	Matthew Wilcox, Christoph Hellwig, Linux ACPI, jmoyer,
	linux-ext4, Len Brown, kilobyte, Rik van Riel, yuval shaia,
	Stefan Hajnoczi, Paolo Bonzini, lcapitulino, Nites

On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
>
> This patch adds 'DAXDEV_SYNC' flag which is set
> for nd_region doing synchronous flush. This later
> is used to disable MAP_SYNC functionality for
> ext4 & xfs filesystem for devices don't support
> synchronous flush.
>
> Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
[..]
> diff --git a/include/linux/dax.h b/include/linux/dax.h
> index 0dd316a74a29..c97fc0cc7167 100644
> --- a/include/linux/dax.h
> +++ b/include/linux/dax.h
> @@ -7,6 +7,9 @@
>  #include <linux/radix-tree.h>
>  #include <asm/pgtable.h>
>
> +/* Flag for synchronous flush */
> +#define DAXDEV_F_SYNC true

I'd feel better, i.e. it reads more canonically, if this was defined
as (1UL << 0) and the argument to alloc_dax() was changed to 'unsigned
long flags' rather than a bool.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 6/6] xfs: disable map_sync for async flush
  2019-05-07 15:37     ` Dan Williams
  (?)
  (?)
@ 2019-05-07 16:17       ` Darrick J. Wong
  -1 siblings, 0 replies; 107+ messages in thread
From: Darrick J. Wong @ 2019-05-07 16:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang, david,
	Qemu Developers, virtualization, Andreas Dilger, Ross Zwisler,
	Andrea Arcangeli, linux-nvdimm, David Hildenbrand,
	Matthew Wilcox, Christoph Hellwig, Linux ACPI, linux-ext4,
	Len Brown, kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Paolo Bonzini, lcapitulino, Kevin Wolf, Nitesh Narayan Lal,
	Theodore Ts'o, Xiao Guangrong, cohuck, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-xfs, linux-fsdevel,
	Igor Mammedov

On Tue, May 07, 2019 at 08:37:01AM -0700, Dan Williams wrote:
> On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> >
> > Dont support 'MAP_SYNC' with non-DAX files and DAX files
> > with asynchronous dax_device. Virtio pmem provides
> > asynchronous host page cache flush mechanism. We don't
> > support 'MAP_SYNC' with virtio pmem and xfs.
> >
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> >  fs/xfs/xfs_file.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> Darrick, does this look ok to take through the nvdimm tree?

<urk> forgot about this, sorry. :/

> >
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index a7ceae90110e..f17652cca5ff 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -1203,11 +1203,14 @@ xfs_file_mmap(
> >         struct file     *filp,
> >         struct vm_area_struct *vma)
> >  {
> > +       struct dax_device       *dax_dev;
> > +
> > +       dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
> >         /*
> > -        * We don't support synchronous mappings for non-DAX files. At least
> > -        * until someone comes with a sensible use case.
> > +        * We don't support synchronous mappings for non-DAX files and
> > +        * for DAX files if underneath dax_device is not synchronous.
> >          */
> > -       if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> > +       if (!daxdev_mapping_supported(vma, dax_dev))
> >                 return -EOPNOTSUPP;

LGTM, and I'm fine with it going through nvdimm.  Nothing in
xfs-5.2-merge touches that function so it should be clean.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> >
> >         file_accessed(filp);
> > --
> > 2.20.1
> >
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 6/6] xfs: disable map_sync for async flush
@ 2019-05-07 16:17       ` Darrick J. Wong
  0 siblings, 0 replies; 107+ messages in thread
From: Darrick J. Wong @ 2019-05-07 16:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Pankaj Gupta, Jan Kara, KVM list, Michael S. Tsirkin, david,
	Qemu Developers, virtualization, Andreas Dilger, Ross Zwisler,
	Andrea Arcangeli, Dave Jiang, linux-nvdimm, Vishal L Verma,
	Matthew Wilcox, Christoph Hellwig, Linux ACPI, jmoyer,
	linux-ext4, Len Brown, kilobyte, Rik van Riel, yuval shaia,
	Stefan Hajnoczi, Paolo Bonzini, lcapitulino

On Tue, May 07, 2019 at 08:37:01AM -0700, Dan Williams wrote:
> On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> >
> > Dont support 'MAP_SYNC' with non-DAX files and DAX files
> > with asynchronous dax_device. Virtio pmem provides
> > asynchronous host page cache flush mechanism. We don't
> > support 'MAP_SYNC' with virtio pmem and xfs.
> >
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> >  fs/xfs/xfs_file.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> Darrick, does this look ok to take through the nvdimm tree?

<urk> forgot about this, sorry. :/

> >
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index a7ceae90110e..f17652cca5ff 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -1203,11 +1203,14 @@ xfs_file_mmap(
> >         struct file     *filp,
> >         struct vm_area_struct *vma)
> >  {
> > +       struct dax_device       *dax_dev;
> > +
> > +       dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
> >         /*
> > -        * We don't support synchronous mappings for non-DAX files. At least
> > -        * until someone comes with a sensible use case.
> > +        * We don't support synchronous mappings for non-DAX files and
> > +        * for DAX files if underneath dax_device is not synchronous.
> >          */
> > -       if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> > +       if (!daxdev_mapping_supported(vma, dax_dev))
> >                 return -EOPNOTSUPP;

LGTM, and I'm fine with it going through nvdimm.  Nothing in
xfs-5.2-merge touches that function so it should be clean.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> >
> >         file_accessed(filp);
> > --
> > 2.20.1
> >

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 6/6] xfs: disable map_sync for async flush
@ 2019-05-07 16:17       ` Darrick J. Wong
  0 siblings, 0 replies; 107+ messages in thread
From: Darrick J. Wong @ 2019-05-07 16:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Pankaj Gupta, linux-nvdimm, Linux Kernel Mailing List,
	virtualization, KVM list, linux-fsdevel, Linux ACPI,
	Qemu Developers, linux-ext4, linux-xfs, Ross Zwisler,
	Vishal L Verma, Dave Jiang, Michael S. Tsirkin, Jason Wang,
	Matthew Wilcox, Rafael J. Wysocki, Christoph Hellwig, Len Brown,
	Jan Kara, Theodore Ts'o, Andreas Dilger, lcapitulino,
	Kevin Wolf, Igor Mammedov, jmoyer, Nitesh Narayan Lal,
	Rik van Riel, Stefan Hajnoczi, Andrea Arcangeli,
	David Hildenbrand, david, cohuck, Xiao Guangrong, Paolo Bonzini,
	kilobyte, yuval shaia

On Tue, May 07, 2019 at 08:37:01AM -0700, Dan Williams wrote:
> On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> >
> > Dont support 'MAP_SYNC' with non-DAX files and DAX files
> > with asynchronous dax_device. Virtio pmem provides
> > asynchronous host page cache flush mechanism. We don't
> > support 'MAP_SYNC' with virtio pmem and xfs.
> >
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> >  fs/xfs/xfs_file.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> Darrick, does this look ok to take through the nvdimm tree?

<urk> forgot about this, sorry. :/

> >
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index a7ceae90110e..f17652cca5ff 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -1203,11 +1203,14 @@ xfs_file_mmap(
> >         struct file     *filp,
> >         struct vm_area_struct *vma)
> >  {
> > +       struct dax_device       *dax_dev;
> > +
> > +       dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
> >         /*
> > -        * We don't support synchronous mappings for non-DAX files. At least
> > -        * until someone comes with a sensible use case.
> > +        * We don't support synchronous mappings for non-DAX files and
> > +        * for DAX files if underneath dax_device is not synchronous.
> >          */
> > -       if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> > +       if (!daxdev_mapping_supported(vma, dax_dev))
> >                 return -EOPNOTSUPP;

LGTM, and I'm fine with it going through nvdimm.  Nothing in
xfs-5.2-merge touches that function so it should be clean.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> >
> >         file_accessed(filp);
> > --
> > 2.20.1
> >

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 6/6] xfs: disable map_sync for async flush
@ 2019-05-07 16:17       ` Darrick J. Wong
  0 siblings, 0 replies; 107+ messages in thread
From: Darrick J. Wong @ 2019-05-07 16:17 UTC (permalink / raw)
  To: Dan Williams
  Cc: Pankaj Gupta, Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang,
	david, Qemu Developers, virtualization, Andreas Dilger,
	Ross Zwisler, Andrea Arcangeli, Dave Jiang, linux-nvdimm,
	Vishal L Verma, David Hildenbrand, Matthew Wilcox,
	Christoph Hellwig, Linux ACPI, jmoyer, linux-ext4, Len Brown,
	kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Paolo Bonzini, lcapitulino, Kevin Wolf, Nitesh Narayan Lal,
	Theodore Ts'o, Xiao Guangrong, cohuck, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-xfs, linux-fsdevel,
	Igor Mammedov

On Tue, May 07, 2019 at 08:37:01AM -0700, Dan Williams wrote:
> On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> >
> > Dont support 'MAP_SYNC' with non-DAX files and DAX files
> > with asynchronous dax_device. Virtio pmem provides
> > asynchronous host page cache flush mechanism. We don't
> > support 'MAP_SYNC' with virtio pmem and xfs.
> >
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> >  fs/xfs/xfs_file.c | 9 ++++++---
> >  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> Darrick, does this look ok to take through the nvdimm tree?

<urk> forgot about this, sorry. :/

> >
> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index a7ceae90110e..f17652cca5ff 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -1203,11 +1203,14 @@ xfs_file_mmap(
> >         struct file     *filp,
> >         struct vm_area_struct *vma)
> >  {
> > +       struct dax_device       *dax_dev;
> > +
> > +       dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
> >         /*
> > -        * We don't support synchronous mappings for non-DAX files. At least
> > -        * until someone comes with a sensible use case.
> > +        * We don't support synchronous mappings for non-DAX files and
> > +        * for DAX files if underneath dax_device is not synchronous.
> >          */
> > -       if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> > +       if (!daxdev_mapping_supported(vma, dax_dev))
> >                 return -EOPNOTSUPP;

LGTM, and I'm fine with it going through nvdimm.  Nothing in
xfs-5.2-merge touches that function so it should be clean.

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D

> >
> >         file_accessed(filp);
> > --
> > 2.20.1
> >


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/6] dax: check synchronous mapping is supported
@ 2019-05-07 19:24     ` Jakub Staroń
  0 siblings, 0 replies; 107+ messages in thread
From: Jakub Staroń @ 2019-05-07 19:24 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs, jack, mst,
	jasowang, david, lcapitulino, adilger.kernel, zwisler, aarcange,
	dave.jiang, darrick.wong, vishal.l.verma, david, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, yuval.shaia, stefanha,
	pbonzini, dan.j.williams, kwolf, tytso, xiaoguangrong.eric

From: Pankaj Gupta <pagupta@redhat.com>
Date: Thu, Apr 25, 2019 at 10:00 PM

> +static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
> +                               struct dax_device *dax_dev)
> +{
> +       return !(vma->flags & VM_SYNC);
> +}

Shouldn't it be rather `return !(vma->vm_flags & VM_SYNC);`? There is
no field named `flags` in `struct vm_area_struct`.

Thank you,
Jakub

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/6] dax: check synchronous mapping is supported
@ 2019-05-07 19:24     ` Jakub Staroń
  0 siblings, 0 replies; 107+ messages in thread
From: Jakub Staroń @ 2019-05-07 19:24 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs, jack, mst,
	jasowang, david, lcapitulino, adilger.kernel, zwisler, aarcange,
	dave.jiang, darrick.wong, vishal.l.verma, david, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, yuval.shaia, stefanha,
	pbonzini, dan.j.williams, kwolf, tytso, xiaoguangrong.eric,
	cohuck, rjw, imammedo, Stephen Barber

From: Pankaj Gupta <pagupta@redhat.com>
Date: Thu, Apr 25, 2019 at 10:00 PM

> +static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
> +                               struct dax_device *dax_dev)
> +{
> +       return !(vma->flags & VM_SYNC);
> +}

Shouldn't it be rather `return !(vma->vm_flags & VM_SYNC);`? There is
no field named `flags` in `struct vm_area_struct`.

Thank you,
Jakub

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/6] dax: check synchronous mapping is supported
@ 2019-05-07 19:24     ` Jakub Staroń
  0 siblings, 0 replies; 107+ messages in thread
From: Jakub Staroń via Qemu-devel @ 2019-05-07 19:24 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, jack, kvm, mst, jasowang, david, qemu-devel,
	virtualization, adilger.kernel, Stephen Barber, zwisler,
	aarcange, dave.jiang, linux-nvdimm, vishal.l.verma, david, willy,
	hch, linux-acpi, jmoyer, linux-ext4, lenb, kilobyte, riel,
	yuval.shaia, stefanha, imammedo, dan.j.williams, lcapitulino,
	kwolf, nilal, tytso, xiaoguangrong.eric, darrick.wong, rjw,
	linux-kernel, linux-xfs, linux-fsdevel, pbonzini

From: Pankaj Gupta <pagupta@redhat.com>
Date: Thu, Apr 25, 2019 at 10:00 PM

> +static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
> +                               struct dax_device *dax_dev)
> +{
> +       return !(vma->flags & VM_SYNC);
> +}

Shouldn't it be rather `return !(vma->vm_flags & VM_SYNC);`? There is
no field named `flags` in `struct vm_area_struct`.

Thank you,
Jakub


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/6] dax: check synchronous mapping is supported
  2019-04-26  5:00   ` Pankaj Gupta
                     ` (3 preceding siblings ...)
  (?)
@ 2019-05-07 19:24   ` Jakub Staroń via Virtualization
  -1 siblings, 0 replies; 107+ messages in thread
From: Jakub Staroń via Virtualization @ 2019-05-07 19:24 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, jack, kvm, mst, david, qemu-devel, virtualization,
	adilger.kernel, Stephen Barber, zwisler, aarcange, dave.jiang,
	linux-nvdimm, vishal.l.verma, willy, hch, linux-acpi, jmoyer,
	linux-ext4, lenb, kilobyte, riel, yuval.shaia, stefanha,
	imammedo, dan.j.williams, lcapitulino, nilal, tytso,
	xiaoguangrong.eric, darrick.wong, rjw, linux-kernel, linux-xfs,
	linux-fsdevel, pbonzini

From: Pankaj Gupta <pagupta@redhat.com>
Date: Thu, Apr 25, 2019 at 10:00 PM

> +static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
> +                               struct dax_device *dax_dev)
> +{
> +       return !(vma->flags & VM_SYNC);
> +}

Shouldn't it be rather `return !(vma->vm_flags & VM_SYNC);`? There is
no field named `flags` in `struct vm_area_struct`.

Thank you,
Jakub

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-04-26  5:00   ` Pankaj Gupta
@ 2019-05-07 20:25     ` Jakub Staroń via Qemu-devel
  -1 siblings, 0 replies; 107+ messages in thread
From: Jakub Staroń @ 2019-05-07 20:25 UTC (permalink / raw)
  To: Pankaj Gupta, linux-nvdimm, linux-kernel, virtualization, kvm,
	linux-fsdevel, linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: jack, mst, jasowang, david, lcapitulino, adilger.kernel, zwisler,
	aarcange, dave.jiang, darrick.wong, vishal.l.verma, david, willy,
	hch, jmoyer, nilal, lenb, kilobyte, riel, yuval.shaia, stefanha,
	pbonzini, dan.j.williams, kwolf, tytso, xiaoguangrong.eric,
	cohuck, rjw, imammedo, smbarber

On 4/25/19 10:00 PM, Pankaj Gupta wrote:

> +void host_ack(struct virtqueue *vq)
> +{
> +	unsigned int len;
> +	unsigned long flags;
> +	struct virtio_pmem_request *req, *req_buf;
> +	struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> +		req->done = true;
> +		wake_up(&req->host_acked);
> +
> +		if (!list_empty(&vpmem->req_list)) {
> +			req_buf = list_first_entry(&vpmem->req_list,
> +					struct virtio_pmem_request, list);
> +			list_del(&vpmem->req_list);

Shouldn't it be rather `list_del(vpmem->req_list.next)`? We are trying to unlink
first element of the list and `vpmem->req_list` is just the list head.

> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> +	int err;
> +	unsigned long flags;
> +	struct scatterlist *sgs[2], sg, ret;
> +	struct virtio_device *vdev = nd_region->provider_data;
> +	struct virtio_pmem *vpmem = vdev->priv;
> +	struct virtio_pmem_request *req;
> +
> +	might_sleep();
> +	req = kmalloc(sizeof(*req), GFP_KERNEL);
> +	if (!req)
> +		return -ENOMEM;
> +
> +	req->done = req->wq_buf_avail = false;
> +	strcpy(req->name, "FLUSH");
> +	init_waitqueue_head(&req->host_acked);
> +	init_waitqueue_head(&req->wq_buf);
> +	sg_init_one(&sg, req->name, strlen(req->name));
> +	sgs[0] = &sg;
> +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> +	sgs[1] = &ret;
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> +	if (err) {
> +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +
> +		list_add_tail(&vpmem->req_list, &req->list);
> +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +		/* When host has read buffer, this completes via host_ack */
> +		wait_event(req->wq_buf, req->wq_buf_avail);
> +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	}

Aren't the arguments in `list_add_tail` swapped? The element we are adding should
be first, the list should be second. Also, shouldn't we resubmit the request after
waking up from `wait_event(req->wq_buf, req->wq_buf_avail)`?

I propose rewriting it like that:

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
index 66b582f751a3..ff0556b04e86 100644
--- a/drivers/nvdimm/virtio_pmem.c
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -25,7 +25,7 @@ void host_ack(struct virtqueue *vq)
 		if (!list_empty(&vpmem->req_list)) {
 			req_buf = list_first_entry(&vpmem->req_list,
 					struct virtio_pmem_request, list);
-			list_del(&vpmem->req_list);
+			list_del(vpmem->req_list.next);
 			req_buf->wq_buf_avail = true;
 			wake_up(&req_buf->wq_buf);
 		}
@@ -59,17 +59,33 @@ int virtio_pmem_flush(struct nd_region *nd_region)
 	sgs[1] = &ret;
 
 	spin_lock_irqsave(&vpmem->pmem_lock, flags);
-	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
-	if (err) {
-		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
+	/*
+	 * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual queue does not
+	 * have free descriptor slots. We add the request to req_list and wait
+	 * for host_ack to wake us up when free slots are available.
+	 */
+	while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC)) == -ENOSPC) {
+		dev_err(&vdev->dev, "failed to send command to virtio pmem device, no free slots in the virtqueue, postponing request\n");
+		req->wq_buf_avail = false;
 
-		list_add_tail(&vpmem->req_list, &req->list);
+		list_add_tail(&req->list, &vpmem->req_list);
 		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
 
 		/* When host has read buffer, this completes via host_ack */
 		wait_event(req->wq_buf, req->wq_buf_avail);
 		spin_lock_irqsave(&vpmem->pmem_lock, flags);
 	}
+
+	/*
+	 * virtqueue_add_sgs failed with error different than -ENOSPC, we can't
+	 * do anything about that.
+	 */
+	if (err) {
+		dev_info(&vdev->dev, "failed to send command to virtio pmem device, error code %d\n", err);
+		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+		err = -EIO;
+		goto ret;
+	}
 	err = virtqueue_kick(vpmem->req_vq);
 	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);


Let me know if it looks reasonable to you.

Thank you,
Jakub Staron

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-07 20:25     ` Jakub Staroń via Qemu-devel
  0 siblings, 0 replies; 107+ messages in thread
From: Jakub Staroń via Qemu-devel @ 2019-05-07 20:25 UTC (permalink / raw)
  To: Pankaj Gupta, linux-nvdimm, linux-kernel, virtualization, kvm,
	linux-fsdevel, linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: jack, mst, jasowang, david, lcapitulino, adilger.kernel,
	smbarber, zwisler, aarcange, dave.jiang, darrick.wong,
	vishal.l.verma, david, willy, hch, jmoyer, nilal, lenb, kilobyte,
	riel, yuval.shaia, stefanha, imammedo, dan.j.williams, kwolf,
	tytso, xiaoguangrong.eric, cohuck, rjw, pbonzini

On 4/25/19 10:00 PM, Pankaj Gupta wrote:

> +void host_ack(struct virtqueue *vq)
> +{
> +	unsigned int len;
> +	unsigned long flags;
> +	struct virtio_pmem_request *req, *req_buf;
> +	struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> +		req->done = true;
> +		wake_up(&req->host_acked);
> +
> +		if (!list_empty(&vpmem->req_list)) {
> +			req_buf = list_first_entry(&vpmem->req_list,
> +					struct virtio_pmem_request, list);
> +			list_del(&vpmem->req_list);

Shouldn't it be rather `list_del(vpmem->req_list.next)`? We are trying to unlink
first element of the list and `vpmem->req_list` is just the list head.

> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> +	int err;
> +	unsigned long flags;
> +	struct scatterlist *sgs[2], sg, ret;
> +	struct virtio_device *vdev = nd_region->provider_data;
> +	struct virtio_pmem *vpmem = vdev->priv;
> +	struct virtio_pmem_request *req;
> +
> +	might_sleep();
> +	req = kmalloc(sizeof(*req), GFP_KERNEL);
> +	if (!req)
> +		return -ENOMEM;
> +
> +	req->done = req->wq_buf_avail = false;
> +	strcpy(req->name, "FLUSH");
> +	init_waitqueue_head(&req->host_acked);
> +	init_waitqueue_head(&req->wq_buf);
> +	sg_init_one(&sg, req->name, strlen(req->name));
> +	sgs[0] = &sg;
> +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> +	sgs[1] = &ret;
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> +	if (err) {
> +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +
> +		list_add_tail(&vpmem->req_list, &req->list);
> +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +		/* When host has read buffer, this completes via host_ack */
> +		wait_event(req->wq_buf, req->wq_buf_avail);
> +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	}

Aren't the arguments in `list_add_tail` swapped? The element we are adding should
be first, the list should be second. Also, shouldn't we resubmit the request after
waking up from `wait_event(req->wq_buf, req->wq_buf_avail)`?

I propose rewriting it like that:

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
index 66b582f751a3..ff0556b04e86 100644
--- a/drivers/nvdimm/virtio_pmem.c
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -25,7 +25,7 @@ void host_ack(struct virtqueue *vq)
 		if (!list_empty(&vpmem->req_list)) {
 			req_buf = list_first_entry(&vpmem->req_list,
 					struct virtio_pmem_request, list);
-			list_del(&vpmem->req_list);
+			list_del(vpmem->req_list.next);
 			req_buf->wq_buf_avail = true;
 			wake_up(&req_buf->wq_buf);
 		}
@@ -59,17 +59,33 @@ int virtio_pmem_flush(struct nd_region *nd_region)
 	sgs[1] = &ret;
 
 	spin_lock_irqsave(&vpmem->pmem_lock, flags);
-	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
-	if (err) {
-		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
+	/*
+	 * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual queue does not
+	 * have free descriptor slots. We add the request to req_list and wait
+	 * for host_ack to wake us up when free slots are available.
+	 */
+	while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC)) == -ENOSPC) {
+		dev_err(&vdev->dev, "failed to send command to virtio pmem device, no free slots in the virtqueue, postponing request\n");
+		req->wq_buf_avail = false;
 
-		list_add_tail(&vpmem->req_list, &req->list);
+		list_add_tail(&req->list, &vpmem->req_list);
 		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
 
 		/* When host has read buffer, this completes via host_ack */
 		wait_event(req->wq_buf, req->wq_buf_avail);
 		spin_lock_irqsave(&vpmem->pmem_lock, flags);
 	}
+
+	/*
+	 * virtqueue_add_sgs failed with error different than -ENOSPC, we can't
+	 * do anything about that.
+	 */
+	if (err) {
+		dev_info(&vdev->dev, "failed to send command to virtio pmem device, error code %d\n", err);
+		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+		err = -EIO;
+		goto ret;
+	}
 	err = virtqueue_kick(vpmem->req_vq);
 	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);


Let me know if it looks reasonable to you.

Thank you,
Jakub Staron



^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-04-26  5:00   ` Pankaj Gupta
                     ` (7 preceding siblings ...)
  (?)
@ 2019-05-07 20:25   ` Jakub Staroń via Virtualization
  -1 siblings, 0 replies; 107+ messages in thread
From: Jakub Staroń via Virtualization @ 2019-05-07 20:25 UTC (permalink / raw)
  To: Pankaj Gupta, linux-nvdimm, linux-kernel, virtualization, kvm,
	linux-fsdevel, linux-acpi, qemu-devel, linux-ext4, linux-xfs
  Cc: jack, mst, david, lcapitulino, adilger.kernel, smbarber, zwisler,
	aarcange, dave.jiang, darrick.wong, vishal.l.verma, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, yuval.shaia, stefanha,
	imammedo, dan.j.williams, tytso, xiaoguangrong.eric, cohuck, rjw,
	pbonzini

On 4/25/19 10:00 PM, Pankaj Gupta wrote:

> +void host_ack(struct virtqueue *vq)
> +{
> +	unsigned int len;
> +	unsigned long flags;
> +	struct virtio_pmem_request *req, *req_buf;
> +	struct virtio_pmem *vpmem = vq->vdev->priv;
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> +		req->done = true;
> +		wake_up(&req->host_acked);
> +
> +		if (!list_empty(&vpmem->req_list)) {
> +			req_buf = list_first_entry(&vpmem->req_list,
> +					struct virtio_pmem_request, list);
> +			list_del(&vpmem->req_list);

Shouldn't it be rather `list_del(vpmem->req_list.next)`? We are trying to unlink
first element of the list and `vpmem->req_list` is just the list head.

> +int virtio_pmem_flush(struct nd_region *nd_region)
> +{
> +	int err;
> +	unsigned long flags;
> +	struct scatterlist *sgs[2], sg, ret;
> +	struct virtio_device *vdev = nd_region->provider_data;
> +	struct virtio_pmem *vpmem = vdev->priv;
> +	struct virtio_pmem_request *req;
> +
> +	might_sleep();
> +	req = kmalloc(sizeof(*req), GFP_KERNEL);
> +	if (!req)
> +		return -ENOMEM;
> +
> +	req->done = req->wq_buf_avail = false;
> +	strcpy(req->name, "FLUSH");
> +	init_waitqueue_head(&req->host_acked);
> +	init_waitqueue_head(&req->wq_buf);
> +	sg_init_one(&sg, req->name, strlen(req->name));
> +	sgs[0] = &sg;
> +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> +	sgs[1] = &ret;
> +
> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> +	if (err) {
> +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +
> +		list_add_tail(&vpmem->req_list, &req->list);
> +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +
> +		/* When host has read buffer, this completes via host_ack */
> +		wait_event(req->wq_buf, req->wq_buf_avail);
> +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> +	}

Aren't the arguments in `list_add_tail` swapped? The element we are adding should
be first, the list should be second. Also, shouldn't we resubmit the request after
waking up from `wait_event(req->wq_buf, req->wq_buf_avail)`?

I propose rewriting it like that:

diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
index 66b582f751a3..ff0556b04e86 100644
--- a/drivers/nvdimm/virtio_pmem.c
+++ b/drivers/nvdimm/virtio_pmem.c
@@ -25,7 +25,7 @@ void host_ack(struct virtqueue *vq)
 		if (!list_empty(&vpmem->req_list)) {
 			req_buf = list_first_entry(&vpmem->req_list,
 					struct virtio_pmem_request, list);
-			list_del(&vpmem->req_list);
+			list_del(vpmem->req_list.next);
 			req_buf->wq_buf_avail = true;
 			wake_up(&req_buf->wq_buf);
 		}
@@ -59,17 +59,33 @@ int virtio_pmem_flush(struct nd_region *nd_region)
 	sgs[1] = &ret;
 
 	spin_lock_irqsave(&vpmem->pmem_lock, flags);
-	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
-	if (err) {
-		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
+	/*
+	 * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual queue does not
+	 * have free descriptor slots. We add the request to req_list and wait
+	 * for host_ack to wake us up when free slots are available.
+	 */
+	while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC)) == -ENOSPC) {
+		dev_err(&vdev->dev, "failed to send command to virtio pmem device, no free slots in the virtqueue, postponing request\n");
+		req->wq_buf_avail = false;
 
-		list_add_tail(&vpmem->req_list, &req->list);
+		list_add_tail(&req->list, &vpmem->req_list);
 		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
 
 		/* When host has read buffer, this completes via host_ack */
 		wait_event(req->wq_buf, req->wq_buf_avail);
 		spin_lock_irqsave(&vpmem->pmem_lock, flags);
 	}
+
+	/*
+	 * virtqueue_add_sgs failed with error different than -ENOSPC, we can't
+	 * do anything about that.
+	 */
+	if (err) {
+		dev_info(&vdev->dev, "failed to send command to virtio pmem device, error code %d\n", err);
+		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
+		err = -EIO;
+		goto ret;
+	}
 	err = virtqueue_kick(vpmem->req_vq);
 	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);


Let me know if it looks reasonable to you.

Thank you,
Jakub Staron

^ permalink raw reply related	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/6] dax: check synchronous mapping is supported
  2019-05-07 19:24     ` Jakub Staroń
  (?)
  (?)
@ 2019-05-08  5:31       ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08  5:31 UTC (permalink / raw)
  To: Jakub Staroń
  Cc: cohuck, jack, kvm, mst, jasowang, david, qemu-devel,
	virtualization, adilger kernel, Stephen Barber, zwisler,
	aarcange, linux-nvdimm, david, willy, hch, linux-acpi,
	linux-ext4, lenb, kilobyte, riel, yuval shaia, stefanha,
	imammedo, lcapitulino, kwolf, nilal, tytso, xiaoguangrong eric,
	darrick wong, rjw, linux-kernel, linux-xfs, linux-fsdevel,
	pbonzini


> 
> From: Pankaj Gupta <pagupta@redhat.com>
> Date: Thu, Apr 25, 2019 at 10:00 PM
> 
> > +static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
> > +                               struct dax_device *dax_dev)
> > +{
> > +       return !(vma->flags & VM_SYNC);
> > +}
> 
> Shouldn't it be rather `return !(vma->vm_flags & VM_SYNC);`? There is
> no field named `flags` in `struct vm_area_struct`.

Thanks for catching. Sorry! for this. 

Will correct in v8.

Thank you,
Pankaj 

> 
> Thank you,
> Jakub
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/6] dax: check synchronous mapping is supported
@ 2019-05-08  5:31       ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08  5:31 UTC (permalink / raw)
  To: Jakub Staroń
  Cc: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs, jack, mst,
	jasowang, david, lcapitulino, adilger kernel, zwisler, aarcange,
	dave jiang, darrick wong, vishal l verma, david, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, yuval shaia, stefanha,
	pbonzini


> 
> From: Pankaj Gupta <pagupta@redhat.com>
> Date: Thu, Apr 25, 2019 at 10:00 PM
> 
> > +static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
> > +                               struct dax_device *dax_dev)
> > +{
> > +       return !(vma->flags & VM_SYNC);
> > +}
> 
> Shouldn't it be rather `return !(vma->vm_flags & VM_SYNC);`? There is
> no field named `flags` in `struct vm_area_struct`.

Thanks for catching. Sorry! for this. 

Will correct in v8.

Thank you,
Pankaj 

> 
> Thank you,
> Jakub
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/6] dax: check synchronous mapping is supported
@ 2019-05-08  5:31       ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08  5:31 UTC (permalink / raw)
  To: Jakub Staroń
  Cc: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs, jack, mst,
	jasowang, david, lcapitulino, adilger kernel, zwisler, aarcange,
	dave jiang, darrick wong, vishal l verma, david, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, yuval shaia, stefanha,
	pbonzini, dan j williams, kwolf, tytso, xiaoguangrong eric,
	cohuck, rjw, imammedo, Stephen Barber


> 
> From: Pankaj Gupta <pagupta@redhat.com>
> Date: Thu, Apr 25, 2019 at 10:00 PM
> 
> > +static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
> > +                               struct dax_device *dax_dev)
> > +{
> > +       return !(vma->flags & VM_SYNC);
> > +}
> 
> Shouldn't it be rather `return !(vma->vm_flags & VM_SYNC);`? There is
> no field named `flags` in `struct vm_area_struct`.

Thanks for catching. Sorry! for this. 

Will correct in v8.

Thank you,
Pankaj 

> 
> Thank you,
> Jakub
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/6] dax: check synchronous mapping is supported
@ 2019-05-08  5:31       ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08  5:31 UTC (permalink / raw)
  To: Jakub Staroń
  Cc: cohuck, jack, kvm, mst, jasowang, david, qemu-devel,
	virtualization, adilger kernel, Stephen Barber, zwisler,
	aarcange, dave jiang, linux-nvdimm, vishal l verma, david, willy,
	hch, linux-acpi, jmoyer, linux-ext4, lenb, kilobyte, riel,
	yuval shaia, stefanha, imammedo, dan j williams, lcapitulino,
	kwolf, nilal, tytso, xiaoguangrong eric, darrick wong, rjw,
	linux-kernel, linux-xfs, linux-fsdevel, pbonzini


> 
> From: Pankaj Gupta <pagupta@redhat.com>
> Date: Thu, Apr 25, 2019 at 10:00 PM
> 
> > +static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
> > +                               struct dax_device *dax_dev)
> > +{
> > +       return !(vma->flags & VM_SYNC);
> > +}
> 
> Shouldn't it be rather `return !(vma->vm_flags & VM_SYNC);`? There is
> no field named `flags` in `struct vm_area_struct`.

Thanks for catching. Sorry! for this. 

Will correct in v8.

Thank you,
Pankaj 

> 
> Thank you,
> Jakub
> 


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 4/6] dax: check synchronous mapping is supported
  2019-05-07 19:24     ` Jakub Staroń
                       ` (2 preceding siblings ...)
  (?)
@ 2019-05-08  5:31     ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08  5:31 UTC (permalink / raw)
  To: Jakub Staroń
  Cc: cohuck, jack, kvm, mst, david, qemu-devel, virtualization,
	adilger kernel, Stephen Barber, zwisler, aarcange, dave jiang,
	linux-nvdimm, vishal l verma, willy, hch, linux-acpi, jmoyer,
	linux-ext4, lenb, kilobyte, riel, yuval shaia, stefanha,
	imammedo, dan j williams, lcapitulino, nilal, tytso,
	xiaoguangrong eric, darrick wong, rjw, linux-kernel


> 
> From: Pankaj Gupta <pagupta@redhat.com>
> Date: Thu, Apr 25, 2019 at 10:00 PM
> 
> > +static inline bool daxdev_mapping_supported(struct vm_area_struct *vma,
> > +                               struct dax_device *dax_dev)
> > +{
> > +       return !(vma->flags & VM_SYNC);
> > +}
> 
> Shouldn't it be rather `return !(vma->vm_flags & VM_SYNC);`? There is
> no field named `flags` in `struct vm_area_struct`.

Thanks for catching. Sorry! for this. 

Will correct in v8.

Thank you,
Pankaj 

> 
> Thank you,
> Jakub
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 6/6] xfs: disable map_sync for async flush
  2019-05-07 16:17       ` Darrick J. Wong
  (?)
  (?)
@ 2019-05-08  5:49         ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08  5:49 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang, david,
	Qemu Developers, virtualization, Andreas Dilger, Ross Zwisler,
	Andrea Arcangeli, linux-nvdimm, David Hildenbrand,
	Matthew Wilcox, Christoph Hellwig, Linux ACPI, linux-ext4,
	Len Brown, kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Igor Mammedov, lcapitulino, Kevin Wolf, Nitesh Narayan Lal,
	Theodore Ts'o, Xiao Guangrong, cohuck, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-xfs, linux-fsdevel,
	Paolo Bonzini


> 
> On Tue, May 07, 2019 at 08:37:01AM -0700, Dan Williams wrote:
> > On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> > >
> > > Dont support 'MAP_SYNC' with non-DAX files and DAX files
> > > with asynchronous dax_device. Virtio pmem provides
> > > asynchronous host page cache flush mechanism. We don't
> > > support 'MAP_SYNC' with virtio pmem and xfs.
> > >
> > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > > ---
> > >  fs/xfs/xfs_file.c | 9 ++++++---
> > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > Darrick, does this look ok to take through the nvdimm tree?
> 
> <urk> forgot about this, sorry. :/
> 
> > >
> > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > > index a7ceae90110e..f17652cca5ff 100644
> > > --- a/fs/xfs/xfs_file.c
> > > +++ b/fs/xfs/xfs_file.c
> > > @@ -1203,11 +1203,14 @@ xfs_file_mmap(
> > >         struct file     *filp,
> > >         struct vm_area_struct *vma)
> > >  {
> > > +       struct dax_device       *dax_dev;
> > > +
> > > +       dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
> > >         /*
> > > -        * We don't support synchronous mappings for non-DAX files. At
> > > least
> > > -        * until someone comes with a sensible use case.
> > > +        * We don't support synchronous mappings for non-DAX files and
> > > +        * for DAX files if underneath dax_device is not synchronous.
> > >          */
> > > -       if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> > > +       if (!daxdev_mapping_supported(vma, dax_dev))
> > >                 return -EOPNOTSUPP;
> 
> LGTM, and I'm fine with it going through nvdimm.  Nothing in
> xfs-5.2-merge touches that function so it should be clean.
> 
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Thank you for the review.

Pankaj

> 
> --D
> 
> > >
> > >         file_accessed(filp);
> > > --
> > > 2.20.1
> > >
> 
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 6/6] xfs: disable map_sync for async flush
@ 2019-05-08  5:49         ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08  5:49 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang, david,
	Qemu Developers,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Andreas Dilger, Ross Zwisler, Andrea Arcangeli, linux-nvdimm,
	David Hildenbrand, Matthew Wilcox, Christoph Hellwig, Linux ACPI,
	linux-ext4, Len Brown, kilobyte-b9QjgO8OEXPVItvQsEIGlw,
	Rik van Riel, yuval shaia, Stefan Hajnoczi, Igor Mammedov,
	lcapitulino-H+wXaHxf7aLQT0dZR+AlfA, Kevin Wolf,
	Nitesh Narayan Lal


> 
> On Tue, May 07, 2019 at 08:37:01AM -0700, Dan Williams wrote:
> > On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta <pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> > >
> > > Dont support 'MAP_SYNC' with non-DAX files and DAX files
> > > with asynchronous dax_device. Virtio pmem provides
> > > asynchronous host page cache flush mechanism. We don't
> > > support 'MAP_SYNC' with virtio pmem and xfs.
> > >
> > > Signed-off-by: Pankaj Gupta <pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > > ---
> > >  fs/xfs/xfs_file.c | 9 ++++++---
> > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > Darrick, does this look ok to take through the nvdimm tree?
> 
> <urk> forgot about this, sorry. :/
> 
> > >
> > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > > index a7ceae90110e..f17652cca5ff 100644
> > > --- a/fs/xfs/xfs_file.c
> > > +++ b/fs/xfs/xfs_file.c
> > > @@ -1203,11 +1203,14 @@ xfs_file_mmap(
> > >         struct file     *filp,
> > >         struct vm_area_struct *vma)
> > >  {
> > > +       struct dax_device       *dax_dev;
> > > +
> > > +       dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
> > >         /*
> > > -        * We don't support synchronous mappings for non-DAX files. At
> > > least
> > > -        * until someone comes with a sensible use case.
> > > +        * We don't support synchronous mappings for non-DAX files and
> > > +        * for DAX files if underneath dax_device is not synchronous.
> > >          */
> > > -       if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> > > +       if (!daxdev_mapping_supported(vma, dax_dev))
> > >                 return -EOPNOTSUPP;
> 
> LGTM, and I'm fine with it going through nvdimm.  Nothing in
> xfs-5.2-merge touches that function so it should be clean.
> 
> Reviewed-by: Darrick J. Wong <darrick.wong-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

Thank you for the review.

Pankaj

> 
> --D
> 
> > >
> > >         file_accessed(filp);
> > > --
> > > 2.20.1
> > >
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 6/6] xfs: disable map_sync for async flush
@ 2019-05-08  5:49         ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08  5:49 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dan Williams, Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang,
	david, Qemu Developers, virtualization, Andreas Dilger,
	Ross Zwisler, Andrea Arcangeli, Dave Jiang, linux-nvdimm,
	Vishal L Verma, David Hildenbrand, Matthew Wilcox,
	Christoph Hellwig, Linux ACPI, jmoyer, linux-ext4, Len Brown,
	kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Paolo Bonzini, lcapitulino, Kevin Wolf, Nitesh Narayan Lal,
	Theodore Ts'o, Xiao Guangrong, cohuck, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-xfs, linux-fsdevel,
	Igor Mammedov


> 
> On Tue, May 07, 2019 at 08:37:01AM -0700, Dan Williams wrote:
> > On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> > >
> > > Dont support 'MAP_SYNC' with non-DAX files and DAX files
> > > with asynchronous dax_device. Virtio pmem provides
> > > asynchronous host page cache flush mechanism. We don't
> > > support 'MAP_SYNC' with virtio pmem and xfs.
> > >
> > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > > ---
> > >  fs/xfs/xfs_file.c | 9 ++++++---
> > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > Darrick, does this look ok to take through the nvdimm tree?
> 
> <urk> forgot about this, sorry. :/
> 
> > >
> > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > > index a7ceae90110e..f17652cca5ff 100644
> > > --- a/fs/xfs/xfs_file.c
> > > +++ b/fs/xfs/xfs_file.c
> > > @@ -1203,11 +1203,14 @@ xfs_file_mmap(
> > >         struct file     *filp,
> > >         struct vm_area_struct *vma)
> > >  {
> > > +       struct dax_device       *dax_dev;
> > > +
> > > +       dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
> > >         /*
> > > -        * We don't support synchronous mappings for non-DAX files. At
> > > least
> > > -        * until someone comes with a sensible use case.
> > > +        * We don't support synchronous mappings for non-DAX files and
> > > +        * for DAX files if underneath dax_device is not synchronous.
> > >          */
> > > -       if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> > > +       if (!daxdev_mapping_supported(vma, dax_dev))
> > >                 return -EOPNOTSUPP;
> 
> LGTM, and I'm fine with it going through nvdimm.  Nothing in
> xfs-5.2-merge touches that function so it should be clean.
> 
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Thank you for the review.

Pankaj

> 
> --D
> 
> > >
> > >         file_accessed(filp);
> > > --
> > > 2.20.1
> > >
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 6/6] xfs: disable map_sync for async flush
@ 2019-05-08  5:49         ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08  5:49 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang, david,
	Qemu Developers, virtualization, Andreas Dilger, Ross Zwisler,
	Andrea Arcangeli, Dave Jiang, linux-nvdimm, Vishal L Verma,
	David Hildenbrand, Matthew Wilcox, Christoph Hellwig, Linux ACPI,
	jmoyer, linux-ext4, Len Brown, kilobyte, Rik van Riel,
	yuval shaia, Stefan Hajnoczi, Igor Mammedov, Dan Williams,
	lcapitulino, Kevin Wolf, Nitesh Narayan Lal, Theodore Ts'o,
	Xiao Guangrong, cohuck, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-xfs, linux-fsdevel,
	Paolo Bonzini


> 
> On Tue, May 07, 2019 at 08:37:01AM -0700, Dan Williams wrote:
> > On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> > >
> > > Dont support 'MAP_SYNC' with non-DAX files and DAX files
> > > with asynchronous dax_device. Virtio pmem provides
> > > asynchronous host page cache flush mechanism. We don't
> > > support 'MAP_SYNC' with virtio pmem and xfs.
> > >
> > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > > ---
> > >  fs/xfs/xfs_file.c | 9 ++++++---
> > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > Darrick, does this look ok to take through the nvdimm tree?
> 
> <urk> forgot about this, sorry. :/
> 
> > >
> > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > > index a7ceae90110e..f17652cca5ff 100644
> > > --- a/fs/xfs/xfs_file.c
> > > +++ b/fs/xfs/xfs_file.c
> > > @@ -1203,11 +1203,14 @@ xfs_file_mmap(
> > >         struct file     *filp,
> > >         struct vm_area_struct *vma)
> > >  {
> > > +       struct dax_device       *dax_dev;
> > > +
> > > +       dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
> > >         /*
> > > -        * We don't support synchronous mappings for non-DAX files. At
> > > least
> > > -        * until someone comes with a sensible use case.
> > > +        * We don't support synchronous mappings for non-DAX files and
> > > +        * for DAX files if underneath dax_device is not synchronous.
> > >          */
> > > -       if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> > > +       if (!daxdev_mapping_supported(vma, dax_dev))
> > >                 return -EOPNOTSUPP;
> 
> LGTM, and I'm fine with it going through nvdimm.  Nothing in
> xfs-5.2-merge touches that function so it should be clean.
> 
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Thank you for the review.

Pankaj

> 
> --D
> 
> > >
> > >         file_accessed(filp);
> > > --
> > > 2.20.1
> > >
> 
> 


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 6/6] xfs: disable map_sync for async flush
  2019-05-07 16:17       ` Darrick J. Wong
                         ` (2 preceding siblings ...)
  (?)
@ 2019-05-08  5:49       ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08  5:49 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Jan Kara, KVM list, Michael S. Tsirkin, david, Qemu Developers,
	virtualization, Andreas Dilger, Ross Zwisler, Andrea Arcangeli,
	Dave Jiang, linux-nvdimm, Vishal L Verma, Matthew Wilcox,
	Christoph Hellwig, Linux ACPI, jmoyer, linux-ext4, Len Brown,
	kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Igor Mammedov, Dan Williams, lcapitul


> 
> On Tue, May 07, 2019 at 08:37:01AM -0700, Dan Williams wrote:
> > On Thu, Apr 25, 2019 at 10:03 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> > >
> > > Dont support 'MAP_SYNC' with non-DAX files and DAX files
> > > with asynchronous dax_device. Virtio pmem provides
> > > asynchronous host page cache flush mechanism. We don't
> > > support 'MAP_SYNC' with virtio pmem and xfs.
> > >
> > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > > ---
> > >  fs/xfs/xfs_file.c | 9 ++++++---
> > >  1 file changed, 6 insertions(+), 3 deletions(-)
> > 
> > Darrick, does this look ok to take through the nvdimm tree?
> 
> <urk> forgot about this, sorry. :/
> 
> > >
> > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > > index a7ceae90110e..f17652cca5ff 100644
> > > --- a/fs/xfs/xfs_file.c
> > > +++ b/fs/xfs/xfs_file.c
> > > @@ -1203,11 +1203,14 @@ xfs_file_mmap(
> > >         struct file     *filp,
> > >         struct vm_area_struct *vma)
> > >  {
> > > +       struct dax_device       *dax_dev;
> > > +
> > > +       dax_dev = xfs_find_daxdev_for_inode(file_inode(filp));
> > >         /*
> > > -        * We don't support synchronous mappings for non-DAX files. At
> > > least
> > > -        * until someone comes with a sensible use case.
> > > +        * We don't support synchronous mappings for non-DAX files and
> > > +        * for DAX files if underneath dax_device is not synchronous.
> > >          */
> > > -       if (!IS_DAX(file_inode(filp)) && (vma->vm_flags & VM_SYNC))
> > > +       if (!daxdev_mapping_supported(vma, dax_dev))
> > >                 return -EOPNOTSUPP;
> 
> LGTM, and I'm fine with it going through nvdimm.  Nothing in
> xfs-5.2-merge touches that function so it should be clean.
> 
> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

Thank you for the review.

Pankaj

> 
> --D
> 
> > >
> > >         file_accessed(filp);
> > > --
> > > 2.20.1
> > >
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-05-07 20:25     ` Jakub Staroń via Qemu-devel
  (?)
  (?)
@ 2019-05-08 11:12       ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08 11:12 UTC (permalink / raw)
  To: Jakub Staroń
  Cc: cohuck, jack, kvm, mst, jasowang, david, qemu-devel,
	virtualization, adilger kernel, smbarber, zwisler, aarcange,
	linux-nvdimm, david, willy, hch, linux-acpi, linux-ext4, lenb,
	kilobyte, riel, yuval shaia, stefanha, imammedo, lcapitulino,
	kwolf, nilal, tytso, xiaoguangrong eric, darrick wong, rjw,
	linux-kernel, linux-xfs, linux-fsdevel, pbonzini


> 
> On 4/25/19 10:00 PM, Pankaj Gupta wrote:
> 
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +	unsigned int len;
> > +	unsigned long flags;
> > +	struct virtio_pmem_request *req, *req_buf;
> > +	struct virtio_pmem *vpmem = vq->vdev->priv;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > +		req->done = true;
> > +		wake_up(&req->host_acked);
> > +
> > +		if (!list_empty(&vpmem->req_list)) {
> > +			req_buf = list_first_entry(&vpmem->req_list,
> > +					struct virtio_pmem_request, list);
> > +			list_del(&vpmem->req_list);
> 
> Shouldn't it be rather `list_del(vpmem->req_list.next)`? We are trying to
> unlink
> first element of the list and `vpmem->req_list` is just the list head.

This looks correct. We are not deleting head but first entry in 'req_list'
which is device corresponding list of pending requests.

Please see below:

/**
 * Retrieve the first list entry for the given list pointer.
 *
 * Example:
 * struct foo *first;
 * first = list_first_entry(&bar->list_of_foos, struct foo, list_of_foos);
 *
 * @param ptr The list head
 * @param type Data type of the list element to retrieve
 * @param member Member name of the struct list_head field in the list element.
 * @return A pointer to the first list element.
 */
#define list_first_entry(ptr, type, member) \
    list_entry((ptr)->next, type, member)

> 
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +	int err;
> > +	unsigned long flags;
> > +	struct scatterlist *sgs[2], sg, ret;
> > +	struct virtio_device *vdev = nd_region->provider_data;
> > +	struct virtio_pmem *vpmem = vdev->priv;
> > +	struct virtio_pmem_request *req;
> > +
> > +	might_sleep();
> > +	req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +	if (!req)
> > +		return -ENOMEM;
> > +
> > +	req->done = req->wq_buf_avail = false;
> > +	strcpy(req->name, "FLUSH");
> > +	init_waitqueue_head(&req->host_acked);
> > +	init_waitqueue_head(&req->wq_buf);
> > +	sg_init_one(&sg, req->name, strlen(req->name));
> > +	sgs[0] = &sg;
> > +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > +	sgs[1] = &ret;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > +	if (err) {
> > +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> > +
> > +		list_add_tail(&vpmem->req_list, &req->list);
> > +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +		/* When host has read buffer, this completes via host_ack */
> > +		wait_event(req->wq_buf, req->wq_buf_avail);
> > +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	}
> 
> Aren't the arguments in `list_add_tail` swapped? The element we are adding

No, this is intentional. 'vpmem->req_list' maintains a list of pending requests
for entire pmem device.  'req->list'is per request list and maintains pending
request on virtio queue add failure. I think we don't need this list.

> should
> be first, the list should be second. Also, shouldn't we resubmit the request
> after
> waking up from `wait_event(req->wq_buf, req->wq_buf_avail)`?

Yes. we should. Good point.

> 
> I propose rewriting it like that:
> 
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> index 66b582f751a3..ff0556b04e86 100644
> --- a/drivers/nvdimm/virtio_pmem.c
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -25,7 +25,7 @@ void host_ack(struct virtqueue *vq)
>  		if (!list_empty(&vpmem->req_list)) {
>  			req_buf = list_first_entry(&vpmem->req_list,
>  					struct virtio_pmem_request, list);
> -			list_del(&vpmem->req_list);
> +			list_del(vpmem->req_list.next);

Don't think its correct.

>  			req_buf->wq_buf_avail = true;
>  			wake_up(&req_buf->wq_buf);
>  		}
> @@ -59,17 +59,33 @@ int virtio_pmem_flush(struct nd_region *nd_region)
>  	sgs[1] = &ret;
>  
>  	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> -	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> -	if (err) {
> -		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +	/*
> +	 * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual queue does not
> +	 * have free descriptor slots. We add the request to req_list and wait
> +	 * for host_ack to wake us up when free slots are available.
> +	 */
> +	while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC))
> == -ENOSPC) {
> +		dev_err(&vdev->dev, "failed to send command to virtio pmem device, no free
> slots in the virtqueue, postponing request\n");
> +		req->wq_buf_avail = false;
>  
> -		list_add_tail(&vpmem->req_list, &req->list);
> +		list_add_tail(&req->list, &vpmem->req_list);
>  		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
>  
>  		/* When host has read buffer, this completes via host_ack */
>  		wait_event(req->wq_buf, req->wq_buf_avail);
>  		spin_lock_irqsave(&vpmem->pmem_lock, flags);
>  	}
> +
> +	/*
> +	 * virtqueue_add_sgs failed with error different than -ENOSPC, we can't
> +	 * do anything about that.
> +	 */
> +	if (err) {
> +		dev_info(&vdev->dev, "failed to send command to virtio pmem device, error
> code %d\n", err);
> +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +		err = -EIO;
> +		goto ret;
> +	}
>  	err = virtqueue_kick(vpmem->req_vq);
>  	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> 
> 
> Let me know if it looks reasonable to you.

Don't think this is fulfilling entire logic correctly. But thanks, I spotted a bug in my code :)
Will fix it. 

> 
> Thank you,
> Jakub Staron
> 
> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-08 11:12       ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08 11:12 UTC (permalink / raw)
  To: Jakub Staroń
  Cc: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs, jack, mst,
	jasowang, david, lcapitulino, adilger kernel, zwisler, aarcange,
	dave jiang, darrick wong, vishal l verma, david, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, yuval shaia, stefanha,
	pbonzini, dan j williams, kwolf, tytso, xiaoguangrong eric,
	cohuck, rjw, imammedo, smbarber


> 
> On 4/25/19 10:00 PM, Pankaj Gupta wrote:
> 
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +	unsigned int len;
> > +	unsigned long flags;
> > +	struct virtio_pmem_request *req, *req_buf;
> > +	struct virtio_pmem *vpmem = vq->vdev->priv;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > +		req->done = true;
> > +		wake_up(&req->host_acked);
> > +
> > +		if (!list_empty(&vpmem->req_list)) {
> > +			req_buf = list_first_entry(&vpmem->req_list,
> > +					struct virtio_pmem_request, list);
> > +			list_del(&vpmem->req_list);
> 
> Shouldn't it be rather `list_del(vpmem->req_list.next)`? We are trying to
> unlink
> first element of the list and `vpmem->req_list` is just the list head.

This looks correct. We are not deleting head but first entry in 'req_list'
which is device corresponding list of pending requests.

Please see below:

/**
 * Retrieve the first list entry for the given list pointer.
 *
 * Example:
 * struct foo *first;
 * first = list_first_entry(&bar->list_of_foos, struct foo, list_of_foos);
 *
 * @param ptr The list head
 * @param type Data type of the list element to retrieve
 * @param member Member name of the struct list_head field in the list element.
 * @return A pointer to the first list element.
 */
#define list_first_entry(ptr, type, member) \
    list_entry((ptr)->next, type, member)

> 
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +	int err;
> > +	unsigned long flags;
> > +	struct scatterlist *sgs[2], sg, ret;
> > +	struct virtio_device *vdev = nd_region->provider_data;
> > +	struct virtio_pmem *vpmem = vdev->priv;
> > +	struct virtio_pmem_request *req;
> > +
> > +	might_sleep();
> > +	req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +	if (!req)
> > +		return -ENOMEM;
> > +
> > +	req->done = req->wq_buf_avail = false;
> > +	strcpy(req->name, "FLUSH");
> > +	init_waitqueue_head(&req->host_acked);
> > +	init_waitqueue_head(&req->wq_buf);
> > +	sg_init_one(&sg, req->name, strlen(req->name));
> > +	sgs[0] = &sg;
> > +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > +	sgs[1] = &ret;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > +	if (err) {
> > +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> > +
> > +		list_add_tail(&vpmem->req_list, &req->list);
> > +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +		/* When host has read buffer, this completes via host_ack */
> > +		wait_event(req->wq_buf, req->wq_buf_avail);
> > +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	}
> 
> Aren't the arguments in `list_add_tail` swapped? The element we are adding

No, this is intentional. 'vpmem->req_list' maintains a list of pending requests
for entire pmem device.  'req->list'is per request list and maintains pending
request on virtio queue add failure. I think we don't need this list.

> should
> be first, the list should be second. Also, shouldn't we resubmit the request
> after
> waking up from `wait_event(req->wq_buf, req->wq_buf_avail)`?

Yes. we should. Good point.

> 
> I propose rewriting it like that:
> 
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> index 66b582f751a3..ff0556b04e86 100644
> --- a/drivers/nvdimm/virtio_pmem.c
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -25,7 +25,7 @@ void host_ack(struct virtqueue *vq)
>  		if (!list_empty(&vpmem->req_list)) {
>  			req_buf = list_first_entry(&vpmem->req_list,
>  					struct virtio_pmem_request, list);
> -			list_del(&vpmem->req_list);
> +			list_del(vpmem->req_list.next);

Don't think its correct.

>  			req_buf->wq_buf_avail = true;
>  			wake_up(&req_buf->wq_buf);
>  		}
> @@ -59,17 +59,33 @@ int virtio_pmem_flush(struct nd_region *nd_region)
>  	sgs[1] = &ret;
>  
>  	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> -	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> -	if (err) {
> -		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +	/*
> +	 * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual queue does not
> +	 * have free descriptor slots. We add the request to req_list and wait
> +	 * for host_ack to wake us up when free slots are available.
> +	 */
> +	while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC))
> == -ENOSPC) {
> +		dev_err(&vdev->dev, "failed to send command to virtio pmem device, no free
> slots in the virtqueue, postponing request\n");
> +		req->wq_buf_avail = false;
>  
> -		list_add_tail(&vpmem->req_list, &req->list);
> +		list_add_tail(&req->list, &vpmem->req_list);
>  		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
>  
>  		/* When host has read buffer, this completes via host_ack */
>  		wait_event(req->wq_buf, req->wq_buf_avail);
>  		spin_lock_irqsave(&vpmem->pmem_lock, flags);
>  	}
> +
> +	/*
> +	 * virtqueue_add_sgs failed with error different than -ENOSPC, we can't
> +	 * do anything about that.
> +	 */
> +	if (err) {
> +		dev_info(&vdev->dev, "failed to send command to virtio pmem device, error
> code %d\n", err);
> +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +		err = -EIO;
> +		goto ret;
> +	}
>  	err = virtqueue_kick(vpmem->req_vq);
>  	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> 
> 
> Let me know if it looks reasonable to you.

Don't think this is fulfilling entire logic correctly. But thanks, I spotted a bug in my code :)
Will fix it. 

> 
> Thank you,
> Jakub Staron
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-08 11:12       ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08 11:12 UTC (permalink / raw)
  To: Jakub Staroń
  Cc: cohuck, jack, kvm, mst, jasowang, david, qemu-devel,
	virtualization, adilger kernel, smbarber, zwisler, aarcange,
	dave jiang, linux-nvdimm, vishal l verma, david, willy, hch,
	linux-acpi, jmoyer, linux-ext4, lenb, kilobyte, riel,
	yuval shaia, stefanha, imammedo, dan j williams, lcapitulino,
	kwolf, nilal, tytso, xiaoguangrong eric, darrick wong, rjw,
	linux-kernel, linux-xfs, linux-fsdevel, pbonzini


> 
> On 4/25/19 10:00 PM, Pankaj Gupta wrote:
> 
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +	unsigned int len;
> > +	unsigned long flags;
> > +	struct virtio_pmem_request *req, *req_buf;
> > +	struct virtio_pmem *vpmem = vq->vdev->priv;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > +		req->done = true;
> > +		wake_up(&req->host_acked);
> > +
> > +		if (!list_empty(&vpmem->req_list)) {
> > +			req_buf = list_first_entry(&vpmem->req_list,
> > +					struct virtio_pmem_request, list);
> > +			list_del(&vpmem->req_list);
> 
> Shouldn't it be rather `list_del(vpmem->req_list.next)`? We are trying to
> unlink
> first element of the list and `vpmem->req_list` is just the list head.

This looks correct. We are not deleting head but first entry in 'req_list'
which is device corresponding list of pending requests.

Please see below:

/**
 * Retrieve the first list entry for the given list pointer.
 *
 * Example:
 * struct foo *first;
 * first = list_first_entry(&bar->list_of_foos, struct foo, list_of_foos);
 *
 * @param ptr The list head
 * @param type Data type of the list element to retrieve
 * @param member Member name of the struct list_head field in the list element.
 * @return A pointer to the first list element.
 */
#define list_first_entry(ptr, type, member) \
    list_entry((ptr)->next, type, member)

> 
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +	int err;
> > +	unsigned long flags;
> > +	struct scatterlist *sgs[2], sg, ret;
> > +	struct virtio_device *vdev = nd_region->provider_data;
> > +	struct virtio_pmem *vpmem = vdev->priv;
> > +	struct virtio_pmem_request *req;
> > +
> > +	might_sleep();
> > +	req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +	if (!req)
> > +		return -ENOMEM;
> > +
> > +	req->done = req->wq_buf_avail = false;
> > +	strcpy(req->name, "FLUSH");
> > +	init_waitqueue_head(&req->host_acked);
> > +	init_waitqueue_head(&req->wq_buf);
> > +	sg_init_one(&sg, req->name, strlen(req->name));
> > +	sgs[0] = &sg;
> > +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > +	sgs[1] = &ret;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > +	if (err) {
> > +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> > +
> > +		list_add_tail(&vpmem->req_list, &req->list);
> > +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +		/* When host has read buffer, this completes via host_ack */
> > +		wait_event(req->wq_buf, req->wq_buf_avail);
> > +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	}
> 
> Aren't the arguments in `list_add_tail` swapped? The element we are adding

No, this is intentional. 'vpmem->req_list' maintains a list of pending requests
for entire pmem device.  'req->list'is per request list and maintains pending
request on virtio queue add failure. I think we don't need this list.

> should
> be first, the list should be second. Also, shouldn't we resubmit the request
> after
> waking up from `wait_event(req->wq_buf, req->wq_buf_avail)`?

Yes. we should. Good point.

> 
> I propose rewriting it like that:
> 
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> index 66b582f751a3..ff0556b04e86 100644
> --- a/drivers/nvdimm/virtio_pmem.c
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -25,7 +25,7 @@ void host_ack(struct virtqueue *vq)
>  		if (!list_empty(&vpmem->req_list)) {
>  			req_buf = list_first_entry(&vpmem->req_list,
>  					struct virtio_pmem_request, list);
> -			list_del(&vpmem->req_list);
> +			list_del(vpmem->req_list.next);

Don't think its correct.

>  			req_buf->wq_buf_avail = true;
>  			wake_up(&req_buf->wq_buf);
>  		}
> @@ -59,17 +59,33 @@ int virtio_pmem_flush(struct nd_region *nd_region)
>  	sgs[1] = &ret;
>  
>  	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> -	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> -	if (err) {
> -		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +	/*
> +	 * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual queue does not
> +	 * have free descriptor slots. We add the request to req_list and wait
> +	 * for host_ack to wake us up when free slots are available.
> +	 */
> +	while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC))
> == -ENOSPC) {
> +		dev_err(&vdev->dev, "failed to send command to virtio pmem device, no free
> slots in the virtqueue, postponing request\n");
> +		req->wq_buf_avail = false;
>  
> -		list_add_tail(&vpmem->req_list, &req->list);
> +		list_add_tail(&req->list, &vpmem->req_list);
>  		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
>  
>  		/* When host has read buffer, this completes via host_ack */
>  		wait_event(req->wq_buf, req->wq_buf_avail);
>  		spin_lock_irqsave(&vpmem->pmem_lock, flags);
>  	}
> +
> +	/*
> +	 * virtqueue_add_sgs failed with error different than -ENOSPC, we can't
> +	 * do anything about that.
> +	 */
> +	if (err) {
> +		dev_info(&vdev->dev, "failed to send command to virtio pmem device, error
> code %d\n", err);
> +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +		err = -EIO;
> +		goto ret;
> +	}
>  	err = virtqueue_kick(vpmem->req_vq);
>  	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> 
> 
> Let me know if it looks reasonable to you.

Don't think this is fulfilling entire logic correctly. But thanks, I spotted a bug in my code :)
Will fix it. 

> 
> Thank you,
> Jakub Staron
> 
> 


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-08 11:12       ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08 11:12 UTC (permalink / raw)
  To: Jakub Staroń
  Cc: cohuck, jack, kvm, mst, david, qemu-devel, virtualization,
	adilger kernel, smbarber, zwisler, aarcange, dave jiang,
	linux-nvdimm, vishal l verma, willy, hch, linux-acpi, jmoyer,
	linux-ext4, lenb, kilobyte, riel, yuval shaia, stefanha,
	imammedo, dan j williams, lcapitulino, nilal, tytso,
	xiaoguangrong eric, darrick wong, rjw, linux-kernel, linux-xfs


> 
> On 4/25/19 10:00 PM, Pankaj Gupta wrote:
> 
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +	unsigned int len;
> > +	unsigned long flags;
> > +	struct virtio_pmem_request *req, *req_buf;
> > +	struct virtio_pmem *vpmem = vq->vdev->priv;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > +		req->done = true;
> > +		wake_up(&req->host_acked);
> > +
> > +		if (!list_empty(&vpmem->req_list)) {
> > +			req_buf = list_first_entry(&vpmem->req_list,
> > +					struct virtio_pmem_request, list);
> > +			list_del(&vpmem->req_list);
> 
> Shouldn't it be rather `list_del(vpmem->req_list.next)`? We are trying to
> unlink
> first element of the list and `vpmem->req_list` is just the list head.

This looks correct. We are not deleting head but first entry in 'req_list'
which is device corresponding list of pending requests.

Please see below:

/**
 * Retrieve the first list entry for the given list pointer.
 *
 * Example:
 * struct foo *first;
 * first = list_first_entry(&bar->list_of_foos, struct foo, list_of_foos);
 *
 * @param ptr The list head
 * @param type Data type of the list element to retrieve
 * @param member Member name of the struct list_head field in the list element.
 * @return A pointer to the first list element.
 */
#define list_first_entry(ptr, type, member) \
    list_entry((ptr)->next, type, member)

> 
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +	int err;
> > +	unsigned long flags;
> > +	struct scatterlist *sgs[2], sg, ret;
> > +	struct virtio_device *vdev = nd_region->provider_data;
> > +	struct virtio_pmem *vpmem = vdev->priv;
> > +	struct virtio_pmem_request *req;
> > +
> > +	might_sleep();
> > +	req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +	if (!req)
> > +		return -ENOMEM;
> > +
> > +	req->done = req->wq_buf_avail = false;
> > +	strcpy(req->name, "FLUSH");
> > +	init_waitqueue_head(&req->host_acked);
> > +	init_waitqueue_head(&req->wq_buf);
> > +	sg_init_one(&sg, req->name, strlen(req->name));
> > +	sgs[0] = &sg;
> > +	sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > +	sgs[1] = &ret;
> > +
> > +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > +	if (err) {
> > +		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> > +
> > +		list_add_tail(&vpmem->req_list, &req->list);
> > +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +		/* When host has read buffer, this completes via host_ack */
> > +		wait_event(req->wq_buf, req->wq_buf_avail);
> > +		spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +	}
> 
> Aren't the arguments in `list_add_tail` swapped? The element we are adding

No, this is intentional. 'vpmem->req_list' maintains a list of pending requests
for entire pmem device.  'req->list'is per request list and maintains pending
request on virtio queue add failure. I think we don't need this list.

> should
> be first, the list should be second. Also, shouldn't we resubmit the request
> after
> waking up from `wait_event(req->wq_buf, req->wq_buf_avail)`?

Yes. we should. Good point.

> 
> I propose rewriting it like that:
> 
> diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> index 66b582f751a3..ff0556b04e86 100644
> --- a/drivers/nvdimm/virtio_pmem.c
> +++ b/drivers/nvdimm/virtio_pmem.c
> @@ -25,7 +25,7 @@ void host_ack(struct virtqueue *vq)
>  		if (!list_empty(&vpmem->req_list)) {
>  			req_buf = list_first_entry(&vpmem->req_list,
>  					struct virtio_pmem_request, list);
> -			list_del(&vpmem->req_list);
> +			list_del(vpmem->req_list.next);

Don't think its correct.

>  			req_buf->wq_buf_avail = true;
>  			wake_up(&req_buf->wq_buf);
>  		}
> @@ -59,17 +59,33 @@ int virtio_pmem_flush(struct nd_region *nd_region)
>  	sgs[1] = &ret;
>  
>  	spin_lock_irqsave(&vpmem->pmem_lock, flags);
> -	err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> -	if (err) {
> -		dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> +	/*
> +	 * If virtqueue_add_sgs returns -ENOSPC then req_vq virtual queue does not
> +	 * have free descriptor slots. We add the request to req_list and wait
> +	 * for host_ack to wake us up when free slots are available.
> +	 */
> +	while ((err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC))
> == -ENOSPC) {
> +		dev_err(&vdev->dev, "failed to send command to virtio pmem device, no free
> slots in the virtqueue, postponing request\n");
> +		req->wq_buf_avail = false;
>  
> -		list_add_tail(&vpmem->req_list, &req->list);
> +		list_add_tail(&req->list, &vpmem->req_list);
>  		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
>  
>  		/* When host has read buffer, this completes via host_ack */
>  		wait_event(req->wq_buf, req->wq_buf_avail);
>  		spin_lock_irqsave(&vpmem->pmem_lock, flags);
>  	}
> +
> +	/*
> +	 * virtqueue_add_sgs failed with error different than -ENOSPC, we can't
> +	 * do anything about that.
> +	 */
> +	if (err) {
> +		dev_info(&vdev->dev, "failed to send command to virtio pmem device, error
> code %d\n", err);
> +		spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> +		err = -EIO;
> +		goto ret;
> +	}
>  	err = virtqueue_kick(vpmem->req_vq);
>  	spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> 
> 
> Let me know if it looks reasonable to you.

Don't think this is fulfilling entire logic correctly. But thanks, I spotted a bug in my code :)
Will fix it. 

> 
> Thank you,
> Jakub Staron
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-05-07 15:35     ` Dan Williams
  (?)
@ 2019-05-08 11:19         ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08 11:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang, david,
	Qemu Developers,
	virtualization-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Andreas Dilger, Ross Zwisler, Andrea Arcangeli, linux-nvdimm,
	David Hildenbrand, Matthew Wilcox, Christoph Hellwig, Linux ACPI,
	linux-ext4, Len Brown, kilobyte-b9QjgO8OEXPVItvQsEIGlw,
	Rik van Riel, yuval shaia, Stefan Hajnoczi, Igor Mammedov,
	lcapitulino-H+wXaHxf7aLQT0dZR+AlfA, Kevin Wolf,
	Nitesh Narayan Lal


Hi Dan,

Thank you for the review. Please see my reply inline.

> 
> Hi Pankaj,
> 
> Some minor file placement comments below.

Sure.

> 
> On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> >
> > This patch adds virtio-pmem driver for KVM guest.
> >
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> >
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> >
> > Signed-off-by: Pankaj Gupta <pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> > ---
> >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> >  drivers/virtio/Kconfig           |  10 +++
> >  drivers/virtio/Makefile          |   1 +
> >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  10 +++
> >  7 files changed, 314 insertions(+)
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 drivers/virtio/pmem.c
> >  create mode 100644 include/linux/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> >
> > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > new file mode 100644
> > index 000000000000..66b582f751a3
> > --- /dev/null
> > +++ b/drivers/nvdimm/virtio_pmem.c
> > @@ -0,0 +1,114 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include "nd.h"
> > +
> > + /* The interrupt handler */
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +       unsigned int len;
> > +       unsigned long flags;
> > +       struct virtio_pmem_request *req, *req_buf;
> > +       struct virtio_pmem *vpmem = vq->vdev->priv;
> > +
> > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > +               req->done = true;
> > +               wake_up(&req->host_acked);
> > +
> > +               if (!list_empty(&vpmem->req_list)) {
> > +                       req_buf = list_first_entry(&vpmem->req_list,
> > +                                       struct virtio_pmem_request, list);
> > +                       list_del(&vpmem->req_list);
> > +                       req_buf->wq_buf_avail = true;
> > +                       wake_up(&req_buf->wq_buf);
> > +               }
> > +       }
> > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(host_ack);
> > +
> > + /* The request submission function */
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +       int err;
> > +       unsigned long flags;
> > +       struct scatterlist *sgs[2], sg, ret;
> > +       struct virtio_device *vdev = nd_region->provider_data;
> > +       struct virtio_pmem *vpmem = vdev->priv;
> > +       struct virtio_pmem_request *req;
> > +
> > +       might_sleep();
> > +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +       if (!req)
> > +               return -ENOMEM;
> > +
> > +       req->done = req->wq_buf_avail = false;
> > +       strcpy(req->name, "FLUSH");
> > +       init_waitqueue_head(&req->host_acked);
> > +       init_waitqueue_head(&req->wq_buf);
> > +       sg_init_one(&sg, req->name, strlen(req->name));
> > +       sgs[0] = &sg;
> > +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > +       sgs[1] = &ret;
> > +
> > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > +       if (err) {
> > +               dev_err(&vdev->dev, "failed to send command to virtio pmem
> > device\n");
> > +
> > +               list_add_tail(&vpmem->req_list, &req->list);
> > +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +               /* When host has read buffer, this completes via host_ack
> > */
> > +               wait_event(req->wq_buf, req->wq_buf_avail);
> > +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +       }
> > +       err = virtqueue_kick(vpmem->req_vq);
> > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +       if (!err) {
> > +               err = -EIO;
> > +               goto ret;
> > +       }
> > +       /* When host has read buffer, this completes via host_ack */
> > +       wait_event(req->host_acked, req->done);
> > +       err = req->ret;
> > +ret:
> > +       kfree(req);
> > +       return err;
> > +};
> > +
> > + /* The asynchronous flush callback function */
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > +{
> > +       int rc = 0;
> > +
> > +       /* Create child bio for asynchronous flush and chain with
> > +        * parent bio. Otherwise directly call nd_region flush.
> > +        */
> > +       if (bio && bio->bi_iter.bi_sector != -1) {
> > +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > +
> > +               if (!child)
> > +                       return -ENOMEM;
> > +               bio_copy_dev(child, bio);
> > +               child->bi_opf = REQ_PREFLUSH;
> > +               child->bi_iter.bi_sector = -1;
> > +               bio_chain(child, bio);
> > +               submit_bio(child);
> > +       } else {
> > +               if (virtio_pmem_flush(nd_region))
> > +                       rc = -EIO;
> > +       }
> > +
> > +       return rc;
> > +};
> > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > +MODULE_LICENSE("GPL");
> > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > index 35897649c24f..9f634a2ed638 100644
> > --- a/drivers/virtio/Kconfig
> > +++ b/drivers/virtio/Kconfig
> > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> >
> >           If unsure, say Y.
> >
> > +config VIRTIO_PMEM
> > +       tristate "Support for virtio pmem driver"
> > +       depends on VIRTIO
> > +       depends on LIBNVDIMM
> > +       help
> > +       This driver provides support for virtio based flushing interface
> > +       for persistent memory range.
> > +
> > +       If unsure, say M.
> > +
> >  config VIRTIO_BALLOON
> >         tristate "Virtio balloon driver"
> >         depends on VIRTIO
> > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > index 3a2b5c5dcf46..143ce91eabe9 100644
> > --- a/drivers/virtio/Makefile
> > +++ b/drivers/virtio/Makefile
> > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > new file mode 100644
> > index 000000000000..309788628e41
> > --- /dev/null
> > +++ b/drivers/virtio/pmem.c
> 
> It's not clear to me why this driver is located in drivers/virtio/

Like other VIRTIO drivers, I placed it initially in drivers/virtio directory.

> 
> > @@ -0,0 +1,118 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and registers the virtual pmem device
> > + * with libnvdimm core.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include <../../drivers/nvdimm/nd.h>
> 
> ...especially because it seems to require nvdimm internals.
> 
> However I don't see why that header is included.

Removed.

> 
> In any event lets move this to drivers/nvdimm/virtio.c to live
> alongside the other generic bus provider drivers/nvdimm/e820.c.

o.k. Makes sense.

> 
> > +
> > +static struct virtio_device_id id_table[] = {
> > +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > +       { 0 },
> > +};
> > +
> > + /* Initialize virt queue */
> > +static int init_vq(struct virtio_pmem *vpmem)
> > +{
> > +       /* single vq */
> > +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > +                               host_ack, "flush_queue");
> > +       if (IS_ERR(vpmem->req_vq))
> > +               return PTR_ERR(vpmem->req_vq);
> > +
> > +       spin_lock_init(&vpmem->pmem_lock);
> > +       INIT_LIST_HEAD(&vpmem->req_list);
> > +
> > +       return 0;
> > +};
> > +
> > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > +{
> > +       int err = 0;
> > +       struct resource res;
> > +       struct virtio_pmem *vpmem;
> > +       struct nd_region_desc ndr_desc = {};
> > +       int nid = dev_to_node(&vdev->dev);
> > +       struct nd_region *nd_region;
> > +
> > +       if (!vdev->config->get) {
> > +               dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > +                       __func__);
> > +               return -EINVAL;
> > +       }
> > +
> > +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > +       if (!vpmem) {
> > +               err = -ENOMEM;
> > +               goto out_err;
> > +       }
> > +
> > +       vpmem->vdev = vdev;
> > +       vdev->priv = vpmem;
> > +       err = init_vq(vpmem);
> > +       if (err)
> > +               goto out_err;
> > +
> > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +                       start, &vpmem->start);
> > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +                       size, &vpmem->size);
> > +
> > +       res.start = vpmem->start;
> > +       res.end   = vpmem->start + vpmem->size-1;
> > +       vpmem->nd_desc.provider_name = "virtio-pmem";
> > +       vpmem->nd_desc.module = THIS_MODULE;
> > +
> > +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > +                                               &vpmem->nd_desc);
> > +       if (!vpmem->nvdimm_bus)
> > +               goto out_vq;
> > +
> > +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > +
> > +       ndr_desc.res = &res;
> > +       ndr_desc.numa_node = nid;
> > +       ndr_desc.flush = async_pmem_flush;
> > +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus,
> > &ndr_desc);
> > +
> > +       if (!nd_region)
> > +               goto out_nd;
> > +       nd_region->provider_data =  dev_to_virtio
> > +                                       (nd_region->dev.parent->parent);
> > +       return 0;
> > +out_nd:
> > +       err = -ENXIO;
> > +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > +out_vq:
> > +       vdev->config->del_vqs(vdev);
> > +out_err:
> > +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > +       return err;
> > +}
> > +
> > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > +{
> > +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > +
> > +       nvdimm_bus_unregister(nvdimm_bus);
> > +       vdev->config->del_vqs(vdev);
> > +       vdev->config->reset(vdev);
> > +}
> > +
> > +static struct virtio_driver virtio_pmem_driver = {
> > +       .driver.name            = KBUILD_MODNAME,
> > +       .driver.owner           = THIS_MODULE,
> > +       .id_table               = id_table,
> > +       .probe                  = virtio_pmem_probe,
> > +       .remove                 = virtio_pmem_remove,
> > +};
> > +
> > +module_virtio_driver(virtio_pmem_driver);
> > +MODULE_DEVICE_TABLE(virtio, id_table);
> > +MODULE_DESCRIPTION("Virtio pmem driver");
> > +MODULE_LICENSE("GPL");
> > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..ab1da877575d
> > --- /dev/null
> > +++ b/include/linux/virtio_pmem.h
> 
> Why is this a global header?

This is where other virtio driver headers are also placed.
I think this is to access uapi config file in :

./include/uapi/linux/virtio_pmem.h

Is it okay if we keep 'virtio_pmem.h' in global header?
  
> 
> Seems it can move to drivers/nvdimm/virtio.h.
> 
> Also, I'd like to get a virtio ack from Michael (mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org)
> before taking this through the nvdimm tree.

Sure, Will post v8 with the suggestions.

Thanks,
Pankaj

> 
> > @@ -0,0 +1,60 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * virtio_pmem.h: virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + **/
> > +
> > +#ifndef _LINUX_VIRTIO_PMEM_H
> > +#define _LINUX_VIRTIO_PMEM_H
> > +
> > +#include <linux/virtio_ids.h>
> > +#include <linux/module.h>
> > +#include <linux/virtio_config.h>
> > +#include <uapi/linux/virtio_pmem.h>
> > +#include <linux/libnvdimm.h>
> > +#include <linux/spinlock.h>
> > +
> > +struct virtio_pmem_request {
> > +       /* Host return status corresponding to flush request */
> > +       int ret;
> > +
> > +       /* command name*/
> > +       char name[16];
> > +
> > +       /* Wait queue to process deferred work after ack from host */
> > +       wait_queue_head_t host_acked;
> > +       bool done;
> > +
> > +       /* Wait queue to process deferred work after virt queue buffer
> > avail */
> > +       wait_queue_head_t wq_buf;
> > +       bool wq_buf_avail;
> > +       struct list_head list;
> > +};
> > +
> > +struct virtio_pmem {
> > +       struct virtio_device *vdev;
> > +
> > +       /* Virtio pmem request queue */
> > +       struct virtqueue *req_vq;
> > +
> > +       /* nvdimm bus registers virtio pmem device */
> > +       struct nvdimm_bus *nvdimm_bus;
> > +       struct nvdimm_bus_descriptor nd_desc;
> > +
> > +       /* List to store deferred work if virtqueue is full */
> > +       struct list_head req_list;
> > +
> > +       /* Synchronize virtqueue data */
> > +       spinlock_t pmem_lock;
> > +
> > +       /* Memory region information */
> > +       uint64_t start;
> > +       uint64_t size;
> > +};
> > +
> > +void host_ack(struct virtqueue *vq);
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> > +#endif
> > diff --git a/include/uapi/linux/virtio_ids.h
> > b/include/uapi/linux/virtio_ids.h
> > index 6d5c3b2d4f4d..32b2f94d1f58 100644
> > --- a/include/uapi/linux/virtio_ids.h
> > +++ b/include/uapi/linux/virtio_ids.h
> > @@ -43,5 +43,6 @@
> >  #define VIRTIO_ID_INPUT        18 /* virtio input */
> >  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
> >  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> > +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
> >
> >  #endif /* _LINUX_VIRTIO_IDS_H */
> > diff --git a/include/uapi/linux/virtio_pmem.h
> > b/include/uapi/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..fa3f7d52717a
> > --- /dev/null
> > +++ b/include/uapi/linux/virtio_pmem.h
> > @@ -0,0 +1,10 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> > +#define _UAPI_LINUX_VIRTIO_PMEM_H
> > +
> > +struct virtio_pmem_config {
> > +       __le64 start;
> > +       __le64 size;
> > +};
> > +#endif
> > --
> > 2.20.1
> >
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-08 11:19         ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08 11:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: cohuck, Jan Kara, KVM list, Jason Wang, david,
	Michael S. Tsirkin, Qemu Developers, virtualization,
	Andreas Dilger, Ross Zwisler, Andrea Arcangeli, Dave Jiang,
	linux-nvdimm, Vishal L Verma, David Hildenbrand, Matthew Wilcox,
	Christoph Hellwig, Linux ACPI, jmoyer, linux-ext4, Len Brown,
	kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Paolo Bonzini, lcapitulino, Kevin Wolf, Nitesh Narayan Lal,
	Theodore Ts'o, Xiao Guangrong, Darrick J. Wong,
	Rafael J. Wysocki, Linux Kernel Mailing List, linux-xfs,
	linux-fsdevel, Igor Mammedov


Hi Dan,

Thank you for the review. Please see my reply inline.

> 
> Hi Pankaj,
> 
> Some minor file placement comments below.

Sure.

> 
> On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> >
> > This patch adds virtio-pmem driver for KVM guest.
> >
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> >
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> >
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> >  drivers/virtio/Kconfig           |  10 +++
> >  drivers/virtio/Makefile          |   1 +
> >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  10 +++
> >  7 files changed, 314 insertions(+)
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 drivers/virtio/pmem.c
> >  create mode 100644 include/linux/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> >
> > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > new file mode 100644
> > index 000000000000..66b582f751a3
> > --- /dev/null
> > +++ b/drivers/nvdimm/virtio_pmem.c
> > @@ -0,0 +1,114 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include "nd.h"
> > +
> > + /* The interrupt handler */
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +       unsigned int len;
> > +       unsigned long flags;
> > +       struct virtio_pmem_request *req, *req_buf;
> > +       struct virtio_pmem *vpmem = vq->vdev->priv;
> > +
> > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > +               req->done = true;
> > +               wake_up(&req->host_acked);
> > +
> > +               if (!list_empty(&vpmem->req_list)) {
> > +                       req_buf = list_first_entry(&vpmem->req_list,
> > +                                       struct virtio_pmem_request, list);
> > +                       list_del(&vpmem->req_list);
> > +                       req_buf->wq_buf_avail = true;
> > +                       wake_up(&req_buf->wq_buf);
> > +               }
> > +       }
> > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(host_ack);
> > +
> > + /* The request submission function */
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +       int err;
> > +       unsigned long flags;
> > +       struct scatterlist *sgs[2], sg, ret;
> > +       struct virtio_device *vdev = nd_region->provider_data;
> > +       struct virtio_pmem *vpmem = vdev->priv;
> > +       struct virtio_pmem_request *req;
> > +
> > +       might_sleep();
> > +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +       if (!req)
> > +               return -ENOMEM;
> > +
> > +       req->done = req->wq_buf_avail = false;
> > +       strcpy(req->name, "FLUSH");
> > +       init_waitqueue_head(&req->host_acked);
> > +       init_waitqueue_head(&req->wq_buf);
> > +       sg_init_one(&sg, req->name, strlen(req->name));
> > +       sgs[0] = &sg;
> > +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > +       sgs[1] = &ret;
> > +
> > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > +       if (err) {
> > +               dev_err(&vdev->dev, "failed to send command to virtio pmem
> > device\n");
> > +
> > +               list_add_tail(&vpmem->req_list, &req->list);
> > +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +               /* When host has read buffer, this completes via host_ack
> > */
> > +               wait_event(req->wq_buf, req->wq_buf_avail);
> > +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +       }
> > +       err = virtqueue_kick(vpmem->req_vq);
> > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +       if (!err) {
> > +               err = -EIO;
> > +               goto ret;
> > +       }
> > +       /* When host has read buffer, this completes via host_ack */
> > +       wait_event(req->host_acked, req->done);
> > +       err = req->ret;
> > +ret:
> > +       kfree(req);
> > +       return err;
> > +};
> > +
> > + /* The asynchronous flush callback function */
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > +{
> > +       int rc = 0;
> > +
> > +       /* Create child bio for asynchronous flush and chain with
> > +        * parent bio. Otherwise directly call nd_region flush.
> > +        */
> > +       if (bio && bio->bi_iter.bi_sector != -1) {
> > +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > +
> > +               if (!child)
> > +                       return -ENOMEM;
> > +               bio_copy_dev(child, bio);
> > +               child->bi_opf = REQ_PREFLUSH;
> > +               child->bi_iter.bi_sector = -1;
> > +               bio_chain(child, bio);
> > +               submit_bio(child);
> > +       } else {
> > +               if (virtio_pmem_flush(nd_region))
> > +                       rc = -EIO;
> > +       }
> > +
> > +       return rc;
> > +};
> > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > +MODULE_LICENSE("GPL");
> > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > index 35897649c24f..9f634a2ed638 100644
> > --- a/drivers/virtio/Kconfig
> > +++ b/drivers/virtio/Kconfig
> > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> >
> >           If unsure, say Y.
> >
> > +config VIRTIO_PMEM
> > +       tristate "Support for virtio pmem driver"
> > +       depends on VIRTIO
> > +       depends on LIBNVDIMM
> > +       help
> > +       This driver provides support for virtio based flushing interface
> > +       for persistent memory range.
> > +
> > +       If unsure, say M.
> > +
> >  config VIRTIO_BALLOON
> >         tristate "Virtio balloon driver"
> >         depends on VIRTIO
> > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > index 3a2b5c5dcf46..143ce91eabe9 100644
> > --- a/drivers/virtio/Makefile
> > +++ b/drivers/virtio/Makefile
> > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > new file mode 100644
> > index 000000000000..309788628e41
> > --- /dev/null
> > +++ b/drivers/virtio/pmem.c
> 
> It's not clear to me why this driver is located in drivers/virtio/

Like other VIRTIO drivers, I placed it initially in drivers/virtio directory.

> 
> > @@ -0,0 +1,118 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and registers the virtual pmem device
> > + * with libnvdimm core.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include <../../drivers/nvdimm/nd.h>
> 
> ...especially because it seems to require nvdimm internals.
> 
> However I don't see why that header is included.

Removed.

> 
> In any event lets move this to drivers/nvdimm/virtio.c to live
> alongside the other generic bus provider drivers/nvdimm/e820.c.

o.k. Makes sense.

> 
> > +
> > +static struct virtio_device_id id_table[] = {
> > +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > +       { 0 },
> > +};
> > +
> > + /* Initialize virt queue */
> > +static int init_vq(struct virtio_pmem *vpmem)
> > +{
> > +       /* single vq */
> > +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > +                               host_ack, "flush_queue");
> > +       if (IS_ERR(vpmem->req_vq))
> > +               return PTR_ERR(vpmem->req_vq);
> > +
> > +       spin_lock_init(&vpmem->pmem_lock);
> > +       INIT_LIST_HEAD(&vpmem->req_list);
> > +
> > +       return 0;
> > +};
> > +
> > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > +{
> > +       int err = 0;
> > +       struct resource res;
> > +       struct virtio_pmem *vpmem;
> > +       struct nd_region_desc ndr_desc = {};
> > +       int nid = dev_to_node(&vdev->dev);
> > +       struct nd_region *nd_region;
> > +
> > +       if (!vdev->config->get) {
> > +               dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > +                       __func__);
> > +               return -EINVAL;
> > +       }
> > +
> > +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > +       if (!vpmem) {
> > +               err = -ENOMEM;
> > +               goto out_err;
> > +       }
> > +
> > +       vpmem->vdev = vdev;
> > +       vdev->priv = vpmem;
> > +       err = init_vq(vpmem);
> > +       if (err)
> > +               goto out_err;
> > +
> > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +                       start, &vpmem->start);
> > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +                       size, &vpmem->size);
> > +
> > +       res.start = vpmem->start;
> > +       res.end   = vpmem->start + vpmem->size-1;
> > +       vpmem->nd_desc.provider_name = "virtio-pmem";
> > +       vpmem->nd_desc.module = THIS_MODULE;
> > +
> > +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > +                                               &vpmem->nd_desc);
> > +       if (!vpmem->nvdimm_bus)
> > +               goto out_vq;
> > +
> > +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > +
> > +       ndr_desc.res = &res;
> > +       ndr_desc.numa_node = nid;
> > +       ndr_desc.flush = async_pmem_flush;
> > +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus,
> > &ndr_desc);
> > +
> > +       if (!nd_region)
> > +               goto out_nd;
> > +       nd_region->provider_data =  dev_to_virtio
> > +                                       (nd_region->dev.parent->parent);
> > +       return 0;
> > +out_nd:
> > +       err = -ENXIO;
> > +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > +out_vq:
> > +       vdev->config->del_vqs(vdev);
> > +out_err:
> > +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > +       return err;
> > +}
> > +
> > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > +{
> > +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > +
> > +       nvdimm_bus_unregister(nvdimm_bus);
> > +       vdev->config->del_vqs(vdev);
> > +       vdev->config->reset(vdev);
> > +}
> > +
> > +static struct virtio_driver virtio_pmem_driver = {
> > +       .driver.name            = KBUILD_MODNAME,
> > +       .driver.owner           = THIS_MODULE,
> > +       .id_table               = id_table,
> > +       .probe                  = virtio_pmem_probe,
> > +       .remove                 = virtio_pmem_remove,
> > +};
> > +
> > +module_virtio_driver(virtio_pmem_driver);
> > +MODULE_DEVICE_TABLE(virtio, id_table);
> > +MODULE_DESCRIPTION("Virtio pmem driver");
> > +MODULE_LICENSE("GPL");
> > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..ab1da877575d
> > --- /dev/null
> > +++ b/include/linux/virtio_pmem.h
> 
> Why is this a global header?

This is where other virtio driver headers are also placed.
I think this is to access uapi config file in :

./include/uapi/linux/virtio_pmem.h

Is it okay if we keep 'virtio_pmem.h' in global header?
  
> 
> Seems it can move to drivers/nvdimm/virtio.h.
> 
> Also, I'd like to get a virtio ack from Michael (mst@redhat.com)
> before taking this through the nvdimm tree.

Sure, Will post v8 with the suggestions.

Thanks,
Pankaj

> 
> > @@ -0,0 +1,60 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * virtio_pmem.h: virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + **/
> > +
> > +#ifndef _LINUX_VIRTIO_PMEM_H
> > +#define _LINUX_VIRTIO_PMEM_H
> > +
> > +#include <linux/virtio_ids.h>
> > +#include <linux/module.h>
> > +#include <linux/virtio_config.h>
> > +#include <uapi/linux/virtio_pmem.h>
> > +#include <linux/libnvdimm.h>
> > +#include <linux/spinlock.h>
> > +
> > +struct virtio_pmem_request {
> > +       /* Host return status corresponding to flush request */
> > +       int ret;
> > +
> > +       /* command name*/
> > +       char name[16];
> > +
> > +       /* Wait queue to process deferred work after ack from host */
> > +       wait_queue_head_t host_acked;
> > +       bool done;
> > +
> > +       /* Wait queue to process deferred work after virt queue buffer
> > avail */
> > +       wait_queue_head_t wq_buf;
> > +       bool wq_buf_avail;
> > +       struct list_head list;
> > +};
> > +
> > +struct virtio_pmem {
> > +       struct virtio_device *vdev;
> > +
> > +       /* Virtio pmem request queue */
> > +       struct virtqueue *req_vq;
> > +
> > +       /* nvdimm bus registers virtio pmem device */
> > +       struct nvdimm_bus *nvdimm_bus;
> > +       struct nvdimm_bus_descriptor nd_desc;
> > +
> > +       /* List to store deferred work if virtqueue is full */
> > +       struct list_head req_list;
> > +
> > +       /* Synchronize virtqueue data */
> > +       spinlock_t pmem_lock;
> > +
> > +       /* Memory region information */
> > +       uint64_t start;
> > +       uint64_t size;
> > +};
> > +
> > +void host_ack(struct virtqueue *vq);
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> > +#endif
> > diff --git a/include/uapi/linux/virtio_ids.h
> > b/include/uapi/linux/virtio_ids.h
> > index 6d5c3b2d4f4d..32b2f94d1f58 100644
> > --- a/include/uapi/linux/virtio_ids.h
> > +++ b/include/uapi/linux/virtio_ids.h
> > @@ -43,5 +43,6 @@
> >  #define VIRTIO_ID_INPUT        18 /* virtio input */
> >  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
> >  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> > +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
> >
> >  #endif /* _LINUX_VIRTIO_IDS_H */
> > diff --git a/include/uapi/linux/virtio_pmem.h
> > b/include/uapi/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..fa3f7d52717a
> > --- /dev/null
> > +++ b/include/uapi/linux/virtio_pmem.h
> > @@ -0,0 +1,10 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> > +#define _UAPI_LINUX_VIRTIO_PMEM_H
> > +
> > +struct virtio_pmem_config {
> > +       __le64 start;
> > +       __le64 size;
> > +};
> > +#endif
> > --
> > 2.20.1
> >
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-08 11:19         ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08 11:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang, david,
	Qemu Developers, virtualization, Andreas Dilger, Ross Zwisler,
	Andrea Arcangeli, Dave Jiang, linux-nvdimm, Vishal L Verma,
	David Hildenbrand, Matthew Wilcox, Christoph Hellwig, Linux ACPI,
	jmoyer, linux-ext4, Len Brown, kilobyte, Rik van Riel,
	yuval shaia, Stefan Hajnoczi, Igor Mammedov, lcapitulino,
	Kevin Wolf, Nitesh Narayan Lal, Theodore Ts'o,
	Xiao Guangrong, cohuck, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-xfs, linux-fsdevel,
	Paolo Bonzini, Darrick J. Wong


Hi Dan,

Thank you for the review. Please see my reply inline.

> 
> Hi Pankaj,
> 
> Some minor file placement comments below.

Sure.

> 
> On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> >
> > This patch adds virtio-pmem driver for KVM guest.
> >
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> >
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> >
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> >  drivers/virtio/Kconfig           |  10 +++
> >  drivers/virtio/Makefile          |   1 +
> >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  10 +++
> >  7 files changed, 314 insertions(+)
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 drivers/virtio/pmem.c
> >  create mode 100644 include/linux/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> >
> > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > new file mode 100644
> > index 000000000000..66b582f751a3
> > --- /dev/null
> > +++ b/drivers/nvdimm/virtio_pmem.c
> > @@ -0,0 +1,114 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include "nd.h"
> > +
> > + /* The interrupt handler */
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +       unsigned int len;
> > +       unsigned long flags;
> > +       struct virtio_pmem_request *req, *req_buf;
> > +       struct virtio_pmem *vpmem = vq->vdev->priv;
> > +
> > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > +               req->done = true;
> > +               wake_up(&req->host_acked);
> > +
> > +               if (!list_empty(&vpmem->req_list)) {
> > +                       req_buf = list_first_entry(&vpmem->req_list,
> > +                                       struct virtio_pmem_request, list);
> > +                       list_del(&vpmem->req_list);
> > +                       req_buf->wq_buf_avail = true;
> > +                       wake_up(&req_buf->wq_buf);
> > +               }
> > +       }
> > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(host_ack);
> > +
> > + /* The request submission function */
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +       int err;
> > +       unsigned long flags;
> > +       struct scatterlist *sgs[2], sg, ret;
> > +       struct virtio_device *vdev = nd_region->provider_data;
> > +       struct virtio_pmem *vpmem = vdev->priv;
> > +       struct virtio_pmem_request *req;
> > +
> > +       might_sleep();
> > +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +       if (!req)
> > +               return -ENOMEM;
> > +
> > +       req->done = req->wq_buf_avail = false;
> > +       strcpy(req->name, "FLUSH");
> > +       init_waitqueue_head(&req->host_acked);
> > +       init_waitqueue_head(&req->wq_buf);
> > +       sg_init_one(&sg, req->name, strlen(req->name));
> > +       sgs[0] = &sg;
> > +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > +       sgs[1] = &ret;
> > +
> > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > +       if (err) {
> > +               dev_err(&vdev->dev, "failed to send command to virtio pmem
> > device\n");
> > +
> > +               list_add_tail(&vpmem->req_list, &req->list);
> > +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +               /* When host has read buffer, this completes via host_ack
> > */
> > +               wait_event(req->wq_buf, req->wq_buf_avail);
> > +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +       }
> > +       err = virtqueue_kick(vpmem->req_vq);
> > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +       if (!err) {
> > +               err = -EIO;
> > +               goto ret;
> > +       }
> > +       /* When host has read buffer, this completes via host_ack */
> > +       wait_event(req->host_acked, req->done);
> > +       err = req->ret;
> > +ret:
> > +       kfree(req);
> > +       return err;
> > +};
> > +
> > + /* The asynchronous flush callback function */
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > +{
> > +       int rc = 0;
> > +
> > +       /* Create child bio for asynchronous flush and chain with
> > +        * parent bio. Otherwise directly call nd_region flush.
> > +        */
> > +       if (bio && bio->bi_iter.bi_sector != -1) {
> > +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > +
> > +               if (!child)
> > +                       return -ENOMEM;
> > +               bio_copy_dev(child, bio);
> > +               child->bi_opf = REQ_PREFLUSH;
> > +               child->bi_iter.bi_sector = -1;
> > +               bio_chain(child, bio);
> > +               submit_bio(child);
> > +       } else {
> > +               if (virtio_pmem_flush(nd_region))
> > +                       rc = -EIO;
> > +       }
> > +
> > +       return rc;
> > +};
> > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > +MODULE_LICENSE("GPL");
> > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > index 35897649c24f..9f634a2ed638 100644
> > --- a/drivers/virtio/Kconfig
> > +++ b/drivers/virtio/Kconfig
> > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> >
> >           If unsure, say Y.
> >
> > +config VIRTIO_PMEM
> > +       tristate "Support for virtio pmem driver"
> > +       depends on VIRTIO
> > +       depends on LIBNVDIMM
> > +       help
> > +       This driver provides support for virtio based flushing interface
> > +       for persistent memory range.
> > +
> > +       If unsure, say M.
> > +
> >  config VIRTIO_BALLOON
> >         tristate "Virtio balloon driver"
> >         depends on VIRTIO
> > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > index 3a2b5c5dcf46..143ce91eabe9 100644
> > --- a/drivers/virtio/Makefile
> > +++ b/drivers/virtio/Makefile
> > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > new file mode 100644
> > index 000000000000..309788628e41
> > --- /dev/null
> > +++ b/drivers/virtio/pmem.c
> 
> It's not clear to me why this driver is located in drivers/virtio/

Like other VIRTIO drivers, I placed it initially in drivers/virtio directory.

> 
> > @@ -0,0 +1,118 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and registers the virtual pmem device
> > + * with libnvdimm core.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include <../../drivers/nvdimm/nd.h>
> 
> ...especially because it seems to require nvdimm internals.
> 
> However I don't see why that header is included.

Removed.

> 
> In any event lets move this to drivers/nvdimm/virtio.c to live
> alongside the other generic bus provider drivers/nvdimm/e820.c.

o.k. Makes sense.

> 
> > +
> > +static struct virtio_device_id id_table[] = {
> > +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > +       { 0 },
> > +};
> > +
> > + /* Initialize virt queue */
> > +static int init_vq(struct virtio_pmem *vpmem)
> > +{
> > +       /* single vq */
> > +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > +                               host_ack, "flush_queue");
> > +       if (IS_ERR(vpmem->req_vq))
> > +               return PTR_ERR(vpmem->req_vq);
> > +
> > +       spin_lock_init(&vpmem->pmem_lock);
> > +       INIT_LIST_HEAD(&vpmem->req_list);
> > +
> > +       return 0;
> > +};
> > +
> > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > +{
> > +       int err = 0;
> > +       struct resource res;
> > +       struct virtio_pmem *vpmem;
> > +       struct nd_region_desc ndr_desc = {};
> > +       int nid = dev_to_node(&vdev->dev);
> > +       struct nd_region *nd_region;
> > +
> > +       if (!vdev->config->get) {
> > +               dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > +                       __func__);
> > +               return -EINVAL;
> > +       }
> > +
> > +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > +       if (!vpmem) {
> > +               err = -ENOMEM;
> > +               goto out_err;
> > +       }
> > +
> > +       vpmem->vdev = vdev;
> > +       vdev->priv = vpmem;
> > +       err = init_vq(vpmem);
> > +       if (err)
> > +               goto out_err;
> > +
> > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +                       start, &vpmem->start);
> > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +                       size, &vpmem->size);
> > +
> > +       res.start = vpmem->start;
> > +       res.end   = vpmem->start + vpmem->size-1;
> > +       vpmem->nd_desc.provider_name = "virtio-pmem";
> > +       vpmem->nd_desc.module = THIS_MODULE;
> > +
> > +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > +                                               &vpmem->nd_desc);
> > +       if (!vpmem->nvdimm_bus)
> > +               goto out_vq;
> > +
> > +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > +
> > +       ndr_desc.res = &res;
> > +       ndr_desc.numa_node = nid;
> > +       ndr_desc.flush = async_pmem_flush;
> > +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus,
> > &ndr_desc);
> > +
> > +       if (!nd_region)
> > +               goto out_nd;
> > +       nd_region->provider_data =  dev_to_virtio
> > +                                       (nd_region->dev.parent->parent);
> > +       return 0;
> > +out_nd:
> > +       err = -ENXIO;
> > +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > +out_vq:
> > +       vdev->config->del_vqs(vdev);
> > +out_err:
> > +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > +       return err;
> > +}
> > +
> > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > +{
> > +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > +
> > +       nvdimm_bus_unregister(nvdimm_bus);
> > +       vdev->config->del_vqs(vdev);
> > +       vdev->config->reset(vdev);
> > +}
> > +
> > +static struct virtio_driver virtio_pmem_driver = {
> > +       .driver.name            = KBUILD_MODNAME,
> > +       .driver.owner           = THIS_MODULE,
> > +       .id_table               = id_table,
> > +       .probe                  = virtio_pmem_probe,
> > +       .remove                 = virtio_pmem_remove,
> > +};
> > +
> > +module_virtio_driver(virtio_pmem_driver);
> > +MODULE_DEVICE_TABLE(virtio, id_table);
> > +MODULE_DESCRIPTION("Virtio pmem driver");
> > +MODULE_LICENSE("GPL");
> > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..ab1da877575d
> > --- /dev/null
> > +++ b/include/linux/virtio_pmem.h
> 
> Why is this a global header?

This is where other virtio driver headers are also placed.
I think this is to access uapi config file in :

./include/uapi/linux/virtio_pmem.h

Is it okay if we keep 'virtio_pmem.h' in global header?
  
> 
> Seems it can move to drivers/nvdimm/virtio.h.
> 
> Also, I'd like to get a virtio ack from Michael (mst@redhat.com)
> before taking this through the nvdimm tree.

Sure, Will post v8 with the suggestions.

Thanks,
Pankaj

> 
> > @@ -0,0 +1,60 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * virtio_pmem.h: virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + **/
> > +
> > +#ifndef _LINUX_VIRTIO_PMEM_H
> > +#define _LINUX_VIRTIO_PMEM_H
> > +
> > +#include <linux/virtio_ids.h>
> > +#include <linux/module.h>
> > +#include <linux/virtio_config.h>
> > +#include <uapi/linux/virtio_pmem.h>
> > +#include <linux/libnvdimm.h>
> > +#include <linux/spinlock.h>
> > +
> > +struct virtio_pmem_request {
> > +       /* Host return status corresponding to flush request */
> > +       int ret;
> > +
> > +       /* command name*/
> > +       char name[16];
> > +
> > +       /* Wait queue to process deferred work after ack from host */
> > +       wait_queue_head_t host_acked;
> > +       bool done;
> > +
> > +       /* Wait queue to process deferred work after virt queue buffer
> > avail */
> > +       wait_queue_head_t wq_buf;
> > +       bool wq_buf_avail;
> > +       struct list_head list;
> > +};
> > +
> > +struct virtio_pmem {
> > +       struct virtio_device *vdev;
> > +
> > +       /* Virtio pmem request queue */
> > +       struct virtqueue *req_vq;
> > +
> > +       /* nvdimm bus registers virtio pmem device */
> > +       struct nvdimm_bus *nvdimm_bus;
> > +       struct nvdimm_bus_descriptor nd_desc;
> > +
> > +       /* List to store deferred work if virtqueue is full */
> > +       struct list_head req_list;
> > +
> > +       /* Synchronize virtqueue data */
> > +       spinlock_t pmem_lock;
> > +
> > +       /* Memory region information */
> > +       uint64_t start;
> > +       uint64_t size;
> > +};
> > +
> > +void host_ack(struct virtqueue *vq);
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> > +#endif
> > diff --git a/include/uapi/linux/virtio_ids.h
> > b/include/uapi/linux/virtio_ids.h
> > index 6d5c3b2d4f4d..32b2f94d1f58 100644
> > --- a/include/uapi/linux/virtio_ids.h
> > +++ b/include/uapi/linux/virtio_ids.h
> > @@ -43,5 +43,6 @@
> >  #define VIRTIO_ID_INPUT        18 /* virtio input */
> >  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
> >  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> > +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
> >
> >  #endif /* _LINUX_VIRTIO_IDS_H */
> > diff --git a/include/uapi/linux/virtio_pmem.h
> > b/include/uapi/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..fa3f7d52717a
> > --- /dev/null
> > +++ b/include/uapi/linux/virtio_pmem.h
> > @@ -0,0 +1,10 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> > +#define _UAPI_LINUX_VIRTIO_PMEM_H
> > +
> > +struct virtio_pmem_config {
> > +       __le64 start;
> > +       __le64 size;
> > +};
> > +#endif
> > --
> > 2.20.1
> >
> 
> 


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-05-07 15:35     ` Dan Williams
                       ` (3 preceding siblings ...)
  (?)
@ 2019-05-08 11:19     ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08 11:19 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, KVM list, Michael S. Tsirkin, david, Qemu Developers,
	virtualization, Andreas Dilger, Ross Zwisler, Andrea Arcangeli,
	Dave Jiang, linux-nvdimm, Vishal L Verma, Matthew Wilcox,
	Christoph Hellwig, Linux ACPI, jmoyer, linux-ext4, Len Brown,
	kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Igor Mammedov, lcapitulino, Nitesh Narayan Lal


Hi Dan,

Thank you for the review. Please see my reply inline.

> 
> Hi Pankaj,
> 
> Some minor file placement comments below.

Sure.

> 
> On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> >
> > This patch adds virtio-pmem driver for KVM guest.
> >
> > Guest reads the persistent memory range information from
> > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > creates a nd_region object with the persistent memory
> > range information so that existing 'nvdimm/pmem' driver
> > can reserve this into system memory map. This way
> > 'virtio-pmem' driver uses existing functionality of pmem
> > driver to register persistent memory compatible for DAX
> > capable filesystems.
> >
> > This also provides function to perform guest flush over
> > VIRTIO from 'pmem' driver when userspace performs flush
> > on DAX memory range.
> >
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > ---
> >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> >  drivers/virtio/Kconfig           |  10 +++
> >  drivers/virtio/Makefile          |   1 +
> >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> >  include/uapi/linux/virtio_ids.h  |   1 +
> >  include/uapi/linux/virtio_pmem.h |  10 +++
> >  7 files changed, 314 insertions(+)
> >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> >  create mode 100644 drivers/virtio/pmem.c
> >  create mode 100644 include/linux/virtio_pmem.h
> >  create mode 100644 include/uapi/linux/virtio_pmem.h
> >
> > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > new file mode 100644
> > index 000000000000..66b582f751a3
> > --- /dev/null
> > +++ b/drivers/nvdimm/virtio_pmem.c
> > @@ -0,0 +1,114 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include "nd.h"
> > +
> > + /* The interrupt handler */
> > +void host_ack(struct virtqueue *vq)
> > +{
> > +       unsigned int len;
> > +       unsigned long flags;
> > +       struct virtio_pmem_request *req, *req_buf;
> > +       struct virtio_pmem *vpmem = vq->vdev->priv;
> > +
> > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > +               req->done = true;
> > +               wake_up(&req->host_acked);
> > +
> > +               if (!list_empty(&vpmem->req_list)) {
> > +                       req_buf = list_first_entry(&vpmem->req_list,
> > +                                       struct virtio_pmem_request, list);
> > +                       list_del(&vpmem->req_list);
> > +                       req_buf->wq_buf_avail = true;
> > +                       wake_up(&req_buf->wq_buf);
> > +               }
> > +       }
> > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +}
> > +EXPORT_SYMBOL_GPL(host_ack);
> > +
> > + /* The request submission function */
> > +int virtio_pmem_flush(struct nd_region *nd_region)
> > +{
> > +       int err;
> > +       unsigned long flags;
> > +       struct scatterlist *sgs[2], sg, ret;
> > +       struct virtio_device *vdev = nd_region->provider_data;
> > +       struct virtio_pmem *vpmem = vdev->priv;
> > +       struct virtio_pmem_request *req;
> > +
> > +       might_sleep();
> > +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> > +       if (!req)
> > +               return -ENOMEM;
> > +
> > +       req->done = req->wq_buf_avail = false;
> > +       strcpy(req->name, "FLUSH");
> > +       init_waitqueue_head(&req->host_acked);
> > +       init_waitqueue_head(&req->wq_buf);
> > +       sg_init_one(&sg, req->name, strlen(req->name));
> > +       sgs[0] = &sg;
> > +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > +       sgs[1] = &ret;
> > +
> > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > +       if (err) {
> > +               dev_err(&vdev->dev, "failed to send command to virtio pmem
> > device\n");
> > +
> > +               list_add_tail(&vpmem->req_list, &req->list);
> > +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +               /* When host has read buffer, this completes via host_ack
> > */
> > +               wait_event(req->wq_buf, req->wq_buf_avail);
> > +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > +       }
> > +       err = virtqueue_kick(vpmem->req_vq);
> > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > +
> > +       if (!err) {
> > +               err = -EIO;
> > +               goto ret;
> > +       }
> > +       /* When host has read buffer, this completes via host_ack */
> > +       wait_event(req->host_acked, req->done);
> > +       err = req->ret;
> > +ret:
> > +       kfree(req);
> > +       return err;
> > +};
> > +
> > + /* The asynchronous flush callback function */
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > +{
> > +       int rc = 0;
> > +
> > +       /* Create child bio for asynchronous flush and chain with
> > +        * parent bio. Otherwise directly call nd_region flush.
> > +        */
> > +       if (bio && bio->bi_iter.bi_sector != -1) {
> > +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > +
> > +               if (!child)
> > +                       return -ENOMEM;
> > +               bio_copy_dev(child, bio);
> > +               child->bi_opf = REQ_PREFLUSH;
> > +               child->bi_iter.bi_sector = -1;
> > +               bio_chain(child, bio);
> > +               submit_bio(child);
> > +       } else {
> > +               if (virtio_pmem_flush(nd_region))
> > +                       rc = -EIO;
> > +       }
> > +
> > +       return rc;
> > +};
> > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > +MODULE_LICENSE("GPL");
> > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > index 35897649c24f..9f634a2ed638 100644
> > --- a/drivers/virtio/Kconfig
> > +++ b/drivers/virtio/Kconfig
> > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> >
> >           If unsure, say Y.
> >
> > +config VIRTIO_PMEM
> > +       tristate "Support for virtio pmem driver"
> > +       depends on VIRTIO
> > +       depends on LIBNVDIMM
> > +       help
> > +       This driver provides support for virtio based flushing interface
> > +       for persistent memory range.
> > +
> > +       If unsure, say M.
> > +
> >  config VIRTIO_BALLOON
> >         tristate "Virtio balloon driver"
> >         depends on VIRTIO
> > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > index 3a2b5c5dcf46..143ce91eabe9 100644
> > --- a/drivers/virtio/Makefile
> > +++ b/drivers/virtio/Makefile
> > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > new file mode 100644
> > index 000000000000..309788628e41
> > --- /dev/null
> > +++ b/drivers/virtio/pmem.c
> 
> It's not clear to me why this driver is located in drivers/virtio/

Like other VIRTIO drivers, I placed it initially in drivers/virtio directory.

> 
> > @@ -0,0 +1,118 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * virtio_pmem.c: Virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and registers the virtual pmem device
> > + * with libnvdimm core.
> > + */
> > +#include <linux/virtio_pmem.h>
> > +#include <../../drivers/nvdimm/nd.h>
> 
> ...especially because it seems to require nvdimm internals.
> 
> However I don't see why that header is included.

Removed.

> 
> In any event lets move this to drivers/nvdimm/virtio.c to live
> alongside the other generic bus provider drivers/nvdimm/e820.c.

o.k. Makes sense.

> 
> > +
> > +static struct virtio_device_id id_table[] = {
> > +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > +       { 0 },
> > +};
> > +
> > + /* Initialize virt queue */
> > +static int init_vq(struct virtio_pmem *vpmem)
> > +{
> > +       /* single vq */
> > +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > +                               host_ack, "flush_queue");
> > +       if (IS_ERR(vpmem->req_vq))
> > +               return PTR_ERR(vpmem->req_vq);
> > +
> > +       spin_lock_init(&vpmem->pmem_lock);
> > +       INIT_LIST_HEAD(&vpmem->req_list);
> > +
> > +       return 0;
> > +};
> > +
> > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > +{
> > +       int err = 0;
> > +       struct resource res;
> > +       struct virtio_pmem *vpmem;
> > +       struct nd_region_desc ndr_desc = {};
> > +       int nid = dev_to_node(&vdev->dev);
> > +       struct nd_region *nd_region;
> > +
> > +       if (!vdev->config->get) {
> > +               dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > +                       __func__);
> > +               return -EINVAL;
> > +       }
> > +
> > +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > +       if (!vpmem) {
> > +               err = -ENOMEM;
> > +               goto out_err;
> > +       }
> > +
> > +       vpmem->vdev = vdev;
> > +       vdev->priv = vpmem;
> > +       err = init_vq(vpmem);
> > +       if (err)
> > +               goto out_err;
> > +
> > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +                       start, &vpmem->start);
> > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > +                       size, &vpmem->size);
> > +
> > +       res.start = vpmem->start;
> > +       res.end   = vpmem->start + vpmem->size-1;
> > +       vpmem->nd_desc.provider_name = "virtio-pmem";
> > +       vpmem->nd_desc.module = THIS_MODULE;
> > +
> > +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > +                                               &vpmem->nd_desc);
> > +       if (!vpmem->nvdimm_bus)
> > +               goto out_vq;
> > +
> > +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > +
> > +       ndr_desc.res = &res;
> > +       ndr_desc.numa_node = nid;
> > +       ndr_desc.flush = async_pmem_flush;
> > +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus,
> > &ndr_desc);
> > +
> > +       if (!nd_region)
> > +               goto out_nd;
> > +       nd_region->provider_data =  dev_to_virtio
> > +                                       (nd_region->dev.parent->parent);
> > +       return 0;
> > +out_nd:
> > +       err = -ENXIO;
> > +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > +out_vq:
> > +       vdev->config->del_vqs(vdev);
> > +out_err:
> > +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > +       return err;
> > +}
> > +
> > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > +{
> > +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > +
> > +       nvdimm_bus_unregister(nvdimm_bus);
> > +       vdev->config->del_vqs(vdev);
> > +       vdev->config->reset(vdev);
> > +}
> > +
> > +static struct virtio_driver virtio_pmem_driver = {
> > +       .driver.name            = KBUILD_MODNAME,
> > +       .driver.owner           = THIS_MODULE,
> > +       .id_table               = id_table,
> > +       .probe                  = virtio_pmem_probe,
> > +       .remove                 = virtio_pmem_remove,
> > +};
> > +
> > +module_virtio_driver(virtio_pmem_driver);
> > +MODULE_DEVICE_TABLE(virtio, id_table);
> > +MODULE_DESCRIPTION("Virtio pmem driver");
> > +MODULE_LICENSE("GPL");
> > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..ab1da877575d
> > --- /dev/null
> > +++ b/include/linux/virtio_pmem.h
> 
> Why is this a global header?

This is where other virtio driver headers are also placed.
I think this is to access uapi config file in :

./include/uapi/linux/virtio_pmem.h

Is it okay if we keep 'virtio_pmem.h' in global header?
  
> 
> Seems it can move to drivers/nvdimm/virtio.h.
> 
> Also, I'd like to get a virtio ack from Michael (mst@redhat.com)
> before taking this through the nvdimm tree.

Sure, Will post v8 with the suggestions.

Thanks,
Pankaj

> 
> > @@ -0,0 +1,60 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * virtio_pmem.h: virtio pmem Driver
> > + *
> > + * Discovers persistent memory range information
> > + * from host and provides a virtio based flushing
> > + * interface.
> > + **/
> > +
> > +#ifndef _LINUX_VIRTIO_PMEM_H
> > +#define _LINUX_VIRTIO_PMEM_H
> > +
> > +#include <linux/virtio_ids.h>
> > +#include <linux/module.h>
> > +#include <linux/virtio_config.h>
> > +#include <uapi/linux/virtio_pmem.h>
> > +#include <linux/libnvdimm.h>
> > +#include <linux/spinlock.h>
> > +
> > +struct virtio_pmem_request {
> > +       /* Host return status corresponding to flush request */
> > +       int ret;
> > +
> > +       /* command name*/
> > +       char name[16];
> > +
> > +       /* Wait queue to process deferred work after ack from host */
> > +       wait_queue_head_t host_acked;
> > +       bool done;
> > +
> > +       /* Wait queue to process deferred work after virt queue buffer
> > avail */
> > +       wait_queue_head_t wq_buf;
> > +       bool wq_buf_avail;
> > +       struct list_head list;
> > +};
> > +
> > +struct virtio_pmem {
> > +       struct virtio_device *vdev;
> > +
> > +       /* Virtio pmem request queue */
> > +       struct virtqueue *req_vq;
> > +
> > +       /* nvdimm bus registers virtio pmem device */
> > +       struct nvdimm_bus *nvdimm_bus;
> > +       struct nvdimm_bus_descriptor nd_desc;
> > +
> > +       /* List to store deferred work if virtqueue is full */
> > +       struct list_head req_list;
> > +
> > +       /* Synchronize virtqueue data */
> > +       spinlock_t pmem_lock;
> > +
> > +       /* Memory region information */
> > +       uint64_t start;
> > +       uint64_t size;
> > +};
> > +
> > +void host_ack(struct virtqueue *vq);
> > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio);
> > +#endif
> > diff --git a/include/uapi/linux/virtio_ids.h
> > b/include/uapi/linux/virtio_ids.h
> > index 6d5c3b2d4f4d..32b2f94d1f58 100644
> > --- a/include/uapi/linux/virtio_ids.h
> > +++ b/include/uapi/linux/virtio_ids.h
> > @@ -43,5 +43,6 @@
> >  #define VIRTIO_ID_INPUT        18 /* virtio input */
> >  #define VIRTIO_ID_VSOCK        19 /* virtio vsock transport */
> >  #define VIRTIO_ID_CRYPTO       20 /* virtio crypto */
> > +#define VIRTIO_ID_PMEM         27 /* virtio pmem */
> >
> >  #endif /* _LINUX_VIRTIO_IDS_H */
> > diff --git a/include/uapi/linux/virtio_pmem.h
> > b/include/uapi/linux/virtio_pmem.h
> > new file mode 100644
> > index 000000000000..fa3f7d52717a
> > --- /dev/null
> > +++ b/include/uapi/linux/virtio_pmem.h
> > @@ -0,0 +1,10 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +
> > +#ifndef _UAPI_LINUX_VIRTIO_PMEM_H
> > +#define _UAPI_LINUX_VIRTIO_PMEM_H
> > +
> > +struct virtio_pmem_config {
> > +       __le64 start;
> > +       __le64 size;
> > +};
> > +#endif
> > --
> > 2.20.1
> >
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-05-08 11:12       ` Pankaj Gupta
  (?)
@ 2019-05-08 15:23         ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08 15:23 UTC (permalink / raw)
  To: Jakub Staroń
  Cc: cohuck, jack, kvm, mst, jasowang, david, qemu-devel,
	virtualization, adilger kernel, smbarber, zwisler, aarcange,
	linux-nvdimm, david, willy, hch, linux-acpi, linux-ext4, lenb,
	kilobyte, riel, yuval shaia, stefanha, imammedo, lcapitulino,
	kwolf, nilal, tytso, xiaoguangrong eric, darrick wong, rjw,
	linux-kernel, linux-xfs, linux-fsdevel, pbonzini


> > 
> > > +int virtio_pmem_flush(struct nd_region *nd_region)
> > > +{
> > > +        int err;
> > > +        unsigned long flags;
> > > +        struct scatterlist *sgs[2], sg, ret;
> > > +        struct virtio_device *vdev = nd_region->provider_data;
> > > +        struct virtio_pmem *vpmem = vdev->priv;
> > > +        struct virtio_pmem_request *req;
> > > +
> > > +        might_sleep();
> > > +        req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > +        if (!req)
> > > +                return -ENOMEM;
> > > +
> > > +        req->done = req->wq_buf_avail = false;
> > > +        strcpy(req->name, "FLUSH");
> > > +        init_waitqueue_head(&req->host_acked);
> > > +        init_waitqueue_head(&req->wq_buf);
> > > +        sg_init_one(&sg, req->name, strlen(req->name));
> > > +        sgs[0] = &sg;
> > > +        sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > > +        sgs[1] = &ret;
> > > +
> > > +        spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +        err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > > +        if (err) {
> > > +                dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> > > +
> > > +                list_add_tail(&vpmem->req_list, &req->list);
> > > +                spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +
> > > +                /* When host has read buffer, this completes via host_ack */
> > > +                wait_event(req->wq_buf, req->wq_buf_avail);
> > > +                spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +        }
> > 
> > Aren't the arguments in `list_add_tail` swapped? The element we are adding
> 

Yes, arguments for 'list_add_tail' should be swapped.

list_add_tail(&req->list, &vpmem->req_list);


Thank you,
Pankaj
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-08 15:23         ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08 15:23 UTC (permalink / raw)
  To: Jakub Staroń
  Cc: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs, jack, mst,
	jasowang, david, lcapitulino, adilger kernel, zwisler, aarcange,
	dave jiang, darrick wong, vishal l verma, david, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, yuval shaia, stefanha,
	pbonzini, dan j williams, kwolf, tytso, xiaoguangrong eric,
	cohuck, rjw, imammedo, smbarber


> > 
> > > +int virtio_pmem_flush(struct nd_region *nd_region)
> > > +{
> > > +        int err;
> > > +        unsigned long flags;
> > > +        struct scatterlist *sgs[2], sg, ret;
> > > +        struct virtio_device *vdev = nd_region->provider_data;
> > > +        struct virtio_pmem *vpmem = vdev->priv;
> > > +        struct virtio_pmem_request *req;
> > > +
> > > +        might_sleep();
> > > +        req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > +        if (!req)
> > > +                return -ENOMEM;
> > > +
> > > +        req->done = req->wq_buf_avail = false;
> > > +        strcpy(req->name, "FLUSH");
> > > +        init_waitqueue_head(&req->host_acked);
> > > +        init_waitqueue_head(&req->wq_buf);
> > > +        sg_init_one(&sg, req->name, strlen(req->name));
> > > +        sgs[0] = &sg;
> > > +        sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > > +        sgs[1] = &ret;
> > > +
> > > +        spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +        err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > > +        if (err) {
> > > +                dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> > > +
> > > +                list_add_tail(&vpmem->req_list, &req->list);
> > > +                spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +
> > > +                /* When host has read buffer, this completes via host_ack */
> > > +                wait_event(req->wq_buf, req->wq_buf_avail);
> > > +                spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +        }
> > 
> > Aren't the arguments in `list_add_tail` swapped? The element we are adding
> 

Yes, arguments for 'list_add_tail' should be swapped.

list_add_tail(&req->list, &vpmem->req_list);


Thank you,
Pankaj

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-08 15:23         ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08 15:23 UTC (permalink / raw)
  To: Jakub Staroń
  Cc: cohuck, jack, kvm, mst, jasowang, david, qemu-devel,
	virtualization, adilger kernel, smbarber, zwisler, aarcange,
	dave jiang, linux-nvdimm, vishal l verma, david, willy, hch,
	linux-acpi, jmoyer, linux-ext4, lenb, kilobyte, riel,
	yuval shaia, stefanha, imammedo, dan j williams, lcapitulino,
	kwolf, nilal, tytso, xiaoguangrong eric, darrick wong, rjw,
	linux-kernel, linux-xfs, linux-fsdevel, pbonzini


> > 
> > > +int virtio_pmem_flush(struct nd_region *nd_region)
> > > +{
> > > +        int err;
> > > +        unsigned long flags;
> > > +        struct scatterlist *sgs[2], sg, ret;
> > > +        struct virtio_device *vdev = nd_region->provider_data;
> > > +        struct virtio_pmem *vpmem = vdev->priv;
> > > +        struct virtio_pmem_request *req;
> > > +
> > > +        might_sleep();
> > > +        req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > +        if (!req)
> > > +                return -ENOMEM;
> > > +
> > > +        req->done = req->wq_buf_avail = false;
> > > +        strcpy(req->name, "FLUSH");
> > > +        init_waitqueue_head(&req->host_acked);
> > > +        init_waitqueue_head(&req->wq_buf);
> > > +        sg_init_one(&sg, req->name, strlen(req->name));
> > > +        sgs[0] = &sg;
> > > +        sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > > +        sgs[1] = &ret;
> > > +
> > > +        spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +        err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > > +        if (err) {
> > > +                dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> > > +
> > > +                list_add_tail(&vpmem->req_list, &req->list);
> > > +                spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +
> > > +                /* When host has read buffer, this completes via host_ack */
> > > +                wait_event(req->wq_buf, req->wq_buf_avail);
> > > +                spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +        }
> > 
> > Aren't the arguments in `list_add_tail` swapped? The element we are adding
> 

Yes, arguments for 'list_add_tail' should be swapped.

list_add_tail(&req->list, &vpmem->req_list);


Thank you,
Pankaj


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-05-08 11:12       ` Pankaj Gupta
                         ` (2 preceding siblings ...)
  (?)
@ 2019-05-08 15:23       ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-08 15:23 UTC (permalink / raw)
  To: Jakub Staroń
  Cc: cohuck, jack, kvm, mst, david, qemu-devel, virtualization,
	adilger kernel, smbarber, zwisler, aarcange, dave jiang,
	linux-nvdimm, vishal l verma, willy, hch, linux-acpi, jmoyer,
	linux-ext4, lenb, kilobyte, riel, yuval shaia, stefanha,
	imammedo, dan j williams, lcapitulino, nilal, tytso,
	xiaoguangrong eric, darrick wong, rjw, linux-kernel, linux-xfs


> > 
> > > +int virtio_pmem_flush(struct nd_region *nd_region)
> > > +{
> > > +        int err;
> > > +        unsigned long flags;
> > > +        struct scatterlist *sgs[2], sg, ret;
> > > +        struct virtio_device *vdev = nd_region->provider_data;
> > > +        struct virtio_pmem *vpmem = vdev->priv;
> > > +        struct virtio_pmem_request *req;
> > > +
> > > +        might_sleep();
> > > +        req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > +        if (!req)
> > > +                return -ENOMEM;
> > > +
> > > +        req->done = req->wq_buf_avail = false;
> > > +        strcpy(req->name, "FLUSH");
> > > +        init_waitqueue_head(&req->host_acked);
> > > +        init_waitqueue_head(&req->wq_buf);
> > > +        sg_init_one(&sg, req->name, strlen(req->name));
> > > +        sgs[0] = &sg;
> > > +        sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > > +        sgs[1] = &ret;
> > > +
> > > +        spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +        err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > > +        if (err) {
> > > +                dev_err(&vdev->dev, "failed to send command to virtio pmem device\n");
> > > +
> > > +                list_add_tail(&vpmem->req_list, &req->list);
> > > +                spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +
> > > +                /* When host has read buffer, this completes via host_ack */
> > > +                wait_event(req->wq_buf, req->wq_buf_avail);
> > > +                spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +        }
> > 
> > Aren't the arguments in `list_add_tail` swapped? The element we are adding
> 

Yes, arguments for 'list_add_tail' should be swapped.

list_add_tail(&req->list, &vpmem->req_list);


Thank you,
Pankaj

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-05-08 11:12       ` Pankaj Gupta
  (?)
@ 2019-05-08 19:05         ` Jakub Staroń
  -1 siblings, 0 replies; 107+ messages in thread
From: Jakub Staroń @ 2019-05-08 19:05 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs, jack, mst,
	jasowang, david, lcapitulino, adilger kernel, zwisler, aarcange,
	dave jiang, darrick wong, vishal l verma, david, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, yuval shaia, stefanha,
	pbonzini

On 5/8/19 4:12 AM, Pankaj Gupta wrote:
> 
>>
>> On 4/25/19 10:00 PM, Pankaj Gupta wrote:
>>
>>> +void host_ack(struct virtqueue *vq)
>>> +{
>>> +	unsigned int len;
>>> +	unsigned long flags;
>>> +	struct virtio_pmem_request *req, *req_buf;
>>> +	struct virtio_pmem *vpmem = vq->vdev->priv;
>>> +
>>> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
>>> +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
>>> +		req->done = true;
>>> +		wake_up(&req->host_acked);
>>> +
>>> +		if (!list_empty(&vpmem->req_list)) {
>>> +			req_buf = list_first_entry(&vpmem->req_list,
>>> +					struct virtio_pmem_request, list);
>>> +			list_del(&vpmem->req_list);
>>
>> Shouldn't it be rather `list_del(vpmem->req_list.next)`? We are trying to
>> unlink
>> first element of the list and `vpmem->req_list` is just the list head.
> 
> This looks correct. We are not deleting head but first entry in 'req_list'
> which is device corresponding list of pending requests.
> 
> Please see below:
> 
> /**
>  * Retrieve the first list entry for the given list pointer.
>  *
>  * Example:
>  * struct foo *first;
>  * first = list_first_entry(&bar->list_of_foos, struct foo, list_of_foos);
>  *
>  * @param ptr The list head
>  * @param type Data type of the list element to retrieve
>  * @param member Member name of the struct list_head field in the list element.
>  * @return A pointer to the first list element.
>  */
> #define list_first_entry(ptr, type, member) \
>     list_entry((ptr)->next, type, member)

Please look at this StackOverflow question:
https://stackoverflow.com/questions/19675419/deleting-first-element-of-a-list-h-list

Author asks about deleting first element of the queue. In our case
(and also in the question's author case), `vpmem->req_list` is not element
of any request struct and not an element of the list. It's just a list head storing 
`next` and `prev` pointers which are then pointing to respectively first and
last element of the list. We want to unlink the first element of the list,
so we need to pass pointer to the first element of the list to
the `list_del` function - that is, the `vpmem->req_list.next`.

Thank you,
Jakub Staron

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-08 19:05         ` Jakub Staroń
  0 siblings, 0 replies; 107+ messages in thread
From: Jakub Staroń @ 2019-05-08 19:05 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: linux-nvdimm, linux-kernel, virtualization, kvm, linux-fsdevel,
	linux-acpi, qemu-devel, linux-ext4, linux-xfs, jack, mst,
	jasowang, david, lcapitulino, adilger kernel, zwisler, aarcange,
	dave jiang, darrick wong, vishal l verma, david, willy, hch,
	jmoyer, nilal, lenb, kilobyte, riel, yuval shaia, stefanha,
	pbonzini, dan j williams, kwolf, tytso, xiaoguangrong eric,
	cohuck, rjw, imammedo, smbarber

On 5/8/19 4:12 AM, Pankaj Gupta wrote:
> 
>>
>> On 4/25/19 10:00 PM, Pankaj Gupta wrote:
>>
>>> +void host_ack(struct virtqueue *vq)
>>> +{
>>> +	unsigned int len;
>>> +	unsigned long flags;
>>> +	struct virtio_pmem_request *req, *req_buf;
>>> +	struct virtio_pmem *vpmem = vq->vdev->priv;
>>> +
>>> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
>>> +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
>>> +		req->done = true;
>>> +		wake_up(&req->host_acked);
>>> +
>>> +		if (!list_empty(&vpmem->req_list)) {
>>> +			req_buf = list_first_entry(&vpmem->req_list,
>>> +					struct virtio_pmem_request, list);
>>> +			list_del(&vpmem->req_list);
>>
>> Shouldn't it be rather `list_del(vpmem->req_list.next)`? We are trying to
>> unlink
>> first element of the list and `vpmem->req_list` is just the list head.
> 
> This looks correct. We are not deleting head but first entry in 'req_list'
> which is device corresponding list of pending requests.
> 
> Please see below:
> 
> /**
>  * Retrieve the first list entry for the given list pointer.
>  *
>  * Example:
>  * struct foo *first;
>  * first = list_first_entry(&bar->list_of_foos, struct foo, list_of_foos);
>  *
>  * @param ptr The list head
>  * @param type Data type of the list element to retrieve
>  * @param member Member name of the struct list_head field in the list element.
>  * @return A pointer to the first list element.
>  */
> #define list_first_entry(ptr, type, member) \
>     list_entry((ptr)->next, type, member)

Please look at this StackOverflow question:
https://stackoverflow.com/questions/19675419/deleting-first-element-of-a-list-h-list

Author asks about deleting first element of the queue. In our case
(and also in the question's author case), `vpmem->req_list` is not element
of any request struct and not an element of the list. It's just a list head storing 
`next` and `prev` pointers which are then pointing to respectively first and
last element of the list. We want to unlink the first element of the list,
so we need to pass pointer to the first element of the list to
the `list_del` function - that is, the `vpmem->req_list.next`.

Thank you,
Jakub Staron

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-08 19:05         ` Jakub Staroń
  0 siblings, 0 replies; 107+ messages in thread
From: Jakub Staroń via Qemu-devel @ 2019-05-08 19:05 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, jack, kvm, mst, jasowang, david, qemu-devel,
	virtualization, adilger kernel, smbarber, zwisler, aarcange,
	dave jiang, linux-nvdimm, vishal l verma, david, willy, hch,
	linux-acpi, jmoyer, linux-ext4, lenb, kilobyte, riel,
	yuval shaia, stefanha, imammedo, dan j williams, lcapitulino,
	kwolf, nilal, tytso, xiaoguangrong eric, darrick wong, rjw,
	linux-kernel, linux-xfs, linux-fsdevel, pbonzini

On 5/8/19 4:12 AM, Pankaj Gupta wrote:
> 
>>
>> On 4/25/19 10:00 PM, Pankaj Gupta wrote:
>>
>>> +void host_ack(struct virtqueue *vq)
>>> +{
>>> +	unsigned int len;
>>> +	unsigned long flags;
>>> +	struct virtio_pmem_request *req, *req_buf;
>>> +	struct virtio_pmem *vpmem = vq->vdev->priv;
>>> +
>>> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
>>> +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
>>> +		req->done = true;
>>> +		wake_up(&req->host_acked);
>>> +
>>> +		if (!list_empty(&vpmem->req_list)) {
>>> +			req_buf = list_first_entry(&vpmem->req_list,
>>> +					struct virtio_pmem_request, list);
>>> +			list_del(&vpmem->req_list);
>>
>> Shouldn't it be rather `list_del(vpmem->req_list.next)`? We are trying to
>> unlink
>> first element of the list and `vpmem->req_list` is just the list head.
> 
> This looks correct. We are not deleting head but first entry in 'req_list'
> which is device corresponding list of pending requests.
> 
> Please see below:
> 
> /**
>  * Retrieve the first list entry for the given list pointer.
>  *
>  * Example:
>  * struct foo *first;
>  * first = list_first_entry(&bar->list_of_foos, struct foo, list_of_foos);
>  *
>  * @param ptr The list head
>  * @param type Data type of the list element to retrieve
>  * @param member Member name of the struct list_head field in the list element.
>  * @return A pointer to the first list element.
>  */
> #define list_first_entry(ptr, type, member) \
>     list_entry((ptr)->next, type, member)

Please look at this StackOverflow question:
https://stackoverflow.com/questions/19675419/deleting-first-element-of-a-list-h-list

Author asks about deleting first element of the queue. In our case
(and also in the question's author case), `vpmem->req_list` is not element
of any request struct and not an element of the list. It's just a list head storing 
`next` and `prev` pointers which are then pointing to respectively first and
last element of the list. We want to unlink the first element of the list,
so we need to pass pointer to the first element of the list to
the `list_del` function - that is, the `vpmem->req_list.next`.

Thank you,
Jakub Staron


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-05-08 11:12       ` Pankaj Gupta
                         ` (5 preceding siblings ...)
  (?)
@ 2019-05-08 19:05       ` Jakub Staroń via Virtualization
  -1 siblings, 0 replies; 107+ messages in thread
From: Jakub Staroń via Virtualization @ 2019-05-08 19:05 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, jack, kvm, mst, david, qemu-devel, virtualization,
	adilger kernel, smbarber, zwisler, aarcange, dave jiang,
	linux-nvdimm, vishal l verma, willy, hch, linux-acpi, jmoyer,
	linux-ext4, lenb, kilobyte, riel, yuval shaia, stefanha,
	imammedo, dan j williams, lcapitulino, nilal, tytso,
	xiaoguangrong eric, darrick wong, rjw, linux-kernel, linux-xfs

On 5/8/19 4:12 AM, Pankaj Gupta wrote:
> 
>>
>> On 4/25/19 10:00 PM, Pankaj Gupta wrote:
>>
>>> +void host_ack(struct virtqueue *vq)
>>> +{
>>> +	unsigned int len;
>>> +	unsigned long flags;
>>> +	struct virtio_pmem_request *req, *req_buf;
>>> +	struct virtio_pmem *vpmem = vq->vdev->priv;
>>> +
>>> +	spin_lock_irqsave(&vpmem->pmem_lock, flags);
>>> +	while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
>>> +		req->done = true;
>>> +		wake_up(&req->host_acked);
>>> +
>>> +		if (!list_empty(&vpmem->req_list)) {
>>> +			req_buf = list_first_entry(&vpmem->req_list,
>>> +					struct virtio_pmem_request, list);
>>> +			list_del(&vpmem->req_list);
>>
>> Shouldn't it be rather `list_del(vpmem->req_list.next)`? We are trying to
>> unlink
>> first element of the list and `vpmem->req_list` is just the list head.
> 
> This looks correct. We are not deleting head but first entry in 'req_list'
> which is device corresponding list of pending requests.
> 
> Please see below:
> 
> /**
>  * Retrieve the first list entry for the given list pointer.
>  *
>  * Example:
>  * struct foo *first;
>  * first = list_first_entry(&bar->list_of_foos, struct foo, list_of_foos);
>  *
>  * @param ptr The list head
>  * @param type Data type of the list element to retrieve
>  * @param member Member name of the struct list_head field in the list element.
>  * @return A pointer to the first list element.
>  */
> #define list_first_entry(ptr, type, member) \
>     list_entry((ptr)->next, type, member)

Please look at this StackOverflow question:
https://stackoverflow.com/questions/19675419/deleting-first-element-of-a-list-h-list

Author asks about deleting first element of the queue. In our case
(and also in the question's author case), `vpmem->req_list` is not element
of any request struct and not an element of the list. It's just a list head storing 
`next` and `prev` pointers which are then pointing to respectively first and
last element of the list. We want to unlink the first element of the list,
so we need to pass pointer to the first element of the list to
the `list_del` function - that is, the `vpmem->req_list.next`.

Thank you,
Jakub Staron

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 3/6] libnvdimm: add dax_dev sync flag
  2019-05-07 15:40     ` Dan Williams
  (?)
@ 2019-05-09 12:24       ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-09 12:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: cohuck, Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang,
	david, Qemu Developers, virtualization, Andreas Dilger,
	Ross Zwisler, Andrea Arcangeli, linux-nvdimm, David Hildenbrand,
	Matthew Wilcox, Christoph Hellwig, Linux ACPI, linux-ext4,
	Len Brown, kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Paolo Bonzini, lcapitulino, Kevin Wolf, Nitesh Narayan Lal,
	Theodore Ts'o, Xiao Guangrong, Darrick J. Wong,
	Rafael J. Wysocki, Linux Kernel Mailing List, linux-xfs,
	linux-fsdevel, Igor Mammedov


> >
> > This patch adds 'DAXDEV_SYNC' flag which is set
> > for nd_region doing synchronous flush. This later
> > is used to disable MAP_SYNC functionality for
> > ext4 & xfs filesystem for devices don't support
> > synchronous flush.
> >
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> [..]
> > diff --git a/include/linux/dax.h b/include/linux/dax.h
> > index 0dd316a74a29..c97fc0cc7167 100644
> > --- a/include/linux/dax.h
> > +++ b/include/linux/dax.h
> > @@ -7,6 +7,9 @@
> >  #include <linux/radix-tree.h>
> >  #include <asm/pgtable.h>
> >
> > +/* Flag for synchronous flush */
> > +#define DAXDEV_F_SYNC true
> 
> I'd feel better, i.e. it reads more canonically, if this was defined
> as (1UL << 0) and the argument to alloc_dax() was changed to 'unsigned
> long flags' rather than a bool.

Sure, Will send a v8 with suggested changes.

Thank You,
Pankaj

> 
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 3/6] libnvdimm: add dax_dev sync flag
@ 2019-05-09 12:24       ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-09 12:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-nvdimm, Linux Kernel Mailing List, virtualization,
	KVM list, linux-fsdevel, Linux ACPI, Qemu Developers, linux-ext4,
	linux-xfs, Ross Zwisler, Vishal L Verma, Dave Jiang,
	Michael S. Tsirkin, Jason Wang, Matthew Wilcox,
	Rafael J. Wysocki, Christoph Hellwig, Len Brown, Jan Kara,
	Theodore Ts'o, Andreas Dilger, Darrick J. Wong, lcapitulino,
	Kevin Wolf, Igor Mammedov, jmoyer, Nitesh Narayan Lal,
	Rik van Riel, Stefan Hajnoczi, Andrea Arcangeli,
	David Hildenbrand, david, cohuck, Xiao Guangrong, Paolo Bonzini,
	kilobyte, yuval shaia


> >
> > This patch adds 'DAXDEV_SYNC' flag which is set
> > for nd_region doing synchronous flush. This later
> > is used to disable MAP_SYNC functionality for
> > ext4 & xfs filesystem for devices don't support
> > synchronous flush.
> >
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> [..]
> > diff --git a/include/linux/dax.h b/include/linux/dax.h
> > index 0dd316a74a29..c97fc0cc7167 100644
> > --- a/include/linux/dax.h
> > +++ b/include/linux/dax.h
> > @@ -7,6 +7,9 @@
> >  #include <linux/radix-tree.h>
> >  #include <asm/pgtable.h>
> >
> > +/* Flag for synchronous flush */
> > +#define DAXDEV_F_SYNC true
> 
> I'd feel better, i.e. it reads more canonically, if this was defined
> as (1UL << 0) and the argument to alloc_dax() was changed to 'unsigned
> long flags' rather than a bool.

Sure, Will send a v8 with suggested changes.

Thank You,
Pankaj

> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 3/6] libnvdimm: add dax_dev sync flag
@ 2019-05-09 12:24       ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-09 12:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: cohuck, Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang,
	david, Qemu Developers, virtualization, Andreas Dilger,
	Ross Zwisler, Andrea Arcangeli, Dave Jiang, linux-nvdimm,
	Vishal L Verma, David Hildenbrand, Matthew Wilcox,
	Christoph Hellwig, Linux ACPI, jmoyer, linux-ext4, Len Brown,
	kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Paolo Bonzini, lcapitulino, Kevin Wolf, Nitesh Narayan Lal,
	Theodore Ts'o, Xiao Guangrong, Darrick J. Wong,
	Rafael J. Wysocki, Linux Kernel Mailing List, linux-xfs,
	linux-fsdevel, Igor Mammedov


> >
> > This patch adds 'DAXDEV_SYNC' flag which is set
> > for nd_region doing synchronous flush. This later
> > is used to disable MAP_SYNC functionality for
> > ext4 & xfs filesystem for devices don't support
> > synchronous flush.
> >
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> [..]
> > diff --git a/include/linux/dax.h b/include/linux/dax.h
> > index 0dd316a74a29..c97fc0cc7167 100644
> > --- a/include/linux/dax.h
> > +++ b/include/linux/dax.h
> > @@ -7,6 +7,9 @@
> >  #include <linux/radix-tree.h>
> >  #include <asm/pgtable.h>
> >
> > +/* Flag for synchronous flush */
> > +#define DAXDEV_F_SYNC true
> 
> I'd feel better, i.e. it reads more canonically, if this was defined
> as (1UL << 0) and the argument to alloc_dax() was changed to 'unsigned
> long flags' rather than a bool.

Sure, Will send a v8 with suggested changes.

Thank You,
Pankaj

> 


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [PATCH v7 3/6] libnvdimm: add dax_dev sync flag
  2019-05-07 15:40     ` Dan Williams
                       ` (3 preceding siblings ...)
  (?)
@ 2019-05-09 12:24     ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-09 12:24 UTC (permalink / raw)
  To: Dan Williams
  Cc: cohuck, Jan Kara, KVM list, Michael S. Tsirkin, david,
	Qemu Developers, virtualization, Andreas Dilger, Ross Zwisler,
	Andrea Arcangeli, Dave Jiang, linux-nvdimm, Vishal L Verma,
	Matthew Wilcox, Christoph Hellwig, Linux ACPI, jmoyer,
	linux-ext4, Len Brown, kilobyte, Rik van Riel, yuval shaia,
	Stefan Hajnoczi, Paolo Bonzini, lcapitulino, Nites


> >
> > This patch adds 'DAXDEV_SYNC' flag which is set
> > for nd_region doing synchronous flush. This later
> > is used to disable MAP_SYNC functionality for
> > ext4 & xfs filesystem for devices don't support
> > synchronous flush.
> >
> > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> [..]
> > diff --git a/include/linux/dax.h b/include/linux/dax.h
> > index 0dd316a74a29..c97fc0cc7167 100644
> > --- a/include/linux/dax.h
> > +++ b/include/linux/dax.h
> > @@ -7,6 +7,9 @@
> >  #include <linux/radix-tree.h>
> >  #include <asm/pgtable.h>
> >
> > +/* Flag for synchronous flush */
> > +#define DAXDEV_F_SYNC true
> 
> I'd feel better, i.e. it reads more canonically, if this was defined
> as (1UL << 0) and the argument to alloc_dax() was changed to 'unsigned
> long flags' rather than a bool.

Sure, Will send a v8 with suggested changes.

Thank You,
Pankaj

> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-05-08 11:19         ` Pankaj Gupta
  (?)
  (?)
@ 2019-05-10 23:33           ` Dan Williams
  -1 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-10 23:33 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang, david,
	Qemu Developers, virtualization, Andreas Dilger, Ross Zwisler,
	Andrea Arcangeli, linux-nvdimm, David Hildenbrand,
	Matthew Wilcox, Christoph Hellwig, Linux ACPI, linux-ext4,
	Len Brown, kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Igor Mammedov, lcapitulino, Kevin Wolf, Nitesh Narayan Lal,
	Theodore Ts'o, Xiao Guangrong, cohuck, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-xfs, linux-fsdevel,
	Paolo Bonzini, Darrick J. Wong

On Wed, May 8, 2019 at 4:19 AM Pankaj Gupta <pagupta@redhat.com> wrote:
>
>
> Hi Dan,
>
> Thank you for the review. Please see my reply inline.
>
> >
> > Hi Pankaj,
> >
> > Some minor file placement comments below.
>
> Sure.
>
> >
> > On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> > >
> > > This patch adds virtio-pmem driver for KVM guest.
> > >
> > > Guest reads the persistent memory range information from
> > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > creates a nd_region object with the persistent memory
> > > range information so that existing 'nvdimm/pmem' driver
> > > can reserve this into system memory map. This way
> > > 'virtio-pmem' driver uses existing functionality of pmem
> > > driver to register persistent memory compatible for DAX
> > > capable filesystems.
> > >
> > > This also provides function to perform guest flush over
> > > VIRTIO from 'pmem' driver when userspace performs flush
> > > on DAX memory range.
> > >
> > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > > ---
> > >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> > >  drivers/virtio/Kconfig           |  10 +++
> > >  drivers/virtio/Makefile          |   1 +
> > >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> > >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> > >  include/uapi/linux/virtio_ids.h  |   1 +
> > >  include/uapi/linux/virtio_pmem.h |  10 +++
> > >  7 files changed, 314 insertions(+)
> > >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> > >  create mode 100644 drivers/virtio/pmem.c
> > >  create mode 100644 include/linux/virtio_pmem.h
> > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > >
> > > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > > new file mode 100644
> > > index 000000000000..66b582f751a3
> > > --- /dev/null
> > > +++ b/drivers/nvdimm/virtio_pmem.c
> > > @@ -0,0 +1,114 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * virtio_pmem.c: Virtio pmem Driver
> > > + *
> > > + * Discovers persistent memory range information
> > > + * from host and provides a virtio based flushing
> > > + * interface.
> > > + */
> > > +#include <linux/virtio_pmem.h>
> > > +#include "nd.h"
> > > +
> > > + /* The interrupt handler */
> > > +void host_ack(struct virtqueue *vq)
> > > +{
> > > +       unsigned int len;
> > > +       unsigned long flags;
> > > +       struct virtio_pmem_request *req, *req_buf;
> > > +       struct virtio_pmem *vpmem = vq->vdev->priv;
> > > +
> > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > > +               req->done = true;
> > > +               wake_up(&req->host_acked);
> > > +
> > > +               if (!list_empty(&vpmem->req_list)) {
> > > +                       req_buf = list_first_entry(&vpmem->req_list,
> > > +                                       struct virtio_pmem_request, list);
> > > +                       list_del(&vpmem->req_list);
> > > +                       req_buf->wq_buf_avail = true;
> > > +                       wake_up(&req_buf->wq_buf);
> > > +               }
> > > +       }
> > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +}
> > > +EXPORT_SYMBOL_GPL(host_ack);
> > > +
> > > + /* The request submission function */
> > > +int virtio_pmem_flush(struct nd_region *nd_region)
> > > +{
> > > +       int err;
> > > +       unsigned long flags;
> > > +       struct scatterlist *sgs[2], sg, ret;
> > > +       struct virtio_device *vdev = nd_region->provider_data;
> > > +       struct virtio_pmem *vpmem = vdev->priv;
> > > +       struct virtio_pmem_request *req;
> > > +
> > > +       might_sleep();
> > > +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > +       if (!req)
> > > +               return -ENOMEM;
> > > +
> > > +       req->done = req->wq_buf_avail = false;
> > > +       strcpy(req->name, "FLUSH");
> > > +       init_waitqueue_head(&req->host_acked);
> > > +       init_waitqueue_head(&req->wq_buf);
> > > +       sg_init_one(&sg, req->name, strlen(req->name));
> > > +       sgs[0] = &sg;
> > > +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > > +       sgs[1] = &ret;
> > > +
> > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > > +       if (err) {
> > > +               dev_err(&vdev->dev, "failed to send command to virtio pmem
> > > device\n");
> > > +
> > > +               list_add_tail(&vpmem->req_list, &req->list);
> > > +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +
> > > +               /* When host has read buffer, this completes via host_ack
> > > */
> > > +               wait_event(req->wq_buf, req->wq_buf_avail);
> > > +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +       }
> > > +       err = virtqueue_kick(vpmem->req_vq);
> > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +
> > > +       if (!err) {
> > > +               err = -EIO;
> > > +               goto ret;
> > > +       }
> > > +       /* When host has read buffer, this completes via host_ack */
> > > +       wait_event(req->host_acked, req->done);
> > > +       err = req->ret;
> > > +ret:
> > > +       kfree(req);
> > > +       return err;
> > > +};
> > > +
> > > + /* The asynchronous flush callback function */
> > > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > > +{
> > > +       int rc = 0;
> > > +
> > > +       /* Create child bio for asynchronous flush and chain with
> > > +        * parent bio. Otherwise directly call nd_region flush.
> > > +        */
> > > +       if (bio && bio->bi_iter.bi_sector != -1) {
> > > +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > > +
> > > +               if (!child)
> > > +                       return -ENOMEM;
> > > +               bio_copy_dev(child, bio);
> > > +               child->bi_opf = REQ_PREFLUSH;
> > > +               child->bi_iter.bi_sector = -1;
> > > +               bio_chain(child, bio);
> > > +               submit_bio(child);
> > > +       } else {
> > > +               if (virtio_pmem_flush(nd_region))
> > > +                       rc = -EIO;
> > > +       }
> > > +
> > > +       return rc;
> > > +};
> > > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > > +MODULE_LICENSE("GPL");
> > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > > index 35897649c24f..9f634a2ed638 100644
> > > --- a/drivers/virtio/Kconfig
> > > +++ b/drivers/virtio/Kconfig
> > > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> > >
> > >           If unsure, say Y.
> > >
> > > +config VIRTIO_PMEM
> > > +       tristate "Support for virtio pmem driver"
> > > +       depends on VIRTIO
> > > +       depends on LIBNVDIMM
> > > +       help
> > > +       This driver provides support for virtio based flushing interface
> > > +       for persistent memory range.
> > > +
> > > +       If unsure, say M.
> > > +
> > >  config VIRTIO_BALLOON
> > >         tristate "Virtio balloon driver"
> > >         depends on VIRTIO
> > > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > > index 3a2b5c5dcf46..143ce91eabe9 100644
> > > --- a/drivers/virtio/Makefile
> > > +++ b/drivers/virtio/Makefile
> > > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> > >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> > >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> > >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > > new file mode 100644
> > > index 000000000000..309788628e41
> > > --- /dev/null
> > > +++ b/drivers/virtio/pmem.c
> >
> > It's not clear to me why this driver is located in drivers/virtio/
>
> Like other VIRTIO drivers, I placed it initially in drivers/virtio directory.
>
> >
> > > @@ -0,0 +1,118 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * virtio_pmem.c: Virtio pmem Driver
> > > + *
> > > + * Discovers persistent memory range information
> > > + * from host and registers the virtual pmem device
> > > + * with libnvdimm core.
> > > + */
> > > +#include <linux/virtio_pmem.h>
> > > +#include <../../drivers/nvdimm/nd.h>
> >
> > ...especially because it seems to require nvdimm internals.
> >
> > However I don't see why that header is included.
>
> Removed.
>
> >
> > In any event lets move this to drivers/nvdimm/virtio.c to live
> > alongside the other generic bus provider drivers/nvdimm/e820.c.
>
> o.k. Makes sense.
>
> >
> > > +
> > > +static struct virtio_device_id id_table[] = {
> > > +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > > +       { 0 },
> > > +};
> > > +
> > > + /* Initialize virt queue */
> > > +static int init_vq(struct virtio_pmem *vpmem)
> > > +{
> > > +       /* single vq */
> > > +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > > +                               host_ack, "flush_queue");
> > > +       if (IS_ERR(vpmem->req_vq))
> > > +               return PTR_ERR(vpmem->req_vq);
> > > +
> > > +       spin_lock_init(&vpmem->pmem_lock);
> > > +       INIT_LIST_HEAD(&vpmem->req_list);
> > > +
> > > +       return 0;
> > > +};
> > > +
> > > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > > +{
> > > +       int err = 0;
> > > +       struct resource res;
> > > +       struct virtio_pmem *vpmem;
> > > +       struct nd_region_desc ndr_desc = {};
> > > +       int nid = dev_to_node(&vdev->dev);
> > > +       struct nd_region *nd_region;
> > > +
> > > +       if (!vdev->config->get) {
> > > +               dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > > +                       __func__);
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > > +       if (!vpmem) {
> > > +               err = -ENOMEM;
> > > +               goto out_err;
> > > +       }
> > > +
> > > +       vpmem->vdev = vdev;
> > > +       vdev->priv = vpmem;
> > > +       err = init_vq(vpmem);
> > > +       if (err)
> > > +               goto out_err;
> > > +
> > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > +                       start, &vpmem->start);
> > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > +                       size, &vpmem->size);
> > > +
> > > +       res.start = vpmem->start;
> > > +       res.end   = vpmem->start + vpmem->size-1;
> > > +       vpmem->nd_desc.provider_name = "virtio-pmem";
> > > +       vpmem->nd_desc.module = THIS_MODULE;
> > > +
> > > +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > > +                                               &vpmem->nd_desc);
> > > +       if (!vpmem->nvdimm_bus)
> > > +               goto out_vq;
> > > +
> > > +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > > +
> > > +       ndr_desc.res = &res;
> > > +       ndr_desc.numa_node = nid;
> > > +       ndr_desc.flush = async_pmem_flush;
> > > +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > > +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > > +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus,
> > > &ndr_desc);
> > > +
> > > +       if (!nd_region)
> > > +               goto out_nd;
> > > +       nd_region->provider_data =  dev_to_virtio
> > > +                                       (nd_region->dev.parent->parent);
> > > +       return 0;
> > > +out_nd:
> > > +       err = -ENXIO;
> > > +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > > +out_vq:
> > > +       vdev->config->del_vqs(vdev);
> > > +out_err:
> > > +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > > +       return err;
> > > +}
> > > +
> > > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > > +{
> > > +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > > +
> > > +       nvdimm_bus_unregister(nvdimm_bus);
> > > +       vdev->config->del_vqs(vdev);
> > > +       vdev->config->reset(vdev);
> > > +}
> > > +
> > > +static struct virtio_driver virtio_pmem_driver = {
> > > +       .driver.name            = KBUILD_MODNAME,
> > > +       .driver.owner           = THIS_MODULE,
> > > +       .id_table               = id_table,
> > > +       .probe                  = virtio_pmem_probe,
> > > +       .remove                 = virtio_pmem_remove,
> > > +};
> > > +
> > > +module_virtio_driver(virtio_pmem_driver);
> > > +MODULE_DEVICE_TABLE(virtio, id_table);
> > > +MODULE_DESCRIPTION("Virtio pmem driver");
> > > +MODULE_LICENSE("GPL");
> > > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > > new file mode 100644
> > > index 000000000000..ab1da877575d
> > > --- /dev/null
> > > +++ b/include/linux/virtio_pmem.h
> >
> > Why is this a global header?
>
> This is where other virtio driver headers are also placed.
> I think this is to access uapi config file in :
>
> ./include/uapi/linux/virtio_pmem.h
>
> Is it okay if we keep 'virtio_pmem.h' in global header?

No, I don't think so. While virtio_console.h and virtio_net.h make
sense as global headers because they are consumed from multiple
drivers, there is no need for virtio_caif.h, for example, to be a
global header. I see no practical reason that the private details of
virtio_pmem.h need to be made available outside of the virtio_pmem.c
consumer.
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-10 23:33           ` Dan Williams
  0 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-10 23:33 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: cohuck, Jan Kara, KVM list, Jason Wang, david,
	Michael S. Tsirkin, Qemu Developers, virtualization,
	Andreas Dilger, Ross Zwisler, Andrea Arcangeli, Dave Jiang,
	linux-nvdimm, Vishal L Verma, David Hildenbrand, Matthew Wilcox,
	Christoph Hellwig, Linux ACPI, jmoyer, linux-ext4, Len Brown,
	kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Paolo Bonzini, lcapitulino, Kevin Wolf, Nitesh Narayan Lal,
	Theodore Ts'o, Xiao Guangrong, Darrick J. Wong,
	Rafael J. Wysocki, Linux Kernel Mailing List, linux-xfs,
	linux-fsdevel, Igor Mammedov

On Wed, May 8, 2019 at 4:19 AM Pankaj Gupta <pagupta@redhat.com> wrote:
>
>
> Hi Dan,
>
> Thank you for the review. Please see my reply inline.
>
> >
> > Hi Pankaj,
> >
> > Some minor file placement comments below.
>
> Sure.
>
> >
> > On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> > >
> > > This patch adds virtio-pmem driver for KVM guest.
> > >
> > > Guest reads the persistent memory range information from
> > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > creates a nd_region object with the persistent memory
> > > range information so that existing 'nvdimm/pmem' driver
> > > can reserve this into system memory map. This way
> > > 'virtio-pmem' driver uses existing functionality of pmem
> > > driver to register persistent memory compatible for DAX
> > > capable filesystems.
> > >
> > > This also provides function to perform guest flush over
> > > VIRTIO from 'pmem' driver when userspace performs flush
> > > on DAX memory range.
> > >
> > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > > ---
> > >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> > >  drivers/virtio/Kconfig           |  10 +++
> > >  drivers/virtio/Makefile          |   1 +
> > >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> > >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> > >  include/uapi/linux/virtio_ids.h  |   1 +
> > >  include/uapi/linux/virtio_pmem.h |  10 +++
> > >  7 files changed, 314 insertions(+)
> > >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> > >  create mode 100644 drivers/virtio/pmem.c
> > >  create mode 100644 include/linux/virtio_pmem.h
> > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > >
> > > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > > new file mode 100644
> > > index 000000000000..66b582f751a3
> > > --- /dev/null
> > > +++ b/drivers/nvdimm/virtio_pmem.c
> > > @@ -0,0 +1,114 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * virtio_pmem.c: Virtio pmem Driver
> > > + *
> > > + * Discovers persistent memory range information
> > > + * from host and provides a virtio based flushing
> > > + * interface.
> > > + */
> > > +#include <linux/virtio_pmem.h>
> > > +#include "nd.h"
> > > +
> > > + /* The interrupt handler */
> > > +void host_ack(struct virtqueue *vq)
> > > +{
> > > +       unsigned int len;
> > > +       unsigned long flags;
> > > +       struct virtio_pmem_request *req, *req_buf;
> > > +       struct virtio_pmem *vpmem = vq->vdev->priv;
> > > +
> > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > > +               req->done = true;
> > > +               wake_up(&req->host_acked);
> > > +
> > > +               if (!list_empty(&vpmem->req_list)) {
> > > +                       req_buf = list_first_entry(&vpmem->req_list,
> > > +                                       struct virtio_pmem_request, list);
> > > +                       list_del(&vpmem->req_list);
> > > +                       req_buf->wq_buf_avail = true;
> > > +                       wake_up(&req_buf->wq_buf);
> > > +               }
> > > +       }
> > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +}
> > > +EXPORT_SYMBOL_GPL(host_ack);
> > > +
> > > + /* The request submission function */
> > > +int virtio_pmem_flush(struct nd_region *nd_region)
> > > +{
> > > +       int err;
> > > +       unsigned long flags;
> > > +       struct scatterlist *sgs[2], sg, ret;
> > > +       struct virtio_device *vdev = nd_region->provider_data;
> > > +       struct virtio_pmem *vpmem = vdev->priv;
> > > +       struct virtio_pmem_request *req;
> > > +
> > > +       might_sleep();
> > > +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > +       if (!req)
> > > +               return -ENOMEM;
> > > +
> > > +       req->done = req->wq_buf_avail = false;
> > > +       strcpy(req->name, "FLUSH");
> > > +       init_waitqueue_head(&req->host_acked);
> > > +       init_waitqueue_head(&req->wq_buf);
> > > +       sg_init_one(&sg, req->name, strlen(req->name));
> > > +       sgs[0] = &sg;
> > > +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > > +       sgs[1] = &ret;
> > > +
> > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > > +       if (err) {
> > > +               dev_err(&vdev->dev, "failed to send command to virtio pmem
> > > device\n");
> > > +
> > > +               list_add_tail(&vpmem->req_list, &req->list);
> > > +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +
> > > +               /* When host has read buffer, this completes via host_ack
> > > */
> > > +               wait_event(req->wq_buf, req->wq_buf_avail);
> > > +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +       }
> > > +       err = virtqueue_kick(vpmem->req_vq);
> > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +
> > > +       if (!err) {
> > > +               err = -EIO;
> > > +               goto ret;
> > > +       }
> > > +       /* When host has read buffer, this completes via host_ack */
> > > +       wait_event(req->host_acked, req->done);
> > > +       err = req->ret;
> > > +ret:
> > > +       kfree(req);
> > > +       return err;
> > > +};
> > > +
> > > + /* The asynchronous flush callback function */
> > > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > > +{
> > > +       int rc = 0;
> > > +
> > > +       /* Create child bio for asynchronous flush and chain with
> > > +        * parent bio. Otherwise directly call nd_region flush.
> > > +        */
> > > +       if (bio && bio->bi_iter.bi_sector != -1) {
> > > +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > > +
> > > +               if (!child)
> > > +                       return -ENOMEM;
> > > +               bio_copy_dev(child, bio);
> > > +               child->bi_opf = REQ_PREFLUSH;
> > > +               child->bi_iter.bi_sector = -1;
> > > +               bio_chain(child, bio);
> > > +               submit_bio(child);
> > > +       } else {
> > > +               if (virtio_pmem_flush(nd_region))
> > > +                       rc = -EIO;
> > > +       }
> > > +
> > > +       return rc;
> > > +};
> > > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > > +MODULE_LICENSE("GPL");
> > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > > index 35897649c24f..9f634a2ed638 100644
> > > --- a/drivers/virtio/Kconfig
> > > +++ b/drivers/virtio/Kconfig
> > > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> > >
> > >           If unsure, say Y.
> > >
> > > +config VIRTIO_PMEM
> > > +       tristate "Support for virtio pmem driver"
> > > +       depends on VIRTIO
> > > +       depends on LIBNVDIMM
> > > +       help
> > > +       This driver provides support for virtio based flushing interface
> > > +       for persistent memory range.
> > > +
> > > +       If unsure, say M.
> > > +
> > >  config VIRTIO_BALLOON
> > >         tristate "Virtio balloon driver"
> > >         depends on VIRTIO
> > > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > > index 3a2b5c5dcf46..143ce91eabe9 100644
> > > --- a/drivers/virtio/Makefile
> > > +++ b/drivers/virtio/Makefile
> > > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> > >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> > >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> > >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > > new file mode 100644
> > > index 000000000000..309788628e41
> > > --- /dev/null
> > > +++ b/drivers/virtio/pmem.c
> >
> > It's not clear to me why this driver is located in drivers/virtio/
>
> Like other VIRTIO drivers, I placed it initially in drivers/virtio directory.
>
> >
> > > @@ -0,0 +1,118 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * virtio_pmem.c: Virtio pmem Driver
> > > + *
> > > + * Discovers persistent memory range information
> > > + * from host and registers the virtual pmem device
> > > + * with libnvdimm core.
> > > + */
> > > +#include <linux/virtio_pmem.h>
> > > +#include <../../drivers/nvdimm/nd.h>
> >
> > ...especially because it seems to require nvdimm internals.
> >
> > However I don't see why that header is included.
>
> Removed.
>
> >
> > In any event lets move this to drivers/nvdimm/virtio.c to live
> > alongside the other generic bus provider drivers/nvdimm/e820.c.
>
> o.k. Makes sense.
>
> >
> > > +
> > > +static struct virtio_device_id id_table[] = {
> > > +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > > +       { 0 },
> > > +};
> > > +
> > > + /* Initialize virt queue */
> > > +static int init_vq(struct virtio_pmem *vpmem)
> > > +{
> > > +       /* single vq */
> > > +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > > +                               host_ack, "flush_queue");
> > > +       if (IS_ERR(vpmem->req_vq))
> > > +               return PTR_ERR(vpmem->req_vq);
> > > +
> > > +       spin_lock_init(&vpmem->pmem_lock);
> > > +       INIT_LIST_HEAD(&vpmem->req_list);
> > > +
> > > +       return 0;
> > > +};
> > > +
> > > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > > +{
> > > +       int err = 0;
> > > +       struct resource res;
> > > +       struct virtio_pmem *vpmem;
> > > +       struct nd_region_desc ndr_desc = {};
> > > +       int nid = dev_to_node(&vdev->dev);
> > > +       struct nd_region *nd_region;
> > > +
> > > +       if (!vdev->config->get) {
> > > +               dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > > +                       __func__);
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > > +       if (!vpmem) {
> > > +               err = -ENOMEM;
> > > +               goto out_err;
> > > +       }
> > > +
> > > +       vpmem->vdev = vdev;
> > > +       vdev->priv = vpmem;
> > > +       err = init_vq(vpmem);
> > > +       if (err)
> > > +               goto out_err;
> > > +
> > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > +                       start, &vpmem->start);
> > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > +                       size, &vpmem->size);
> > > +
> > > +       res.start = vpmem->start;
> > > +       res.end   = vpmem->start + vpmem->size-1;
> > > +       vpmem->nd_desc.provider_name = "virtio-pmem";
> > > +       vpmem->nd_desc.module = THIS_MODULE;
> > > +
> > > +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > > +                                               &vpmem->nd_desc);
> > > +       if (!vpmem->nvdimm_bus)
> > > +               goto out_vq;
> > > +
> > > +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > > +
> > > +       ndr_desc.res = &res;
> > > +       ndr_desc.numa_node = nid;
> > > +       ndr_desc.flush = async_pmem_flush;
> > > +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > > +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > > +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus,
> > > &ndr_desc);
> > > +
> > > +       if (!nd_region)
> > > +               goto out_nd;
> > > +       nd_region->provider_data =  dev_to_virtio
> > > +                                       (nd_region->dev.parent->parent);
> > > +       return 0;
> > > +out_nd:
> > > +       err = -ENXIO;
> > > +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > > +out_vq:
> > > +       vdev->config->del_vqs(vdev);
> > > +out_err:
> > > +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > > +       return err;
> > > +}
> > > +
> > > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > > +{
> > > +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > > +
> > > +       nvdimm_bus_unregister(nvdimm_bus);
> > > +       vdev->config->del_vqs(vdev);
> > > +       vdev->config->reset(vdev);
> > > +}
> > > +
> > > +static struct virtio_driver virtio_pmem_driver = {
> > > +       .driver.name            = KBUILD_MODNAME,
> > > +       .driver.owner           = THIS_MODULE,
> > > +       .id_table               = id_table,
> > > +       .probe                  = virtio_pmem_probe,
> > > +       .remove                 = virtio_pmem_remove,
> > > +};
> > > +
> > > +module_virtio_driver(virtio_pmem_driver);
> > > +MODULE_DEVICE_TABLE(virtio, id_table);
> > > +MODULE_DESCRIPTION("Virtio pmem driver");
> > > +MODULE_LICENSE("GPL");
> > > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > > new file mode 100644
> > > index 000000000000..ab1da877575d
> > > --- /dev/null
> > > +++ b/include/linux/virtio_pmem.h
> >
> > Why is this a global header?
>
> This is where other virtio driver headers are also placed.
> I think this is to access uapi config file in :
>
> ./include/uapi/linux/virtio_pmem.h
>
> Is it okay if we keep 'virtio_pmem.h' in global header?

No, I don't think so. While virtio_console.h and virtio_net.h make
sense as global headers because they are consumed from multiple
drivers, there is no need for virtio_caif.h, for example, to be a
global header. I see no practical reason that the private details of
virtio_pmem.h need to be made available outside of the virtio_pmem.c
consumer.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-10 23:33           ` Dan Williams
  0 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-10 23:33 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang, david,
	Qemu Developers, virtualization, Andreas Dilger, Ross Zwisler,
	Andrea Arcangeli, Dave Jiang, linux-nvdimm, Vishal L Verma,
	David Hildenbrand, Matthew Wilcox, Christoph Hellwig, Linux ACPI,
	jmoyer, linux-ext4, Len Brown, kilobyte, Rik van Riel,
	yuval shaia, Stefan Hajnoczi, Igor Mammedov, lcapitulino,
	Kevin Wolf, Nitesh Narayan Lal, Theodore Ts'o,
	Xiao Guangrong, cohuck, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-xfs, linux-fsdevel,
	Paolo Bonzini, Darrick J. Wong

On Wed, May 8, 2019 at 4:19 AM Pankaj Gupta <pagupta@redhat.com> wrote:
>
>
> Hi Dan,
>
> Thank you for the review. Please see my reply inline.
>
> >
> > Hi Pankaj,
> >
> > Some minor file placement comments below.
>
> Sure.
>
> >
> > On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> > >
> > > This patch adds virtio-pmem driver for KVM guest.
> > >
> > > Guest reads the persistent memory range information from
> > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > creates a nd_region object with the persistent memory
> > > range information so that existing 'nvdimm/pmem' driver
> > > can reserve this into system memory map. This way
> > > 'virtio-pmem' driver uses existing functionality of pmem
> > > driver to register persistent memory compatible for DAX
> > > capable filesystems.
> > >
> > > This also provides function to perform guest flush over
> > > VIRTIO from 'pmem' driver when userspace performs flush
> > > on DAX memory range.
> > >
> > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > > ---
> > >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> > >  drivers/virtio/Kconfig           |  10 +++
> > >  drivers/virtio/Makefile          |   1 +
> > >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> > >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> > >  include/uapi/linux/virtio_ids.h  |   1 +
> > >  include/uapi/linux/virtio_pmem.h |  10 +++
> > >  7 files changed, 314 insertions(+)
> > >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> > >  create mode 100644 drivers/virtio/pmem.c
> > >  create mode 100644 include/linux/virtio_pmem.h
> > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > >
> > > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > > new file mode 100644
> > > index 000000000000..66b582f751a3
> > > --- /dev/null
> > > +++ b/drivers/nvdimm/virtio_pmem.c
> > > @@ -0,0 +1,114 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * virtio_pmem.c: Virtio pmem Driver
> > > + *
> > > + * Discovers persistent memory range information
> > > + * from host and provides a virtio based flushing
> > > + * interface.
> > > + */
> > > +#include <linux/virtio_pmem.h>
> > > +#include "nd.h"
> > > +
> > > + /* The interrupt handler */
> > > +void host_ack(struct virtqueue *vq)
> > > +{
> > > +       unsigned int len;
> > > +       unsigned long flags;
> > > +       struct virtio_pmem_request *req, *req_buf;
> > > +       struct virtio_pmem *vpmem = vq->vdev->priv;
> > > +
> > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > > +               req->done = true;
> > > +               wake_up(&req->host_acked);
> > > +
> > > +               if (!list_empty(&vpmem->req_list)) {
> > > +                       req_buf = list_first_entry(&vpmem->req_list,
> > > +                                       struct virtio_pmem_request, list);
> > > +                       list_del(&vpmem->req_list);
> > > +                       req_buf->wq_buf_avail = true;
> > > +                       wake_up(&req_buf->wq_buf);
> > > +               }
> > > +       }
> > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +}
> > > +EXPORT_SYMBOL_GPL(host_ack);
> > > +
> > > + /* The request submission function */
> > > +int virtio_pmem_flush(struct nd_region *nd_region)
> > > +{
> > > +       int err;
> > > +       unsigned long flags;
> > > +       struct scatterlist *sgs[2], sg, ret;
> > > +       struct virtio_device *vdev = nd_region->provider_data;
> > > +       struct virtio_pmem *vpmem = vdev->priv;
> > > +       struct virtio_pmem_request *req;
> > > +
> > > +       might_sleep();
> > > +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > +       if (!req)
> > > +               return -ENOMEM;
> > > +
> > > +       req->done = req->wq_buf_avail = false;
> > > +       strcpy(req->name, "FLUSH");
> > > +       init_waitqueue_head(&req->host_acked);
> > > +       init_waitqueue_head(&req->wq_buf);
> > > +       sg_init_one(&sg, req->name, strlen(req->name));
> > > +       sgs[0] = &sg;
> > > +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > > +       sgs[1] = &ret;
> > > +
> > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > > +       if (err) {
> > > +               dev_err(&vdev->dev, "failed to send command to virtio pmem
> > > device\n");
> > > +
> > > +               list_add_tail(&vpmem->req_list, &req->list);
> > > +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +
> > > +               /* When host has read buffer, this completes via host_ack
> > > */
> > > +               wait_event(req->wq_buf, req->wq_buf_avail);
> > > +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +       }
> > > +       err = virtqueue_kick(vpmem->req_vq);
> > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +
> > > +       if (!err) {
> > > +               err = -EIO;
> > > +               goto ret;
> > > +       }
> > > +       /* When host has read buffer, this completes via host_ack */
> > > +       wait_event(req->host_acked, req->done);
> > > +       err = req->ret;
> > > +ret:
> > > +       kfree(req);
> > > +       return err;
> > > +};
> > > +
> > > + /* The asynchronous flush callback function */
> > > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > > +{
> > > +       int rc = 0;
> > > +
> > > +       /* Create child bio for asynchronous flush and chain with
> > > +        * parent bio. Otherwise directly call nd_region flush.
> > > +        */
> > > +       if (bio && bio->bi_iter.bi_sector != -1) {
> > > +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > > +
> > > +               if (!child)
> > > +                       return -ENOMEM;
> > > +               bio_copy_dev(child, bio);
> > > +               child->bi_opf = REQ_PREFLUSH;
> > > +               child->bi_iter.bi_sector = -1;
> > > +               bio_chain(child, bio);
> > > +               submit_bio(child);
> > > +       } else {
> > > +               if (virtio_pmem_flush(nd_region))
> > > +                       rc = -EIO;
> > > +       }
> > > +
> > > +       return rc;
> > > +};
> > > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > > +MODULE_LICENSE("GPL");
> > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > > index 35897649c24f..9f634a2ed638 100644
> > > --- a/drivers/virtio/Kconfig
> > > +++ b/drivers/virtio/Kconfig
> > > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> > >
> > >           If unsure, say Y.
> > >
> > > +config VIRTIO_PMEM
> > > +       tristate "Support for virtio pmem driver"
> > > +       depends on VIRTIO
> > > +       depends on LIBNVDIMM
> > > +       help
> > > +       This driver provides support for virtio based flushing interface
> > > +       for persistent memory range.
> > > +
> > > +       If unsure, say M.
> > > +
> > >  config VIRTIO_BALLOON
> > >         tristate "Virtio balloon driver"
> > >         depends on VIRTIO
> > > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > > index 3a2b5c5dcf46..143ce91eabe9 100644
> > > --- a/drivers/virtio/Makefile
> > > +++ b/drivers/virtio/Makefile
> > > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> > >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> > >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> > >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > > new file mode 100644
> > > index 000000000000..309788628e41
> > > --- /dev/null
> > > +++ b/drivers/virtio/pmem.c
> >
> > It's not clear to me why this driver is located in drivers/virtio/
>
> Like other VIRTIO drivers, I placed it initially in drivers/virtio directory.
>
> >
> > > @@ -0,0 +1,118 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * virtio_pmem.c: Virtio pmem Driver
> > > + *
> > > + * Discovers persistent memory range information
> > > + * from host and registers the virtual pmem device
> > > + * with libnvdimm core.
> > > + */
> > > +#include <linux/virtio_pmem.h>
> > > +#include <../../drivers/nvdimm/nd.h>
> >
> > ...especially because it seems to require nvdimm internals.
> >
> > However I don't see why that header is included.
>
> Removed.
>
> >
> > In any event lets move this to drivers/nvdimm/virtio.c to live
> > alongside the other generic bus provider drivers/nvdimm/e820.c.
>
> o.k. Makes sense.
>
> >
> > > +
> > > +static struct virtio_device_id id_table[] = {
> > > +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > > +       { 0 },
> > > +};
> > > +
> > > + /* Initialize virt queue */
> > > +static int init_vq(struct virtio_pmem *vpmem)
> > > +{
> > > +       /* single vq */
> > > +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > > +                               host_ack, "flush_queue");
> > > +       if (IS_ERR(vpmem->req_vq))
> > > +               return PTR_ERR(vpmem->req_vq);
> > > +
> > > +       spin_lock_init(&vpmem->pmem_lock);
> > > +       INIT_LIST_HEAD(&vpmem->req_list);
> > > +
> > > +       return 0;
> > > +};
> > > +
> > > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > > +{
> > > +       int err = 0;
> > > +       struct resource res;
> > > +       struct virtio_pmem *vpmem;
> > > +       struct nd_region_desc ndr_desc = {};
> > > +       int nid = dev_to_node(&vdev->dev);
> > > +       struct nd_region *nd_region;
> > > +
> > > +       if (!vdev->config->get) {
> > > +               dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > > +                       __func__);
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > > +       if (!vpmem) {
> > > +               err = -ENOMEM;
> > > +               goto out_err;
> > > +       }
> > > +
> > > +       vpmem->vdev = vdev;
> > > +       vdev->priv = vpmem;
> > > +       err = init_vq(vpmem);
> > > +       if (err)
> > > +               goto out_err;
> > > +
> > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > +                       start, &vpmem->start);
> > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > +                       size, &vpmem->size);
> > > +
> > > +       res.start = vpmem->start;
> > > +       res.end   = vpmem->start + vpmem->size-1;
> > > +       vpmem->nd_desc.provider_name = "virtio-pmem";
> > > +       vpmem->nd_desc.module = THIS_MODULE;
> > > +
> > > +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > > +                                               &vpmem->nd_desc);
> > > +       if (!vpmem->nvdimm_bus)
> > > +               goto out_vq;
> > > +
> > > +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > > +
> > > +       ndr_desc.res = &res;
> > > +       ndr_desc.numa_node = nid;
> > > +       ndr_desc.flush = async_pmem_flush;
> > > +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > > +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > > +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus,
> > > &ndr_desc);
> > > +
> > > +       if (!nd_region)
> > > +               goto out_nd;
> > > +       nd_region->provider_data =  dev_to_virtio
> > > +                                       (nd_region->dev.parent->parent);
> > > +       return 0;
> > > +out_nd:
> > > +       err = -ENXIO;
> > > +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > > +out_vq:
> > > +       vdev->config->del_vqs(vdev);
> > > +out_err:
> > > +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > > +       return err;
> > > +}
> > > +
> > > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > > +{
> > > +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > > +
> > > +       nvdimm_bus_unregister(nvdimm_bus);
> > > +       vdev->config->del_vqs(vdev);
> > > +       vdev->config->reset(vdev);
> > > +}
> > > +
> > > +static struct virtio_driver virtio_pmem_driver = {
> > > +       .driver.name            = KBUILD_MODNAME,
> > > +       .driver.owner           = THIS_MODULE,
> > > +       .id_table               = id_table,
> > > +       .probe                  = virtio_pmem_probe,
> > > +       .remove                 = virtio_pmem_remove,
> > > +};
> > > +
> > > +module_virtio_driver(virtio_pmem_driver);
> > > +MODULE_DEVICE_TABLE(virtio, id_table);
> > > +MODULE_DESCRIPTION("Virtio pmem driver");
> > > +MODULE_LICENSE("GPL");
> > > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > > new file mode 100644
> > > index 000000000000..ab1da877575d
> > > --- /dev/null
> > > +++ b/include/linux/virtio_pmem.h
> >
> > Why is this a global header?
>
> This is where other virtio driver headers are also placed.
> I think this is to access uapi config file in :
>
> ./include/uapi/linux/virtio_pmem.h
>
> Is it okay if we keep 'virtio_pmem.h' in global header?

No, I don't think so. While virtio_console.h and virtio_net.h make
sense as global headers because they are consumed from multiple
drivers, there is no need for virtio_caif.h, for example, to be a
global header. I see no practical reason that the private details of
virtio_pmem.h need to be made available outside of the virtio_pmem.c
consumer.


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-10 23:33           ` Dan Williams
  0 siblings, 0 replies; 107+ messages in thread
From: Dan Williams @ 2019-05-10 23:33 UTC (permalink / raw)
  To: Pankaj Gupta
  Cc: Jan Kara, KVM list, Michael S. Tsirkin, david, Qemu Developers,
	virtualization, Andreas Dilger, Ross Zwisler, Andrea Arcangeli,
	Dave Jiang, linux-nvdimm, Vishal L Verma, Matthew Wilcox,
	Christoph Hellwig, Linux ACPI, jmoyer, linux-ext4, Len Brown,
	kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Igor Mammedov, lcapitulino, Nitesh Narayan Lal

On Wed, May 8, 2019 at 4:19 AM Pankaj Gupta <pagupta@redhat.com> wrote:
>
>
> Hi Dan,
>
> Thank you for the review. Please see my reply inline.
>
> >
> > Hi Pankaj,
> >
> > Some minor file placement comments below.
>
> Sure.
>
> >
> > On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> > >
> > > This patch adds virtio-pmem driver for KVM guest.
> > >
> > > Guest reads the persistent memory range information from
> > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > creates a nd_region object with the persistent memory
> > > range information so that existing 'nvdimm/pmem' driver
> > > can reserve this into system memory map. This way
> > > 'virtio-pmem' driver uses existing functionality of pmem
> > > driver to register persistent memory compatible for DAX
> > > capable filesystems.
> > >
> > > This also provides function to perform guest flush over
> > > VIRTIO from 'pmem' driver when userspace performs flush
> > > on DAX memory range.
> > >
> > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > > ---
> > >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> > >  drivers/virtio/Kconfig           |  10 +++
> > >  drivers/virtio/Makefile          |   1 +
> > >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> > >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> > >  include/uapi/linux/virtio_ids.h  |   1 +
> > >  include/uapi/linux/virtio_pmem.h |  10 +++
> > >  7 files changed, 314 insertions(+)
> > >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> > >  create mode 100644 drivers/virtio/pmem.c
> > >  create mode 100644 include/linux/virtio_pmem.h
> > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > >
> > > diff --git a/drivers/nvdimm/virtio_pmem.c b/drivers/nvdimm/virtio_pmem.c
> > > new file mode 100644
> > > index 000000000000..66b582f751a3
> > > --- /dev/null
> > > +++ b/drivers/nvdimm/virtio_pmem.c
> > > @@ -0,0 +1,114 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * virtio_pmem.c: Virtio pmem Driver
> > > + *
> > > + * Discovers persistent memory range information
> > > + * from host and provides a virtio based flushing
> > > + * interface.
> > > + */
> > > +#include <linux/virtio_pmem.h>
> > > +#include "nd.h"
> > > +
> > > + /* The interrupt handler */
> > > +void host_ack(struct virtqueue *vq)
> > > +{
> > > +       unsigned int len;
> > > +       unsigned long flags;
> > > +       struct virtio_pmem_request *req, *req_buf;
> > > +       struct virtio_pmem *vpmem = vq->vdev->priv;
> > > +
> > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > > +               req->done = true;
> > > +               wake_up(&req->host_acked);
> > > +
> > > +               if (!list_empty(&vpmem->req_list)) {
> > > +                       req_buf = list_first_entry(&vpmem->req_list,
> > > +                                       struct virtio_pmem_request, list);
> > > +                       list_del(&vpmem->req_list);
> > > +                       req_buf->wq_buf_avail = true;
> > > +                       wake_up(&req_buf->wq_buf);
> > > +               }
> > > +       }
> > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +}
> > > +EXPORT_SYMBOL_GPL(host_ack);
> > > +
> > > + /* The request submission function */
> > > +int virtio_pmem_flush(struct nd_region *nd_region)
> > > +{
> > > +       int err;
> > > +       unsigned long flags;
> > > +       struct scatterlist *sgs[2], sg, ret;
> > > +       struct virtio_device *vdev = nd_region->provider_data;
> > > +       struct virtio_pmem *vpmem = vdev->priv;
> > > +       struct virtio_pmem_request *req;
> > > +
> > > +       might_sleep();
> > > +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > +       if (!req)
> > > +               return -ENOMEM;
> > > +
> > > +       req->done = req->wq_buf_avail = false;
> > > +       strcpy(req->name, "FLUSH");
> > > +       init_waitqueue_head(&req->host_acked);
> > > +       init_waitqueue_head(&req->wq_buf);
> > > +       sg_init_one(&sg, req->name, strlen(req->name));
> > > +       sgs[0] = &sg;
> > > +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > > +       sgs[1] = &ret;
> > > +
> > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req, GFP_ATOMIC);
> > > +       if (err) {
> > > +               dev_err(&vdev->dev, "failed to send command to virtio pmem
> > > device\n");
> > > +
> > > +               list_add_tail(&vpmem->req_list, &req->list);
> > > +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +
> > > +               /* When host has read buffer, this completes via host_ack
> > > */
> > > +               wait_event(req->wq_buf, req->wq_buf_avail);
> > > +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > +       }
> > > +       err = virtqueue_kick(vpmem->req_vq);
> > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > +
> > > +       if (!err) {
> > > +               err = -EIO;
> > > +               goto ret;
> > > +       }
> > > +       /* When host has read buffer, this completes via host_ack */
> > > +       wait_event(req->host_acked, req->done);
> > > +       err = req->ret;
> > > +ret:
> > > +       kfree(req);
> > > +       return err;
> > > +};
> > > +
> > > + /* The asynchronous flush callback function */
> > > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > > +{
> > > +       int rc = 0;
> > > +
> > > +       /* Create child bio for asynchronous flush and chain with
> > > +        * parent bio. Otherwise directly call nd_region flush.
> > > +        */
> > > +       if (bio && bio->bi_iter.bi_sector != -1) {
> > > +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > > +
> > > +               if (!child)
> > > +                       return -ENOMEM;
> > > +               bio_copy_dev(child, bio);
> > > +               child->bi_opf = REQ_PREFLUSH;
> > > +               child->bi_iter.bi_sector = -1;
> > > +               bio_chain(child, bio);
> > > +               submit_bio(child);
> > > +       } else {
> > > +               if (virtio_pmem_flush(nd_region))
> > > +                       rc = -EIO;
> > > +       }
> > > +
> > > +       return rc;
> > > +};
> > > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > > +MODULE_LICENSE("GPL");
> > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > > index 35897649c24f..9f634a2ed638 100644
> > > --- a/drivers/virtio/Kconfig
> > > +++ b/drivers/virtio/Kconfig
> > > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> > >
> > >           If unsure, say Y.
> > >
> > > +config VIRTIO_PMEM
> > > +       tristate "Support for virtio pmem driver"
> > > +       depends on VIRTIO
> > > +       depends on LIBNVDIMM
> > > +       help
> > > +       This driver provides support for virtio based flushing interface
> > > +       for persistent memory range.
> > > +
> > > +       If unsure, say M.
> > > +
> > >  config VIRTIO_BALLOON
> > >         tristate "Virtio balloon driver"
> > >         depends on VIRTIO
> > > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > > index 3a2b5c5dcf46..143ce91eabe9 100644
> > > --- a/drivers/virtio/Makefile
> > > +++ b/drivers/virtio/Makefile
> > > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> > >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> > >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> > >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > > new file mode 100644
> > > index 000000000000..309788628e41
> > > --- /dev/null
> > > +++ b/drivers/virtio/pmem.c
> >
> > It's not clear to me why this driver is located in drivers/virtio/
>
> Like other VIRTIO drivers, I placed it initially in drivers/virtio directory.
>
> >
> > > @@ -0,0 +1,118 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * virtio_pmem.c: Virtio pmem Driver
> > > + *
> > > + * Discovers persistent memory range information
> > > + * from host and registers the virtual pmem device
> > > + * with libnvdimm core.
> > > + */
> > > +#include <linux/virtio_pmem.h>
> > > +#include <../../drivers/nvdimm/nd.h>
> >
> > ...especially because it seems to require nvdimm internals.
> >
> > However I don't see why that header is included.
>
> Removed.
>
> >
> > In any event lets move this to drivers/nvdimm/virtio.c to live
> > alongside the other generic bus provider drivers/nvdimm/e820.c.
>
> o.k. Makes sense.
>
> >
> > > +
> > > +static struct virtio_device_id id_table[] = {
> > > +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > > +       { 0 },
> > > +};
> > > +
> > > + /* Initialize virt queue */
> > > +static int init_vq(struct virtio_pmem *vpmem)
> > > +{
> > > +       /* single vq */
> > > +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > > +                               host_ack, "flush_queue");
> > > +       if (IS_ERR(vpmem->req_vq))
> > > +               return PTR_ERR(vpmem->req_vq);
> > > +
> > > +       spin_lock_init(&vpmem->pmem_lock);
> > > +       INIT_LIST_HEAD(&vpmem->req_list);
> > > +
> > > +       return 0;
> > > +};
> > > +
> > > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > > +{
> > > +       int err = 0;
> > > +       struct resource res;
> > > +       struct virtio_pmem *vpmem;
> > > +       struct nd_region_desc ndr_desc = {};
> > > +       int nid = dev_to_node(&vdev->dev);
> > > +       struct nd_region *nd_region;
> > > +
> > > +       if (!vdev->config->get) {
> > > +               dev_err(&vdev->dev, "%s failure: config access disabled\n",
> > > +                       __func__);
> > > +               return -EINVAL;
> > > +       }
> > > +
> > > +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > > +       if (!vpmem) {
> > > +               err = -ENOMEM;
> > > +               goto out_err;
> > > +       }
> > > +
> > > +       vpmem->vdev = vdev;
> > > +       vdev->priv = vpmem;
> > > +       err = init_vq(vpmem);
> > > +       if (err)
> > > +               goto out_err;
> > > +
> > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > +                       start, &vpmem->start);
> > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > +                       size, &vpmem->size);
> > > +
> > > +       res.start = vpmem->start;
> > > +       res.end   = vpmem->start + vpmem->size-1;
> > > +       vpmem->nd_desc.provider_name = "virtio-pmem";
> > > +       vpmem->nd_desc.module = THIS_MODULE;
> > > +
> > > +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > > +                                               &vpmem->nd_desc);
> > > +       if (!vpmem->nvdimm_bus)
> > > +               goto out_vq;
> > > +
> > > +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > > +
> > > +       ndr_desc.res = &res;
> > > +       ndr_desc.numa_node = nid;
> > > +       ndr_desc.flush = async_pmem_flush;
> > > +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > > +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > > +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus,
> > > &ndr_desc);
> > > +
> > > +       if (!nd_region)
> > > +               goto out_nd;
> > > +       nd_region->provider_data =  dev_to_virtio
> > > +                                       (nd_region->dev.parent->parent);
> > > +       return 0;
> > > +out_nd:
> > > +       err = -ENXIO;
> > > +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > > +out_vq:
> > > +       vdev->config->del_vqs(vdev);
> > > +out_err:
> > > +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > > +       return err;
> > > +}
> > > +
> > > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > > +{
> > > +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > > +
> > > +       nvdimm_bus_unregister(nvdimm_bus);
> > > +       vdev->config->del_vqs(vdev);
> > > +       vdev->config->reset(vdev);
> > > +}
> > > +
> > > +static struct virtio_driver virtio_pmem_driver = {
> > > +       .driver.name            = KBUILD_MODNAME,
> > > +       .driver.owner           = THIS_MODULE,
> > > +       .id_table               = id_table,
> > > +       .probe                  = virtio_pmem_probe,
> > > +       .remove                 = virtio_pmem_remove,
> > > +};
> > > +
> > > +module_virtio_driver(virtio_pmem_driver);
> > > +MODULE_DEVICE_TABLE(virtio, id_table);
> > > +MODULE_DESCRIPTION("Virtio pmem driver");
> > > +MODULE_LICENSE("GPL");
> > > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > > new file mode 100644
> > > index 000000000000..ab1da877575d
> > > --- /dev/null
> > > +++ b/include/linux/virtio_pmem.h
> >
> > Why is this a global header?
>
> This is where other virtio driver headers are also placed.
> I think this is to access uapi config file in :
>
> ./include/uapi/linux/virtio_pmem.h
>
> Is it okay if we keep 'virtio_pmem.h' in global header?

No, I don't think so. While virtio_console.h and virtio_net.h make
sense as global headers because they are consumed from multiple
drivers, there is no need for virtio_caif.h, for example, to be a
global header. I see no practical reason that the private details of
virtio_pmem.h need to be made available outside of the virtio_pmem.c
consumer.

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-05-10 23:33           ` Dan Williams
  (?)
@ 2019-05-11  1:26             ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-11  1:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang, david,
	Qemu Developers, virtualization, Andreas Dilger, Ross Zwisler,
	Andrea Arcangeli, Dave Jiang, linux-nvdimm, Vishal L Verma,
	David Hildenbrand, Matthew Wilcox, Christoph Hellwig, Linux ACPI,
	jmoyer, linux-ext4, Len Brown, kilobyte


> >
> > Hi Dan,
> >
> > Thank you for the review. Please see my reply inline.
> >
> > >
> > > Hi Pankaj,
> > >
> > > Some minor file placement comments below.
> >
> > Sure.
> >
> > >
> > > On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> > > >
> > > > This patch adds virtio-pmem driver for KVM guest.
> > > >
> > > > Guest reads the persistent memory range information from
> > > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > > creates a nd_region object with the persistent memory
> > > > range information so that existing 'nvdimm/pmem' driver
> > > > can reserve this into system memory map. This way
> > > > 'virtio-pmem' driver uses existing functionality of pmem
> > > > driver to register persistent memory compatible for DAX
> > > > capable filesystems.
> > > >
> > > > This also provides function to perform guest flush over
> > > > VIRTIO from 'pmem' driver when userspace performs flush
> > > > on DAX memory range.
> > > >
> > > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > > > ---
> > > >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> > > >  drivers/virtio/Kconfig           |  10 +++
> > > >  drivers/virtio/Makefile          |   1 +
> > > >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> > > >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> > > >  include/uapi/linux/virtio_ids.h  |   1 +
> > > >  include/uapi/linux/virtio_pmem.h |  10 +++
> > > >  7 files changed, 314 insertions(+)
> > > >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> > > >  create mode 100644 drivers/virtio/pmem.c
> > > >  create mode 100644 include/linux/virtio_pmem.h
> > > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > > >
> > > > diff --git a/drivers/nvdimm/virtio_pmem.c
> > > > b/drivers/nvdimm/virtio_pmem.c
> > > > new file mode 100644
> > > > index 000000000000..66b582f751a3
> > > > --- /dev/null
> > > > +++ b/drivers/nvdimm/virtio_pmem.c
> > > > @@ -0,0 +1,114 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * virtio_pmem.c: Virtio pmem Driver
> > > > + *
> > > > + * Discovers persistent memory range information
> > > > + * from host and provides a virtio based flushing
> > > > + * interface.
> > > > + */
> > > > +#include <linux/virtio_pmem.h>
> > > > +#include "nd.h"
> > > > +
> > > > + /* The interrupt handler */
> > > > +void host_ack(struct virtqueue *vq)
> > > > +{
> > > > +       unsigned int len;
> > > > +       unsigned long flags;
> > > > +       struct virtio_pmem_request *req, *req_buf;
> > > > +       struct virtio_pmem *vpmem = vq->vdev->priv;
> > > > +
> > > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > > +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > > > +               req->done = true;
> > > > +               wake_up(&req->host_acked);
> > > > +
> > > > +               if (!list_empty(&vpmem->req_list)) {
> > > > +                       req_buf = list_first_entry(&vpmem->req_list,
> > > > +                                       struct virtio_pmem_request,
> > > > list);
> > > > +                       list_del(&vpmem->req_list);
> > > > +                       req_buf->wq_buf_avail = true;
> > > > +                       wake_up(&req_buf->wq_buf);
> > > > +               }
> > > > +       }
> > > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(host_ack);
> > > > +
> > > > + /* The request submission function */
> > > > +int virtio_pmem_flush(struct nd_region *nd_region)
> > > > +{
> > > > +       int err;
> > > > +       unsigned long flags;
> > > > +       struct scatterlist *sgs[2], sg, ret;
> > > > +       struct virtio_device *vdev = nd_region->provider_data;
> > > > +       struct virtio_pmem *vpmem = vdev->priv;
> > > > +       struct virtio_pmem_request *req;
> > > > +
> > > > +       might_sleep();
> > > > +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > > +       if (!req)
> > > > +               return -ENOMEM;
> > > > +
> > > > +       req->done = req->wq_buf_avail = false;
> > > > +       strcpy(req->name, "FLUSH");
> > > > +       init_waitqueue_head(&req->host_acked);
> > > > +       init_waitqueue_head(&req->wq_buf);
> > > > +       sg_init_one(&sg, req->name, strlen(req->name));
> > > > +       sgs[0] = &sg;
> > > > +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > > > +       sgs[1] = &ret;
> > > > +
> > > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > > +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req,
> > > > GFP_ATOMIC);
> > > > +       if (err) {
> > > > +               dev_err(&vdev->dev, "failed to send command to virtio
> > > > pmem
> > > > device\n");
> > > > +
> > > > +               list_add_tail(&vpmem->req_list, &req->list);
> > > > +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > > +
> > > > +               /* When host has read buffer, this completes via
> > > > host_ack
> > > > */
> > > > +               wait_event(req->wq_buf, req->wq_buf_avail);
> > > > +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > > +       }
> > > > +       err = virtqueue_kick(vpmem->req_vq);
> > > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > > +
> > > > +       if (!err) {
> > > > +               err = -EIO;
> > > > +               goto ret;
> > > > +       }
> > > > +       /* When host has read buffer, this completes via host_ack */
> > > > +       wait_event(req->host_acked, req->done);
> > > > +       err = req->ret;
> > > > +ret:
> > > > +       kfree(req);
> > > > +       return err;
> > > > +};
> > > > +
> > > > + /* The asynchronous flush callback function */
> > > > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > > > +{
> > > > +       int rc = 0;
> > > > +
> > > > +       /* Create child bio for asynchronous flush and chain with
> > > > +        * parent bio. Otherwise directly call nd_region flush.
> > > > +        */
> > > > +       if (bio && bio->bi_iter.bi_sector != -1) {
> > > > +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > > > +
> > > > +               if (!child)
> > > > +                       return -ENOMEM;
> > > > +               bio_copy_dev(child, bio);
> > > > +               child->bi_opf = REQ_PREFLUSH;
> > > > +               child->bi_iter.bi_sector = -1;
> > > > +               bio_chain(child, bio);
> > > > +               submit_bio(child);
> > > > +       } else {
> > > > +               if (virtio_pmem_flush(nd_region))
> > > > +                       rc = -EIO;
> > > > +       }
> > > > +
> > > > +       return rc;
> > > > +};
> > > > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > > > +MODULE_LICENSE("GPL");
> > > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > > > index 35897649c24f..9f634a2ed638 100644
> > > > --- a/drivers/virtio/Kconfig
> > > > +++ b/drivers/virtio/Kconfig
> > > > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> > > >
> > > >           If unsure, say Y.
> > > >
> > > > +config VIRTIO_PMEM
> > > > +       tristate "Support for virtio pmem driver"
> > > > +       depends on VIRTIO
> > > > +       depends on LIBNVDIMM
> > > > +       help
> > > > +       This driver provides support for virtio based flushing
> > > > interface
> > > > +       for persistent memory range.
> > > > +
> > > > +       If unsure, say M.
> > > > +
> > > >  config VIRTIO_BALLOON
> > > >         tristate "Virtio balloon driver"
> > > >         depends on VIRTIO
> > > > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > > > index 3a2b5c5dcf46..143ce91eabe9 100644
> > > > --- a/drivers/virtio/Makefile
> > > > +++ b/drivers/virtio/Makefile
> > > > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> > > >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> > > >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> > > >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > > > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > > > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > > > new file mode 100644
> > > > index 000000000000..309788628e41
> > > > --- /dev/null
> > > > +++ b/drivers/virtio/pmem.c
> > >
> > > It's not clear to me why this driver is located in drivers/virtio/
> >
> > Like other VIRTIO drivers, I placed it initially in drivers/virtio
> > directory.
> >
> > >
> > > > @@ -0,0 +1,118 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * virtio_pmem.c: Virtio pmem Driver
> > > > + *
> > > > + * Discovers persistent memory range information
> > > > + * from host and registers the virtual pmem device
> > > > + * with libnvdimm core.
> > > > + */
> > > > +#include <linux/virtio_pmem.h>
> > > > +#include <../../drivers/nvdimm/nd.h>
> > >
> > > ...especially because it seems to require nvdimm internals.
> > >
> > > However I don't see why that header is included.
> >
> > Removed.
> >
> > >
> > > In any event lets move this to drivers/nvdimm/virtio.c to live
> > > alongside the other generic bus provider drivers/nvdimm/e820.c.
> >
> > o.k. Makes sense.
> >
> > >
> > > > +
> > > > +static struct virtio_device_id id_table[] = {
> > > > +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > > > +       { 0 },
> > > > +};
> > > > +
> > > > + /* Initialize virt queue */
> > > > +static int init_vq(struct virtio_pmem *vpmem)
> > > > +{
> > > > +       /* single vq */
> > > > +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > > > +                               host_ack, "flush_queue");
> > > > +       if (IS_ERR(vpmem->req_vq))
> > > > +               return PTR_ERR(vpmem->req_vq);
> > > > +
> > > > +       spin_lock_init(&vpmem->pmem_lock);
> > > > +       INIT_LIST_HEAD(&vpmem->req_list);
> > > > +
> > > > +       return 0;
> > > > +};
> > > > +
> > > > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > > > +{
> > > > +       int err = 0;
> > > > +       struct resource res;
> > > > +       struct virtio_pmem *vpmem;
> > > > +       struct nd_region_desc ndr_desc = {};
> > > > +       int nid = dev_to_node(&vdev->dev);
> > > > +       struct nd_region *nd_region;
> > > > +
> > > > +       if (!vdev->config->get) {
> > > > +               dev_err(&vdev->dev, "%s failure: config access
> > > > disabled\n",
> > > > +                       __func__);
> > > > +               return -EINVAL;
> > > > +       }
> > > > +
> > > > +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > > > +       if (!vpmem) {
> > > > +               err = -ENOMEM;
> > > > +               goto out_err;
> > > > +       }
> > > > +
> > > > +       vpmem->vdev = vdev;
> > > > +       vdev->priv = vpmem;
> > > > +       err = init_vq(vpmem);
> > > > +       if (err)
> > > > +               goto out_err;
> > > > +
> > > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > > +                       start, &vpmem->start);
> > > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > > +                       size, &vpmem->size);
> > > > +
> > > > +       res.start = vpmem->start;
> > > > +       res.end   = vpmem->start + vpmem->size-1;
> > > > +       vpmem->nd_desc.provider_name = "virtio-pmem";
> > > > +       vpmem->nd_desc.module = THIS_MODULE;
> > > > +
> > > > +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > > > +                                               &vpmem->nd_desc);
> > > > +       if (!vpmem->nvdimm_bus)
> > > > +               goto out_vq;
> > > > +
> > > > +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > > > +
> > > > +       ndr_desc.res = &res;
> > > > +       ndr_desc.numa_node = nid;
> > > > +       ndr_desc.flush = async_pmem_flush;
> > > > +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > > > +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > > > +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus,
> > > > &ndr_desc);
> > > > +
> > > > +       if (!nd_region)
> > > > +               goto out_nd;
> > > > +       nd_region->provider_data =  dev_to_virtio
> > > > +
> > > > (nd_region->dev.parent->parent);
> > > > +       return 0;
> > > > +out_nd:
> > > > +       err = -ENXIO;
> > > > +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > > > +out_vq:
> > > > +       vdev->config->del_vqs(vdev);
> > > > +out_err:
> > > > +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > > > +       return err;
> > > > +}
> > > > +
> > > > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > > > +{
> > > > +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > > > +
> > > > +       nvdimm_bus_unregister(nvdimm_bus);
> > > > +       vdev->config->del_vqs(vdev);
> > > > +       vdev->config->reset(vdev);
> > > > +}
> > > > +
> > > > +static struct virtio_driver virtio_pmem_driver = {
> > > > +       .driver.name            = KBUILD_MODNAME,
> > > > +       .driver.owner           = THIS_MODULE,
> > > > +       .id_table               = id_table,
> > > > +       .probe                  = virtio_pmem_probe,
> > > > +       .remove                 = virtio_pmem_remove,
> > > > +};
> > > > +
> > > > +module_virtio_driver(virtio_pmem_driver);
> > > > +MODULE_DEVICE_TABLE(virtio, id_table);
> > > > +MODULE_DESCRIPTION("Virtio pmem driver");
> > > > +MODULE_LICENSE("GPL");
> > > > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > > > new file mode 100644
> > > > index 000000000000..ab1da877575d
> > > > --- /dev/null
> > > > +++ b/include/linux/virtio_pmem.h
> > >
> > > Why is this a global header?
> >
> > This is where other virtio driver headers are also placed.
> > I think this is to access uapi config file in :
> >
> > ./include/uapi/linux/virtio_pmem.h
> >
> > Is it okay if we keep 'virtio_pmem.h' in global header?
> 
> No, I don't think so. While virtio_console.h and virtio_net.h make
> sense as global headers because they are consumed from multiple
> drivers, there is no need for virtio_caif.h, for example, to be a
> global header. I see no practical reason that the private details of
> virtio_pmem.h need to be made available outside of the virtio_pmem.c
> consumer.

o.k. Will move it.

Best regards,
Pankaj
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-11  1:26             ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-11  1:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang, david,
	Qemu Developers, virtualization, Andreas Dilger, Ross Zwisler,
	Andrea Arcangeli, Dave Jiang, linux-nvdimm, Vishal L Verma,
	David Hildenbrand, Matthew Wilcox, Christoph Hellwig, Linux ACPI,
	jmoyer, linux-ext4, Len Brown, kilobyte, Rik van Riel,
	yuval shaia, Stefan Hajnoczi, Igor Mammedov, lcapitulino,
	Kevin Wolf, Nitesh Narayan Lal, Theodore Ts'o,
	Xiao Guangrong, cohuck, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-xfs, linux-fsdevel,
	Paolo Bonzini, Darrick J. Wong


> >
> > Hi Dan,
> >
> > Thank you for the review. Please see my reply inline.
> >
> > >
> > > Hi Pankaj,
> > >
> > > Some minor file placement comments below.
> >
> > Sure.
> >
> > >
> > > On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> > > >
> > > > This patch adds virtio-pmem driver for KVM guest.
> > > >
> > > > Guest reads the persistent memory range information from
> > > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > > creates a nd_region object with the persistent memory
> > > > range information so that existing 'nvdimm/pmem' driver
> > > > can reserve this into system memory map. This way
> > > > 'virtio-pmem' driver uses existing functionality of pmem
> > > > driver to register persistent memory compatible for DAX
> > > > capable filesystems.
> > > >
> > > > This also provides function to perform guest flush over
> > > > VIRTIO from 'pmem' driver when userspace performs flush
> > > > on DAX memory range.
> > > >
> > > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > > > ---
> > > >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> > > >  drivers/virtio/Kconfig           |  10 +++
> > > >  drivers/virtio/Makefile          |   1 +
> > > >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> > > >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> > > >  include/uapi/linux/virtio_ids.h  |   1 +
> > > >  include/uapi/linux/virtio_pmem.h |  10 +++
> > > >  7 files changed, 314 insertions(+)
> > > >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> > > >  create mode 100644 drivers/virtio/pmem.c
> > > >  create mode 100644 include/linux/virtio_pmem.h
> > > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > > >
> > > > diff --git a/drivers/nvdimm/virtio_pmem.c
> > > > b/drivers/nvdimm/virtio_pmem.c
> > > > new file mode 100644
> > > > index 000000000000..66b582f751a3
> > > > --- /dev/null
> > > > +++ b/drivers/nvdimm/virtio_pmem.c
> > > > @@ -0,0 +1,114 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * virtio_pmem.c: Virtio pmem Driver
> > > > + *
> > > > + * Discovers persistent memory range information
> > > > + * from host and provides a virtio based flushing
> > > > + * interface.
> > > > + */
> > > > +#include <linux/virtio_pmem.h>
> > > > +#include "nd.h"
> > > > +
> > > > + /* The interrupt handler */
> > > > +void host_ack(struct virtqueue *vq)
> > > > +{
> > > > +       unsigned int len;
> > > > +       unsigned long flags;
> > > > +       struct virtio_pmem_request *req, *req_buf;
> > > > +       struct virtio_pmem *vpmem = vq->vdev->priv;
> > > > +
> > > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > > +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > > > +               req->done = true;
> > > > +               wake_up(&req->host_acked);
> > > > +
> > > > +               if (!list_empty(&vpmem->req_list)) {
> > > > +                       req_buf = list_first_entry(&vpmem->req_list,
> > > > +                                       struct virtio_pmem_request,
> > > > list);
> > > > +                       list_del(&vpmem->req_list);
> > > > +                       req_buf->wq_buf_avail = true;
> > > > +                       wake_up(&req_buf->wq_buf);
> > > > +               }
> > > > +       }
> > > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(host_ack);
> > > > +
> > > > + /* The request submission function */
> > > > +int virtio_pmem_flush(struct nd_region *nd_region)
> > > > +{
> > > > +       int err;
> > > > +       unsigned long flags;
> > > > +       struct scatterlist *sgs[2], sg, ret;
> > > > +       struct virtio_device *vdev = nd_region->provider_data;
> > > > +       struct virtio_pmem *vpmem = vdev->priv;
> > > > +       struct virtio_pmem_request *req;
> > > > +
> > > > +       might_sleep();
> > > > +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > > +       if (!req)
> > > > +               return -ENOMEM;
> > > > +
> > > > +       req->done = req->wq_buf_avail = false;
> > > > +       strcpy(req->name, "FLUSH");
> > > > +       init_waitqueue_head(&req->host_acked);
> > > > +       init_waitqueue_head(&req->wq_buf);
> > > > +       sg_init_one(&sg, req->name, strlen(req->name));
> > > > +       sgs[0] = &sg;
> > > > +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > > > +       sgs[1] = &ret;
> > > > +
> > > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > > +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req,
> > > > GFP_ATOMIC);
> > > > +       if (err) {
> > > > +               dev_err(&vdev->dev, "failed to send command to virtio
> > > > pmem
> > > > device\n");
> > > > +
> > > > +               list_add_tail(&vpmem->req_list, &req->list);
> > > > +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > > +
> > > > +               /* When host has read buffer, this completes via
> > > > host_ack
> > > > */
> > > > +               wait_event(req->wq_buf, req->wq_buf_avail);
> > > > +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > > +       }
> > > > +       err = virtqueue_kick(vpmem->req_vq);
> > > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > > +
> > > > +       if (!err) {
> > > > +               err = -EIO;
> > > > +               goto ret;
> > > > +       }
> > > > +       /* When host has read buffer, this completes via host_ack */
> > > > +       wait_event(req->host_acked, req->done);
> > > > +       err = req->ret;
> > > > +ret:
> > > > +       kfree(req);
> > > > +       return err;
> > > > +};
> > > > +
> > > > + /* The asynchronous flush callback function */
> > > > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > > > +{
> > > > +       int rc = 0;
> > > > +
> > > > +       /* Create child bio for asynchronous flush and chain with
> > > > +        * parent bio. Otherwise directly call nd_region flush.
> > > > +        */
> > > > +       if (bio && bio->bi_iter.bi_sector != -1) {
> > > > +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > > > +
> > > > +               if (!child)
> > > > +                       return -ENOMEM;
> > > > +               bio_copy_dev(child, bio);
> > > > +               child->bi_opf = REQ_PREFLUSH;
> > > > +               child->bi_iter.bi_sector = -1;
> > > > +               bio_chain(child, bio);
> > > > +               submit_bio(child);
> > > > +       } else {
> > > > +               if (virtio_pmem_flush(nd_region))
> > > > +                       rc = -EIO;
> > > > +       }
> > > > +
> > > > +       return rc;
> > > > +};
> > > > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > > > +MODULE_LICENSE("GPL");
> > > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > > > index 35897649c24f..9f634a2ed638 100644
> > > > --- a/drivers/virtio/Kconfig
> > > > +++ b/drivers/virtio/Kconfig
> > > > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> > > >
> > > >           If unsure, say Y.
> > > >
> > > > +config VIRTIO_PMEM
> > > > +       tristate "Support for virtio pmem driver"
> > > > +       depends on VIRTIO
> > > > +       depends on LIBNVDIMM
> > > > +       help
> > > > +       This driver provides support for virtio based flushing
> > > > interface
> > > > +       for persistent memory range.
> > > > +
> > > > +       If unsure, say M.
> > > > +
> > > >  config VIRTIO_BALLOON
> > > >         tristate "Virtio balloon driver"
> > > >         depends on VIRTIO
> > > > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > > > index 3a2b5c5dcf46..143ce91eabe9 100644
> > > > --- a/drivers/virtio/Makefile
> > > > +++ b/drivers/virtio/Makefile
> > > > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> > > >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> > > >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> > > >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > > > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > > > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > > > new file mode 100644
> > > > index 000000000000..309788628e41
> > > > --- /dev/null
> > > > +++ b/drivers/virtio/pmem.c
> > >
> > > It's not clear to me why this driver is located in drivers/virtio/
> >
> > Like other VIRTIO drivers, I placed it initially in drivers/virtio
> > directory.
> >
> > >
> > > > @@ -0,0 +1,118 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * virtio_pmem.c: Virtio pmem Driver
> > > > + *
> > > > + * Discovers persistent memory range information
> > > > + * from host and registers the virtual pmem device
> > > > + * with libnvdimm core.
> > > > + */
> > > > +#include <linux/virtio_pmem.h>
> > > > +#include <../../drivers/nvdimm/nd.h>
> > >
> > > ...especially because it seems to require nvdimm internals.
> > >
> > > However I don't see why that header is included.
> >
> > Removed.
> >
> > >
> > > In any event lets move this to drivers/nvdimm/virtio.c to live
> > > alongside the other generic bus provider drivers/nvdimm/e820.c.
> >
> > o.k. Makes sense.
> >
> > >
> > > > +
> > > > +static struct virtio_device_id id_table[] = {
> > > > +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > > > +       { 0 },
> > > > +};
> > > > +
> > > > + /* Initialize virt queue */
> > > > +static int init_vq(struct virtio_pmem *vpmem)
> > > > +{
> > > > +       /* single vq */
> > > > +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > > > +                               host_ack, "flush_queue");
> > > > +       if (IS_ERR(vpmem->req_vq))
> > > > +               return PTR_ERR(vpmem->req_vq);
> > > > +
> > > > +       spin_lock_init(&vpmem->pmem_lock);
> > > > +       INIT_LIST_HEAD(&vpmem->req_list);
> > > > +
> > > > +       return 0;
> > > > +};
> > > > +
> > > > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > > > +{
> > > > +       int err = 0;
> > > > +       struct resource res;
> > > > +       struct virtio_pmem *vpmem;
> > > > +       struct nd_region_desc ndr_desc = {};
> > > > +       int nid = dev_to_node(&vdev->dev);
> > > > +       struct nd_region *nd_region;
> > > > +
> > > > +       if (!vdev->config->get) {
> > > > +               dev_err(&vdev->dev, "%s failure: config access
> > > > disabled\n",
> > > > +                       __func__);
> > > > +               return -EINVAL;
> > > > +       }
> > > > +
> > > > +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > > > +       if (!vpmem) {
> > > > +               err = -ENOMEM;
> > > > +               goto out_err;
> > > > +       }
> > > > +
> > > > +       vpmem->vdev = vdev;
> > > > +       vdev->priv = vpmem;
> > > > +       err = init_vq(vpmem);
> > > > +       if (err)
> > > > +               goto out_err;
> > > > +
> > > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > > +                       start, &vpmem->start);
> > > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > > +                       size, &vpmem->size);
> > > > +
> > > > +       res.start = vpmem->start;
> > > > +       res.end   = vpmem->start + vpmem->size-1;
> > > > +       vpmem->nd_desc.provider_name = "virtio-pmem";
> > > > +       vpmem->nd_desc.module = THIS_MODULE;
> > > > +
> > > > +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > > > +                                               &vpmem->nd_desc);
> > > > +       if (!vpmem->nvdimm_bus)
> > > > +               goto out_vq;
> > > > +
> > > > +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > > > +
> > > > +       ndr_desc.res = &res;
> > > > +       ndr_desc.numa_node = nid;
> > > > +       ndr_desc.flush = async_pmem_flush;
> > > > +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > > > +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > > > +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus,
> > > > &ndr_desc);
> > > > +
> > > > +       if (!nd_region)
> > > > +               goto out_nd;
> > > > +       nd_region->provider_data =  dev_to_virtio
> > > > +
> > > > (nd_region->dev.parent->parent);
> > > > +       return 0;
> > > > +out_nd:
> > > > +       err = -ENXIO;
> > > > +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > > > +out_vq:
> > > > +       vdev->config->del_vqs(vdev);
> > > > +out_err:
> > > > +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > > > +       return err;
> > > > +}
> > > > +
> > > > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > > > +{
> > > > +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > > > +
> > > > +       nvdimm_bus_unregister(nvdimm_bus);
> > > > +       vdev->config->del_vqs(vdev);
> > > > +       vdev->config->reset(vdev);
> > > > +}
> > > > +
> > > > +static struct virtio_driver virtio_pmem_driver = {
> > > > +       .driver.name            = KBUILD_MODNAME,
> > > > +       .driver.owner           = THIS_MODULE,
> > > > +       .id_table               = id_table,
> > > > +       .probe                  = virtio_pmem_probe,
> > > > +       .remove                 = virtio_pmem_remove,
> > > > +};
> > > > +
> > > > +module_virtio_driver(virtio_pmem_driver);
> > > > +MODULE_DEVICE_TABLE(virtio, id_table);
> > > > +MODULE_DESCRIPTION("Virtio pmem driver");
> > > > +MODULE_LICENSE("GPL");
> > > > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > > > new file mode 100644
> > > > index 000000000000..ab1da877575d
> > > > --- /dev/null
> > > > +++ b/include/linux/virtio_pmem.h
> > >
> > > Why is this a global header?
> >
> > This is where other virtio driver headers are also placed.
> > I think this is to access uapi config file in :
> >
> > ./include/uapi/linux/virtio_pmem.h
> >
> > Is it okay if we keep 'virtio_pmem.h' in global header?
> 
> No, I don't think so. While virtio_console.h and virtio_net.h make
> sense as global headers because they are consumed from multiple
> drivers, there is no need for virtio_caif.h, for example, to be a
> global header. I see no practical reason that the private details of
> virtio_pmem.h need to be made available outside of the virtio_pmem.c
> consumer.

o.k. Will move it.

Best regards,
Pankaj
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
@ 2019-05-11  1:26             ` Pankaj Gupta
  0 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-11  1:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, KVM list, Michael S. Tsirkin, Jason Wang, david,
	Qemu Developers, virtualization, Andreas Dilger, Ross Zwisler,
	Andrea Arcangeli, Dave Jiang, linux-nvdimm, Vishal L Verma,
	David Hildenbrand, Matthew Wilcox, Christoph Hellwig, Linux ACPI,
	jmoyer, linux-ext4, Len Brown, kilobyte, Rik van Riel,
	yuval shaia, Stefan Hajnoczi, Paolo Bonzini, lcapitulino,
	Kevin Wolf, Nitesh Narayan Lal, Theodore Ts'o,
	Xiao Guangrong, cohuck, Rafael J. Wysocki,
	Linux Kernel Mailing List, linux-xfs, linux-fsdevel,
	Igor Mammedov, Darrick J. Wong


> >
> > Hi Dan,
> >
> > Thank you for the review. Please see my reply inline.
> >
> > >
> > > Hi Pankaj,
> > >
> > > Some minor file placement comments below.
> >
> > Sure.
> >
> > >
> > > On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> > > >
> > > > This patch adds virtio-pmem driver for KVM guest.
> > > >
> > > > Guest reads the persistent memory range information from
> > > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > > creates a nd_region object with the persistent memory
> > > > range information so that existing 'nvdimm/pmem' driver
> > > > can reserve this into system memory map. This way
> > > > 'virtio-pmem' driver uses existing functionality of pmem
> > > > driver to register persistent memory compatible for DAX
> > > > capable filesystems.
> > > >
> > > > This also provides function to perform guest flush over
> > > > VIRTIO from 'pmem' driver when userspace performs flush
> > > > on DAX memory range.
> > > >
> > > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > > > ---
> > > >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> > > >  drivers/virtio/Kconfig           |  10 +++
> > > >  drivers/virtio/Makefile          |   1 +
> > > >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> > > >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> > > >  include/uapi/linux/virtio_ids.h  |   1 +
> > > >  include/uapi/linux/virtio_pmem.h |  10 +++
> > > >  7 files changed, 314 insertions(+)
> > > >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> > > >  create mode 100644 drivers/virtio/pmem.c
> > > >  create mode 100644 include/linux/virtio_pmem.h
> > > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > > >
> > > > diff --git a/drivers/nvdimm/virtio_pmem.c
> > > > b/drivers/nvdimm/virtio_pmem.c
> > > > new file mode 100644
> > > > index 000000000000..66b582f751a3
> > > > --- /dev/null
> > > > +++ b/drivers/nvdimm/virtio_pmem.c
> > > > @@ -0,0 +1,114 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * virtio_pmem.c: Virtio pmem Driver
> > > > + *
> > > > + * Discovers persistent memory range information
> > > > + * from host and provides a virtio based flushing
> > > > + * interface.
> > > > + */
> > > > +#include <linux/virtio_pmem.h>
> > > > +#include "nd.h"
> > > > +
> > > > + /* The interrupt handler */
> > > > +void host_ack(struct virtqueue *vq)
> > > > +{
> > > > +       unsigned int len;
> > > > +       unsigned long flags;
> > > > +       struct virtio_pmem_request *req, *req_buf;
> > > > +       struct virtio_pmem *vpmem = vq->vdev->priv;
> > > > +
> > > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > > +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > > > +               req->done = true;
> > > > +               wake_up(&req->host_acked);
> > > > +
> > > > +               if (!list_empty(&vpmem->req_list)) {
> > > > +                       req_buf = list_first_entry(&vpmem->req_list,
> > > > +                                       struct virtio_pmem_request,
> > > > list);
> > > > +                       list_del(&vpmem->req_list);
> > > > +                       req_buf->wq_buf_avail = true;
> > > > +                       wake_up(&req_buf->wq_buf);
> > > > +               }
> > > > +       }
> > > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(host_ack);
> > > > +
> > > > + /* The request submission function */
> > > > +int virtio_pmem_flush(struct nd_region *nd_region)
> > > > +{
> > > > +       int err;
> > > > +       unsigned long flags;
> > > > +       struct scatterlist *sgs[2], sg, ret;
> > > > +       struct virtio_device *vdev = nd_region->provider_data;
> > > > +       struct virtio_pmem *vpmem = vdev->priv;
> > > > +       struct virtio_pmem_request *req;
> > > > +
> > > > +       might_sleep();
> > > > +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > > +       if (!req)
> > > > +               return -ENOMEM;
> > > > +
> > > > +       req->done = req->wq_buf_avail = false;
> > > > +       strcpy(req->name, "FLUSH");
> > > > +       init_waitqueue_head(&req->host_acked);
> > > > +       init_waitqueue_head(&req->wq_buf);
> > > > +       sg_init_one(&sg, req->name, strlen(req->name));
> > > > +       sgs[0] = &sg;
> > > > +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > > > +       sgs[1] = &ret;
> > > > +
> > > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > > +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req,
> > > > GFP_ATOMIC);
> > > > +       if (err) {
> > > > +               dev_err(&vdev->dev, "failed to send command to virtio
> > > > pmem
> > > > device\n");
> > > > +
> > > > +               list_add_tail(&vpmem->req_list, &req->list);
> > > > +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > > +
> > > > +               /* When host has read buffer, this completes via
> > > > host_ack
> > > > */
> > > > +               wait_event(req->wq_buf, req->wq_buf_avail);
> > > > +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > > +       }
> > > > +       err = virtqueue_kick(vpmem->req_vq);
> > > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > > +
> > > > +       if (!err) {
> > > > +               err = -EIO;
> > > > +               goto ret;
> > > > +       }
> > > > +       /* When host has read buffer, this completes via host_ack */
> > > > +       wait_event(req->host_acked, req->done);
> > > > +       err = req->ret;
> > > > +ret:
> > > > +       kfree(req);
> > > > +       return err;
> > > > +};
> > > > +
> > > > + /* The asynchronous flush callback function */
> > > > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > > > +{
> > > > +       int rc = 0;
> > > > +
> > > > +       /* Create child bio for asynchronous flush and chain with
> > > > +        * parent bio. Otherwise directly call nd_region flush.
> > > > +        */
> > > > +       if (bio && bio->bi_iter.bi_sector != -1) {
> > > > +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > > > +
> > > > +               if (!child)
> > > > +                       return -ENOMEM;
> > > > +               bio_copy_dev(child, bio);
> > > > +               child->bi_opf = REQ_PREFLUSH;
> > > > +               child->bi_iter.bi_sector = -1;
> > > > +               bio_chain(child, bio);
> > > > +               submit_bio(child);
> > > > +       } else {
> > > > +               if (virtio_pmem_flush(nd_region))
> > > > +                       rc = -EIO;
> > > > +       }
> > > > +
> > > > +       return rc;
> > > > +};
> > > > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > > > +MODULE_LICENSE("GPL");
> > > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > > > index 35897649c24f..9f634a2ed638 100644
> > > > --- a/drivers/virtio/Kconfig
> > > > +++ b/drivers/virtio/Kconfig
> > > > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> > > >
> > > >           If unsure, say Y.
> > > >
> > > > +config VIRTIO_PMEM
> > > > +       tristate "Support for virtio pmem driver"
> > > > +       depends on VIRTIO
> > > > +       depends on LIBNVDIMM
> > > > +       help
> > > > +       This driver provides support for virtio based flushing
> > > > interface
> > > > +       for persistent memory range.
> > > > +
> > > > +       If unsure, say M.
> > > > +
> > > >  config VIRTIO_BALLOON
> > > >         tristate "Virtio balloon driver"
> > > >         depends on VIRTIO
> > > > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > > > index 3a2b5c5dcf46..143ce91eabe9 100644
> > > > --- a/drivers/virtio/Makefile
> > > > +++ b/drivers/virtio/Makefile
> > > > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> > > >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> > > >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> > > >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > > > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > > > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > > > new file mode 100644
> > > > index 000000000000..309788628e41
> > > > --- /dev/null
> > > > +++ b/drivers/virtio/pmem.c
> > >
> > > It's not clear to me why this driver is located in drivers/virtio/
> >
> > Like other VIRTIO drivers, I placed it initially in drivers/virtio
> > directory.
> >
> > >
> > > > @@ -0,0 +1,118 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * virtio_pmem.c: Virtio pmem Driver
> > > > + *
> > > > + * Discovers persistent memory range information
> > > > + * from host and registers the virtual pmem device
> > > > + * with libnvdimm core.
> > > > + */
> > > > +#include <linux/virtio_pmem.h>
> > > > +#include <../../drivers/nvdimm/nd.h>
> > >
> > > ...especially because it seems to require nvdimm internals.
> > >
> > > However I don't see why that header is included.
> >
> > Removed.
> >
> > >
> > > In any event lets move this to drivers/nvdimm/virtio.c to live
> > > alongside the other generic bus provider drivers/nvdimm/e820.c.
> >
> > o.k. Makes sense.
> >
> > >
> > > > +
> > > > +static struct virtio_device_id id_table[] = {
> > > > +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > > > +       { 0 },
> > > > +};
> > > > +
> > > > + /* Initialize virt queue */
> > > > +static int init_vq(struct virtio_pmem *vpmem)
> > > > +{
> > > > +       /* single vq */
> > > > +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > > > +                               host_ack, "flush_queue");
> > > > +       if (IS_ERR(vpmem->req_vq))
> > > > +               return PTR_ERR(vpmem->req_vq);
> > > > +
> > > > +       spin_lock_init(&vpmem->pmem_lock);
> > > > +       INIT_LIST_HEAD(&vpmem->req_list);
> > > > +
> > > > +       return 0;
> > > > +};
> > > > +
> > > > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > > > +{
> > > > +       int err = 0;
> > > > +       struct resource res;
> > > > +       struct virtio_pmem *vpmem;
> > > > +       struct nd_region_desc ndr_desc = {};
> > > > +       int nid = dev_to_node(&vdev->dev);
> > > > +       struct nd_region *nd_region;
> > > > +
> > > > +       if (!vdev->config->get) {
> > > > +               dev_err(&vdev->dev, "%s failure: config access
> > > > disabled\n",
> > > > +                       __func__);
> > > > +               return -EINVAL;
> > > > +       }
> > > > +
> > > > +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > > > +       if (!vpmem) {
> > > > +               err = -ENOMEM;
> > > > +               goto out_err;
> > > > +       }
> > > > +
> > > > +       vpmem->vdev = vdev;
> > > > +       vdev->priv = vpmem;
> > > > +       err = init_vq(vpmem);
> > > > +       if (err)
> > > > +               goto out_err;
> > > > +
> > > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > > +                       start, &vpmem->start);
> > > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > > +                       size, &vpmem->size);
> > > > +
> > > > +       res.start = vpmem->start;
> > > > +       res.end   = vpmem->start + vpmem->size-1;
> > > > +       vpmem->nd_desc.provider_name = "virtio-pmem";
> > > > +       vpmem->nd_desc.module = THIS_MODULE;
> > > > +
> > > > +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > > > +                                               &vpmem->nd_desc);
> > > > +       if (!vpmem->nvdimm_bus)
> > > > +               goto out_vq;
> > > > +
> > > > +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > > > +
> > > > +       ndr_desc.res = &res;
> > > > +       ndr_desc.numa_node = nid;
> > > > +       ndr_desc.flush = async_pmem_flush;
> > > > +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > > > +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > > > +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus,
> > > > &ndr_desc);
> > > > +
> > > > +       if (!nd_region)
> > > > +               goto out_nd;
> > > > +       nd_region->provider_data =  dev_to_virtio
> > > > +
> > > > (nd_region->dev.parent->parent);
> > > > +       return 0;
> > > > +out_nd:
> > > > +       err = -ENXIO;
> > > > +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > > > +out_vq:
> > > > +       vdev->config->del_vqs(vdev);
> > > > +out_err:
> > > > +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > > > +       return err;
> > > > +}
> > > > +
> > > > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > > > +{
> > > > +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > > > +
> > > > +       nvdimm_bus_unregister(nvdimm_bus);
> > > > +       vdev->config->del_vqs(vdev);
> > > > +       vdev->config->reset(vdev);
> > > > +}
> > > > +
> > > > +static struct virtio_driver virtio_pmem_driver = {
> > > > +       .driver.name            = KBUILD_MODNAME,
> > > > +       .driver.owner           = THIS_MODULE,
> > > > +       .id_table               = id_table,
> > > > +       .probe                  = virtio_pmem_probe,
> > > > +       .remove                 = virtio_pmem_remove,
> > > > +};
> > > > +
> > > > +module_virtio_driver(virtio_pmem_driver);
> > > > +MODULE_DEVICE_TABLE(virtio, id_table);
> > > > +MODULE_DESCRIPTION("Virtio pmem driver");
> > > > +MODULE_LICENSE("GPL");
> > > > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > > > new file mode 100644
> > > > index 000000000000..ab1da877575d
> > > > --- /dev/null
> > > > +++ b/include/linux/virtio_pmem.h
> > >
> > > Why is this a global header?
> >
> > This is where other virtio driver headers are also placed.
> > I think this is to access uapi config file in :
> >
> > ./include/uapi/linux/virtio_pmem.h
> >
> > Is it okay if we keep 'virtio_pmem.h' in global header?
> 
> No, I don't think so. While virtio_console.h and virtio_net.h make
> sense as global headers because they are consumed from multiple
> drivers, there is no need for virtio_caif.h, for example, to be a
> global header. I see no practical reason that the private details of
> virtio_pmem.h need to be made available outside of the virtio_pmem.c
> consumer.

o.k. Will move it.

Best regards,
Pankaj
> 
> 


^ permalink raw reply	[flat|nested] 107+ messages in thread

* Re: [Qemu-devel] [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver
  2019-05-10 23:33           ` Dan Williams
                             ` (2 preceding siblings ...)
  (?)
@ 2019-05-11  1:26           ` Pankaj Gupta
  -1 siblings, 0 replies; 107+ messages in thread
From: Pankaj Gupta @ 2019-05-11  1:26 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jan Kara, KVM list, Michael S. Tsirkin, david, Qemu Developers,
	virtualization, Andreas Dilger, Ross Zwisler, Andrea Arcangeli,
	Dave Jiang, linux-nvdimm, Vishal L Verma, Matthew Wilcox,
	Christoph Hellwig, Linux ACPI, jmoyer, linux-ext4, Len Brown,
	kilobyte, Rik van Riel, yuval shaia, Stefan Hajnoczi,
	Paolo Bonzini, lcapitulino, Nitesh Narayan Lal


> >
> > Hi Dan,
> >
> > Thank you for the review. Please see my reply inline.
> >
> > >
> > > Hi Pankaj,
> > >
> > > Some minor file placement comments below.
> >
> > Sure.
> >
> > >
> > > On Thu, Apr 25, 2019 at 10:02 PM Pankaj Gupta <pagupta@redhat.com> wrote:
> > > >
> > > > This patch adds virtio-pmem driver for KVM guest.
> > > >
> > > > Guest reads the persistent memory range information from
> > > > Qemu over VIRTIO and registers it on nvdimm_bus. It also
> > > > creates a nd_region object with the persistent memory
> > > > range information so that existing 'nvdimm/pmem' driver
> > > > can reserve this into system memory map. This way
> > > > 'virtio-pmem' driver uses existing functionality of pmem
> > > > driver to register persistent memory compatible for DAX
> > > > capable filesystems.
> > > >
> > > > This also provides function to perform guest flush over
> > > > VIRTIO from 'pmem' driver when userspace performs flush
> > > > on DAX memory range.
> > > >
> > > > Signed-off-by: Pankaj Gupta <pagupta@redhat.com>
> > > > ---
> > > >  drivers/nvdimm/virtio_pmem.c     | 114 +++++++++++++++++++++++++++++
> > > >  drivers/virtio/Kconfig           |  10 +++
> > > >  drivers/virtio/Makefile          |   1 +
> > > >  drivers/virtio/pmem.c            | 118 +++++++++++++++++++++++++++++++
> > > >  include/linux/virtio_pmem.h      |  60 ++++++++++++++++
> > > >  include/uapi/linux/virtio_ids.h  |   1 +
> > > >  include/uapi/linux/virtio_pmem.h |  10 +++
> > > >  7 files changed, 314 insertions(+)
> > > >  create mode 100644 drivers/nvdimm/virtio_pmem.c
> > > >  create mode 100644 drivers/virtio/pmem.c
> > > >  create mode 100644 include/linux/virtio_pmem.h
> > > >  create mode 100644 include/uapi/linux/virtio_pmem.h
> > > >
> > > > diff --git a/drivers/nvdimm/virtio_pmem.c
> > > > b/drivers/nvdimm/virtio_pmem.c
> > > > new file mode 100644
> > > > index 000000000000..66b582f751a3
> > > > --- /dev/null
> > > > +++ b/drivers/nvdimm/virtio_pmem.c
> > > > @@ -0,0 +1,114 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * virtio_pmem.c: Virtio pmem Driver
> > > > + *
> > > > + * Discovers persistent memory range information
> > > > + * from host and provides a virtio based flushing
> > > > + * interface.
> > > > + */
> > > > +#include <linux/virtio_pmem.h>
> > > > +#include "nd.h"
> > > > +
> > > > + /* The interrupt handler */
> > > > +void host_ack(struct virtqueue *vq)
> > > > +{
> > > > +       unsigned int len;
> > > > +       unsigned long flags;
> > > > +       struct virtio_pmem_request *req, *req_buf;
> > > > +       struct virtio_pmem *vpmem = vq->vdev->priv;
> > > > +
> > > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > > +       while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
> > > > +               req->done = true;
> > > > +               wake_up(&req->host_acked);
> > > > +
> > > > +               if (!list_empty(&vpmem->req_list)) {
> > > > +                       req_buf = list_first_entry(&vpmem->req_list,
> > > > +                                       struct virtio_pmem_request,
> > > > list);
> > > > +                       list_del(&vpmem->req_list);
> > > > +                       req_buf->wq_buf_avail = true;
> > > > +                       wake_up(&req_buf->wq_buf);
> > > > +               }
> > > > +       }
> > > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(host_ack);
> > > > +
> > > > + /* The request submission function */
> > > > +int virtio_pmem_flush(struct nd_region *nd_region)
> > > > +{
> > > > +       int err;
> > > > +       unsigned long flags;
> > > > +       struct scatterlist *sgs[2], sg, ret;
> > > > +       struct virtio_device *vdev = nd_region->provider_data;
> > > > +       struct virtio_pmem *vpmem = vdev->priv;
> > > > +       struct virtio_pmem_request *req;
> > > > +
> > > > +       might_sleep();
> > > > +       req = kmalloc(sizeof(*req), GFP_KERNEL);
> > > > +       if (!req)
> > > > +               return -ENOMEM;
> > > > +
> > > > +       req->done = req->wq_buf_avail = false;
> > > > +       strcpy(req->name, "FLUSH");
> > > > +       init_waitqueue_head(&req->host_acked);
> > > > +       init_waitqueue_head(&req->wq_buf);
> > > > +       sg_init_one(&sg, req->name, strlen(req->name));
> > > > +       sgs[0] = &sg;
> > > > +       sg_init_one(&ret, &req->ret, sizeof(req->ret));
> > > > +       sgs[1] = &ret;
> > > > +
> > > > +       spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > > +       err = virtqueue_add_sgs(vpmem->req_vq, sgs, 1, 1, req,
> > > > GFP_ATOMIC);
> > > > +       if (err) {
> > > > +               dev_err(&vdev->dev, "failed to send command to virtio
> > > > pmem
> > > > device\n");
> > > > +
> > > > +               list_add_tail(&vpmem->req_list, &req->list);
> > > > +               spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > > +
> > > > +               /* When host has read buffer, this completes via
> > > > host_ack
> > > > */
> > > > +               wait_event(req->wq_buf, req->wq_buf_avail);
> > > > +               spin_lock_irqsave(&vpmem->pmem_lock, flags);
> > > > +       }
> > > > +       err = virtqueue_kick(vpmem->req_vq);
> > > > +       spin_unlock_irqrestore(&vpmem->pmem_lock, flags);
> > > > +
> > > > +       if (!err) {
> > > > +               err = -EIO;
> > > > +               goto ret;
> > > > +       }
> > > > +       /* When host has read buffer, this completes via host_ack */
> > > > +       wait_event(req->host_acked, req->done);
> > > > +       err = req->ret;
> > > > +ret:
> > > > +       kfree(req);
> > > > +       return err;
> > > > +};
> > > > +
> > > > + /* The asynchronous flush callback function */
> > > > +int async_pmem_flush(struct nd_region *nd_region, struct bio *bio)
> > > > +{
> > > > +       int rc = 0;
> > > > +
> > > > +       /* Create child bio for asynchronous flush and chain with
> > > > +        * parent bio. Otherwise directly call nd_region flush.
> > > > +        */
> > > > +       if (bio && bio->bi_iter.bi_sector != -1) {
> > > > +               struct bio *child = bio_alloc(GFP_ATOMIC, 0);
> > > > +
> > > > +               if (!child)
> > > > +                       return -ENOMEM;
> > > > +               bio_copy_dev(child, bio);
> > > > +               child->bi_opf = REQ_PREFLUSH;
> > > > +               child->bi_iter.bi_sector = -1;
> > > > +               bio_chain(child, bio);
> > > > +               submit_bio(child);
> > > > +       } else {
> > > > +               if (virtio_pmem_flush(nd_region))
> > > > +                       rc = -EIO;
> > > > +       }
> > > > +
> > > > +       return rc;
> > > > +};
> > > > +EXPORT_SYMBOL_GPL(async_pmem_flush);
> > > > +MODULE_LICENSE("GPL");
> > > > diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
> > > > index 35897649c24f..9f634a2ed638 100644
> > > > --- a/drivers/virtio/Kconfig
> > > > +++ b/drivers/virtio/Kconfig
> > > > @@ -42,6 +42,16 @@ config VIRTIO_PCI_LEGACY
> > > >
> > > >           If unsure, say Y.
> > > >
> > > > +config VIRTIO_PMEM
> > > > +       tristate "Support for virtio pmem driver"
> > > > +       depends on VIRTIO
> > > > +       depends on LIBNVDIMM
> > > > +       help
> > > > +       This driver provides support for virtio based flushing
> > > > interface
> > > > +       for persistent memory range.
> > > > +
> > > > +       If unsure, say M.
> > > > +
> > > >  config VIRTIO_BALLOON
> > > >         tristate "Virtio balloon driver"
> > > >         depends on VIRTIO
> > > > diff --git a/drivers/virtio/Makefile b/drivers/virtio/Makefile
> > > > index 3a2b5c5dcf46..143ce91eabe9 100644
> > > > --- a/drivers/virtio/Makefile
> > > > +++ b/drivers/virtio/Makefile
> > > > @@ -6,3 +6,4 @@ virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o
> > > >  virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o
> > > >  obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o
> > > >  obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o
> > > > +obj-$(CONFIG_VIRTIO_PMEM) += pmem.o ../nvdimm/virtio_pmem.o
> > > > diff --git a/drivers/virtio/pmem.c b/drivers/virtio/pmem.c
> > > > new file mode 100644
> > > > index 000000000000..309788628e41
> > > > --- /dev/null
> > > > +++ b/drivers/virtio/pmem.c
> > >
> > > It's not clear to me why this driver is located in drivers/virtio/
> >
> > Like other VIRTIO drivers, I placed it initially in drivers/virtio
> > directory.
> >
> > >
> > > > @@ -0,0 +1,118 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * virtio_pmem.c: Virtio pmem Driver
> > > > + *
> > > > + * Discovers persistent memory range information
> > > > + * from host and registers the virtual pmem device
> > > > + * with libnvdimm core.
> > > > + */
> > > > +#include <linux/virtio_pmem.h>
> > > > +#include <../../drivers/nvdimm/nd.h>
> > >
> > > ...especially because it seems to require nvdimm internals.
> > >
> > > However I don't see why that header is included.
> >
> > Removed.
> >
> > >
> > > In any event lets move this to drivers/nvdimm/virtio.c to live
> > > alongside the other generic bus provider drivers/nvdimm/e820.c.
> >
> > o.k. Makes sense.
> >
> > >
> > > > +
> > > > +static struct virtio_device_id id_table[] = {
> > > > +       { VIRTIO_ID_PMEM, VIRTIO_DEV_ANY_ID },
> > > > +       { 0 },
> > > > +};
> > > > +
> > > > + /* Initialize virt queue */
> > > > +static int init_vq(struct virtio_pmem *vpmem)
> > > > +{
> > > > +       /* single vq */
> > > > +       vpmem->req_vq = virtio_find_single_vq(vpmem->vdev,
> > > > +                               host_ack, "flush_queue");
> > > > +       if (IS_ERR(vpmem->req_vq))
> > > > +               return PTR_ERR(vpmem->req_vq);
> > > > +
> > > > +       spin_lock_init(&vpmem->pmem_lock);
> > > > +       INIT_LIST_HEAD(&vpmem->req_list);
> > > > +
> > > > +       return 0;
> > > > +};
> > > > +
> > > > +static int virtio_pmem_probe(struct virtio_device *vdev)
> > > > +{
> > > > +       int err = 0;
> > > > +       struct resource res;
> > > > +       struct virtio_pmem *vpmem;
> > > > +       struct nd_region_desc ndr_desc = {};
> > > > +       int nid = dev_to_node(&vdev->dev);
> > > > +       struct nd_region *nd_region;
> > > > +
> > > > +       if (!vdev->config->get) {
> > > > +               dev_err(&vdev->dev, "%s failure: config access
> > > > disabled\n",
> > > > +                       __func__);
> > > > +               return -EINVAL;
> > > > +       }
> > > > +
> > > > +       vpmem = devm_kzalloc(&vdev->dev, sizeof(*vpmem), GFP_KERNEL);
> > > > +       if (!vpmem) {
> > > > +               err = -ENOMEM;
> > > > +               goto out_err;
> > > > +       }
> > > > +
> > > > +       vpmem->vdev = vdev;
> > > > +       vdev->priv = vpmem;
> > > > +       err = init_vq(vpmem);
> > > > +       if (err)
> > > > +               goto out_err;
> > > > +
> > > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > > +                       start, &vpmem->start);
> > > > +       virtio_cread(vpmem->vdev, struct virtio_pmem_config,
> > > > +                       size, &vpmem->size);
> > > > +
> > > > +       res.start = vpmem->start;
> > > > +       res.end   = vpmem->start + vpmem->size-1;
> > > > +       vpmem->nd_desc.provider_name = "virtio-pmem";
> > > > +       vpmem->nd_desc.module = THIS_MODULE;
> > > > +
> > > > +       vpmem->nvdimm_bus = nvdimm_bus_register(&vdev->dev,
> > > > +                                               &vpmem->nd_desc);
> > > > +       if (!vpmem->nvdimm_bus)
> > > > +               goto out_vq;
> > > > +
> > > > +       dev_set_drvdata(&vdev->dev, vpmem->nvdimm_bus);
> > > > +
> > > > +       ndr_desc.res = &res;
> > > > +       ndr_desc.numa_node = nid;
> > > > +       ndr_desc.flush = async_pmem_flush;
> > > > +       set_bit(ND_REGION_PAGEMAP, &ndr_desc.flags);
> > > > +       set_bit(ND_REGION_ASYNC, &ndr_desc.flags);
> > > > +       nd_region = nvdimm_pmem_region_create(vpmem->nvdimm_bus,
> > > > &ndr_desc);
> > > > +
> > > > +       if (!nd_region)
> > > > +               goto out_nd;
> > > > +       nd_region->provider_data =  dev_to_virtio
> > > > +
> > > > (nd_region->dev.parent->parent);
> > > > +       return 0;
> > > > +out_nd:
> > > > +       err = -ENXIO;
> > > > +       nvdimm_bus_unregister(vpmem->nvdimm_bus);
> > > > +out_vq:
> > > > +       vdev->config->del_vqs(vdev);
> > > > +out_err:
> > > > +       dev_err(&vdev->dev, "failed to register virtio pmem memory\n");
> > > > +       return err;
> > > > +}
> > > > +
> > > > +static void virtio_pmem_remove(struct virtio_device *vdev)
> > > > +{
> > > > +       struct nvdimm_bus *nvdimm_bus = dev_get_drvdata(&vdev->dev);
> > > > +
> > > > +       nvdimm_bus_unregister(nvdimm_bus);
> > > > +       vdev->config->del_vqs(vdev);
> > > > +       vdev->config->reset(vdev);
> > > > +}
> > > > +
> > > > +static struct virtio_driver virtio_pmem_driver = {
> > > > +       .driver.name            = KBUILD_MODNAME,
> > > > +       .driver.owner           = THIS_MODULE,
> > > > +       .id_table               = id_table,
> > > > +       .probe                  = virtio_pmem_probe,
> > > > +       .remove                 = virtio_pmem_remove,
> > > > +};
> > > > +
> > > > +module_virtio_driver(virtio_pmem_driver);
> > > > +MODULE_DEVICE_TABLE(virtio, id_table);
> > > > +MODULE_DESCRIPTION("Virtio pmem driver");
> > > > +MODULE_LICENSE("GPL");
> > > > diff --git a/include/linux/virtio_pmem.h b/include/linux/virtio_pmem.h
> > > > new file mode 100644
> > > > index 000000000000..ab1da877575d
> > > > --- /dev/null
> > > > +++ b/include/linux/virtio_pmem.h
> > >
> > > Why is this a global header?
> >
> > This is where other virtio driver headers are also placed.
> > I think this is to access uapi config file in :
> >
> > ./include/uapi/linux/virtio_pmem.h
> >
> > Is it okay if we keep 'virtio_pmem.h' in global header?
> 
> No, I don't think so. While virtio_console.h and virtio_net.h make
> sense as global headers because they are consumed from multiple
> drivers, there is no need for virtio_caif.h, for example, to be a
> global header. I see no practical reason that the private details of
> virtio_pmem.h need to be made available outside of the virtio_pmem.c
> consumer.

o.k. Will move it.

Best regards,
Pankaj
> 
> 

^ permalink raw reply	[flat|nested] 107+ messages in thread

end of thread, other threads:[~2019-05-11  1:42 UTC | newest]

Thread overview: 107+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-26  5:00 [PATCH v7 0/6] virtio pmem driver Pankaj Gupta
2019-04-26  5:00 ` [Qemu-devel] " Pankaj Gupta
2019-04-26  5:00 ` Pankaj Gupta
2019-04-26  5:00 ` Pankaj Gupta
2019-04-26  5:00 ` [PATCH v7 1/6] libnvdimm: nd_region flush callback support Pankaj Gupta
2019-04-26  5:00   ` [Qemu-devel] " Pankaj Gupta
2019-04-26  5:00   ` Pankaj Gupta
2019-04-26  5:00   ` Pankaj Gupta
2019-04-26  5:00 ` Pankaj Gupta
2019-04-26  5:00 ` [PATCH v7 2/6] virtio-pmem: Add virtio pmem driver Pankaj Gupta
2019-04-26  5:00 ` Pankaj Gupta
2019-04-26  5:00   ` [Qemu-devel] " Pankaj Gupta
2019-04-26  5:00   ` Pankaj Gupta
2019-04-26  5:00   ` Pankaj Gupta
     [not found]   ` <20190426050039.17460-3-pagupta-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2019-04-30  5:53     ` [Qemu-devel] " Yuval Shaia
2019-04-30  5:53       ` Yuval Shaia
2019-04-30  5:53       ` Yuval Shaia
2019-04-30  6:06       ` Pankaj Gupta
2019-04-30  6:06       ` Pankaj Gupta
2019-04-30  6:06         ` Pankaj Gupta
2019-04-30  6:06         ` Pankaj Gupta
2019-04-30  5:53   ` Yuval Shaia
2019-05-07 15:35   ` Dan Williams
2019-05-07 15:35   ` Dan Williams
2019-05-07 15:35     ` [Qemu-devel] " Dan Williams
2019-05-07 15:35     ` Dan Williams
2019-05-07 15:35     ` Dan Williams
     [not found]     ` <CAPcyv4hdT5bbgv0Gy1r0Xb3RMfE_Zpe7DV10a=F1PFeTeEt+Fw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-05-08 11:19       ` [Qemu-devel] " Pankaj Gupta
2019-05-08 11:19         ` Pankaj Gupta
2019-05-08 11:19         ` Pankaj Gupta
2019-05-10 23:33         ` Dan Williams
2019-05-10 23:33           ` Dan Williams
2019-05-10 23:33           ` Dan Williams
2019-05-10 23:33           ` Dan Williams
2019-05-11  1:26           ` Pankaj Gupta
2019-05-11  1:26           ` Pankaj Gupta
2019-05-11  1:26             ` Pankaj Gupta
2019-05-11  1:26             ` Pankaj Gupta
2019-05-08 11:19     ` Pankaj Gupta
2019-05-07 20:25   ` Jakub Staroń
2019-05-07 20:25     ` Jakub Staroń via Qemu-devel
2019-05-08 11:12     ` Pankaj Gupta
2019-05-08 11:12       ` Pankaj Gupta
2019-05-08 11:12       ` Pankaj Gupta
2019-05-08 11:12       ` Pankaj Gupta
2019-05-08 15:23       ` Pankaj Gupta
2019-05-08 15:23       ` Pankaj Gupta
2019-05-08 15:23         ` Pankaj Gupta
2019-05-08 15:23         ` Pankaj Gupta
2019-05-08 19:05       ` Jakub Staroń
2019-05-08 19:05         ` Jakub Staroń via Qemu-devel
2019-05-08 19:05         ` Jakub Staroń
2019-05-08 19:05       ` Jakub Staroń via Virtualization
2019-05-07 20:25   ` Jakub Staroń via Virtualization
2019-04-26  5:00 ` [PATCH v7 3/6] libnvdimm: add dax_dev sync flag Pankaj Gupta
2019-04-26  5:00   ` [Qemu-devel] " Pankaj Gupta
2019-04-26  5:00   ` Pankaj Gupta
2019-04-26  5:00   ` Pankaj Gupta
2019-05-07 15:40   ` Dan Williams
2019-05-07 15:40     ` [Qemu-devel] " Dan Williams
2019-05-07 15:40     ` Dan Williams
2019-05-07 15:40     ` Dan Williams
2019-05-09 12:24     ` Pankaj Gupta
2019-05-09 12:24       ` [Qemu-devel] " Pankaj Gupta
2019-05-09 12:24       ` Pankaj Gupta
2019-05-09 12:24     ` Pankaj Gupta
2019-05-07 15:40   ` Dan Williams
2019-04-26  5:00 ` Pankaj Gupta
2019-04-26  5:00 ` [PATCH v7 4/6] dax: check synchronous mapping is supported Pankaj Gupta
2019-04-26  5:00 ` Pankaj Gupta
2019-04-26  5:00   ` [Qemu-devel] " Pankaj Gupta
2019-04-26  5:00   ` Pankaj Gupta
2019-04-26  5:00   ` Pankaj Gupta
2019-04-26  5:00   ` Pankaj Gupta
2019-05-07 19:24   ` [Qemu-devel] " Jakub Staroń via Virtualization
2019-05-07 19:24   ` Jakub Staroń
2019-05-07 19:24     ` Jakub Staroń via Qemu-devel
2019-05-07 19:24     ` Jakub Staroń
2019-05-08  5:31     ` Pankaj Gupta
2019-05-08  5:31       ` Pankaj Gupta
2019-05-08  5:31       ` Pankaj Gupta
2019-05-08  5:31       ` Pankaj Gupta
2019-05-08  5:31     ` Pankaj Gupta
2019-04-26  5:00 ` [PATCH v7 5/6] ext4: disable map_sync for async flush Pankaj Gupta
2019-04-26  5:00   ` [Qemu-devel] " Pankaj Gupta
2019-04-26  5:00   ` Pankaj Gupta
2019-04-26  5:00   ` Pankaj Gupta
2019-04-26  5:00 ` Pankaj Gupta
2019-04-26  5:00 ` [PATCH v7 6/6] xfs: " Pankaj Gupta
2019-04-26  5:00   ` [Qemu-devel] " Pankaj Gupta
2019-04-26  5:00   ` Pankaj Gupta
2019-04-26  5:00   ` Pankaj Gupta
2019-05-07 15:37   ` Dan Williams
2019-05-07 15:37   ` Dan Williams
2019-05-07 15:37     ` [Qemu-devel] " Dan Williams
2019-05-07 15:37     ` Dan Williams
2019-05-07 15:37     ` Dan Williams
2019-05-07 16:17     ` Darrick J. Wong
2019-05-07 16:17       ` [Qemu-devel] " Darrick J. Wong
2019-05-07 16:17       ` Darrick J. Wong
2019-05-07 16:17       ` Darrick J. Wong
2019-05-08  5:49       ` [Qemu-devel] " Pankaj Gupta
2019-05-08  5:49       ` Pankaj Gupta
2019-05-08  5:49         ` Pankaj Gupta
2019-05-08  5:49         ` Pankaj Gupta
2019-05-08  5:49         ` Pankaj Gupta
2019-04-26  5:00 ` Pankaj Gupta

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.