NVDIMM Device and Persistent Memory development
 help / color / Atom feed
From: Shivaprasad G Bhat <sbhat@linux.ibm.com>
To: sbhat@linux.vnet.ibm.com, david@gibson.dropbear.id.au,
	groug@kaod.org, qemu-ppc@nongnu.org, ehabkost@redhat.com,
	marcel.apfelbaum@gmail.com, mst@redhat.com, imammedo@redhat.com,
	xiaoguangrong.eric@gmail.com
Cc: qemu-devel@nongnu.org, aneesh.kumar@linux.ibm.com,
	linux-nvdimm@lists.01.org, kvm-ppc@vger.kernel.org,
	shivaprasadbhat@gmail.com, bharata@linux.vnet.ibm.com
Subject: [PATCH v3 0/3] spapr: nvdimm: Enable sync-dax property for nvdimm
Date: Tue, 23 Mar 2021 09:47:10 -0400
Message-ID: <161650723087.2959.8703728357980727008.stgit@6532096d84d3> (raw)

The nvdimm devices are expected to ensure write persistence during power
failure kind of scenarios.

The libpmem has architecture specific instructions like dcbf on power
to flush the cache data to backend nvdimm device during normal writes.

Qemu - virtual nvdimm devices are memory mapped. The dcbf in the guest
doesn't traslate to actual flush to the backend file on the host in case
of file backed v-nvdimms. This is addressed by virtio-pmem in case of x86_64
by making explicit flushes translating to fdatasync at qemu.

On PAPR, the issue is addressed by adding a new hcall to
request for an explicit flush from the guest ndctl driver when the backend
nvdimm cannot ensure write persistence with dcbf alone. So, the approach
here is to convey when the hcall flush is required in a device tree
property. The guest makes the hcall when the property is found, instead
of relying on dcbf.

The first patch adds the necessary asynchronous hcall support infrastructure
code at the DRC level. Second patch implements the hcall using the
infrastructure.

Hcall number and semantics finalized, so dropping the RFC prefix.

A new device property sync-dax is added to the nvdimm device. When the 
sync-dax is off(default), device property "hcall-flush-required" is set,
and the guest makes hcall H_SCM_FLUSH requesting for an explicit flush. 

By default, sync-dax is "off" on all new pseries machines and prior to
5.2 its "on",

The below demonstration shows the map_sync behavior with sync-dax on & off.
(https://github.com/avocado-framework-tests/avocado-misc-tests/blob/master/memory/ndctl.py.data/map_sync.c)

The pmem0 is from nvdimm with With sync-dax=on, and pmem1 is from nvdimm with syn-dax=off, mounted as
/dev/pmem0 on /mnt1 type xfs (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota)
/dev/pmem1 on /mnt2 type xfs (rw,relatime,attr2,dax=always,inode64,logbufs=8,logbsize=32k,noquota)

[root@atest-guest ~]# ./mapsync /mnt1/newfile    ----> When sync-dax=off
[root@atest-guest ~]# ./mapsync /mnt2/newfile    ----> when sync-dax=on
Failed to mmap  with Operation not supported

The first patch does the header file cleanup necessary for the
subsequent ones. Second patch implements the hcall, adds the necessary
vmstate properties to spapr machine structure for carrying the hcall
status during save-restore. The nature of the hcall being asynchronus,
the patch uses aio utilities to offload the flush. The third patch adds
the 'sync-dax' device property and enables the device tree property
for the guest to utilise the hcall.

---
v2 - https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg07031.html
Changes from v2:
      - Using the thread pool based approach as suggested by Greg
      - Moved the async hcall handling code to spapr_nvdimm.c along
        with some simplifications
      - Added vmstate to preserve the hcall status during save-restore
        along with pre_save handler code to complete all ongoning flushes.
      - Added hw_compat magic for sync-dax 'on' on previous machines.
      - Miscellanious minor fixes.

v1 - https://lists.gnu.org/archive/html/qemu-devel/2020-11/msg06330.html
Changes from v1
      - Fixed a missed-out unlock
      - using QLIST_FOREACH instead of QLIST_FOREACH_SAFE while generating token

Shivaprasad G Bhat (3):
      spapr: nvdimm: Forward declare and move the definitions
      spapr: nvdimm: Impletment scm flush hcall
      spapr: nvdimm: Enable sync-dax device property for nvdimm


 hw/core/machine.c             |    1 
 hw/mem/nvdimm.c               |    1 
 hw/ppc/spapr.c                |    6 +
 hw/ppc/spapr_nvdimm.c         |  269 +++++++++++++++++++++++++++++++++++++++++
 include/hw/mem/nvdimm.h       |   10 ++
 include/hw/ppc/spapr.h        |   12 ++
 include/hw/ppc/spapr_nvdimm.h |   34 +++--
 7 files changed, 317 insertions(+), 16 deletions(-)

--
Signature


_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

             reply index

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-23 13:47 Shivaprasad G Bhat [this message]
2021-03-23 13:47 ` [PATCH v3 1/3] spapr: nvdimm: Forward declare and move the definitions Shivaprasad G Bhat
2021-03-24  2:30   ` David Gibson
2021-03-23 13:47 ` [PATCH v3 2/3] spapr: nvdimm: Implement H_SCM_FLUSH hcall Shivaprasad G Bhat
2021-03-24  3:07   ` David Gibson
2021-03-24  4:04     ` Aneesh Kumar K.V
2021-03-25  1:51       ` David Gibson
2021-03-26 13:45         ` Shivaprasad G Bhat
2021-03-29  9:23     ` Shivaprasad G Bhat
2021-03-30 23:57       ` David Gibson
2021-03-23 13:47 ` [PATCH v3 3/3] spapr: nvdimm: Enable sync-dax device property for nvdimm Shivaprasad G Bhat
2021-03-24  3:09   ` David Gibson
2021-03-24  4:09     ` Aneesh Kumar K.V

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=161650723087.2959.8703728357980727008.stgit@6532096d84d3 \
    --to=sbhat@linux.ibm.com \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=bharata@linux.vnet.ibm.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=ehabkost@redhat.com \
    --cc=groug@kaod.org \
    --cc=imammedo@redhat.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=sbhat@linux.vnet.ibm.com \
    --cc=shivaprasadbhat@gmail.com \
    --cc=xiaoguangrong.eric@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

NVDIMM Device and Persistent Memory development

Archives are clonable:
	git clone --mirror https://lore.kernel.org/nvdimm/0 nvdimm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 nvdimm nvdimm/ https://lore.kernel.org/nvdimm \
		nvdimm@lists.linux.dev
	public-inbox-index nvdimm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/dev.linux.lists.nvdimm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git