All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@gmail.com>
To: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: ehabkost@redhat.com, kvm@vger.kernel.org, mst@redhat.com,
	gleb@kernel.org, mtosatti@redhat.com, qemu-devel@nongnu.org,
	stefanha@redhat.com, imammedo@redhat.com, pbonzini@redhat.com,
	dan.j.williams@intel.com, rth@twiddle.net
Subject: Re: [PATCH v8 0/5] implement vNVDIMM
Date: Mon, 30 Nov 2015 16:51:14 +0800	[thread overview]
Message-ID: <20151130085114.GE18454@stefanha-x1.localdomain> (raw)
In-Reply-To: <1447671063-92665-1-git-send-email-guangrong.xiao@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 13984 bytes --]

On Mon, Nov 16, 2015 at 06:50:58PM +0800, Xiao Guangrong wrote:
> This patchset can be found at:
>       https://github.com/xiaogr/qemu.git nvdimm-v8
> 
> It is based on pci branch on Michael's tree and the top commit is:
> commit e3a4e177d9 (migration/ram: fix build on 32 bit hosts).
> 
> Changelog in v8:
> We split the long patch series into the small parts, as you see now, this
> is the first part which enables NVDIMM without label data support.
> 
> The command line has been changed because some patches simplifying the
> things have not been included into this series, you should specify the
> file size exactly using the parameters as follows:
>    memory-backend-file,id=mem1,share,mem-path=/tmp/nvdimm1,size=10G \
>    -device nvdimm,memdev=mem1,id=nv1
> 
> Changelog in v7:
> - changes from Vladimir Sementsov-Ogievskiy's comments:
>   1) let gethugepagesize() realize if fstat is failed instead of get
>      normal page size
>   2) rename  open_file_path to open_ram_file_path
>   3) better log the error message by using error_setg_errno
>   4) update commit in the commit log to explain hugepage detection on
>      Windows
> 
> - changes from Eduardo Habkost's comments:
>   1) use 'Error**' to collect error message for qemu_file_get_page_size()
>   2) move gethugepagesize() replacement to the same patch to make it
>      better for review
>   3) introduce qemu_get_file_size to unity the code with raw_getlength()
> 
> - changes from Stefan's comments:
>   1) check the memory region is large enough to contain DSM output
>      buffer
> 
> - changes from Eric Blake's comments:
>   1) update the shell command in the commit log to generate the patch
>      which drops 'pc-dimm' prefix
>   
> - others:
>   pick up Reviewed-by from Stefan, Vladimir Sementsov-Ogievskiy, and
>   Eric Blake.
> 
> Changelog in v6:
> - changes from Stefan's comments:
>   1) fix code style of struct naming by CamelCase way
>   2) fix offset + length overflow when read/write label data
>   3) compile hw/acpi/nvdimm.c for per target so that TARGET_PAGE_SIZE can
>      be used to replace getpagesize()
> 
> Changelog in v5:
> - changes from Michael's comments:
>   1) prefix nvdimm_ to everything in NVDIMM source files
>   2) make parsing _DSM Arg3 more clear
>   3) comment style fix
>   5) drop single used definition
>   6) fix dirty dsm buffer lost due to memory write happened on host
>   7) check dsm buffer if it is big enough to contain input data
>   8) use build_append_int_noprefix to store single value to GArray
> 
> - changes from Michael's and Igor's comments:
>   1) introduce 'nvdimm-support' parameter to control nvdimm
>      enablement and it is disabled for 2.4 and its earlier versions
>      to make live migration compatible
>   2) only reserve 1 RAM page and 4 bytes IO Port for NVDIMM ACPI
>      virtualization
> 
> - changes from Stefan's comments:
>   1) do endian adjustment for the buffer length
> 
> - changes from Bharata B Rao's comments:
>   1) fix compile on ppc
> 
> - others:
>   1) the buffer length is directly got from IO read rather than got
>      from dsm memory
>   2) fix dirty label data lost due to memory write happened on host
> 
> Changelog in v4:
> - changes from Michael's comments:
>   1) show the message, "Memory is not allocated from HugeTlbfs", if file
>      based memory is not allocated from hugetlbfs.
>   2) introduce function, acpi_get_nvdimm_state(), to get NVDIMMState
>      from Machine.
>   3) statically define UUID and make its operation more clear
>   4) use GArray to build device structures to avoid potential buffer
>      overflow
>   4) improve comments in the code
>   5) improve code style
> 
> - changes from Igor's comments:
>   1) add NVDIMM ACPI spec document
>   2) use serialized method to avoid Mutex
>   3) move NVDIMM ACPI's code to hw/acpi/nvdimm.c
>   4) introduce a common ASL method used by _DSM for all devices to reduce
>      ACPI size
>   5) handle UUID in ACPI AML code. BTW, i'd keep handling revision in QEMU
>      it's better to upgrade QEMU to support Rev2 in the future
> 
> - changes from Stefan's comments:
>   1) copy input data from DSM memory to local buffer to avoid potential
>      issues as DSM memory is visible to guest. Output data is handled
>      in a similar way
> 
> - changes from Dan's comments:
>   1) drop static namespace as Linux has already supported label-less
>      nvdimm devices
> 
> - changes from Vladimir's comments:
>   1) print better message, "failed to get file size for %s, can't create
>      backend on it", if any file operation filed to obtain file size
> 
> - others:
>   create a git repo on github.com for better review/test
> 
> Also, thanks for Eric Blake's review on QAPI's side.
> 
> Thank all of you to review this patchset.
> 
> Changelog in v3:
> There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
> Michael for their valuable comments, the patchset finally gets better shape.
> - changes from Igor's comments:
>   1) abstract dimm device type from pc-dimm and create nvdimm device based on
>      dimm, then it uses memory backend device as nvdimm's memory and NUMA has
>      easily been implemented.
>   2) let file-backend device support any kind of filesystem not only for
>      hugetlbfs and let it work on file not only for directory which is
>      achieved by extending 'mem-path' - if it's a directory then it works as
>      current behavior, otherwise if it's file then directly allocates memory
>      from it.
>   3) we figure out a unused memory hole below 4G that is 0xFF00000 ~ 
>      0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
>      ACPI SSDT/DSDT table will break windows XP.
>      BTW, only make SSDT.rev = 2 can not work since the width is only depended
>      on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
>      in ACPI spec:
> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit 
> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
> | If the ComplianceRevision is less than 2, all integers are restricted to 32 
> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets 
> | the global integer width for all integers, including integers in SSDTs.
>   4) use the lowest ACPI spec version to document AML terms.
>   5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm"
> 
> - changes from Stefan's comments:
>   1) do not do endian adjustment in-place since _DSM memory is visible to guest
>   2) use target platform's target page size instead of fixed PAGE_SIZE
>      definition
>   3) lots of code style improvement and typo fixes.
>   4) live migration fix
> - changes from Paolo's comments:
>   1) improve the name of memory region
>   
> - other changes:
>   1) return exact buffer size for _DSM method instead of the page size.
>   2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm
>      devices.
>   3) NUMA support
>   4) implement _FIT method
>   5) rename "configdata" to "reserve-label-data"
>   6) simplify _DSM arg3 determination
>   7) main changelog update to let it reflect v3.
> 
> Changlog in v2:
> - Use litten endian for DSM method, thanks for Stefan's suggestion
> 
> - introduce a new parameter, @configdata, if it's false, Qemu will
>   build a static and readonly namespace in memory and use it serveing
>   for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
>   reserved region is needed at the end of the @file, it is good for
>   the user who want to pass whole nvdimm device and make its data
>   completely be visible to guest
> 
> - divide the source code into separated files and add maintain info
> 
> BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will
> be posted on next week
> 
> ====== Background ======
> NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
> on Intel's platform. They are discovered via ACPI and configured by _DSM
> method of NVDIMM device in ACPI. There has some supporting documents which
> can be found at:
> ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
> NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
> DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
> Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
> 
> Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
> this patchset tries to enable it in virtualization field
> 
> ====== Design ======
> NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's
> address space then CPU can directly access it as normal memory, another is
> BLK which is used as block device to reduce the occupying of CPU address
> space
> 
> BLK mode accesses NVDIMM via Command Register window and Data Register window.
> BLK virtualization has high workload since each sector access will cause at
> least two VM-EXIT. So we currently only imperilment vPMEM in this patchset
> 
> --- vPMEM design ---
> We introduce a new device named "nvdimm", it uses memory backend device as
> NVDIMM memory. The file in file-backend device can be a regular file and block 
> device. We can use any file when we do test or emulation, however,
> in the real word, the files passed to guest are:
> - the regular file in the filesystem with DAX enabled created on NVDIMM device
>   on host
> - the raw PMEM device on host, e,g /dev/pmem0
> Memory access on the address created by mmap on these kinds of files can
> directly reach NVDIMM device on host.
> 
> --- vConfigure data area design ---
> Each NVDIMM device has a configure data area which is used to store label
> namespace data. In order to emulating this area, we divide the file into two
> parts:
> - first parts is (0, size - 128K], which is used as PMEM
> - 128K at the end of the file, which is used as Label Data Area
> So that the label namespace data can be persistent during power lose or system
> failure.
> 
> We also support passing the whole file to guest without reserve any region for
> label data area which is achieved by "reserve-label-data" parameter - if it's
> false then QEMU will build static and readonly namespace in memory and that
> namespace contains the whole file size. The parameter is false on default.
> 
> --- _DSM method design ---
> _DSM in ACPI is used to configure NVDIMM, currently we only allow access of
> label namespace data, i.e, Get Namespace Label Size (Function Index 4),
> Get Namespace Label Data (Function Index 5) and Set Namespace Label Data
> (Function Index 6)
> 
> _DSM uses two pages to transfer data between ACPI and Qemu, the first page
> is RAM-based used to save the input info of _DSM method and Qemu reuse it
> store output info and another page is MMIO-based, ACPI write data to this
> page to transfer the control to Qemu
> 
> ====== Test ======
> In host
> 1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10
> 2) append "-object memory-backend-file,share,id=mem1,
>    mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data,
>    id=nv1" in QEMU command line
> 
> In guest, download the latest upsteam kernel (4.2 merge window) and enable
> ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
> 1) insmod drivers/nvdimm/libnvdimm.ko
> 2) insmod drivers/acpi/nfit.ko
> 3) insmod drivers/nvdimm/nd_btt.ko
> 4) insmod drivers/nvdimm/nd_pmem.ko
> You can see the whole nvdimm device used as a single namespace and /dev/pmem0
> appears. You can do whatever on /dev/pmem0 including DAX access.
> 
> Currently Linux NVDIMM driver does not support namespace operation on this
> kind of PMEM, apply below changes to support dynamical namespace:
> 
> @@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *a
>                         continue;
>                 }
>  
> -               if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> +               //if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> +               if (nfit_mem->memdev_pmem)
>                         flags |= NDD_ALIASING;
> 
> You can append another NVDIMM device in guest and do:                       
> # cd /sys/bus/nd/devices/
> # cd namespace1.0/
> # echo `uuidgen` > uuid
> # echo `expr 1024 \* 1024 \* 128` > size
> then reload nd.pmem.ko
> 
> You can see /dev/pmem1 appears
> 
> Xiao Guangrong (5):
>   nvdimm: implement NVDIMM device abstract
>   acpi: support specified oem table id for build_header
>   nvdimm acpi: build ACPI NFIT table
>   nvdimm acpi: build ACPI nvdimm devices
>   nvdimm: add maintain info
> 
>  MAINTAINERS                        |   7 +
>  default-configs/i386-softmmu.mak   |   2 +
>  default-configs/x86_64-softmmu.mak |   2 +
>  hw/acpi/Makefile.objs              |   1 +
>  hw/acpi/aml-build.c                |  15 +-
>  hw/acpi/ich9.c                     |  19 ++
>  hw/acpi/memory_hotplug.c           |   5 +
>  hw/acpi/nvdimm.c                   | 467 +++++++++++++++++++++++++++++++++++++
>  hw/acpi/piix4.c                    |   4 +
>  hw/arm/virt-acpi-build.c           |  13 +-
>  hw/i386/acpi-build.c               |  26 ++-
>  hw/mem/Makefile.objs               |   1 +
>  hw/mem/nvdimm.c                    |  46 ++++
>  include/hw/acpi/aml-build.h        |   3 +-
>  include/hw/acpi/ich9.h             |   3 +
>  include/hw/i386/pc.h               |  12 +-
>  include/hw/mem/nvdimm.h            |  41 ++++
>  17 files changed, 645 insertions(+), 22 deletions(-)
>  create mode 100644 hw/acpi/nvdimm.c
>  create mode 100644 hw/mem/nvdimm.c
>  create mode 100644 include/hw/mem/nvdimm.h

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Stefan Hajnoczi <stefanha@gmail.com>
To: Xiao Guangrong <guangrong.xiao@linux.intel.com>
Cc: ehabkost@redhat.com, kvm@vger.kernel.org, mst@redhat.com,
	gleb@kernel.org, mtosatti@redhat.com, qemu-devel@nongnu.org,
	stefanha@redhat.com, imammedo@redhat.com, pbonzini@redhat.com,
	dan.j.williams@intel.com, rth@twiddle.net
Subject: Re: [Qemu-devel] [PATCH v8 0/5] implement vNVDIMM
Date: Mon, 30 Nov 2015 16:51:14 +0800	[thread overview]
Message-ID: <20151130085114.GE18454@stefanha-x1.localdomain> (raw)
In-Reply-To: <1447671063-92665-1-git-send-email-guangrong.xiao@linux.intel.com>

[-- Attachment #1: Type: text/plain, Size: 13984 bytes --]

On Mon, Nov 16, 2015 at 06:50:58PM +0800, Xiao Guangrong wrote:
> This patchset can be found at:
>       https://github.com/xiaogr/qemu.git nvdimm-v8
> 
> It is based on pci branch on Michael's tree and the top commit is:
> commit e3a4e177d9 (migration/ram: fix build on 32 bit hosts).
> 
> Changelog in v8:
> We split the long patch series into the small parts, as you see now, this
> is the first part which enables NVDIMM without label data support.
> 
> The command line has been changed because some patches simplifying the
> things have not been included into this series, you should specify the
> file size exactly using the parameters as follows:
>    memory-backend-file,id=mem1,share,mem-path=/tmp/nvdimm1,size=10G \
>    -device nvdimm,memdev=mem1,id=nv1
> 
> Changelog in v7:
> - changes from Vladimir Sementsov-Ogievskiy's comments:
>   1) let gethugepagesize() realize if fstat is failed instead of get
>      normal page size
>   2) rename  open_file_path to open_ram_file_path
>   3) better log the error message by using error_setg_errno
>   4) update commit in the commit log to explain hugepage detection on
>      Windows
> 
> - changes from Eduardo Habkost's comments:
>   1) use 'Error**' to collect error message for qemu_file_get_page_size()
>   2) move gethugepagesize() replacement to the same patch to make it
>      better for review
>   3) introduce qemu_get_file_size to unity the code with raw_getlength()
> 
> - changes from Stefan's comments:
>   1) check the memory region is large enough to contain DSM output
>      buffer
> 
> - changes from Eric Blake's comments:
>   1) update the shell command in the commit log to generate the patch
>      which drops 'pc-dimm' prefix
>   
> - others:
>   pick up Reviewed-by from Stefan, Vladimir Sementsov-Ogievskiy, and
>   Eric Blake.
> 
> Changelog in v6:
> - changes from Stefan's comments:
>   1) fix code style of struct naming by CamelCase way
>   2) fix offset + length overflow when read/write label data
>   3) compile hw/acpi/nvdimm.c for per target so that TARGET_PAGE_SIZE can
>      be used to replace getpagesize()
> 
> Changelog in v5:
> - changes from Michael's comments:
>   1) prefix nvdimm_ to everything in NVDIMM source files
>   2) make parsing _DSM Arg3 more clear
>   3) comment style fix
>   5) drop single used definition
>   6) fix dirty dsm buffer lost due to memory write happened on host
>   7) check dsm buffer if it is big enough to contain input data
>   8) use build_append_int_noprefix to store single value to GArray
> 
> - changes from Michael's and Igor's comments:
>   1) introduce 'nvdimm-support' parameter to control nvdimm
>      enablement and it is disabled for 2.4 and its earlier versions
>      to make live migration compatible
>   2) only reserve 1 RAM page and 4 bytes IO Port for NVDIMM ACPI
>      virtualization
> 
> - changes from Stefan's comments:
>   1) do endian adjustment for the buffer length
> 
> - changes from Bharata B Rao's comments:
>   1) fix compile on ppc
> 
> - others:
>   1) the buffer length is directly got from IO read rather than got
>      from dsm memory
>   2) fix dirty label data lost due to memory write happened on host
> 
> Changelog in v4:
> - changes from Michael's comments:
>   1) show the message, "Memory is not allocated from HugeTlbfs", if file
>      based memory is not allocated from hugetlbfs.
>   2) introduce function, acpi_get_nvdimm_state(), to get NVDIMMState
>      from Machine.
>   3) statically define UUID and make its operation more clear
>   4) use GArray to build device structures to avoid potential buffer
>      overflow
>   4) improve comments in the code
>   5) improve code style
> 
> - changes from Igor's comments:
>   1) add NVDIMM ACPI spec document
>   2) use serialized method to avoid Mutex
>   3) move NVDIMM ACPI's code to hw/acpi/nvdimm.c
>   4) introduce a common ASL method used by _DSM for all devices to reduce
>      ACPI size
>   5) handle UUID in ACPI AML code. BTW, i'd keep handling revision in QEMU
>      it's better to upgrade QEMU to support Rev2 in the future
> 
> - changes from Stefan's comments:
>   1) copy input data from DSM memory to local buffer to avoid potential
>      issues as DSM memory is visible to guest. Output data is handled
>      in a similar way
> 
> - changes from Dan's comments:
>   1) drop static namespace as Linux has already supported label-less
>      nvdimm devices
> 
> - changes from Vladimir's comments:
>   1) print better message, "failed to get file size for %s, can't create
>      backend on it", if any file operation filed to obtain file size
> 
> - others:
>   create a git repo on github.com for better review/test
> 
> Also, thanks for Eric Blake's review on QAPI's side.
> 
> Thank all of you to review this patchset.
> 
> Changelog in v3:
> There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
> Michael for their valuable comments, the patchset finally gets better shape.
> - changes from Igor's comments:
>   1) abstract dimm device type from pc-dimm and create nvdimm device based on
>      dimm, then it uses memory backend device as nvdimm's memory and NUMA has
>      easily been implemented.
>   2) let file-backend device support any kind of filesystem not only for
>      hugetlbfs and let it work on file not only for directory which is
>      achieved by extending 'mem-path' - if it's a directory then it works as
>      current behavior, otherwise if it's file then directly allocates memory
>      from it.
>   3) we figure out a unused memory hole below 4G that is 0xFF00000 ~ 
>      0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
>      ACPI SSDT/DSDT table will break windows XP.
>      BTW, only make SSDT.rev = 2 can not work since the width is only depended
>      on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
>      in ACPI spec:
> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit 
> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
> | If the ComplianceRevision is less than 2, all integers are restricted to 32 
> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets 
> | the global integer width for all integers, including integers in SSDTs.
>   4) use the lowest ACPI spec version to document AML terms.
>   5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm"
> 
> - changes from Stefan's comments:
>   1) do not do endian adjustment in-place since _DSM memory is visible to guest
>   2) use target platform's target page size instead of fixed PAGE_SIZE
>      definition
>   3) lots of code style improvement and typo fixes.
>   4) live migration fix
> - changes from Paolo's comments:
>   1) improve the name of memory region
>   
> - other changes:
>   1) return exact buffer size for _DSM method instead of the page size.
>   2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm
>      devices.
>   3) NUMA support
>   4) implement _FIT method
>   5) rename "configdata" to "reserve-label-data"
>   6) simplify _DSM arg3 determination
>   7) main changelog update to let it reflect v3.
> 
> Changlog in v2:
> - Use litten endian for DSM method, thanks for Stefan's suggestion
> 
> - introduce a new parameter, @configdata, if it's false, Qemu will
>   build a static and readonly namespace in memory and use it serveing
>   for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
>   reserved region is needed at the end of the @file, it is good for
>   the user who want to pass whole nvdimm device and make its data
>   completely be visible to guest
> 
> - divide the source code into separated files and add maintain info
> 
> BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will
> be posted on next week
> 
> ====== Background ======
> NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
> on Intel's platform. They are discovered via ACPI and configured by _DSM
> method of NVDIMM device in ACPI. There has some supporting documents which
> can be found at:
> ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
> NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
> DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
> Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
> 
> Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
> this patchset tries to enable it in virtualization field
> 
> ====== Design ======
> NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's
> address space then CPU can directly access it as normal memory, another is
> BLK which is used as block device to reduce the occupying of CPU address
> space
> 
> BLK mode accesses NVDIMM via Command Register window and Data Register window.
> BLK virtualization has high workload since each sector access will cause at
> least two VM-EXIT. So we currently only imperilment vPMEM in this patchset
> 
> --- vPMEM design ---
> We introduce a new device named "nvdimm", it uses memory backend device as
> NVDIMM memory. The file in file-backend device can be a regular file and block 
> device. We can use any file when we do test or emulation, however,
> in the real word, the files passed to guest are:
> - the regular file in the filesystem with DAX enabled created on NVDIMM device
>   on host
> - the raw PMEM device on host, e,g /dev/pmem0
> Memory access on the address created by mmap on these kinds of files can
> directly reach NVDIMM device on host.
> 
> --- vConfigure data area design ---
> Each NVDIMM device has a configure data area which is used to store label
> namespace data. In order to emulating this area, we divide the file into two
> parts:
> - first parts is (0, size - 128K], which is used as PMEM
> - 128K at the end of the file, which is used as Label Data Area
> So that the label namespace data can be persistent during power lose or system
> failure.
> 
> We also support passing the whole file to guest without reserve any region for
> label data area which is achieved by "reserve-label-data" parameter - if it's
> false then QEMU will build static and readonly namespace in memory and that
> namespace contains the whole file size. The parameter is false on default.
> 
> --- _DSM method design ---
> _DSM in ACPI is used to configure NVDIMM, currently we only allow access of
> label namespace data, i.e, Get Namespace Label Size (Function Index 4),
> Get Namespace Label Data (Function Index 5) and Set Namespace Label Data
> (Function Index 6)
> 
> _DSM uses two pages to transfer data between ACPI and Qemu, the first page
> is RAM-based used to save the input info of _DSM method and Qemu reuse it
> store output info and another page is MMIO-based, ACPI write data to this
> page to transfer the control to Qemu
> 
> ====== Test ======
> In host
> 1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10
> 2) append "-object memory-backend-file,share,id=mem1,
>    mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data,
>    id=nv1" in QEMU command line
> 
> In guest, download the latest upsteam kernel (4.2 merge window) and enable
> ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
> 1) insmod drivers/nvdimm/libnvdimm.ko
> 2) insmod drivers/acpi/nfit.ko
> 3) insmod drivers/nvdimm/nd_btt.ko
> 4) insmod drivers/nvdimm/nd_pmem.ko
> You can see the whole nvdimm device used as a single namespace and /dev/pmem0
> appears. You can do whatever on /dev/pmem0 including DAX access.
> 
> Currently Linux NVDIMM driver does not support namespace operation on this
> kind of PMEM, apply below changes to support dynamical namespace:
> 
> @@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *a
>                         continue;
>                 }
>  
> -               if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> +               //if (nfit_mem->bdw && nfit_mem->memdev_pmem)
> +               if (nfit_mem->memdev_pmem)
>                         flags |= NDD_ALIASING;
> 
> You can append another NVDIMM device in guest and do:                       
> # cd /sys/bus/nd/devices/
> # cd namespace1.0/
> # echo `uuidgen` > uuid
> # echo `expr 1024 \* 1024 \* 128` > size
> then reload nd.pmem.ko
> 
> You can see /dev/pmem1 appears
> 
> Xiao Guangrong (5):
>   nvdimm: implement NVDIMM device abstract
>   acpi: support specified oem table id for build_header
>   nvdimm acpi: build ACPI NFIT table
>   nvdimm acpi: build ACPI nvdimm devices
>   nvdimm: add maintain info
> 
>  MAINTAINERS                        |   7 +
>  default-configs/i386-softmmu.mak   |   2 +
>  default-configs/x86_64-softmmu.mak |   2 +
>  hw/acpi/Makefile.objs              |   1 +
>  hw/acpi/aml-build.c                |  15 +-
>  hw/acpi/ich9.c                     |  19 ++
>  hw/acpi/memory_hotplug.c           |   5 +
>  hw/acpi/nvdimm.c                   | 467 +++++++++++++++++++++++++++++++++++++
>  hw/acpi/piix4.c                    |   4 +
>  hw/arm/virt-acpi-build.c           |  13 +-
>  hw/i386/acpi-build.c               |  26 ++-
>  hw/mem/Makefile.objs               |   1 +
>  hw/mem/nvdimm.c                    |  46 ++++
>  include/hw/acpi/aml-build.h        |   3 +-
>  include/hw/acpi/ich9.h             |   3 +
>  include/hw/i386/pc.h               |  12 +-
>  include/hw/mem/nvdimm.h            |  41 ++++
>  17 files changed, 645 insertions(+), 22 deletions(-)
>  create mode 100644 hw/acpi/nvdimm.c
>  create mode 100644 hw/mem/nvdimm.c
>  create mode 100644 include/hw/mem/nvdimm.h

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

  parent reply	other threads:[~2015-11-30  8:51 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-16 10:50 [PATCH v8 0/5] implement vNVDIMM Xiao Guangrong
2015-11-16 10:50 ` [Qemu-devel] " Xiao Guangrong
2015-11-16 10:50 ` [PATCH v8 1/5] nvdimm: implement NVDIMM device abstract Xiao Guangrong
2015-11-16 10:50   ` [Qemu-devel] " Xiao Guangrong
2015-11-16 10:51 ` [PATCH v8 2/5] acpi: support specified oem table id for build_header Xiao Guangrong
2015-11-16 10:51   ` [Qemu-devel] " Xiao Guangrong
2015-11-16 10:51 ` [PATCH v8 3/5] nvdimm acpi: build ACPI NFIT table Xiao Guangrong
2015-11-16 10:51   ` [Qemu-devel] " Xiao Guangrong
2015-11-30 10:30   ` Michael S. Tsirkin
2015-11-30 10:30     ` [Qemu-devel] " Michael S. Tsirkin
2015-11-30 12:29     ` Xiao Guangrong
2015-11-30 12:29       ` [Qemu-devel] " Xiao Guangrong
2015-11-16 10:51 ` [PATCH v8 4/5] nvdimm acpi: build ACPI nvdimm devices Xiao Guangrong
2015-11-16 10:51   ` [Qemu-devel] " Xiao Guangrong
2015-11-30 10:30   ` Michael S. Tsirkin
2015-11-30 10:30     ` [Qemu-devel] " Michael S. Tsirkin
2015-11-30 12:21     ` Xiao Guangrong
2015-11-30 12:21       ` [Qemu-devel] " Xiao Guangrong
2015-11-30 10:32   ` Michael S. Tsirkin
2015-11-30 10:32     ` [Qemu-devel] " Michael S. Tsirkin
2015-11-30 12:31     ` Xiao Guangrong
2015-11-30 12:31       ` [Qemu-devel] " Xiao Guangrong
2015-11-16 10:51 ` [PATCH v8 5/5] nvdimm: add maintain info Xiao Guangrong
2015-11-16 10:51   ` [Qemu-devel] " Xiao Guangrong
2015-11-18  1:59 ` [PATCH v8 0/5] implement vNVDIMM Xiao Guangrong
2015-11-18  1:59   ` [Qemu-devel] " Xiao Guangrong
2015-11-18 19:18   ` Eduardo Habkost
2015-11-18 19:18     ` [Qemu-devel] " Eduardo Habkost
2015-11-18 20:44     ` Michael S. Tsirkin
2015-11-18 20:44       ` [Qemu-devel] " Michael S. Tsirkin
2015-11-19  2:39       ` Xiao Guangrong
2015-11-19  2:39         ` [Qemu-devel] " Xiao Guangrong
2015-11-19  8:21         ` Michael S. Tsirkin
2015-11-19  8:21           ` [Qemu-devel] " Michael S. Tsirkin
2015-11-23  8:53         ` Stefan Hajnoczi
2015-11-23  8:53           ` [Qemu-devel] " Stefan Hajnoczi
2015-11-30  8:51 ` Stefan Hajnoczi [this message]
2015-11-30  8:51   ` Stefan Hajnoczi
2015-11-30 12:34   ` Xiao Guangrong
2015-11-30 12:34     ` [Qemu-devel] " Xiao Guangrong
2015-11-30 10:38 ` Michael S. Tsirkin
2015-11-30 10:38   ` [Qemu-devel] " Michael S. Tsirkin
2015-11-30 12:33   ` Xiao Guangrong
2015-11-30 12:33     ` [Qemu-devel] " Xiao Guangrong
2015-12-04 16:38 ` Vladimir Sementsov-Ogievskiy
2015-12-04 16:38   ` Vladimir Sementsov-Ogievskiy
2015-12-05  4:29   ` Xiao Guangrong
2015-12-05  4:29     ` [Qemu-devel] " Xiao Guangrong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151130085114.GE18454@stefanha-x1.localdomain \
    --to=stefanha@gmail.com \
    --cc=dan.j.williams@intel.com \
    --cc=ehabkost@redhat.com \
    --cc=gleb@kernel.org \
    --cc=guangrong.xiao@linux.intel.com \
    --cc=imammedo@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.