All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/35] implement vNVDIMM
@ 2015-11-02  9:13 ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

This patchset can be found at:
      https://github.com/xiaogr/qemu.git nvdimm-v7

It is based on pci branch on Michael's tree and the top commit is:
commit 6f96a31a06c2a1 (tests: re-enable vhost-user-test).

Changelog in v7:
- changes from Vladimir Sementsov-Ogievskiy's comments:
  1) let gethugepagesize() realize if fstat is failed instead of get
     normal page size
  2) rename  open_file_path to open_ram_file_path
  3) better log the error message by using error_setg_errno
  4) update commit in the commit log to explain hugepage detection on
     Windows

- changes from Eduardo Habkost's comments:
  1) use 'Error**' to collect error message for qemu_file_get_page_size()
  2) move gethugepagesize() replacement to the same patch to make it
     better for review
  3) introduce qemu_get_file_size to unity the code with raw_getlength()

- changes from Stefan's comments:
  1) check the memory region is large enough to contain DSM output
     buffer

- changes from Eric Blake's comments:
  1) update the shell command in the commit log to generate the patch
     which drops 'pc-dimm' prefix
  
- others:
  pick up Reviewed-by from Stefan, Vladimir Sementsov-Ogievskiy, and
  Eric Blake.

Changelog in v6:
- changes from Stefan's comments:
  1) fix code style of struct naming by CamelCase way
  2) fix offset + length overflow when read/write label data
  3) compile hw/acpi/nvdimm.c for per target so that TARGET_PAGE_SIZE can
     be used to replace getpagesize()

Changelog in v5:
- changes from Michael's comments:
  1) prefix nvdimm_ to everything in NVDIMM source files
  2) make parsing _DSM Arg3 more clear
  3) comment style fix
  5) drop single used definition
  6) fix dirty dsm buffer lost due to memory write happened on host
  7) check dsm buffer if it is big enough to contain input data
  8) use build_append_int_noprefix to store single value to GArray

- changes from Michael's and Igor's comments:
  1) introduce 'nvdimm-support' parameter to control nvdimm
     enablement and it is disabled for 2.4 and its earlier versions
     to make live migration compatible
  2) only reserve 1 RAM page and 4 bytes IO Port for NVDIMM ACPI
     virtualization

- changes from Stefan's comments:
  1) do endian adjustment for the buffer length

- changes from Bharata B Rao's comments:
  1) fix compile on ppc

- others:
  1) the buffer length is directly got from IO read rather than got
     from dsm memory
  2) fix dirty label data lost due to memory write happened on host

Changelog in v4:
- changes from Michael's comments:
  1) show the message, "Memory is not allocated from HugeTlbfs", if file
     based memory is not allocated from hugetlbfs.
  2) introduce function, acpi_get_nvdimm_state(), to get NVDIMMState
     from Machine.
  3) statically define UUID and make its operation more clear
  4) use GArray to build device structures to avoid potential buffer
     overflow
  4) improve comments in the code
  5) improve code style

- changes from Igor's comments:
  1) add NVDIMM ACPI spec document
  2) use serialized method to avoid Mutex
  3) move NVDIMM ACPI's code to hw/acpi/nvdimm.c
  4) introduce a common ASL method used by _DSM for all devices to reduce
     ACPI size
  5) handle UUID in ACPI AML code. BTW, i'd keep handling revision in QEMU
     it's better to upgrade QEMU to support Rev2 in the future

- changes from Stefan's comments:
  1) copy input data from DSM memory to local buffer to avoid potential
     issues as DSM memory is visible to guest. Output data is handled
     in a similar way

- changes from Dan's comments:
  1) drop static namespace as Linux has already supported label-less
     nvdimm devices

- changes from Vladimir's comments:
  1) print better message, "failed to get file size for %s, can't create
     backend on it", if any file operation filed to obtain file size

- others:
  create a git repo on github.com for better review/test

Also, thanks for Eric Blake's review on QAPI's side.

Thank all of you to review this patchset.

Changelog in v3:
There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
Michael for their valuable comments, the patchset finally gets better shape.
- changes from Igor's comments:
  1) abstract dimm device type from pc-dimm and create nvdimm device based on
     dimm, then it uses memory backend device as nvdimm's memory and NUMA has
     easily been implemented.
  2) let file-backend device support any kind of filesystem not only for
     hugetlbfs and let it work on file not only for directory which is
     achieved by extending 'mem-path' - if it's a directory then it works as
     current behavior, otherwise if it's file then directly allocates memory
     from it.
  3) we figure out a unused memory hole below 4G that is 0xFF00000 ~ 
     0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
     ACPI SSDT/DSDT table will break windows XP.
     BTW, only make SSDT.rev = 2 can not work since the width is only depended
     on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
     in ACPI spec:
| Note: For compatibility with ACPI versions before ACPI 2.0, the bit 
| width of Integer objects is dependent on the ComplianceRevision of the DSDT.
| If the ComplianceRevision is less than 2, all integers are restricted to 32 
| bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets 
| the global integer width for all integers, including integers in SSDTs.
  4) use the lowest ACPI spec version to document AML terms.
  5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm"

- changes from Stefan's comments:
  1) do not do endian adjustment in-place since _DSM memory is visible to guest
  2) use target platform's target page size instead of fixed PAGE_SIZE
     definition
  3) lots of code style improvement and typo fixes.
  4) live migration fix
- changes from Paolo's comments:
  1) improve the name of memory region
  
- other changes:
  1) return exact buffer size for _DSM method instead of the page size.
  2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm
     devices.
  3) NUMA support
  4) implement _FIT method
  5) rename "configdata" to "reserve-label-data"
  6) simplify _DSM arg3 determination
  7) main changelog update to let it reflect v3.

Changlog in v2:
- Use litten endian for DSM method, thanks for Stefan's suggestion

- introduce a new parameter, @configdata, if it's false, Qemu will
  build a static and readonly namespace in memory and use it serveing
  for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
  reserved region is needed at the end of the @file, it is good for
  the user who want to pass whole nvdimm device and make its data
  completely be visible to guest

- divide the source code into separated files and add maintain info

BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will
be posted on next week

====== Background ======
NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
on Intel's platform. They are discovered via ACPI and configured by _DSM
method of NVDIMM device in ACPI. There has some supporting documents which
can be found at:
ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf

Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
this patchset tries to enable it in virtualization field

====== Design ======
NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's
address space then CPU can directly access it as normal memory, another is
BLK which is used as block device to reduce the occupying of CPU address
space

BLK mode accesses NVDIMM via Command Register window and Data Register window.
BLK virtualization has high workload since each sector access will cause at
least two VM-EXIT. So we currently only imperilment vPMEM in this patchset

--- vPMEM design ---
We introduce a new device named "nvdimm", it uses memory backend device as
NVDIMM memory. The file in file-backend device can be a regular file and block 
device. We can use any file when we do test or emulation, however,
in the real word, the files passed to guest are:
- the regular file in the filesystem with DAX enabled created on NVDIMM device
  on host
- the raw PMEM device on host, e,g /dev/pmem0
Memory access on the address created by mmap on these kinds of files can
directly reach NVDIMM device on host.

--- vConfigure data area design ---
Each NVDIMM device has a configure data area which is used to store label
namespace data. In order to emulating this area, we divide the file into two
parts:
- first parts is (0, size - 128K], which is used as PMEM
- 128K at the end of the file, which is used as Label Data Area
So that the label namespace data can be persistent during power lose or system
failure.

We also support passing the whole file to guest without reserve any region for
label data area which is achieved by "reserve-label-data" parameter - if it's
false then QEMU will build static and readonly namespace in memory and that
namespace contains the whole file size. The parameter is false on default.

--- _DSM method design ---
_DSM in ACPI is used to configure NVDIMM, currently we only allow access of
label namespace data, i.e, Get Namespace Label Size (Function Index 4),
Get Namespace Label Data (Function Index 5) and Set Namespace Label Data
(Function Index 6)

_DSM uses two pages to transfer data between ACPI and Qemu, the first page
is RAM-based used to save the input info of _DSM method and Qemu reuse it
store output info and another page is MMIO-based, ACPI write data to this
page to transfer the control to Qemu

====== Test ======
In host
1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10
2) append "-object memory-backend-file,share,id=mem1,
   mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data,
   id=nv1" in QEMU command line

In guest, download the latest upsteam kernel (4.2 merge window) and enable
ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
1) insmod drivers/nvdimm/libnvdimm.ko
2) insmod drivers/acpi/nfit.ko
3) insmod drivers/nvdimm/nd_btt.ko
4) insmod drivers/nvdimm/nd_pmem.ko
You can see the whole nvdimm device used as a single namespace and /dev/pmem0
appears. You can do whatever on /dev/pmem0 including DAX access.

Currently Linux NVDIMM driver does not support namespace operation on this
kind of PMEM, apply below changes to support dynamical namespace:

@@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *a
                        continue;
                }
 
-               if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+               //if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+               if (nfit_mem->memdev_pmem)
                        flags |= NDD_ALIASING;

You can append another NVDIMM device in guest and do:                       
# cd /sys/bus/nd/devices/
# cd namespace1.0/
# echo `uuidgen` > uuid
# echo `expr 1024 \* 1024 \* 128` > size
then reload nd.pmem.ko

You can see /dev/pmem1 appears

Xiao Guangrong (35):
  acpi: add aml_derefof
  acpi: add aml_sizeof
  acpi: add aml_create_field
  acpi: add aml_concatenate
  acpi: add aml_object_type
  acpi: add aml_method_serialized
  util: introduce qemu_file_get_page_size()
  exec: allow memory to be allocated from any kind of path
  exec: allow file_ram_alloc to work on file
  hostmem-file: clean up memory allocation
  util: introduce qemu_file_getlength()
  util: let qemu_fd_getlength support block device
  hostmem-file: use whole file size if possible
  pc-dimm: remove DEFAULT_PC_DIMMSIZE
  pc-dimm: make pc_existing_dimms_capacity static and rename it
  pc-dimm: drop the prefix of pc-dimm
  stubs: rename qmp_pc_dimm_device_list.c
  pc-dimm: rename pc-dimm.c and pc-dimm.h
  dimm: abstract dimm device from pc-dimm
  dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
  dimm: keep the state of the whole backend memory
  dimm: introduce realize callback
  nvdimm: implement NVDIMM device abstract
  docs: add NVDIMM ACPI documentation
  nvdimm acpi: init the resource used by NVDIMM ACPI
  nvdimm acpi: build ACPI NFIT table
  nvdimm acpi: build ACPI nvdimm devices
  nvdimm acpi: save arg3 for NVDIMM device _DSM method
  nvdimm acpi: support function 0
  nvdimm acpi: support Get Namespace Label Size function
  nvdimm acpi: support Get Namespace Label Data function
  nvdimm acpi: support Set Namespace Label Data function
  nvdimm: allow using whole backend memory as pmem
  nvdimm acpi: support _FIT method
  nvdimm: add maintain info

 MAINTAINERS                                        |    7 +
 backends/hostmem-file.c                            |   33 +-
 block/raw-posix.c                                  |    7 +-
 default-configs/i386-softmmu.mak                   |    3 +
 default-configs/mips-softmmu.mak                   |    1 +
 default-configs/mips64-softmmu.mak                 |    1 +
 default-configs/mips64el-softmmu.mak               |    1 +
 default-configs/mipsel-softmmu.mak                 |    1 +
 default-configs/ppc64-softmmu.mak                  |    1 +
 default-configs/x86_64-softmmu.mak                 |    3 +
 docs/specs/acpi_nvdimm.txt                         |  179 +++
 exec.c                                             |  109 +-
 hmp.c                                              |    2 +-
 hw/Makefile.objs                                   |    2 +-
 hw/acpi/Makefile.objs                              |    1 +
 hw/acpi/aml-build.c                                |   79 +-
 hw/acpi/ich9.c                                     |   32 +-
 hw/acpi/memory_hotplug.c                           |   26 +-
 hw/acpi/nvdimm.c                                   | 1134 ++++++++++++++++++++
 hw/acpi/piix4.c                                    |   35 +-
 hw/i386/acpi-build.c                               |    6 +
 hw/i386/pc.c                                       |   34 +-
 hw/mem/Makefile.objs                               |    2 +
 hw/mem/{pc-dimm.c => dimm.c}                       |  240 +++--
 hw/mem/nvdimm.c                                    |  145 +++
 hw/mem/pc-dimm.c                                   |  520 +--------
 hw/ppc/spapr.c                                     |   20 +-
 include/hw/acpi/aml-build.h                        |    7 +
 include/hw/acpi/ich9.h                             |    3 +
 include/hw/i386/pc.h                               |   12 +-
 include/hw/mem/dimm.h                              |   95 ++
 include/hw/mem/nvdimm.h                            |  133 +++
 include/hw/mem/pc-dimm.h                           |  104 +-
 include/hw/ppc/spapr.h                             |    2 +-
 include/qemu/osdep.h                               |    3 +
 numa.c                                             |    4 +-
 qapi-schema.json                                   |    8 +-
 qmp.c                                              |    4 +-
 stubs/Makefile.objs                                |    2 +-
 ...c_dimm_device_list.c => qmp_dimm_device_list.c} |    4 +-
 target-ppc/kvm.c                                   |   23 +-
 trace-events                                       |    8 +-
 util/osdep.c                                       |   51 +
 util/oslib-posix.c                                 |   37 +-
 util/oslib-win32.c                                 |    5 +
 45 files changed, 2284 insertions(+), 845 deletions(-)
 create mode 100644 docs/specs/acpi_nvdimm.txt
 create mode 100644 hw/acpi/nvdimm.c
 rename hw/mem/{pc-dimm.c => dimm.c} (65%)
 create mode 100644 hw/mem/nvdimm.c
 rewrite hw/mem/pc-dimm.c (91%)
 create mode 100644 include/hw/mem/dimm.h
 create mode 100644 include/hw/mem/nvdimm.h
 rewrite include/hw/mem/pc-dimm.h (97%)
 rename stubs/{qmp_pc_dimm_device_list.c => qmp_dimm_device_list.c} (56%)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 00/35] implement vNVDIMM
@ 2015-11-02  9:13 ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

This patchset can be found at:
      https://github.com/xiaogr/qemu.git nvdimm-v7

It is based on pci branch on Michael's tree and the top commit is:
commit 6f96a31a06c2a1 (tests: re-enable vhost-user-test).

Changelog in v7:
- changes from Vladimir Sementsov-Ogievskiy's comments:
  1) let gethugepagesize() realize if fstat is failed instead of get
     normal page size
  2) rename  open_file_path to open_ram_file_path
  3) better log the error message by using error_setg_errno
  4) update commit in the commit log to explain hugepage detection on
     Windows

- changes from Eduardo Habkost's comments:
  1) use 'Error**' to collect error message for qemu_file_get_page_size()
  2) move gethugepagesize() replacement to the same patch to make it
     better for review
  3) introduce qemu_get_file_size to unity the code with raw_getlength()

- changes from Stefan's comments:
  1) check the memory region is large enough to contain DSM output
     buffer

- changes from Eric Blake's comments:
  1) update the shell command in the commit log to generate the patch
     which drops 'pc-dimm' prefix
  
- others:
  pick up Reviewed-by from Stefan, Vladimir Sementsov-Ogievskiy, and
  Eric Blake.

Changelog in v6:
- changes from Stefan's comments:
  1) fix code style of struct naming by CamelCase way
  2) fix offset + length overflow when read/write label data
  3) compile hw/acpi/nvdimm.c for per target so that TARGET_PAGE_SIZE can
     be used to replace getpagesize()

Changelog in v5:
- changes from Michael's comments:
  1) prefix nvdimm_ to everything in NVDIMM source files
  2) make parsing _DSM Arg3 more clear
  3) comment style fix
  5) drop single used definition
  6) fix dirty dsm buffer lost due to memory write happened on host
  7) check dsm buffer if it is big enough to contain input data
  8) use build_append_int_noprefix to store single value to GArray

- changes from Michael's and Igor's comments:
  1) introduce 'nvdimm-support' parameter to control nvdimm
     enablement and it is disabled for 2.4 and its earlier versions
     to make live migration compatible
  2) only reserve 1 RAM page and 4 bytes IO Port for NVDIMM ACPI
     virtualization

- changes from Stefan's comments:
  1) do endian adjustment for the buffer length

- changes from Bharata B Rao's comments:
  1) fix compile on ppc

- others:
  1) the buffer length is directly got from IO read rather than got
     from dsm memory
  2) fix dirty label data lost due to memory write happened on host

Changelog in v4:
- changes from Michael's comments:
  1) show the message, "Memory is not allocated from HugeTlbfs", if file
     based memory is not allocated from hugetlbfs.
  2) introduce function, acpi_get_nvdimm_state(), to get NVDIMMState
     from Machine.
  3) statically define UUID and make its operation more clear
  4) use GArray to build device structures to avoid potential buffer
     overflow
  4) improve comments in the code
  5) improve code style

- changes from Igor's comments:
  1) add NVDIMM ACPI spec document
  2) use serialized method to avoid Mutex
  3) move NVDIMM ACPI's code to hw/acpi/nvdimm.c
  4) introduce a common ASL method used by _DSM for all devices to reduce
     ACPI size
  5) handle UUID in ACPI AML code. BTW, i'd keep handling revision in QEMU
     it's better to upgrade QEMU to support Rev2 in the future

- changes from Stefan's comments:
  1) copy input data from DSM memory to local buffer to avoid potential
     issues as DSM memory is visible to guest. Output data is handled
     in a similar way

- changes from Dan's comments:
  1) drop static namespace as Linux has already supported label-less
     nvdimm devices

- changes from Vladimir's comments:
  1) print better message, "failed to get file size for %s, can't create
     backend on it", if any file operation filed to obtain file size

- others:
  create a git repo on github.com for better review/test

Also, thanks for Eric Blake's review on QAPI's side.

Thank all of you to review this patchset.

Changelog in v3:
There is huge change in this version, thank Igor, Stefan, Paolo, Eduardo,
Michael for their valuable comments, the patchset finally gets better shape.
- changes from Igor's comments:
  1) abstract dimm device type from pc-dimm and create nvdimm device based on
     dimm, then it uses memory backend device as nvdimm's memory and NUMA has
     easily been implemented.
  2) let file-backend device support any kind of filesystem not only for
     hugetlbfs and let it work on file not only for directory which is
     achieved by extending 'mem-path' - if it's a directory then it works as
     current behavior, otherwise if it's file then directly allocates memory
     from it.
  3) we figure out a unused memory hole below 4G that is 0xFF00000 ~ 
     0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
     ACPI SSDT/DSDT table will break windows XP.
     BTW, only make SSDT.rev = 2 can not work since the width is only depended
     on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
     in ACPI spec:
| Note: For compatibility with ACPI versions before ACPI 2.0, the bit 
| width of Integer objects is dependent on the ComplianceRevision of the DSDT.
| If the ComplianceRevision is less than 2, all integers are restricted to 32 
| bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets 
| the global integer width for all integers, including integers in SSDTs.
  4) use the lowest ACPI spec version to document AML terms.
  5) use "nvdimm" as nvdimm device name instead of "pc-nvdimm"

- changes from Stefan's comments:
  1) do not do endian adjustment in-place since _DSM memory is visible to guest
  2) use target platform's target page size instead of fixed PAGE_SIZE
     definition
  3) lots of code style improvement and typo fixes.
  4) live migration fix
- changes from Paolo's comments:
  1) improve the name of memory region
  
- other changes:
  1) return exact buffer size for _DSM method instead of the page size.
  2) introduce mutex in NVDIMM ACPI as the _DSM memory is shared by all nvdimm
     devices.
  3) NUMA support
  4) implement _FIT method
  5) rename "configdata" to "reserve-label-data"
  6) simplify _DSM arg3 determination
  7) main changelog update to let it reflect v3.

Changlog in v2:
- Use litten endian for DSM method, thanks for Stefan's suggestion

- introduce a new parameter, @configdata, if it's false, Qemu will
  build a static and readonly namespace in memory and use it serveing
  for DSM GET_CONFIG_SIZE/GET_CONFIG_DATA requests. In this case, no
  reserved region is needed at the end of the @file, it is good for
  the user who want to pass whole nvdimm device and make its data
  completely be visible to guest

- divide the source code into separated files and add maintain info

BTW, PCOMMIT virtualization on KVM side is work in progress, hopefully will
be posted on next week

====== Background ======
NVDIMM (A Non-Volatile Dual In-line Memory Module) is going to be supported
on Intel's platform. They are discovered via ACPI and configured by _DSM
method of NVDIMM device in ACPI. There has some supporting documents which
can be found at:
ACPI 6: http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
NVDIMM Namespace: http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
DSM Interface Example: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
Driver Writer's Guide: http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf

Currently, the NVDIMM driver has been merged into upstream Linux Kernel and
this patchset tries to enable it in virtualization field

====== Design ======
NVDIMM supports two mode accesses, one is PMEM which maps NVDIMM into CPU's
address space then CPU can directly access it as normal memory, another is
BLK which is used as block device to reduce the occupying of CPU address
space

BLK mode accesses NVDIMM via Command Register window and Data Register window.
BLK virtualization has high workload since each sector access will cause at
least two VM-EXIT. So we currently only imperilment vPMEM in this patchset

--- vPMEM design ---
We introduce a new device named "nvdimm", it uses memory backend device as
NVDIMM memory. The file in file-backend device can be a regular file and block 
device. We can use any file when we do test or emulation, however,
in the real word, the files passed to guest are:
- the regular file in the filesystem with DAX enabled created on NVDIMM device
  on host
- the raw PMEM device on host, e,g /dev/pmem0
Memory access on the address created by mmap on these kinds of files can
directly reach NVDIMM device on host.

--- vConfigure data area design ---
Each NVDIMM device has a configure data area which is used to store label
namespace data. In order to emulating this area, we divide the file into two
parts:
- first parts is (0, size - 128K], which is used as PMEM
- 128K at the end of the file, which is used as Label Data Area
So that the label namespace data can be persistent during power lose or system
failure.

We also support passing the whole file to guest without reserve any region for
label data area which is achieved by "reserve-label-data" parameter - if it's
false then QEMU will build static and readonly namespace in memory and that
namespace contains the whole file size. The parameter is false on default.

--- _DSM method design ---
_DSM in ACPI is used to configure NVDIMM, currently we only allow access of
label namespace data, i.e, Get Namespace Label Size (Function Index 4),
Get Namespace Label Data (Function Index 5) and Set Namespace Label Data
(Function Index 6)

_DSM uses two pages to transfer data between ACPI and Qemu, the first page
is RAM-based used to save the input info of _DSM method and Qemu reuse it
store output info and another page is MMIO-based, ACPI write data to this
page to transfer the control to Qemu

====== Test ======
In host
1) create memory backed file, e.g # dd if=zero of=/tmp/nvdimm bs=1G count=10
2) append "-object memory-backend-file,share,id=mem1,
   mem-path=/tmp/nvdimm -device nvdimm,memdev=mem1,reserve-label-data,
   id=nv1" in QEMU command line

In guest, download the latest upsteam kernel (4.2 merge window) and enable
ACPI_NFIT, LIBNVDIMM and BLK_DEV_PMEM.
1) insmod drivers/nvdimm/libnvdimm.ko
2) insmod drivers/acpi/nfit.ko
3) insmod drivers/nvdimm/nd_btt.ko
4) insmod drivers/nvdimm/nd_pmem.ko
You can see the whole nvdimm device used as a single namespace and /dev/pmem0
appears. You can do whatever on /dev/pmem0 including DAX access.

Currently Linux NVDIMM driver does not support namespace operation on this
kind of PMEM, apply below changes to support dynamical namespace:

@@ -798,7 +823,8 @@ static int acpi_nfit_register_dimms(struct acpi_nfit_desc *a
                        continue;
                }
 
-               if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+               //if (nfit_mem->bdw && nfit_mem->memdev_pmem)
+               if (nfit_mem->memdev_pmem)
                        flags |= NDD_ALIASING;

You can append another NVDIMM device in guest and do:                       
# cd /sys/bus/nd/devices/
# cd namespace1.0/
# echo `uuidgen` > uuid
# echo `expr 1024 \* 1024 \* 128` > size
then reload nd.pmem.ko

You can see /dev/pmem1 appears

Xiao Guangrong (35):
  acpi: add aml_derefof
  acpi: add aml_sizeof
  acpi: add aml_create_field
  acpi: add aml_concatenate
  acpi: add aml_object_type
  acpi: add aml_method_serialized
  util: introduce qemu_file_get_page_size()
  exec: allow memory to be allocated from any kind of path
  exec: allow file_ram_alloc to work on file
  hostmem-file: clean up memory allocation
  util: introduce qemu_file_getlength()
  util: let qemu_fd_getlength support block device
  hostmem-file: use whole file size if possible
  pc-dimm: remove DEFAULT_PC_DIMMSIZE
  pc-dimm: make pc_existing_dimms_capacity static and rename it
  pc-dimm: drop the prefix of pc-dimm
  stubs: rename qmp_pc_dimm_device_list.c
  pc-dimm: rename pc-dimm.c and pc-dimm.h
  dimm: abstract dimm device from pc-dimm
  dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
  dimm: keep the state of the whole backend memory
  dimm: introduce realize callback
  nvdimm: implement NVDIMM device abstract
  docs: add NVDIMM ACPI documentation
  nvdimm acpi: init the resource used by NVDIMM ACPI
  nvdimm acpi: build ACPI NFIT table
  nvdimm acpi: build ACPI nvdimm devices
  nvdimm acpi: save arg3 for NVDIMM device _DSM method
  nvdimm acpi: support function 0
  nvdimm acpi: support Get Namespace Label Size function
  nvdimm acpi: support Get Namespace Label Data function
  nvdimm acpi: support Set Namespace Label Data function
  nvdimm: allow using whole backend memory as pmem
  nvdimm acpi: support _FIT method
  nvdimm: add maintain info

 MAINTAINERS                                        |    7 +
 backends/hostmem-file.c                            |   33 +-
 block/raw-posix.c                                  |    7 +-
 default-configs/i386-softmmu.mak                   |    3 +
 default-configs/mips-softmmu.mak                   |    1 +
 default-configs/mips64-softmmu.mak                 |    1 +
 default-configs/mips64el-softmmu.mak               |    1 +
 default-configs/mipsel-softmmu.mak                 |    1 +
 default-configs/ppc64-softmmu.mak                  |    1 +
 default-configs/x86_64-softmmu.mak                 |    3 +
 docs/specs/acpi_nvdimm.txt                         |  179 +++
 exec.c                                             |  109 +-
 hmp.c                                              |    2 +-
 hw/Makefile.objs                                   |    2 +-
 hw/acpi/Makefile.objs                              |    1 +
 hw/acpi/aml-build.c                                |   79 +-
 hw/acpi/ich9.c                                     |   32 +-
 hw/acpi/memory_hotplug.c                           |   26 +-
 hw/acpi/nvdimm.c                                   | 1134 ++++++++++++++++++++
 hw/acpi/piix4.c                                    |   35 +-
 hw/i386/acpi-build.c                               |    6 +
 hw/i386/pc.c                                       |   34 +-
 hw/mem/Makefile.objs                               |    2 +
 hw/mem/{pc-dimm.c => dimm.c}                       |  240 +++--
 hw/mem/nvdimm.c                                    |  145 +++
 hw/mem/pc-dimm.c                                   |  520 +--------
 hw/ppc/spapr.c                                     |   20 +-
 include/hw/acpi/aml-build.h                        |    7 +
 include/hw/acpi/ich9.h                             |    3 +
 include/hw/i386/pc.h                               |   12 +-
 include/hw/mem/dimm.h                              |   95 ++
 include/hw/mem/nvdimm.h                            |  133 +++
 include/hw/mem/pc-dimm.h                           |  104 +-
 include/hw/ppc/spapr.h                             |    2 +-
 include/qemu/osdep.h                               |    3 +
 numa.c                                             |    4 +-
 qapi-schema.json                                   |    8 +-
 qmp.c                                              |    4 +-
 stubs/Makefile.objs                                |    2 +-
 ...c_dimm_device_list.c => qmp_dimm_device_list.c} |    4 +-
 target-ppc/kvm.c                                   |   23 +-
 trace-events                                       |    8 +-
 util/osdep.c                                       |   51 +
 util/oslib-posix.c                                 |   37 +-
 util/oslib-win32.c                                 |    5 +
 45 files changed, 2284 insertions(+), 845 deletions(-)
 create mode 100644 docs/specs/acpi_nvdimm.txt
 create mode 100644 hw/acpi/nvdimm.c
 rename hw/mem/{pc-dimm.c => dimm.c} (65%)
 create mode 100644 hw/mem/nvdimm.c
 rewrite hw/mem/pc-dimm.c (91%)
 create mode 100644 include/hw/mem/dimm.h
 create mode 100644 include/hw/mem/nvdimm.h
 rewrite include/hw/mem/pc-dimm.h (97%)
 rename stubs/{qmp_pc_dimm_device_list.c => qmp_dimm_device_list.c} (56%)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [PATCH v7 01/35] acpi: add aml_derefof
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Implement DeRefOf term which is used by NVDIMM _DSM method in later patch

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 8 ++++++++
 include/hw/acpi/aml-build.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 0d4b324..cbd53f4 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1135,6 +1135,14 @@ Aml *aml_unicode(const char *str)
     return var;
 }
 
+/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefDerefOf */
+Aml *aml_derefof(Aml *arg)
+{
+    Aml *var = aml_opcode(0x83 /* DerefOfOp */);
+    aml_append(var, arg);
+    return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
              AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 1b632dc..5a03d33 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -274,6 +274,7 @@ Aml *aml_create_dword_field(Aml *srcbuf, Aml *index, const char *name);
 Aml *aml_varpackage(uint32_t num_elements);
 Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
+Aml *aml_derefof(Aml *arg);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 01/35] acpi: add aml_derefof
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Implement DeRefOf term which is used by NVDIMM _DSM method in later patch

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 8 ++++++++
 include/hw/acpi/aml-build.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 0d4b324..cbd53f4 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1135,6 +1135,14 @@ Aml *aml_unicode(const char *str)
     return var;
 }
 
+/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefDerefOf */
+Aml *aml_derefof(Aml *arg)
+{
+    Aml *var = aml_opcode(0x83 /* DerefOfOp */);
+    aml_append(var, arg);
+    return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
              AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 1b632dc..5a03d33 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -274,6 +274,7 @@ Aml *aml_create_dword_field(Aml *srcbuf, Aml *index, const char *name);
 Aml *aml_varpackage(uint32_t num_elements);
 Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
+Aml *aml_derefof(Aml *arg);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 02/35] acpi: add aml_sizeof
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Implement SizeOf term which is used by NVDIMM _DSM method in later patch

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 8 ++++++++
 include/hw/acpi/aml-build.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index cbd53f4..a72214d 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1143,6 +1143,14 @@ Aml *aml_derefof(Aml *arg)
     return var;
 }
 
+/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefSizeOf */
+Aml *aml_sizeof(Aml *arg)
+{
+    Aml *var = aml_opcode(0x87 /* SizeOfOp */);
+    aml_append(var, arg);
+    return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
              AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 5a03d33..7296efb 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -275,6 +275,7 @@ Aml *aml_varpackage(uint32_t num_elements);
 Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
+Aml *aml_sizeof(Aml *arg);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 02/35] acpi: add aml_sizeof
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Implement SizeOf term which is used by NVDIMM _DSM method in later patch

Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 8 ++++++++
 include/hw/acpi/aml-build.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index cbd53f4..a72214d 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1143,6 +1143,14 @@ Aml *aml_derefof(Aml *arg)
     return var;
 }
 
+/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefSizeOf */
+Aml *aml_sizeof(Aml *arg)
+{
+    Aml *var = aml_opcode(0x87 /* SizeOfOp */);
+    aml_append(var, arg);
+    return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
              AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 5a03d33..7296efb 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -275,6 +275,7 @@ Aml *aml_varpackage(uint32_t num_elements);
 Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
+Aml *aml_sizeof(Aml *arg);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 03/35] acpi: add aml_create_field
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Implement CreateField term which is used by NVDIMM _DSM method in later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 13 +++++++++++++
 include/hw/acpi/aml-build.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index a72214d..9fe5e7b 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1151,6 +1151,19 @@ Aml *aml_sizeof(Aml *arg)
     return var;
 }
 
+/* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefCreateField */
+Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name)
+{
+    Aml *var = aml_alloc();
+    build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
+    build_append_byte(var->buf, 0x13); /* CreateFieldOp */
+    aml_append(var, srcbuf);
+    aml_append(var, index);
+    aml_append(var, len);
+    build_append_namestring(var->buf, "%s", name);
+    return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
              AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 7296efb..7e1c43b 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -276,6 +276,7 @@ Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
 Aml *aml_sizeof(Aml *arg);
+Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 03/35] acpi: add aml_create_field
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Implement CreateField term which is used by NVDIMM _DSM method in later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 13 +++++++++++++
 include/hw/acpi/aml-build.h |  1 +
 2 files changed, 14 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index a72214d..9fe5e7b 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1151,6 +1151,19 @@ Aml *aml_sizeof(Aml *arg)
     return var;
 }
 
+/* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefCreateField */
+Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name)
+{
+    Aml *var = aml_alloc();
+    build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
+    build_append_byte(var->buf, 0x13); /* CreateFieldOp */
+    aml_append(var, srcbuf);
+    aml_append(var, index);
+    aml_append(var, len);
+    build_append_namestring(var->buf, "%s", name);
+    return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
              AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 7296efb..7e1c43b 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -276,6 +276,7 @@ Aml *aml_touuid(const char *uuid);
 Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
 Aml *aml_sizeof(Aml *arg);
+Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 04/35] acpi: add aml_concatenate
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Implement Concatenate term which is used by NVDIMM _DSM method
in later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 14 ++++++++++++++
 include/hw/acpi/aml-build.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 9fe5e7b..efc06ab 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1164,6 +1164,20 @@ Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name)
     return var;
 }
 
+/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefConcat */
+Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target)
+{
+    Aml *var = aml_opcode(0x73 /* ConcatOp */);
+    aml_append(var, source1);
+    aml_append(var, source2);
+
+    if (target) {
+        aml_append(var, target);
+    }
+
+    return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
              AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 7e1c43b..325782d 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -277,6 +277,7 @@ Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
 Aml *aml_sizeof(Aml *arg);
 Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
+Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 04/35] acpi: add aml_concatenate
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Implement Concatenate term which is used by NVDIMM _DSM method
in later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 14 ++++++++++++++
 include/hw/acpi/aml-build.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 9fe5e7b..efc06ab 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1164,6 +1164,20 @@ Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name)
     return var;
 }
 
+/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefConcat */
+Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target)
+{
+    Aml *var = aml_opcode(0x73 /* ConcatOp */);
+    aml_append(var, source1);
+    aml_append(var, source2);
+
+    if (target) {
+        aml_append(var, target);
+    }
+
+    return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
              AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 7e1c43b..325782d 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -277,6 +277,7 @@ Aml *aml_unicode(const char *str);
 Aml *aml_derefof(Aml *arg);
 Aml *aml_sizeof(Aml *arg);
 Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
+Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 05/35] acpi: add aml_object_type
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Implement ObjectType which is used by NVDIMM _DSM method in
later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 8 ++++++++
 include/hw/acpi/aml-build.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index efc06ab..9f792ab 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1178,6 +1178,14 @@ Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target)
     return var;
 }
 
+/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefObjectType */
+Aml *aml_object_type(Aml *object)
+{
+    Aml *var = aml_opcode(0x8E /* ObjectTypeOp */);
+    aml_append(var, object);
+    return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
              AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 325782d..5b8a118 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -278,6 +278,7 @@ Aml *aml_derefof(Aml *arg);
 Aml *aml_sizeof(Aml *arg);
 Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
 Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target);
+Aml *aml_object_type(Aml *object);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 05/35] acpi: add aml_object_type
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Implement ObjectType which is used by NVDIMM _DSM method in
later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 8 ++++++++
 include/hw/acpi/aml-build.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index efc06ab..9f792ab 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1178,6 +1178,14 @@ Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target)
     return var;
 }
 
+/* ACPI 1.0b: 16.2.5.4 Type 2 Opcodes Encoding: DefObjectType */
+Aml *aml_object_type(Aml *object)
+{
+    Aml *var = aml_opcode(0x8E /* ObjectTypeOp */);
+    aml_append(var, object);
+    return var;
+}
+
 void
 build_header(GArray *linker, GArray *table_data,
              AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 325782d..5b8a118 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -278,6 +278,7 @@ Aml *aml_derefof(Aml *arg);
 Aml *aml_sizeof(Aml *arg);
 Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
 Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target);
+Aml *aml_object_type(Aml *object);
 
 void
 build_header(GArray *linker, GArray *table_data,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 06/35] acpi: add aml_method_serialized
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

It avoid explicit Mutex and will be used by NVDIMM ACPI

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 26 ++++++++++++++++++++++++--
 include/hw/acpi/aml-build.h |  1 +
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 9f792ab..8bee8b2 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -696,14 +696,36 @@ Aml *aml_while(Aml *predicate)
 }
 
 /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefMethod */
-Aml *aml_method(const char *name, int arg_count)
+static Aml *__aml_method(const char *name, int arg_count, bool serialized)
 {
     Aml *var = aml_bundle(0x14 /* MethodOp */, AML_PACKAGE);
+    int methodflags;
+
+    /*
+     * MethodFlags:
+     *   bit 0-2: ArgCount (0-7)
+     *   bit 3: SerializeFlag
+     *     0: NotSerialized
+     *     1: Serialized
+     *   bit 4-7: reserved (must be 0)
+     */
+    assert(!(arg_count & ~7));
+    methodflags = arg_count | (serialized << 3);
     build_append_namestring(var->buf, "%s", name);
-    build_append_byte(var->buf, arg_count); /* MethodFlags: ArgCount */
+    build_append_byte(var->buf, methodflags);
     return var;
 }
 
+Aml *aml_method(const char *name, int arg_count)
+{
+    return __aml_method(name, arg_count, false);
+}
+
+Aml *aml_method_serialized(const char *name, int arg_count)
+{
+    return __aml_method(name, arg_count, true);
+}
+
 /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefDevice */
 Aml *aml_device(const char *name_format, ...)
 {
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 5b8a118..00cf40e 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -263,6 +263,7 @@ Aml *aml_qword_memory(AmlDecode dec, AmlMinFixed min_fixed,
 Aml *aml_scope(const char *name_format, ...) GCC_FMT_ATTR(1, 2);
 Aml *aml_device(const char *name_format, ...) GCC_FMT_ATTR(1, 2);
 Aml *aml_method(const char *name, int arg_count);
+Aml *aml_method_serialized(const char *name, int arg_count);
 Aml *aml_if(Aml *predicate);
 Aml *aml_else(void);
 Aml *aml_while(Aml *predicate);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 06/35] acpi: add aml_method_serialized
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

It avoid explicit Mutex and will be used by NVDIMM ACPI

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         | 26 ++++++++++++++++++++++++--
 include/hw/acpi/aml-build.h |  1 +
 2 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 9f792ab..8bee8b2 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -696,14 +696,36 @@ Aml *aml_while(Aml *predicate)
 }
 
 /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefMethod */
-Aml *aml_method(const char *name, int arg_count)
+static Aml *__aml_method(const char *name, int arg_count, bool serialized)
 {
     Aml *var = aml_bundle(0x14 /* MethodOp */, AML_PACKAGE);
+    int methodflags;
+
+    /*
+     * MethodFlags:
+     *   bit 0-2: ArgCount (0-7)
+     *   bit 3: SerializeFlag
+     *     0: NotSerialized
+     *     1: Serialized
+     *   bit 4-7: reserved (must be 0)
+     */
+    assert(!(arg_count & ~7));
+    methodflags = arg_count | (serialized << 3);
     build_append_namestring(var->buf, "%s", name);
-    build_append_byte(var->buf, arg_count); /* MethodFlags: ArgCount */
+    build_append_byte(var->buf, methodflags);
     return var;
 }
 
+Aml *aml_method(const char *name, int arg_count)
+{
+    return __aml_method(name, arg_count, false);
+}
+
+Aml *aml_method_serialized(const char *name, int arg_count)
+{
+    return __aml_method(name, arg_count, true);
+}
+
 /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefDevice */
 Aml *aml_device(const char *name_format, ...)
 {
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 5b8a118..00cf40e 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -263,6 +263,7 @@ Aml *aml_qword_memory(AmlDecode dec, AmlMinFixed min_fixed,
 Aml *aml_scope(const char *name_format, ...) GCC_FMT_ATTR(1, 2);
 Aml *aml_device(const char *name_format, ...) GCC_FMT_ATTR(1, 2);
 Aml *aml_method(const char *name, int arg_count);
+Aml *aml_method_serialized(const char *name, int arg_count);
 Aml *aml_if(Aml *predicate);
 Aml *aml_else(void);
 Aml *aml_while(Aml *predicate);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 07/35] util: introduce qemu_file_get_page_size()
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

There are three places use the some logic to get the page size on
the file path or file fd

Windows did not support file hugepage, so it will return normal page
for this case. And this interface has not been used on windows so far
 
This patch introduces qemu_file_get_page_size() to unify the code

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 exec.c               | 33 ++++++---------------------------
 include/qemu/osdep.h |  1 +
 target-ppc/kvm.c     | 23 +++++------------------
 util/oslib-posix.c   | 37 +++++++++++++++++++++++++++++++++----
 util/oslib-win32.c   |  5 +++++
 5 files changed, 50 insertions(+), 49 deletions(-)

diff --git a/exec.c b/exec.c
index 8af2570..9de38be 100644
--- a/exec.c
+++ b/exec.c
@@ -1174,32 +1174,6 @@ void qemu_mutex_unlock_ramlist(void)
 }
 
 #ifdef __linux__
-
-#include <sys/vfs.h>
-
-#define HUGETLBFS_MAGIC       0x958458f6
-
-static long gethugepagesize(const char *path, Error **errp)
-{
-    struct statfs fs;
-    int ret;
-
-    do {
-        ret = statfs(path, &fs);
-    } while (ret != 0 && errno == EINTR);
-
-    if (ret != 0) {
-        error_setg_errno(errp, errno, "failed to get page size of file %s",
-                         path);
-        return 0;
-    }
-
-    if (fs.f_type != HUGETLBFS_MAGIC)
-        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
-
-    return fs.f_bsize;
-}
-
 static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
                             const char *path,
@@ -1213,11 +1187,16 @@ static void *file_ram_alloc(RAMBlock *block,
     uint64_t hpagesize;
     Error *local_err = NULL;
 
-    hpagesize = gethugepagesize(path, &local_err);
+    hpagesize = qemu_file_get_page_size(path, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         goto error;
     }
+
+    if (hpagesize == getpagesize()) {
+        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
+    }
+
     block->mr->align = hpagesize;
 
     if (memory < hpagesize) {
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index b568424..dbc17dc 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -302,4 +302,5 @@ int qemu_read_password(char *buf, int buf_size);
  */
 pid_t qemu_fork(Error **errp);
 
+size_t qemu_file_get_page_size(const char *mem_path, Error **errp);
 #endif
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index ac70f08..d8760ea 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -308,28 +308,15 @@ static void kvm_get_smmu_info(PowerPCCPU *cpu, struct kvm_ppc_smmu_info *info)
 
 static long gethugepagesize(const char *mem_path)
 {
-    struct statfs fs;
-    int ret;
-
-    do {
-        ret = statfs(mem_path, &fs);
-    } while (ret != 0 && errno == EINTR);
+    Error *local_err = NULL;
+    long size = qemu_file_get_page_size(mem_path, local_err);
 
-    if (ret != 0) {
-        fprintf(stderr, "Couldn't statfs() memory path: %s\n",
-                strerror(errno));
+    if (local_err) {
+        error_report_err(local_err);
         exit(1);
     }
 
-#define HUGETLBFS_MAGIC       0x958458f6
-
-    if (fs.f_type != HUGETLBFS_MAGIC) {
-        /* Explicit mempath, but it's ordinary pages */
-        return getpagesize();
-    }
-
-    /* It's hugepage, return the huge page size */
-    return fs.f_bsize;
+    return size;
 }
 
 static int find_max_supported_pagesize(Object *obj, void *opaque)
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 914cef5..51437ff 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -340,7 +340,7 @@ static void sigbus_handler(int signal)
     siglongjmp(sigjump, 1);
 }
 
-static size_t fd_getpagesize(int fd)
+static size_t fd_getpagesize(int fd, Error **errp)
 {
 #ifdef CONFIG_LINUX
     struct statfs fs;
@@ -351,7 +351,12 @@ static size_t fd_getpagesize(int fd)
             ret = fstatfs(fd, &fs);
         } while (ret != 0 && errno == EINTR);
 
-        if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {
+        if (ret) {
+            error_setg_errno(errp, errno, "fstatfs is failed");
+            return 0;
+        }
+
+        if (fs.f_type == HUGETLBFS_MAGIC) {
             return fs.f_bsize;
         }
     }
@@ -360,6 +365,22 @@ static size_t fd_getpagesize(int fd)
     return getpagesize();
 }
 
+size_t qemu_file_get_page_size(const char *path, Error **errp)
+{
+    size_t size = 0;
+    int fd = qemu_open(path, O_RDONLY);
+
+    if (fd < 0) {
+        error_setg_file_open(errp, errno, path);
+        goto exit;
+    }
+
+    size = fd_getpagesize(fd, errp);
+    qemu_close(fd);
+exit:
+    return size;
+}
+
 void os_mem_prealloc(int fd, char *area, size_t memory)
 {
     int ret;
@@ -387,8 +408,16 @@ void os_mem_prealloc(int fd, char *area, size_t memory)
         exit(1);
     } else {
         int i;
-        size_t hpagesize = fd_getpagesize(fd);
-        size_t numpages = DIV_ROUND_UP(memory, hpagesize);
+        Error *local_err = NULL;
+        size_t hpagesize = fd_getpagesize(fd, &local_err);
+        size_t numpages;
+
+        if (local_err) {
+            error_report_err(local_err);
+            exit(1);
+        }
+
+        numpages = DIV_ROUND_UP(memory, hpagesize);
 
         /* MAP_POPULATE silently ignores failures */
         for (i = 0; i < numpages; i++) {
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index 09f9e98..dada6b6 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -462,6 +462,11 @@ size_t getpagesize(void)
     return system_info.dwPageSize;
 }
 
+size_t qemu_file_get_page_size(const char *path, Error **errp)
+{
+    return getpagesize();
+}
+
 void os_mem_prealloc(int fd, char *area, size_t memory)
 {
     int i;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 07/35] util: introduce qemu_file_get_page_size()
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

There are three places use the some logic to get the page size on
the file path or file fd

Windows did not support file hugepage, so it will return normal page
for this case. And this interface has not been used on windows so far
 
This patch introduces qemu_file_get_page_size() to unify the code

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 exec.c               | 33 ++++++---------------------------
 include/qemu/osdep.h |  1 +
 target-ppc/kvm.c     | 23 +++++------------------
 util/oslib-posix.c   | 37 +++++++++++++++++++++++++++++++++----
 util/oslib-win32.c   |  5 +++++
 5 files changed, 50 insertions(+), 49 deletions(-)

diff --git a/exec.c b/exec.c
index 8af2570..9de38be 100644
--- a/exec.c
+++ b/exec.c
@@ -1174,32 +1174,6 @@ void qemu_mutex_unlock_ramlist(void)
 }
 
 #ifdef __linux__
-
-#include <sys/vfs.h>
-
-#define HUGETLBFS_MAGIC       0x958458f6
-
-static long gethugepagesize(const char *path, Error **errp)
-{
-    struct statfs fs;
-    int ret;
-
-    do {
-        ret = statfs(path, &fs);
-    } while (ret != 0 && errno == EINTR);
-
-    if (ret != 0) {
-        error_setg_errno(errp, errno, "failed to get page size of file %s",
-                         path);
-        return 0;
-    }
-
-    if (fs.f_type != HUGETLBFS_MAGIC)
-        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
-
-    return fs.f_bsize;
-}
-
 static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
                             const char *path,
@@ -1213,11 +1187,16 @@ static void *file_ram_alloc(RAMBlock *block,
     uint64_t hpagesize;
     Error *local_err = NULL;
 
-    hpagesize = gethugepagesize(path, &local_err);
+    hpagesize = qemu_file_get_page_size(path, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         goto error;
     }
+
+    if (hpagesize == getpagesize()) {
+        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
+    }
+
     block->mr->align = hpagesize;
 
     if (memory < hpagesize) {
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index b568424..dbc17dc 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -302,4 +302,5 @@ int qemu_read_password(char *buf, int buf_size);
  */
 pid_t qemu_fork(Error **errp);
 
+size_t qemu_file_get_page_size(const char *mem_path, Error **errp);
 #endif
diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index ac70f08..d8760ea 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -308,28 +308,15 @@ static void kvm_get_smmu_info(PowerPCCPU *cpu, struct kvm_ppc_smmu_info *info)
 
 static long gethugepagesize(const char *mem_path)
 {
-    struct statfs fs;
-    int ret;
-
-    do {
-        ret = statfs(mem_path, &fs);
-    } while (ret != 0 && errno == EINTR);
+    Error *local_err = NULL;
+    long size = qemu_file_get_page_size(mem_path, local_err);
 
-    if (ret != 0) {
-        fprintf(stderr, "Couldn't statfs() memory path: %s\n",
-                strerror(errno));
+    if (local_err) {
+        error_report_err(local_err);
         exit(1);
     }
 
-#define HUGETLBFS_MAGIC       0x958458f6
-
-    if (fs.f_type != HUGETLBFS_MAGIC) {
-        /* Explicit mempath, but it's ordinary pages */
-        return getpagesize();
-    }
-
-    /* It's hugepage, return the huge page size */
-    return fs.f_bsize;
+    return size;
 }
 
 static int find_max_supported_pagesize(Object *obj, void *opaque)
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index 914cef5..51437ff 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -340,7 +340,7 @@ static void sigbus_handler(int signal)
     siglongjmp(sigjump, 1);
 }
 
-static size_t fd_getpagesize(int fd)
+static size_t fd_getpagesize(int fd, Error **errp)
 {
 #ifdef CONFIG_LINUX
     struct statfs fs;
@@ -351,7 +351,12 @@ static size_t fd_getpagesize(int fd)
             ret = fstatfs(fd, &fs);
         } while (ret != 0 && errno == EINTR);
 
-        if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {
+        if (ret) {
+            error_setg_errno(errp, errno, "fstatfs is failed");
+            return 0;
+        }
+
+        if (fs.f_type == HUGETLBFS_MAGIC) {
             return fs.f_bsize;
         }
     }
@@ -360,6 +365,22 @@ static size_t fd_getpagesize(int fd)
     return getpagesize();
 }
 
+size_t qemu_file_get_page_size(const char *path, Error **errp)
+{
+    size_t size = 0;
+    int fd = qemu_open(path, O_RDONLY);
+
+    if (fd < 0) {
+        error_setg_file_open(errp, errno, path);
+        goto exit;
+    }
+
+    size = fd_getpagesize(fd, errp);
+    qemu_close(fd);
+exit:
+    return size;
+}
+
 void os_mem_prealloc(int fd, char *area, size_t memory)
 {
     int ret;
@@ -387,8 +408,16 @@ void os_mem_prealloc(int fd, char *area, size_t memory)
         exit(1);
     } else {
         int i;
-        size_t hpagesize = fd_getpagesize(fd);
-        size_t numpages = DIV_ROUND_UP(memory, hpagesize);
+        Error *local_err = NULL;
+        size_t hpagesize = fd_getpagesize(fd, &local_err);
+        size_t numpages;
+
+        if (local_err) {
+            error_report_err(local_err);
+            exit(1);
+        }
+
+        numpages = DIV_ROUND_UP(memory, hpagesize);
 
         /* MAP_POPULATE silently ignores failures */
         for (i = 0; i < numpages; i++) {
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index 09f9e98..dada6b6 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -462,6 +462,11 @@ size_t getpagesize(void)
     return system_info.dwPageSize;
 }
 
+size_t qemu_file_get_page_size(const char *path, Error **errp)
+{
+    return getpagesize();
+}
+
 void os_mem_prealloc(int fd, char *area, size_t memory)
 {
     int i;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
locates at DAX enabled filesystem

So this patch let it work on any kind of path

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 exec.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/exec.c b/exec.c
index 9de38be..9075f4d 100644
--- a/exec.c
+++ b/exec.c
@@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
     char *c;
     void *area;
     int fd;
-    uint64_t hpagesize;
+    uint64_t pagesize;
     Error *local_err = NULL;
 
-    hpagesize = qemu_file_get_page_size(path, &local_err);
+    pagesize = qemu_file_get_page_size(path, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         goto error;
     }
 
-    if (hpagesize == getpagesize()) {
-        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
+    if (pagesize == getpagesize()) {
+        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
     }
 
-    block->mr->align = hpagesize;
+    block->mr->align = pagesize;
 
-    if (memory < hpagesize) {
+    if (memory < pagesize) {
         error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
-                   "or larger than huge page size 0x%" PRIx64,
-                   memory, hpagesize);
+                   "or larger than page size 0x%" PRIx64,
+                   memory, pagesize);
         goto error;
     }
 
@@ -1226,14 +1226,14 @@ static void *file_ram_alloc(RAMBlock *block,
     fd = mkstemp(filename);
     if (fd < 0) {
         error_setg_errno(errp, errno,
-                         "unable to create backing store for hugepages");
+                         "unable to create backing store for path %s", path);
         g_free(filename);
         goto error;
     }
     unlink(filename);
     g_free(filename);
 
-    memory = ROUND_UP(memory, hpagesize);
+    memory = ROUND_UP(memory, pagesize);
 
     /*
      * ftruncate is not supported by hugetlbfs in older
@@ -1245,10 +1245,10 @@ static void *file_ram_alloc(RAMBlock *block,
         perror("ftruncate");
     }
 
-    area = qemu_ram_mmap(fd, memory, hpagesize, block->flags & RAM_SHARED);
+    area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
-                         "unable to map backing store for hugepages");
+                         "unable to map backing store for path %s", path);
         close(fd);
         goto error;
     }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
locates at DAX enabled filesystem

So this patch let it work on any kind of path

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 exec.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/exec.c b/exec.c
index 9de38be..9075f4d 100644
--- a/exec.c
+++ b/exec.c
@@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
     char *c;
     void *area;
     int fd;
-    uint64_t hpagesize;
+    uint64_t pagesize;
     Error *local_err = NULL;
 
-    hpagesize = qemu_file_get_page_size(path, &local_err);
+    pagesize = qemu_file_get_page_size(path, &local_err);
     if (local_err) {
         error_propagate(errp, local_err);
         goto error;
     }
 
-    if (hpagesize == getpagesize()) {
-        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
+    if (pagesize == getpagesize()) {
+        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
     }
 
-    block->mr->align = hpagesize;
+    block->mr->align = pagesize;
 
-    if (memory < hpagesize) {
+    if (memory < pagesize) {
         error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
-                   "or larger than huge page size 0x%" PRIx64,
-                   memory, hpagesize);
+                   "or larger than page size 0x%" PRIx64,
+                   memory, pagesize);
         goto error;
     }
 
@@ -1226,14 +1226,14 @@ static void *file_ram_alloc(RAMBlock *block,
     fd = mkstemp(filename);
     if (fd < 0) {
         error_setg_errno(errp, errno,
-                         "unable to create backing store for hugepages");
+                         "unable to create backing store for path %s", path);
         g_free(filename);
         goto error;
     }
     unlink(filename);
     g_free(filename);
 
-    memory = ROUND_UP(memory, hpagesize);
+    memory = ROUND_UP(memory, pagesize);
 
     /*
      * ftruncate is not supported by hugetlbfs in older
@@ -1245,10 +1245,10 @@ static void *file_ram_alloc(RAMBlock *block,
         perror("ftruncate");
     }
 
-    area = qemu_ram_mmap(fd, memory, hpagesize, block->flags & RAM_SHARED);
+    area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
     if (area == MAP_FAILED) {
         error_setg_errno(errp, errno,
-                         "unable to map backing store for hugepages");
+                         "unable to map backing store for path %s", path);
         close(fd);
         goto error;
     }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Currently, file_ram_alloc() only works on directory - it creates a file
under @path and do mmap on it

This patch tries to allow it to work on file directly, if @path is a
directory it works as before, otherwise it treats @path as the target
file then directly allocate memory from it

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 exec.c | 80 ++++++++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 51 insertions(+), 29 deletions(-)

diff --git a/exec.c b/exec.c
index 9075f4d..db0fdaf 100644
--- a/exec.c
+++ b/exec.c
@@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
 }
 
 #ifdef __linux__
+static bool path_is_dir(const char *path)
+{
+    struct stat fs;
+
+    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
+}
+
+static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
+{
+    char *filename;
+    char *sanitized_name;
+    char *c;
+    int fd;
+
+    if (!path_is_dir(path)) {
+        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
+
+        flags |= O_EXCL;
+        return open(path, flags);
+    }
+
+    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
+    sanitized_name = g_strdup(memory_region_name(block->mr));
+    for (c = sanitized_name; *c != '\0'; c++) {
+        if (*c == '/') {
+            *c = '_';
+        }
+    }
+    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
+                               sanitized_name);
+    g_free(sanitized_name);
+    fd = mkstemp(filename);
+    if (fd >= 0) {
+        unlink(filename);
+        /*
+         * ftruncate is not supported by hugetlbfs in older
+         * hosts, so don't bother bailing out on errors.
+         * If anything goes wrong with it under other filesystems,
+         * mmap will fail.
+         */
+        if (ftruncate(fd, size)) {
+            perror("ftruncate");
+        }
+    }
+    g_free(filename);
+
+    return fd;
+}
+
 static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
                             const char *path,
                             Error **errp)
 {
-    char *filename;
-    char *sanitized_name;
-    char *c;
     void *area;
     int fd;
     uint64_t pagesize;
@@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
         goto error;
     }
 
-    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
-    sanitized_name = g_strdup(memory_region_name(block->mr));
-    for (c = sanitized_name; *c != '\0'; c++) {
-        if (*c == '/')
-            *c = '_';
-    }
-
-    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
-                               sanitized_name);
-    g_free(sanitized_name);
+    memory = ROUND_UP(memory, pagesize);
 
-    fd = mkstemp(filename);
+    fd = open_ram_file_path(block, path, memory);
     if (fd < 0) {
         error_setg_errno(errp, errno,
                          "unable to create backing store for path %s", path);
-        g_free(filename);
         goto error;
     }
-    unlink(filename);
-    g_free(filename);
-
-    memory = ROUND_UP(memory, pagesize);
-
-    /*
-     * ftruncate is not supported by hugetlbfs in older
-     * hosts, so don't bother bailing out on errors.
-     * If anything goes wrong with it under other filesystems,
-     * mmap will fail.
-     */
-    if (ftruncate(fd, memory)) {
-        perror("ftruncate");
-    }
 
     area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
     if (area == MAP_FAILED) {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Currently, file_ram_alloc() only works on directory - it creates a file
under @path and do mmap on it

This patch tries to allow it to work on file directly, if @path is a
directory it works as before, otherwise it treats @path as the target
file then directly allocate memory from it

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 exec.c | 80 ++++++++++++++++++++++++++++++++++++++++++------------------------
 1 file changed, 51 insertions(+), 29 deletions(-)

diff --git a/exec.c b/exec.c
index 9075f4d..db0fdaf 100644
--- a/exec.c
+++ b/exec.c
@@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
 }
 
 #ifdef __linux__
+static bool path_is_dir(const char *path)
+{
+    struct stat fs;
+
+    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
+}
+
+static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
+{
+    char *filename;
+    char *sanitized_name;
+    char *c;
+    int fd;
+
+    if (!path_is_dir(path)) {
+        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
+
+        flags |= O_EXCL;
+        return open(path, flags);
+    }
+
+    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
+    sanitized_name = g_strdup(memory_region_name(block->mr));
+    for (c = sanitized_name; *c != '\0'; c++) {
+        if (*c == '/') {
+            *c = '_';
+        }
+    }
+    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
+                               sanitized_name);
+    g_free(sanitized_name);
+    fd = mkstemp(filename);
+    if (fd >= 0) {
+        unlink(filename);
+        /*
+         * ftruncate is not supported by hugetlbfs in older
+         * hosts, so don't bother bailing out on errors.
+         * If anything goes wrong with it under other filesystems,
+         * mmap will fail.
+         */
+        if (ftruncate(fd, size)) {
+            perror("ftruncate");
+        }
+    }
+    g_free(filename);
+
+    return fd;
+}
+
 static void *file_ram_alloc(RAMBlock *block,
                             ram_addr_t memory,
                             const char *path,
                             Error **errp)
 {
-    char *filename;
-    char *sanitized_name;
-    char *c;
     void *area;
     int fd;
     uint64_t pagesize;
@@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
         goto error;
     }
 
-    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
-    sanitized_name = g_strdup(memory_region_name(block->mr));
-    for (c = sanitized_name; *c != '\0'; c++) {
-        if (*c == '/')
-            *c = '_';
-    }
-
-    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
-                               sanitized_name);
-    g_free(sanitized_name);
+    memory = ROUND_UP(memory, pagesize);
 
-    fd = mkstemp(filename);
+    fd = open_ram_file_path(block, path, memory);
     if (fd < 0) {
         error_setg_errno(errp, errno,
                          "unable to create backing store for path %s", path);
-        g_free(filename);
         goto error;
     }
-    unlink(filename);
-    g_free(filename);
-
-    memory = ROUND_UP(memory, pagesize);
-
-    /*
-     * ftruncate is not supported by hugetlbfs in older
-     * hosts, so don't bother bailing out on errors.
-     * If anything goes wrong with it under other filesystems,
-     * mmap will fail.
-     */
-    if (ftruncate(fd, memory)) {
-        perror("ftruncate");
-    }
 
     area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
     if (area == MAP_FAILED) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 10/35] hostmem-file: clean up memory allocation
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

- hostmem-file.c is compiled only if CONFIG_LINUX is enabled so that is
  unnecessary to do the same check in the source file

- the interface, HostMemoryBackendClass->alloc(), is not called many
  times, do not need to check if the memory-region is initialized

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 backends/hostmem-file.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e9b6d21..9097a57 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -46,17 +46,12 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
         error_setg(errp, "mem-path property not set");
         return;
     }
-#ifndef CONFIG_LINUX
-    error_setg(errp, "-mem-path not supported on this host");
-#else
-    if (!memory_region_size(&backend->mr)) {
-        backend->force_prealloc = mem_prealloc;
-        memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
+
+    backend->force_prealloc = mem_prealloc;
+    memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
                                  object_get_canonical_path(OBJECT(backend)),
                                  backend->size, fb->share,
                                  fb->mem_path, errp);
-    }
-#endif
 }
 
 static void
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 10/35] hostmem-file: clean up memory allocation
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

- hostmem-file.c is compiled only if CONFIG_LINUX is enabled so that is
  unnecessary to do the same check in the source file

- the interface, HostMemoryBackendClass->alloc(), is not called many
  times, do not need to check if the memory-region is initialized

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 backends/hostmem-file.c | 11 +++--------
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index e9b6d21..9097a57 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -46,17 +46,12 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
         error_setg(errp, "mem-path property not set");
         return;
     }
-#ifndef CONFIG_LINUX
-    error_setg(errp, "-mem-path not supported on this host");
-#else
-    if (!memory_region_size(&backend->mr)) {
-        backend->force_prealloc = mem_prealloc;
-        memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
+
+    backend->force_prealloc = mem_prealloc;
+    memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
                                  object_get_canonical_path(OBJECT(backend)),
                                  backend->size, fb->share,
                                  fb->mem_path, errp);
-    }
-#endif
 }
 
 static void
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 11/35] util: introduce qemu_file_getlength()
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

It is used to get the size of the specified file, also qemu_fd_getlength()
is introduced to unify the code with raw_getlength() in block/raw-posix.c

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 block/raw-posix.c    |  7 +------
 include/qemu/osdep.h |  2 ++
 util/osdep.c         | 31 +++++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 918c756..734e6dd 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1592,18 +1592,13 @@ static int64_t raw_getlength(BlockDriverState *bs)
 {
     BDRVRawState *s = bs->opaque;
     int ret;
-    int64_t size;
 
     ret = fd_open(bs);
     if (ret < 0) {
         return ret;
     }
 
-    size = lseek(s->fd, 0, SEEK_END);
-    if (size < 0) {
-        return -errno;
-    }
-    return size;
+    return qemu_fd_getlength(s->fd);
 }
 #endif
 
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index dbc17dc..ca4c3fa 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -303,4 +303,6 @@ int qemu_read_password(char *buf, int buf_size);
 pid_t qemu_fork(Error **errp);
 
 size_t qemu_file_get_page_size(const char *mem_path, Error **errp);
+int64_t qemu_fd_getlength(int fd);
+size_t qemu_file_getlength(const char *file, Error **errp);
 #endif
diff --git a/util/osdep.c b/util/osdep.c
index 0092bb6..5a61e19 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -428,3 +428,34 @@ writev(int fd, const struct iovec *iov, int iov_cnt)
     return readv_writev(fd, iov, iov_cnt, true);
 }
 #endif
+
+int64_t qemu_fd_getlength(int fd)
+{
+    int64_t size;
+
+    size = lseek(fd, 0, SEEK_END);
+    if (size < 0) {
+        return -errno;
+    }
+    return size;
+}
+
+size_t qemu_file_getlength(const char *file, Error **errp)
+{
+    int64_t size;
+    int fd = qemu_open(file, O_RDONLY);
+
+    if (fd < 0) {
+        error_setg_file_open(errp, errno, file);
+        return 0;
+    }
+
+    size = qemu_fd_getlength(fd);
+    if (size < 0) {
+        error_setg_errno(errp, -size, "can't get size of file %s", file);
+        size = 0;
+    }
+
+    qemu_close(fd);
+    return size;
+}
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 11/35] util: introduce qemu_file_getlength()
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

It is used to get the size of the specified file, also qemu_fd_getlength()
is introduced to unify the code with raw_getlength() in block/raw-posix.c

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 block/raw-posix.c    |  7 +------
 include/qemu/osdep.h |  2 ++
 util/osdep.c         | 31 +++++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 918c756..734e6dd 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1592,18 +1592,13 @@ static int64_t raw_getlength(BlockDriverState *bs)
 {
     BDRVRawState *s = bs->opaque;
     int ret;
-    int64_t size;
 
     ret = fd_open(bs);
     if (ret < 0) {
         return ret;
     }
 
-    size = lseek(s->fd, 0, SEEK_END);
-    if (size < 0) {
-        return -errno;
-    }
-    return size;
+    return qemu_fd_getlength(s->fd);
 }
 #endif
 
diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index dbc17dc..ca4c3fa 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -303,4 +303,6 @@ int qemu_read_password(char *buf, int buf_size);
 pid_t qemu_fork(Error **errp);
 
 size_t qemu_file_get_page_size(const char *mem_path, Error **errp);
+int64_t qemu_fd_getlength(int fd);
+size_t qemu_file_getlength(const char *file, Error **errp);
 #endif
diff --git a/util/osdep.c b/util/osdep.c
index 0092bb6..5a61e19 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -428,3 +428,34 @@ writev(int fd, const struct iovec *iov, int iov_cnt)
     return readv_writev(fd, iov, iov_cnt, true);
 }
 #endif
+
+int64_t qemu_fd_getlength(int fd)
+{
+    int64_t size;
+
+    size = lseek(fd, 0, SEEK_END);
+    if (size < 0) {
+        return -errno;
+    }
+    return size;
+}
+
+size_t qemu_file_getlength(const char *file, Error **errp)
+{
+    int64_t size;
+    int fd = qemu_open(file, O_RDONLY);
+
+    if (fd < 0) {
+        error_setg_file_open(errp, errno, file);
+        return 0;
+    }
+
+    size = qemu_fd_getlength(fd);
+    if (size < 0) {
+        error_setg_errno(errp, -size, "can't get size of file %s", file);
+        size = 0;
+    }
+
+    qemu_close(fd);
+    return size;
+}
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 12/35] util: let qemu_fd_getlength support block device
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

lseek can not work for all block devices as the man page says:
| Some devices are incapable of seeking and POSIX does not specify
| which devices must support lseek().

This patch tries to add the support on Linux by using BLKGETSIZE64
ioctl

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 util/osdep.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/util/osdep.c b/util/osdep.c
index 5a61e19..b20c793 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -45,6 +45,11 @@
 extern int madvise(caddr_t, size_t, int);
 #endif
 
+#ifdef CONFIG_LINUX
+#include <sys/ioctl.h>
+#include <linux/fs.h>
+#endif
+
 #include "qemu-common.h"
 #include "qemu/sockets.h"
 #include "qemu/error-report.h"
@@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
 {
     int64_t size;
 
+#ifdef CONFIG_LINUX
+    struct stat stat_buf;
+    if (fstat(fd, &stat_buf) < 0) {
+        return -errno;
+    }
+
+    if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) {
+        /* The size of block device is larger than max int64_t? */
+        if (size < 0) {
+            return -EOVERFLOW;
+        }
+        return size;
+    }
+#endif
+
     size = lseek(fd, 0, SEEK_END);
     if (size < 0) {
         return -errno;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 12/35] util: let qemu_fd_getlength support block device
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

lseek can not work for all block devices as the man page says:
| Some devices are incapable of seeking and POSIX does not specify
| which devices must support lseek().

This patch tries to add the support on Linux by using BLKGETSIZE64
ioctl

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 util/osdep.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/util/osdep.c b/util/osdep.c
index 5a61e19..b20c793 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -45,6 +45,11 @@
 extern int madvise(caddr_t, size_t, int);
 #endif
 
+#ifdef CONFIG_LINUX
+#include <sys/ioctl.h>
+#include <linux/fs.h>
+#endif
+
 #include "qemu-common.h"
 #include "qemu/sockets.h"
 #include "qemu/error-report.h"
@@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
 {
     int64_t size;
 
+#ifdef CONFIG_LINUX
+    struct stat stat_buf;
+    if (fstat(fd, &stat_buf) < 0) {
+        return -errno;
+    }
+
+    if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) {
+        /* The size of block device is larger than max int64_t? */
+        if (size < 0) {
+            return -EOVERFLOW;
+        }
+        return size;
+    }
+#endif
+
     size = lseek(fd, 0, SEEK_END);
     if (size < 0) {
         return -errno;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 13/35] hostmem-file: use whole file size if possible
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Use the whole file size if @size is not specified which is useful
if we want to directly pass a file to guest

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 backends/hostmem-file.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 9097a57..ea355c1 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -38,15 +38,29 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 {
     HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
 
-    if (!backend->size) {
-        error_setg(errp, "can't create backend with size 0");
-        return;
-    }
     if (!fb->mem_path) {
         error_setg(errp, "mem-path property not set");
         return;
     }
 
+    if (!backend->size) {
+        Error *local_err = NULL;
+
+        /*
+         * use the whole file size if @size is not specified.
+         */
+        backend->size = qemu_file_getlength(fb->mem_path, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+
+    if (!backend->size) {
+        error_setg(errp, "can't create backend on the file whose size is 0");
+        return;
+    }
+
     backend->force_prealloc = mem_prealloc;
     memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
                                  object_get_canonical_path(OBJECT(backend)),
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 13/35] hostmem-file: use whole file size if possible
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Use the whole file size if @size is not specified which is useful
if we want to directly pass a file to guest

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 backends/hostmem-file.c | 22 ++++++++++++++++++----
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
index 9097a57..ea355c1 100644
--- a/backends/hostmem-file.c
+++ b/backends/hostmem-file.c
@@ -38,15 +38,29 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
 {
     HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
 
-    if (!backend->size) {
-        error_setg(errp, "can't create backend with size 0");
-        return;
-    }
     if (!fb->mem_path) {
         error_setg(errp, "mem-path property not set");
         return;
     }
 
+    if (!backend->size) {
+        Error *local_err = NULL;
+
+        /*
+         * use the whole file size if @size is not specified.
+         */
+        backend->size = qemu_file_getlength(fb->mem_path, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+
+    if (!backend->size) {
+        error_setg(errp, "can't create backend on the file whose size is 0");
+        return;
+    }
+
     backend->force_prealloc = mem_prealloc;
     memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
                                  object_get_canonical_path(OBJECT(backend)),
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 14/35] pc-dimm: remove DEFAULT_PC_DIMMSIZE
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

It's not used any more

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 include/hw/mem/pc-dimm.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index d83bf30..11a8937 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -20,8 +20,6 @@
 #include "sysemu/hostmem.h"
 #include "hw/qdev.h"
 
-#define DEFAULT_PC_DIMMSIZE (1024*1024*1024)
-
 #define TYPE_PC_DIMM "pc-dimm"
 #define PC_DIMM(obj) \
     OBJECT_CHECK(PCDIMMDevice, (obj), TYPE_PC_DIMM)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 14/35] pc-dimm: remove DEFAULT_PC_DIMMSIZE
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

It's not used any more

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 include/hw/mem/pc-dimm.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index d83bf30..11a8937 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -20,8 +20,6 @@
 #include "sysemu/hostmem.h"
 #include "hw/qdev.h"
 
-#define DEFAULT_PC_DIMMSIZE (1024*1024*1024)
-
 #define TYPE_PC_DIMM "pc-dimm"
 #define PC_DIMM(obj) \
     OBJECT_CHECK(PCDIMMDevice, (obj), TYPE_PC_DIMM)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 15/35] pc-dimm: make pc_existing_dimms_capacity static and rename it
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

pc_existing_dimms_capacity() can be static since it is not used out of
pc-dimm.c and drop the pc_ prefix to prepare the work which abstracts
dimm device type from pc-dimm

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/pc-dimm.c         | 73 ++++++++++++++++++++++++------------------------
 include/hw/mem/pc-dimm.h |  1 -
 2 files changed, 36 insertions(+), 38 deletions(-)

diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 80f424b..2dcbbcd 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -32,6 +32,38 @@ typedef struct pc_dimms_capacity {
      Error    **errp;
 } pc_dimms_capacity;
 
+static int existing_dimms_capacity_internal(Object *obj, void *opaque)
+{
+    pc_dimms_capacity *cap = opaque;
+    uint64_t *size = &cap->size;
+
+    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
+        DeviceState *dev = DEVICE(obj);
+
+        if (dev->realized) {
+            (*size) += object_property_get_int(obj, PC_DIMM_SIZE_PROP,
+                cap->errp);
+        }
+
+        if (cap->errp && *cap->errp) {
+            return 1;
+        }
+    }
+    object_child_foreach(obj, existing_dimms_capacity_internal, opaque);
+    return 0;
+}
+
+static uint64_t existing_dimms_capacity(Error **errp)
+{
+    pc_dimms_capacity cap;
+
+    cap.size = 0;
+    cap.errp = errp;
+
+    existing_dimms_capacity_internal(qdev_get_machine(), &cap);
+    return cap.size;
+}
+
 void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
                          MemoryRegion *mr, uint64_t align, Error **errp)
 {
@@ -39,7 +71,7 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
     MachineState *machine = MACHINE(qdev_get_machine());
     PCDIMMDevice *dimm = PC_DIMM(dev);
     Error *local_err = NULL;
-    uint64_t existing_dimms_capacity = 0;
+    uint64_t dimms_capacity = 0;
     uint64_t addr;
 
     addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, &local_err);
@@ -55,17 +87,16 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
         goto out;
     }
 
-    existing_dimms_capacity = pc_existing_dimms_capacity(&local_err);
+    dimms_capacity = existing_dimms_capacity(&local_err);
     if (local_err) {
         goto out;
     }
 
-    if (existing_dimms_capacity + memory_region_size(mr) >
+    if (dimms_capacity + memory_region_size(mr) >
         machine->maxram_size - machine->ram_size) {
         error_setg(&local_err, "not enough space, currently 0x%" PRIx64
                    " in use of total hot pluggable 0x" RAM_ADDR_FMT,
-                   existing_dimms_capacity,
-                   machine->maxram_size - machine->ram_size);
+                   dimms_capacity, machine->maxram_size - machine->ram_size);
         goto out;
     }
 
@@ -120,38 +151,6 @@ void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
     vmstate_unregister_ram(mr, dev);
 }
 
-static int pc_existing_dimms_capacity_internal(Object *obj, void *opaque)
-{
-    pc_dimms_capacity *cap = opaque;
-    uint64_t *size = &cap->size;
-
-    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
-        DeviceState *dev = DEVICE(obj);
-
-        if (dev->realized) {
-            (*size) += object_property_get_int(obj, PC_DIMM_SIZE_PROP,
-                cap->errp);
-        }
-
-        if (cap->errp && *cap->errp) {
-            return 1;
-        }
-    }
-    object_child_foreach(obj, pc_existing_dimms_capacity_internal, opaque);
-    return 0;
-}
-
-uint64_t pc_existing_dimms_capacity(Error **errp)
-{
-    pc_dimms_capacity cap;
-
-    cap.size = 0;
-    cap.errp = errp;
-
-    pc_existing_dimms_capacity_internal(qdev_get_machine(), &cap);
-    return cap.size;
-}
-
 int qmp_pc_dimm_device_list(Object *obj, void *opaque)
 {
     MemoryDeviceInfoList ***prev = opaque;
diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index 11a8937..8a43548 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -87,7 +87,6 @@ uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
 int pc_dimm_get_free_slot(const int *hint, int max_slots, Error **errp);
 
 int qmp_pc_dimm_device_list(Object *obj, void *opaque);
-uint64_t pc_existing_dimms_capacity(Error **errp);
 void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
                          MemoryRegion *mr, uint64_t align, Error **errp);
 void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 15/35] pc-dimm: make pc_existing_dimms_capacity static and rename it
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

pc_existing_dimms_capacity() can be static since it is not used out of
pc-dimm.c and drop the pc_ prefix to prepare the work which abstracts
dimm device type from pc-dimm

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/pc-dimm.c         | 73 ++++++++++++++++++++++++------------------------
 include/hw/mem/pc-dimm.h |  1 -
 2 files changed, 36 insertions(+), 38 deletions(-)

diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 80f424b..2dcbbcd 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -32,6 +32,38 @@ typedef struct pc_dimms_capacity {
      Error    **errp;
 } pc_dimms_capacity;
 
+static int existing_dimms_capacity_internal(Object *obj, void *opaque)
+{
+    pc_dimms_capacity *cap = opaque;
+    uint64_t *size = &cap->size;
+
+    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
+        DeviceState *dev = DEVICE(obj);
+
+        if (dev->realized) {
+            (*size) += object_property_get_int(obj, PC_DIMM_SIZE_PROP,
+                cap->errp);
+        }
+
+        if (cap->errp && *cap->errp) {
+            return 1;
+        }
+    }
+    object_child_foreach(obj, existing_dimms_capacity_internal, opaque);
+    return 0;
+}
+
+static uint64_t existing_dimms_capacity(Error **errp)
+{
+    pc_dimms_capacity cap;
+
+    cap.size = 0;
+    cap.errp = errp;
+
+    existing_dimms_capacity_internal(qdev_get_machine(), &cap);
+    return cap.size;
+}
+
 void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
                          MemoryRegion *mr, uint64_t align, Error **errp)
 {
@@ -39,7 +71,7 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
     MachineState *machine = MACHINE(qdev_get_machine());
     PCDIMMDevice *dimm = PC_DIMM(dev);
     Error *local_err = NULL;
-    uint64_t existing_dimms_capacity = 0;
+    uint64_t dimms_capacity = 0;
     uint64_t addr;
 
     addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, &local_err);
@@ -55,17 +87,16 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
         goto out;
     }
 
-    existing_dimms_capacity = pc_existing_dimms_capacity(&local_err);
+    dimms_capacity = existing_dimms_capacity(&local_err);
     if (local_err) {
         goto out;
     }
 
-    if (existing_dimms_capacity + memory_region_size(mr) >
+    if (dimms_capacity + memory_region_size(mr) >
         machine->maxram_size - machine->ram_size) {
         error_setg(&local_err, "not enough space, currently 0x%" PRIx64
                    " in use of total hot pluggable 0x" RAM_ADDR_FMT,
-                   existing_dimms_capacity,
-                   machine->maxram_size - machine->ram_size);
+                   dimms_capacity, machine->maxram_size - machine->ram_size);
         goto out;
     }
 
@@ -120,38 +151,6 @@ void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
     vmstate_unregister_ram(mr, dev);
 }
 
-static int pc_existing_dimms_capacity_internal(Object *obj, void *opaque)
-{
-    pc_dimms_capacity *cap = opaque;
-    uint64_t *size = &cap->size;
-
-    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
-        DeviceState *dev = DEVICE(obj);
-
-        if (dev->realized) {
-            (*size) += object_property_get_int(obj, PC_DIMM_SIZE_PROP,
-                cap->errp);
-        }
-
-        if (cap->errp && *cap->errp) {
-            return 1;
-        }
-    }
-    object_child_foreach(obj, pc_existing_dimms_capacity_internal, opaque);
-    return 0;
-}
-
-uint64_t pc_existing_dimms_capacity(Error **errp)
-{
-    pc_dimms_capacity cap;
-
-    cap.size = 0;
-    cap.errp = errp;
-
-    pc_existing_dimms_capacity_internal(qdev_get_machine(), &cap);
-    return cap.size;
-}
-
 int qmp_pc_dimm_device_list(Object *obj, void *opaque)
 {
     MemoryDeviceInfoList ***prev = opaque;
diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index 11a8937..8a43548 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -87,7 +87,6 @@ uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
 int pc_dimm_get_free_slot(const int *hint, int max_slots, Error **errp);
 
 int qmp_pc_dimm_device_list(Object *obj, void *opaque);
-uint64_t pc_existing_dimms_capacity(Error **errp);
 void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
                          MemoryRegion *mr, uint64_t align, Error **errp);
 void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 16/35] pc-dimm: drop the prefix of pc-dimm
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

This patch is generated by this script:

find ./ -name "*.[ch]" -o -name "*.json" -o -name "trace-events" \
| xargs sed -i "s/PC_DIMM/DIMM/g"

find ./ -name "*.[ch]" -o -name "*.json" -o -name "trace-events" \
| xargs sed -i "s/PCDIMM/DIMM/g"

find ./ -name "*.[ch]" -o -name "*.json" -o -name "trace-events" \
| xargs sed -i "s/pc_dimm/dimm/g"

find ./ -name "trace-events" | xargs sed -i "s/pc-dimm/dimm/g"

It prepares the work which abstracts dimm device type for both pc-dimm and
nvdimm

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hmp.c                           |   2 +-
 hw/acpi/ich9.c                  |   6 +-
 hw/acpi/memory_hotplug.c        |  16 ++---
 hw/acpi/piix4.c                 |   6 +-
 hw/i386/pc.c                    |  32 ++++-----
 hw/mem/pc-dimm.c                | 148 ++++++++++++++++++++--------------------
 hw/ppc/spapr.c                  |  18 ++---
 include/hw/mem/pc-dimm.h        |  62 ++++++++---------
 numa.c                          |   2 +-
 qapi-schema.json                |   8 +--
 qmp.c                           |   2 +-
 stubs/qmp_pc_dimm_device_list.c |   2 +-
 trace-events                    |   8 +--
 13 files changed, 156 insertions(+), 156 deletions(-)

diff --git a/hmp.c b/hmp.c
index 5048eee..5c617d2 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1952,7 +1952,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
     MemoryDeviceInfoList *info_list = qmp_query_memory_devices(&err);
     MemoryDeviceInfoList *info;
     MemoryDeviceInfo *value;
-    PCDIMMDeviceInfo *di;
+    DIMMDeviceInfo *di;
 
     for (info = info_list; info; info = info->next) {
         value = info->value;
diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 1c7fcfa..b0d6a67 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -440,7 +440,7 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm, Error **errp)
 void ich9_pm_device_plug_cb(ICH9LPCPMRegs *pm, DeviceState *dev, Error **errp)
 {
     if (pm->acpi_memory_hotplug.is_enabled &&
-        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         acpi_memory_plug_cb(&pm->acpi_regs, pm->irq, &pm->acpi_memory_hotplug,
                             dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
@@ -455,7 +455,7 @@ void ich9_pm_device_unplug_request_cb(ICH9LPCPMRegs *pm, DeviceState *dev,
                                       Error **errp)
 {
     if (pm->acpi_memory_hotplug.is_enabled &&
-        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         acpi_memory_unplug_request_cb(&pm->acpi_regs, pm->irq,
                                       &pm->acpi_memory_hotplug, dev, errp);
     } else {
@@ -468,7 +468,7 @@ void ich9_pm_device_unplug_cb(ICH9LPCPMRegs *pm, DeviceState *dev,
                               Error **errp)
 {
     if (pm->acpi_memory_hotplug.is_enabled &&
-        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         acpi_memory_unplug_cb(&pm->acpi_memory_hotplug, dev, errp);
     } else {
         error_setg(errp, "acpi: device unplug for not supported device"
diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index ce428df..e687852 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -54,23 +54,23 @@ static uint64_t acpi_memory_hotplug_read(void *opaque, hwaddr addr,
     o = OBJECT(mdev->dimm);
     switch (addr) {
     case 0x0: /* Lo part of phys address where DIMM is mapped */
-        val = o ? object_property_get_int(o, PC_DIMM_ADDR_PROP, NULL) : 0;
+        val = o ? object_property_get_int(o, DIMM_ADDR_PROP, NULL) : 0;
         trace_mhp_acpi_read_addr_lo(mem_st->selector, val);
         break;
     case 0x4: /* Hi part of phys address where DIMM is mapped */
-        val = o ? object_property_get_int(o, PC_DIMM_ADDR_PROP, NULL) >> 32 : 0;
+        val = o ? object_property_get_int(o, DIMM_ADDR_PROP, NULL) >> 32 : 0;
         trace_mhp_acpi_read_addr_hi(mem_st->selector, val);
         break;
     case 0x8: /* Lo part of DIMM size */
-        val = o ? object_property_get_int(o, PC_DIMM_SIZE_PROP, NULL) : 0;
+        val = o ? object_property_get_int(o, DIMM_SIZE_PROP, NULL) : 0;
         trace_mhp_acpi_read_size_lo(mem_st->selector, val);
         break;
     case 0xc: /* Hi part of DIMM size */
-        val = o ? object_property_get_int(o, PC_DIMM_SIZE_PROP, NULL) >> 32 : 0;
+        val = o ? object_property_get_int(o, DIMM_SIZE_PROP, NULL) >> 32 : 0;
         trace_mhp_acpi_read_size_hi(mem_st->selector, val);
         break;
     case 0x10: /* node proximity for _PXM method */
-        val = o ? object_property_get_int(o, PC_DIMM_NODE_PROP, NULL) : 0;
+        val = o ? object_property_get_int(o, DIMM_NODE_PROP, NULL) : 0;
         trace_mhp_acpi_read_pxm(mem_st->selector, val);
         break;
     case 0x14: /* pack and return is_* fields */
@@ -151,13 +151,13 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr addr, uint64_t data,
             /* call pc-dimm unplug cb */
             hotplug_handler_unplug(hotplug_ctrl, dev, &local_err);
             if (local_err) {
-                trace_mhp_acpi_pc_dimm_delete_failed(mem_st->selector);
+                trace_mhp_acpi_dimm_delete_failed(mem_st->selector);
                 qapi_event_send_mem_unplug_error(dev->id,
                                                  error_get_pretty(local_err),
                                                  &error_abort);
                 break;
             }
-            trace_mhp_acpi_pc_dimm_deleted(mem_st->selector);
+            trace_mhp_acpi_dimm_deleted(mem_st->selector);
         }
         break;
     default:
@@ -206,7 +206,7 @@ acpi_memory_slot_status(MemHotplugState *mem_st,
                         DeviceState *dev, Error **errp)
 {
     Error *local_err = NULL;
-    int slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP,
+    int slot = object_property_get_int(OBJECT(dev), DIMM_SLOT_PROP,
                                        &local_err);
 
     if (local_err) {
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 2cd2fee..0b2cb6e 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -344,7 +344,7 @@ static void piix4_device_plug_cb(HotplugHandler *hotplug_dev,
     PIIX4PMState *s = PIIX4_PM(hotplug_dev);
 
     if (s->acpi_memory_hotplug.is_enabled &&
-        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         acpi_memory_plug_cb(&s->ar, s->irq, &s->acpi_memory_hotplug, dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
         acpi_pcihp_device_plug_cb(&s->ar, s->irq, &s->acpi_pci_hotplug, dev,
@@ -363,7 +363,7 @@ static void piix4_device_unplug_request_cb(HotplugHandler *hotplug_dev,
     PIIX4PMState *s = PIIX4_PM(hotplug_dev);
 
     if (s->acpi_memory_hotplug.is_enabled &&
-        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         acpi_memory_unplug_request_cb(&s->ar, s->irq, &s->acpi_memory_hotplug,
                                       dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
@@ -381,7 +381,7 @@ static void piix4_device_unplug_cb(HotplugHandler *hotplug_dev,
     PIIX4PMState *s = PIIX4_PM(hotplug_dev);
 
     if (s->acpi_memory_hotplug.is_enabled &&
-        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         acpi_memory_unplug_cb(&s->acpi_memory_hotplug, dev, errp);
     } else {
         error_setg(errp, "acpi: device unplug for not supported device"
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 0cb8afd..67ecc4f 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1610,14 +1610,14 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name)
     }
 }
 
-static void pc_dimm_plug(HotplugHandler *hotplug_dev,
+static void dimm_plug(HotplugHandler *hotplug_dev,
                          DeviceState *dev, Error **errp)
 {
     HotplugHandlerClass *hhc;
     Error *local_err = NULL;
     PCMachineState *pcms = PC_MACHINE(hotplug_dev);
-    PCDIMMDevice *dimm = PC_DIMM(dev);
-    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    DIMMDevice *dimm = DIMM(dev);
+    DIMMDeviceClass *ddc = DIMM_GET_CLASS(dimm);
     MemoryRegion *mr = ddc->get_memory_region(dimm);
     uint64_t align = TARGET_PAGE_SIZE;
 
@@ -1631,7 +1631,7 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
         goto out;
     }
 
-    pc_dimm_memory_plug(dev, &pcms->hotplug_memory, mr, align, &local_err);
+    dimm_memory_plug(dev, &pcms->hotplug_memory, mr, align, &local_err);
     if (local_err) {
         goto out;
     }
@@ -1642,7 +1642,7 @@ out:
     error_propagate(errp, local_err);
 }
 
-static void pc_dimm_unplug_request(HotplugHandler *hotplug_dev,
+static void dimm_unplug_request(HotplugHandler *hotplug_dev,
                                    DeviceState *dev, Error **errp)
 {
     HotplugHandlerClass *hhc;
@@ -1662,12 +1662,12 @@ out:
     error_propagate(errp, local_err);
 }
 
-static void pc_dimm_unplug(HotplugHandler *hotplug_dev,
+static void dimm_unplug(HotplugHandler *hotplug_dev,
                            DeviceState *dev, Error **errp)
 {
     PCMachineState *pcms = PC_MACHINE(hotplug_dev);
-    PCDIMMDevice *dimm = PC_DIMM(dev);
-    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    DIMMDevice *dimm = DIMM(dev);
+    DIMMDeviceClass *ddc = DIMM_GET_CLASS(dimm);
     MemoryRegion *mr = ddc->get_memory_region(dimm);
     HotplugHandlerClass *hhc;
     Error *local_err = NULL;
@@ -1679,7 +1679,7 @@ static void pc_dimm_unplug(HotplugHandler *hotplug_dev,
         goto out;
     }
 
-    pc_dimm_memory_unplug(dev, &pcms->hotplug_memory, mr);
+    dimm_memory_unplug(dev, &pcms->hotplug_memory, mr);
     object_unparent(OBJECT(dev));
 
  out:
@@ -1718,8 +1718,8 @@ out:
 static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
                                       DeviceState *dev, Error **errp)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
-        pc_dimm_plug(hotplug_dev, dev, errp);
+    if (object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
+        dimm_plug(hotplug_dev, dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         pc_cpu_plug(hotplug_dev, dev, errp);
     }
@@ -1728,8 +1728,8 @@ static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
 static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
                                                 DeviceState *dev, Error **errp)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
-        pc_dimm_unplug_request(hotplug_dev, dev, errp);
+    if (object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
+        dimm_unplug_request(hotplug_dev, dev, errp);
     } else {
         error_setg(errp, "acpi: device unplug request for not supported device"
                    " type: %s", object_get_typename(OBJECT(dev)));
@@ -1739,8 +1739,8 @@ static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
 static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
                                         DeviceState *dev, Error **errp)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
-        pc_dimm_unplug(hotplug_dev, dev, errp);
+    if (object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
+        dimm_unplug(hotplug_dev, dev, errp);
     } else {
         error_setg(errp, "acpi: device unplug for not supported device"
                    " type: %s", object_get_typename(OBJECT(dev)));
@@ -1752,7 +1752,7 @@ static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,
 {
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
 
-    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
+    if (object_dynamic_cast(OBJECT(dev), TYPE_DIMM) ||
         object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         return HOTPLUG_HANDLER(machine);
     }
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 2dcbbcd..67afc53 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -27,21 +27,21 @@
 #include "trace.h"
 #include "hw/virtio/vhost.h"
 
-typedef struct pc_dimms_capacity {
+typedef struct dimms_capacity {
      uint64_t size;
      Error    **errp;
-} pc_dimms_capacity;
+} dimms_capacity;
 
 static int existing_dimms_capacity_internal(Object *obj, void *opaque)
 {
-    pc_dimms_capacity *cap = opaque;
+    dimms_capacity *cap = opaque;
     uint64_t *size = &cap->size;
 
-    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
+    if (object_dynamic_cast(obj, TYPE_DIMM)) {
         DeviceState *dev = DEVICE(obj);
 
         if (dev->realized) {
-            (*size) += object_property_get_int(obj, PC_DIMM_SIZE_PROP,
+            (*size) += object_property_get_int(obj, DIMM_SIZE_PROP,
                 cap->errp);
         }
 
@@ -55,7 +55,7 @@ static int existing_dimms_capacity_internal(Object *obj, void *opaque)
 
 static uint64_t existing_dimms_capacity(Error **errp)
 {
-    pc_dimms_capacity cap;
+    dimms_capacity cap;
 
     cap.size = 0;
     cap.errp = errp;
@@ -64,22 +64,22 @@ static uint64_t existing_dimms_capacity(Error **errp)
     return cap.size;
 }
 
-void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
+void dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
                          MemoryRegion *mr, uint64_t align, Error **errp)
 {
     int slot;
     MachineState *machine = MACHINE(qdev_get_machine());
-    PCDIMMDevice *dimm = PC_DIMM(dev);
+    DIMMDevice *dimm = DIMM(dev);
     Error *local_err = NULL;
     uint64_t dimms_capacity = 0;
     uint64_t addr;
 
-    addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, &local_err);
+    addr = object_property_get_int(OBJECT(dimm), DIMM_ADDR_PROP, &local_err);
     if (local_err) {
         goto out;
     }
 
-    addr = pc_dimm_get_free_addr(hpms->base,
+    addr = dimm_get_free_addr(hpms->base,
                                  memory_region_size(&hpms->mr),
                                  !addr ? NULL : &addr, align,
                                  memory_region_size(mr), &local_err);
@@ -100,27 +100,27 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
         goto out;
     }
 
-    object_property_set_int(OBJECT(dev), addr, PC_DIMM_ADDR_PROP, &local_err);
+    object_property_set_int(OBJECT(dev), addr, DIMM_ADDR_PROP, &local_err);
     if (local_err) {
         goto out;
     }
-    trace_mhp_pc_dimm_assigned_address(addr);
+    trace_mhp_dimm_assigned_address(addr);
 
-    slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP, &local_err);
+    slot = object_property_get_int(OBJECT(dev), DIMM_SLOT_PROP, &local_err);
     if (local_err) {
         goto out;
     }
 
-    slot = pc_dimm_get_free_slot(slot == PC_DIMM_UNASSIGNED_SLOT ? NULL : &slot,
+    slot = dimm_get_free_slot(slot == DIMM_UNASSIGNED_SLOT ? NULL : &slot,
                                  machine->ram_slots, &local_err);
     if (local_err) {
         goto out;
     }
-    object_property_set_int(OBJECT(dev), slot, PC_DIMM_SLOT_PROP, &local_err);
+    object_property_set_int(OBJECT(dev), slot, DIMM_SLOT_PROP, &local_err);
     if (local_err) {
         goto out;
     }
-    trace_mhp_pc_dimm_assigned_slot(slot);
+    trace_mhp_dimm_assigned_slot(slot);
 
     if (kvm_enabled() && !kvm_has_free_slot(machine)) {
         error_setg(&local_err, "hypervisor has no free memory slots left");
@@ -141,29 +141,29 @@ out:
     error_propagate(errp, local_err);
 }
 
-void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
+void dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
                            MemoryRegion *mr)
 {
-    PCDIMMDevice *dimm = PC_DIMM(dev);
+    DIMMDevice *dimm = DIMM(dev);
 
     numa_unset_mem_node_id(dimm->addr, memory_region_size(mr), dimm->node);
     memory_region_del_subregion(&hpms->mr, mr);
     vmstate_unregister_ram(mr, dev);
 }
 
-int qmp_pc_dimm_device_list(Object *obj, void *opaque)
+int qmp_dimm_device_list(Object *obj, void *opaque)
 {
     MemoryDeviceInfoList ***prev = opaque;
 
-    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
+    if (object_dynamic_cast(obj, TYPE_DIMM)) {
         DeviceState *dev = DEVICE(obj);
 
         if (dev->realized) {
             MemoryDeviceInfoList *elem = g_new0(MemoryDeviceInfoList, 1);
             MemoryDeviceInfo *info = g_new0(MemoryDeviceInfo, 1);
-            PCDIMMDeviceInfo *di = g_new0(PCDIMMDeviceInfo, 1);
+            DIMMDeviceInfo *di = g_new0(DIMMDeviceInfo, 1);
             DeviceClass *dc = DEVICE_GET_CLASS(obj);
-            PCDIMMDevice *dimm = PC_DIMM(obj);
+            DIMMDevice *dimm = DIMM(obj);
 
             if (dev->id) {
                 di->has_id = true;
@@ -174,7 +174,7 @@ int qmp_pc_dimm_device_list(Object *obj, void *opaque)
             di->addr = dimm->addr;
             di->slot = dimm->slot;
             di->node = dimm->node;
-            di->size = object_property_get_int(OBJECT(dimm), PC_DIMM_SIZE_PROP,
+            di->size = object_property_get_int(OBJECT(dimm), DIMM_SIZE_PROP,
                                                NULL);
             di->memdev = object_get_canonical_path(OBJECT(dimm->hostmem));
 
@@ -186,7 +186,7 @@ int qmp_pc_dimm_device_list(Object *obj, void *opaque)
         }
     }
 
-    object_child_foreach(obj, qmp_pc_dimm_device_list, opaque);
+    object_child_foreach(obj, qmp_dimm_device_list, opaque);
     return 0;
 }
 
@@ -197,7 +197,7 @@ ram_addr_t get_current_ram_size(void)
     MemoryDeviceInfoList *info;
     ram_addr_t size = ram_size;
 
-    qmp_pc_dimm_device_list(qdev_get_machine(), &prev);
+    qmp_dimm_device_list(qdev_get_machine(), &prev);
     for (info = info_list; info; info = info->next) {
         MemoryDeviceInfo *value = info->value;
 
@@ -216,28 +216,28 @@ ram_addr_t get_current_ram_size(void)
     return size;
 }
 
-static int pc_dimm_slot2bitmap(Object *obj, void *opaque)
+static int dimm_slot2bitmap(Object *obj, void *opaque)
 {
     unsigned long *bitmap = opaque;
 
-    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
+    if (object_dynamic_cast(obj, TYPE_DIMM)) {
         DeviceState *dev = DEVICE(obj);
         if (dev->realized) { /* count only realized DIMMs */
-            PCDIMMDevice *d = PC_DIMM(obj);
+            DIMMDevice *d = DIMM(obj);
             set_bit(d->slot, bitmap);
         }
     }
 
-    object_child_foreach(obj, pc_dimm_slot2bitmap, opaque);
+    object_child_foreach(obj, dimm_slot2bitmap, opaque);
     return 0;
 }
 
-int pc_dimm_get_free_slot(const int *hint, int max_slots, Error **errp)
+int dimm_get_free_slot(const int *hint, int max_slots, Error **errp)
 {
     unsigned long *bitmap = bitmap_new(max_slots);
     int slot = 0;
 
-    object_child_foreach(qdev_get_machine(), pc_dimm_slot2bitmap, bitmap);
+    object_child_foreach(qdev_get_machine(), dimm_slot2bitmap, bitmap);
 
     /* check if requested slot is not occupied */
     if (hint) {
@@ -262,10 +262,10 @@ out:
     return slot;
 }
 
-static gint pc_dimm_addr_sort(gconstpointer a, gconstpointer b)
+static gint dimm_addr_sort(gconstpointer a, gconstpointer b)
 {
-    PCDIMMDevice *x = PC_DIMM(a);
-    PCDIMMDevice *y = PC_DIMM(b);
+    DIMMDevice *x = DIMM(a);
+    DIMMDevice *y = DIMM(b);
     Int128 diff = int128_sub(int128_make64(x->addr), int128_make64(y->addr));
 
     if (int128_lt(diff, int128_zero())) {
@@ -276,22 +276,22 @@ static gint pc_dimm_addr_sort(gconstpointer a, gconstpointer b)
     return 0;
 }
 
-static int pc_dimm_built_list(Object *obj, void *opaque)
+static int dimm_built_list(Object *obj, void *opaque)
 {
     GSList **list = opaque;
 
-    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
+    if (object_dynamic_cast(obj, TYPE_DIMM)) {
         DeviceState *dev = DEVICE(obj);
         if (dev->realized) { /* only realized DIMMs matter */
-            *list = g_slist_insert_sorted(*list, dev, pc_dimm_addr_sort);
+            *list = g_slist_insert_sorted(*list, dev, dimm_addr_sort);
         }
     }
 
-    object_child_foreach(obj, pc_dimm_built_list, opaque);
+    object_child_foreach(obj, dimm_built_list, opaque);
     return 0;
 }
 
-uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
+uint64_t dimm_get_free_addr(uint64_t address_space_start,
                                uint64_t address_space_size,
                                uint64_t *hint, uint64_t align, uint64_t size,
                                Error **errp)
@@ -321,7 +321,7 @@ uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
     }
 
     assert(address_space_end > address_space_start);
-    object_child_foreach(qdev_get_machine(), pc_dimm_built_list, &list);
+    object_child_foreach(qdev_get_machine(), dimm_built_list, &list);
 
     if (hint) {
         new_addr = *hint;
@@ -331,9 +331,9 @@ uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
 
     /* find address range that will fit new DIMM */
     for (item = list; item; item = g_slist_next(item)) {
-        PCDIMMDevice *dimm = item->data;
+        DIMMDevice *dimm = item->data;
         uint64_t dimm_size = object_property_get_int(OBJECT(dimm),
-                                                     PC_DIMM_SIZE_PROP,
+                                                     DIMM_SIZE_PROP,
                                                      errp);
         if (errp && *errp) {
             goto out;
@@ -363,20 +363,20 @@ out:
     return ret;
 }
 
-static Property pc_dimm_properties[] = {
-    DEFINE_PROP_UINT64(PC_DIMM_ADDR_PROP, PCDIMMDevice, addr, 0),
-    DEFINE_PROP_UINT32(PC_DIMM_NODE_PROP, PCDIMMDevice, node, 0),
-    DEFINE_PROP_INT32(PC_DIMM_SLOT_PROP, PCDIMMDevice, slot,
-                      PC_DIMM_UNASSIGNED_SLOT),
+static Property dimm_properties[] = {
+    DEFINE_PROP_UINT64(DIMM_ADDR_PROP, DIMMDevice, addr, 0),
+    DEFINE_PROP_UINT32(DIMM_NODE_PROP, DIMMDevice, node, 0),
+    DEFINE_PROP_INT32(DIMM_SLOT_PROP, DIMMDevice, slot,
+                      DIMM_UNASSIGNED_SLOT),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-static void pc_dimm_get_size(Object *obj, Visitor *v, void *opaque,
+static void dimm_get_size(Object *obj, Visitor *v, void *opaque,
                           const char *name, Error **errp)
 {
     int64_t value;
     MemoryRegion *mr;
-    PCDIMMDevice *dimm = PC_DIMM(obj);
+    DIMMDevice *dimm = DIMM(obj);
 
     mr = host_memory_backend_get_memory(dimm->hostmem, errp);
     value = memory_region_size(mr);
@@ -384,7 +384,7 @@ static void pc_dimm_get_size(Object *obj, Visitor *v, void *opaque,
     visit_type_int(v, &value, name, errp);
 }
 
-static void pc_dimm_check_memdev_is_busy(Object *obj, const char *name,
+static void dimm_check_memdev_is_busy(Object *obj, const char *name,
                                       Object *val, Error **errp)
 {
     MemoryRegion *mr;
@@ -399,65 +399,65 @@ static void pc_dimm_check_memdev_is_busy(Object *obj, const char *name,
     }
 }
 
-static void pc_dimm_init(Object *obj)
+static void dimm_init(Object *obj)
 {
-    PCDIMMDevice *dimm = PC_DIMM(obj);
+    DIMMDevice *dimm = DIMM(obj);
 
-    object_property_add(obj, PC_DIMM_SIZE_PROP, "int", pc_dimm_get_size,
+    object_property_add(obj, DIMM_SIZE_PROP, "int", dimm_get_size,
                         NULL, NULL, NULL, &error_abort);
-    object_property_add_link(obj, PC_DIMM_MEMDEV_PROP, TYPE_MEMORY_BACKEND,
+    object_property_add_link(obj, DIMM_MEMDEV_PROP, TYPE_MEMORY_BACKEND,
                              (Object **)&dimm->hostmem,
-                             pc_dimm_check_memdev_is_busy,
+                             dimm_check_memdev_is_busy,
                              OBJ_PROP_LINK_UNREF_ON_RELEASE,
                              &error_abort);
 }
 
-static void pc_dimm_realize(DeviceState *dev, Error **errp)
+static void dimm_realize(DeviceState *dev, Error **errp)
 {
-    PCDIMMDevice *dimm = PC_DIMM(dev);
+    DIMMDevice *dimm = DIMM(dev);
 
     if (!dimm->hostmem) {
-        error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property is not set");
+        error_setg(errp, "'" DIMM_MEMDEV_PROP "' property is not set");
         return;
     }
     if (((nb_numa_nodes > 0) && (dimm->node >= nb_numa_nodes)) ||
         (!nb_numa_nodes && dimm->node)) {
-        error_setg(errp, "'DIMM property " PC_DIMM_NODE_PROP " has value %"
+        error_setg(errp, "'DIMM property " DIMM_NODE_PROP " has value %"
                    PRIu32 "' which exceeds the number of numa nodes: %d",
                    dimm->node, nb_numa_nodes ? nb_numa_nodes : 1);
         return;
     }
 }
 
-static MemoryRegion *pc_dimm_get_memory_region(PCDIMMDevice *dimm)
+static MemoryRegion *dimm_get_memory_region(DIMMDevice *dimm)
 {
     return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
 }
 
-static void pc_dimm_class_init(ObjectClass *oc, void *data)
+static void dimm_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
-    PCDIMMDeviceClass *ddc = PC_DIMM_CLASS(oc);
+    DIMMDeviceClass *ddc = DIMM_CLASS(oc);
 
-    dc->realize = pc_dimm_realize;
-    dc->props = pc_dimm_properties;
+    dc->realize = dimm_realize;
+    dc->props = dimm_properties;
     dc->desc = "DIMM memory module";
 
-    ddc->get_memory_region = pc_dimm_get_memory_region;
+    ddc->get_memory_region = dimm_get_memory_region;
 }
 
-static TypeInfo pc_dimm_info = {
-    .name          = TYPE_PC_DIMM,
+static TypeInfo dimm_info = {
+    .name          = TYPE_DIMM,
     .parent        = TYPE_DEVICE,
-    .instance_size = sizeof(PCDIMMDevice),
-    .instance_init = pc_dimm_init,
-    .class_init    = pc_dimm_class_init,
-    .class_size    = sizeof(PCDIMMDeviceClass),
+    .instance_size = sizeof(DIMMDevice),
+    .instance_init = dimm_init,
+    .class_init    = dimm_class_init,
+    .class_size    = sizeof(DIMMDeviceClass),
 };
 
-static void pc_dimm_register_types(void)
+static void dimm_register_types(void)
 {
-    type_register_static(&pc_dimm_info);
+    type_register_static(&dimm_info);
 }
 
-type_init(pc_dimm_register_types)
+type_init(dimm_register_types)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 288b57e..ab6eb83 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2144,8 +2144,8 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
 {
     Error *local_err = NULL;
     sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
-    PCDIMMDevice *dimm = PC_DIMM(dev);
-    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    DIMMDevice *dimm = DIMM(dev);
+    DIMMDeviceClass *ddc = DIMM_GET_CLASS(dimm);
     MemoryRegion *mr = ddc->get_memory_region(dimm);
     uint64_t align = memory_region_get_alignment(mr);
     uint64_t size = memory_region_size(mr);
@@ -2157,14 +2157,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         goto out;
     }
 
-    pc_dimm_memory_plug(dev, &ms->hotplug_memory, mr, align, &local_err);
+    dimm_memory_plug(dev, &ms->hotplug_memory, mr, align, &local_err);
     if (local_err) {
         goto out;
     }
 
-    addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, &local_err);
+    addr = object_property_get_int(OBJECT(dimm), DIMM_ADDR_PROP, &local_err);
     if (local_err) {
-        pc_dimm_memory_unplug(dev, &ms->hotplug_memory, mr);
+        dimm_memory_unplug(dev, &ms->hotplug_memory, mr);
         goto out;
     }
 
@@ -2179,14 +2179,14 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
 {
     sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(qdev_get_machine());
 
-    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+    if (object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         int node;
 
         if (!smc->dr_lmb_enabled) {
             error_setg(errp, "Memory hotplug not supported for this machine");
             return;
         }
-        node = object_property_get_int(OBJECT(dev), PC_DIMM_NODE_PROP, errp);
+        node = object_property_get_int(OBJECT(dev), DIMM_NODE_PROP, errp);
         if (*errp) {
             return;
         }
@@ -2220,7 +2220,7 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
 static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
                                       DeviceState *dev, Error **errp)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+    if (object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         error_setg(errp, "Memory hot unplug not supported by sPAPR");
     }
 }
@@ -2228,7 +2228,7 @@ static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
 static HotplugHandler *spapr_get_hotpug_handler(MachineState *machine,
                                              DeviceState *dev)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+    if (object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         return HOTPLUG_HANDLER(machine);
     }
     return NULL;
diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index 8a43548..ece8786 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -13,39 +13,39 @@
  *
  */
 
-#ifndef QEMU_PC_DIMM_H
-#define QEMU_PC_DIMM_H
+#ifndef QEMU_DIMM_H
+#define QEMU_DIMM_H
 
 #include "exec/memory.h"
 #include "sysemu/hostmem.h"
 #include "hw/qdev.h"
 
-#define TYPE_PC_DIMM "pc-dimm"
-#define PC_DIMM(obj) \
-    OBJECT_CHECK(PCDIMMDevice, (obj), TYPE_PC_DIMM)
-#define PC_DIMM_CLASS(oc) \
-    OBJECT_CLASS_CHECK(PCDIMMDeviceClass, (oc), TYPE_PC_DIMM)
-#define PC_DIMM_GET_CLASS(obj) \
-    OBJECT_GET_CLASS(PCDIMMDeviceClass, (obj), TYPE_PC_DIMM)
+#define TYPE_DIMM "pc-dimm"
+#define DIMM(obj) \
+    OBJECT_CHECK(DIMMDevice, (obj), TYPE_DIMM)
+#define DIMM_CLASS(oc) \
+    OBJECT_CLASS_CHECK(DIMMDeviceClass, (oc), TYPE_DIMM)
+#define DIMM_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(DIMMDeviceClass, (obj), TYPE_DIMM)
 
-#define PC_DIMM_ADDR_PROP "addr"
-#define PC_DIMM_SLOT_PROP "slot"
-#define PC_DIMM_NODE_PROP "node"
-#define PC_DIMM_SIZE_PROP "size"
-#define PC_DIMM_MEMDEV_PROP "memdev"
+#define DIMM_ADDR_PROP "addr"
+#define DIMM_SLOT_PROP "slot"
+#define DIMM_NODE_PROP "node"
+#define DIMM_SIZE_PROP "size"
+#define DIMM_MEMDEV_PROP "memdev"
 
-#define PC_DIMM_UNASSIGNED_SLOT -1
+#define DIMM_UNASSIGNED_SLOT -1
 
 /**
- * PCDIMMDevice:
- * @addr: starting guest physical address, where @PCDIMMDevice is mapped.
+ * DIMMDevice:
+ * @addr: starting guest physical address, where @DIMMDevice is mapped.
  *         Default value: 0, means that address is auto-allocated.
- * @node: numa node to which @PCDIMMDevice is attached.
- * @slot: slot number into which @PCDIMMDevice is plugged in.
+ * @node: numa node to which @DIMMDevice is attached.
+ * @slot: slot number into which @DIMMDevice is plugged in.
  *        Default value: -1, means that slot is auto-allocated.
- * @hostmem: host memory backend providing memory for @PCDIMMDevice
+ * @hostmem: host memory backend providing memory for @DIMMDevice
  */
-typedef struct PCDIMMDevice {
+typedef struct DIMMDevice {
     /* private */
     DeviceState parent_obj;
 
@@ -54,19 +54,19 @@ typedef struct PCDIMMDevice {
     uint32_t node;
     int32_t slot;
     HostMemoryBackend *hostmem;
-} PCDIMMDevice;
+} DIMMDevice;
 
 /**
- * PCDIMMDeviceClass:
+ * DIMMDeviceClass:
  * @get_memory_region: returns #MemoryRegion associated with @dimm
  */
-typedef struct PCDIMMDeviceClass {
+typedef struct DIMMDeviceClass {
     /* private */
     DeviceClass parent_class;
 
     /* public */
-    MemoryRegion *(*get_memory_region)(PCDIMMDevice *dimm);
-} PCDIMMDeviceClass;
+    MemoryRegion *(*get_memory_region)(DIMMDevice *dimm);
+} DIMMDeviceClass;
 
 /**
  * MemoryHotplugState:
@@ -79,16 +79,16 @@ typedef struct MemoryHotplugState {
     MemoryRegion mr;
 } MemoryHotplugState;
 
-uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
+uint64_t dimm_get_free_addr(uint64_t address_space_start,
                                uint64_t address_space_size,
                                uint64_t *hint, uint64_t align, uint64_t size,
                                Error **errp);
 
-int pc_dimm_get_free_slot(const int *hint, int max_slots, Error **errp);
+int dimm_get_free_slot(const int *hint, int max_slots, Error **errp);
 
-int qmp_pc_dimm_device_list(Object *obj, void *opaque);
-void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
+int qmp_dimm_device_list(Object *obj, void *opaque);
+void dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
                          MemoryRegion *mr, uint64_t align, Error **errp);
-void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
+void dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
                            MemoryRegion *mr);
 #endif
diff --git a/numa.c b/numa.c
index e9b18f5..cb69965 100644
--- a/numa.c
+++ b/numa.c
@@ -482,7 +482,7 @@ static void numa_stat_memory_devices(uint64_t node_mem[])
     MemoryDeviceInfoList **prev = &info_list;
     MemoryDeviceInfoList *info;
 
-    qmp_pc_dimm_device_list(qdev_get_machine(), &prev);
+    qmp_dimm_device_list(qdev_get_machine(), &prev);
     for (info = info_list; info; info = info->next) {
         MemoryDeviceInfo *value = info->value;
 
diff --git a/qapi-schema.json b/qapi-schema.json
index f60be29..aeae833 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3717,9 +3717,9 @@
 { 'command': 'query-memdev', 'returns': ['Memdev'] }
 
 ##
-# @PCDIMMDeviceInfo:
+# @DIMMDeviceInfo:
 #
-# PCDIMMDevice state information
+# DIMMDevice state information
 #
 # @id: #optional device's ID
 #
@@ -3739,7 +3739,7 @@
 #
 # Since: 2.1
 ##
-{ 'struct': 'PCDIMMDeviceInfo',
+{ 'struct': 'DIMMDeviceInfo',
   'data': { '*id': 'str',
             'addr': 'int',
             'size': 'int',
@@ -3758,7 +3758,7 @@
 #
 # Since: 2.1
 ##
-{ 'union': 'MemoryDeviceInfo', 'data': {'dimm': 'PCDIMMDeviceInfo'} }
+{ 'union': 'MemoryDeviceInfo', 'data': {'dimm': 'DIMMDeviceInfo'} }
 
 ##
 # @query-memory-devices
diff --git a/qmp.c b/qmp.c
index ff54e5a..169b981 100644
--- a/qmp.c
+++ b/qmp.c
@@ -712,7 +712,7 @@ MemoryDeviceInfoList *qmp_query_memory_devices(Error **errp)
     MemoryDeviceInfoList *head = NULL;
     MemoryDeviceInfoList **prev = &head;
 
-    qmp_pc_dimm_device_list(qdev_get_machine(), &prev);
+    qmp_dimm_device_list(qdev_get_machine(), &prev);
 
     return head;
 }
diff --git a/stubs/qmp_pc_dimm_device_list.c b/stubs/qmp_pc_dimm_device_list.c
index b584bd8..b2704c6 100644
--- a/stubs/qmp_pc_dimm_device_list.c
+++ b/stubs/qmp_pc_dimm_device_list.c
@@ -1,7 +1,7 @@
 #include "qom/object.h"
 #include "hw/mem/pc-dimm.h"
 
-int qmp_pc_dimm_device_list(Object *obj, void *opaque)
+int qmp_dimm_device_list(Object *obj, void *opaque)
 {
    return 0;
 }
diff --git a/trace-events b/trace-events
index bdfe79f..d1f258d 100644
--- a/trace-events
+++ b/trace-events
@@ -1652,12 +1652,12 @@ mhp_acpi_write_ost_ev(uint32_t slot, uint32_t ev) "slot[0x%"PRIx32"] OST EVENT:
 mhp_acpi_write_ost_status(uint32_t slot, uint32_t st) "slot[0x%"PRIx32"] OST STATUS: 0x%"PRIx32
 mhp_acpi_clear_insert_evt(uint32_t slot) "slot[0x%"PRIx32"] clear insert event"
 mhp_acpi_clear_remove_evt(uint32_t slot) "slot[0x%"PRIx32"] clear remove event"
-mhp_acpi_pc_dimm_deleted(uint32_t slot) "slot[0x%"PRIx32"] pc-dimm deleted"
-mhp_acpi_pc_dimm_delete_failed(uint32_t slot) "slot[0x%"PRIx32"] pc-dimm delete failed"
+mhp_acpi_dimm_deleted(uint32_t slot) "slot[0x%"PRIx32"] dimm deleted"
+mhp_acpi_dimm_delete_failed(uint32_t slot) "slot[0x%"PRIx32"] dimm delete failed"
 
 # hw/i386/pc.c
-mhp_pc_dimm_assigned_slot(int slot) "0x%d"
-mhp_pc_dimm_assigned_address(uint64_t addr) "0x%"PRIx64
+mhp_dimm_assigned_slot(int slot) "0x%d"
+mhp_dimm_assigned_address(uint64_t addr) "0x%"PRIx64
 
 # target-s390x/kvm.c
 kvm_enable_cmma(int rc) "CMMA: enabling with result code %d"
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 16/35] pc-dimm: drop the prefix of pc-dimm
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

This patch is generated by this script:

find ./ -name "*.[ch]" -o -name "*.json" -o -name "trace-events" \
| xargs sed -i "s/PC_DIMM/DIMM/g"

find ./ -name "*.[ch]" -o -name "*.json" -o -name "trace-events" \
| xargs sed -i "s/PCDIMM/DIMM/g"

find ./ -name "*.[ch]" -o -name "*.json" -o -name "trace-events" \
| xargs sed -i "s/pc_dimm/dimm/g"

find ./ -name "trace-events" | xargs sed -i "s/pc-dimm/dimm/g"

It prepares the work which abstracts dimm device type for both pc-dimm and
nvdimm

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hmp.c                           |   2 +-
 hw/acpi/ich9.c                  |   6 +-
 hw/acpi/memory_hotplug.c        |  16 ++---
 hw/acpi/piix4.c                 |   6 +-
 hw/i386/pc.c                    |  32 ++++-----
 hw/mem/pc-dimm.c                | 148 ++++++++++++++++++++--------------------
 hw/ppc/spapr.c                  |  18 ++---
 include/hw/mem/pc-dimm.h        |  62 ++++++++---------
 numa.c                          |   2 +-
 qapi-schema.json                |   8 +--
 qmp.c                           |   2 +-
 stubs/qmp_pc_dimm_device_list.c |   2 +-
 trace-events                    |   8 +--
 13 files changed, 156 insertions(+), 156 deletions(-)

diff --git a/hmp.c b/hmp.c
index 5048eee..5c617d2 100644
--- a/hmp.c
+++ b/hmp.c
@@ -1952,7 +1952,7 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
     MemoryDeviceInfoList *info_list = qmp_query_memory_devices(&err);
     MemoryDeviceInfoList *info;
     MemoryDeviceInfo *value;
-    PCDIMMDeviceInfo *di;
+    DIMMDeviceInfo *di;
 
     for (info = info_list; info; info = info->next) {
         value = info->value;
diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 1c7fcfa..b0d6a67 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -440,7 +440,7 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm, Error **errp)
 void ich9_pm_device_plug_cb(ICH9LPCPMRegs *pm, DeviceState *dev, Error **errp)
 {
     if (pm->acpi_memory_hotplug.is_enabled &&
-        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         acpi_memory_plug_cb(&pm->acpi_regs, pm->irq, &pm->acpi_memory_hotplug,
                             dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
@@ -455,7 +455,7 @@ void ich9_pm_device_unplug_request_cb(ICH9LPCPMRegs *pm, DeviceState *dev,
                                       Error **errp)
 {
     if (pm->acpi_memory_hotplug.is_enabled &&
-        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         acpi_memory_unplug_request_cb(&pm->acpi_regs, pm->irq,
                                       &pm->acpi_memory_hotplug, dev, errp);
     } else {
@@ -468,7 +468,7 @@ void ich9_pm_device_unplug_cb(ICH9LPCPMRegs *pm, DeviceState *dev,
                               Error **errp)
 {
     if (pm->acpi_memory_hotplug.is_enabled &&
-        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         acpi_memory_unplug_cb(&pm->acpi_memory_hotplug, dev, errp);
     } else {
         error_setg(errp, "acpi: device unplug for not supported device"
diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index ce428df..e687852 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -54,23 +54,23 @@ static uint64_t acpi_memory_hotplug_read(void *opaque, hwaddr addr,
     o = OBJECT(mdev->dimm);
     switch (addr) {
     case 0x0: /* Lo part of phys address where DIMM is mapped */
-        val = o ? object_property_get_int(o, PC_DIMM_ADDR_PROP, NULL) : 0;
+        val = o ? object_property_get_int(o, DIMM_ADDR_PROP, NULL) : 0;
         trace_mhp_acpi_read_addr_lo(mem_st->selector, val);
         break;
     case 0x4: /* Hi part of phys address where DIMM is mapped */
-        val = o ? object_property_get_int(o, PC_DIMM_ADDR_PROP, NULL) >> 32 : 0;
+        val = o ? object_property_get_int(o, DIMM_ADDR_PROP, NULL) >> 32 : 0;
         trace_mhp_acpi_read_addr_hi(mem_st->selector, val);
         break;
     case 0x8: /* Lo part of DIMM size */
-        val = o ? object_property_get_int(o, PC_DIMM_SIZE_PROP, NULL) : 0;
+        val = o ? object_property_get_int(o, DIMM_SIZE_PROP, NULL) : 0;
         trace_mhp_acpi_read_size_lo(mem_st->selector, val);
         break;
     case 0xc: /* Hi part of DIMM size */
-        val = o ? object_property_get_int(o, PC_DIMM_SIZE_PROP, NULL) >> 32 : 0;
+        val = o ? object_property_get_int(o, DIMM_SIZE_PROP, NULL) >> 32 : 0;
         trace_mhp_acpi_read_size_hi(mem_st->selector, val);
         break;
     case 0x10: /* node proximity for _PXM method */
-        val = o ? object_property_get_int(o, PC_DIMM_NODE_PROP, NULL) : 0;
+        val = o ? object_property_get_int(o, DIMM_NODE_PROP, NULL) : 0;
         trace_mhp_acpi_read_pxm(mem_st->selector, val);
         break;
     case 0x14: /* pack and return is_* fields */
@@ -151,13 +151,13 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr addr, uint64_t data,
             /* call pc-dimm unplug cb */
             hotplug_handler_unplug(hotplug_ctrl, dev, &local_err);
             if (local_err) {
-                trace_mhp_acpi_pc_dimm_delete_failed(mem_st->selector);
+                trace_mhp_acpi_dimm_delete_failed(mem_st->selector);
                 qapi_event_send_mem_unplug_error(dev->id,
                                                  error_get_pretty(local_err),
                                                  &error_abort);
                 break;
             }
-            trace_mhp_acpi_pc_dimm_deleted(mem_st->selector);
+            trace_mhp_acpi_dimm_deleted(mem_st->selector);
         }
         break;
     default:
@@ -206,7 +206,7 @@ acpi_memory_slot_status(MemHotplugState *mem_st,
                         DeviceState *dev, Error **errp)
 {
     Error *local_err = NULL;
-    int slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP,
+    int slot = object_property_get_int(OBJECT(dev), DIMM_SLOT_PROP,
                                        &local_err);
 
     if (local_err) {
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 2cd2fee..0b2cb6e 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -344,7 +344,7 @@ static void piix4_device_plug_cb(HotplugHandler *hotplug_dev,
     PIIX4PMState *s = PIIX4_PM(hotplug_dev);
 
     if (s->acpi_memory_hotplug.is_enabled &&
-        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         acpi_memory_plug_cb(&s->ar, s->irq, &s->acpi_memory_hotplug, dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
         acpi_pcihp_device_plug_cb(&s->ar, s->irq, &s->acpi_pci_hotplug, dev,
@@ -363,7 +363,7 @@ static void piix4_device_unplug_request_cb(HotplugHandler *hotplug_dev,
     PIIX4PMState *s = PIIX4_PM(hotplug_dev);
 
     if (s->acpi_memory_hotplug.is_enabled &&
-        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         acpi_memory_unplug_request_cb(&s->ar, s->irq, &s->acpi_memory_hotplug,
                                       dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
@@ -381,7 +381,7 @@ static void piix4_device_unplug_cb(HotplugHandler *hotplug_dev,
     PIIX4PMState *s = PIIX4_PM(hotplug_dev);
 
     if (s->acpi_memory_hotplug.is_enabled &&
-        object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         acpi_memory_unplug_cb(&s->acpi_memory_hotplug, dev, errp);
     } else {
         error_setg(errp, "acpi: device unplug for not supported device"
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 0cb8afd..67ecc4f 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1610,14 +1610,14 @@ void ioapic_init_gsi(GSIState *gsi_state, const char *parent_name)
     }
 }
 
-static void pc_dimm_plug(HotplugHandler *hotplug_dev,
+static void dimm_plug(HotplugHandler *hotplug_dev,
                          DeviceState *dev, Error **errp)
 {
     HotplugHandlerClass *hhc;
     Error *local_err = NULL;
     PCMachineState *pcms = PC_MACHINE(hotplug_dev);
-    PCDIMMDevice *dimm = PC_DIMM(dev);
-    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    DIMMDevice *dimm = DIMM(dev);
+    DIMMDeviceClass *ddc = DIMM_GET_CLASS(dimm);
     MemoryRegion *mr = ddc->get_memory_region(dimm);
     uint64_t align = TARGET_PAGE_SIZE;
 
@@ -1631,7 +1631,7 @@ static void pc_dimm_plug(HotplugHandler *hotplug_dev,
         goto out;
     }
 
-    pc_dimm_memory_plug(dev, &pcms->hotplug_memory, mr, align, &local_err);
+    dimm_memory_plug(dev, &pcms->hotplug_memory, mr, align, &local_err);
     if (local_err) {
         goto out;
     }
@@ -1642,7 +1642,7 @@ out:
     error_propagate(errp, local_err);
 }
 
-static void pc_dimm_unplug_request(HotplugHandler *hotplug_dev,
+static void dimm_unplug_request(HotplugHandler *hotplug_dev,
                                    DeviceState *dev, Error **errp)
 {
     HotplugHandlerClass *hhc;
@@ -1662,12 +1662,12 @@ out:
     error_propagate(errp, local_err);
 }
 
-static void pc_dimm_unplug(HotplugHandler *hotplug_dev,
+static void dimm_unplug(HotplugHandler *hotplug_dev,
                            DeviceState *dev, Error **errp)
 {
     PCMachineState *pcms = PC_MACHINE(hotplug_dev);
-    PCDIMMDevice *dimm = PC_DIMM(dev);
-    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    DIMMDevice *dimm = DIMM(dev);
+    DIMMDeviceClass *ddc = DIMM_GET_CLASS(dimm);
     MemoryRegion *mr = ddc->get_memory_region(dimm);
     HotplugHandlerClass *hhc;
     Error *local_err = NULL;
@@ -1679,7 +1679,7 @@ static void pc_dimm_unplug(HotplugHandler *hotplug_dev,
         goto out;
     }
 
-    pc_dimm_memory_unplug(dev, &pcms->hotplug_memory, mr);
+    dimm_memory_unplug(dev, &pcms->hotplug_memory, mr);
     object_unparent(OBJECT(dev));
 
  out:
@@ -1718,8 +1718,8 @@ out:
 static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
                                       DeviceState *dev, Error **errp)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
-        pc_dimm_plug(hotplug_dev, dev, errp);
+    if (object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
+        dimm_plug(hotplug_dev, dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         pc_cpu_plug(hotplug_dev, dev, errp);
     }
@@ -1728,8 +1728,8 @@ static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
 static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
                                                 DeviceState *dev, Error **errp)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
-        pc_dimm_unplug_request(hotplug_dev, dev, errp);
+    if (object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
+        dimm_unplug_request(hotplug_dev, dev, errp);
     } else {
         error_setg(errp, "acpi: device unplug request for not supported device"
                    " type: %s", object_get_typename(OBJECT(dev)));
@@ -1739,8 +1739,8 @@ static void pc_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
 static void pc_machine_device_unplug_cb(HotplugHandler *hotplug_dev,
                                         DeviceState *dev, Error **errp)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
-        pc_dimm_unplug(hotplug_dev, dev, errp);
+    if (object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
+        dimm_unplug(hotplug_dev, dev, errp);
     } else {
         error_setg(errp, "acpi: device unplug for not supported device"
                    " type: %s", object_get_typename(OBJECT(dev)));
@@ -1752,7 +1752,7 @@ static HotplugHandler *pc_get_hotpug_handler(MachineState *machine,
 {
     PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(machine);
 
-    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
+    if (object_dynamic_cast(OBJECT(dev), TYPE_DIMM) ||
         object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
         return HOTPLUG_HANDLER(machine);
     }
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 2dcbbcd..67afc53 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -27,21 +27,21 @@
 #include "trace.h"
 #include "hw/virtio/vhost.h"
 
-typedef struct pc_dimms_capacity {
+typedef struct dimms_capacity {
      uint64_t size;
      Error    **errp;
-} pc_dimms_capacity;
+} dimms_capacity;
 
 static int existing_dimms_capacity_internal(Object *obj, void *opaque)
 {
-    pc_dimms_capacity *cap = opaque;
+    dimms_capacity *cap = opaque;
     uint64_t *size = &cap->size;
 
-    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
+    if (object_dynamic_cast(obj, TYPE_DIMM)) {
         DeviceState *dev = DEVICE(obj);
 
         if (dev->realized) {
-            (*size) += object_property_get_int(obj, PC_DIMM_SIZE_PROP,
+            (*size) += object_property_get_int(obj, DIMM_SIZE_PROP,
                 cap->errp);
         }
 
@@ -55,7 +55,7 @@ static int existing_dimms_capacity_internal(Object *obj, void *opaque)
 
 static uint64_t existing_dimms_capacity(Error **errp)
 {
-    pc_dimms_capacity cap;
+    dimms_capacity cap;
 
     cap.size = 0;
     cap.errp = errp;
@@ -64,22 +64,22 @@ static uint64_t existing_dimms_capacity(Error **errp)
     return cap.size;
 }
 
-void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
+void dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
                          MemoryRegion *mr, uint64_t align, Error **errp)
 {
     int slot;
     MachineState *machine = MACHINE(qdev_get_machine());
-    PCDIMMDevice *dimm = PC_DIMM(dev);
+    DIMMDevice *dimm = DIMM(dev);
     Error *local_err = NULL;
     uint64_t dimms_capacity = 0;
     uint64_t addr;
 
-    addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, &local_err);
+    addr = object_property_get_int(OBJECT(dimm), DIMM_ADDR_PROP, &local_err);
     if (local_err) {
         goto out;
     }
 
-    addr = pc_dimm_get_free_addr(hpms->base,
+    addr = dimm_get_free_addr(hpms->base,
                                  memory_region_size(&hpms->mr),
                                  !addr ? NULL : &addr, align,
                                  memory_region_size(mr), &local_err);
@@ -100,27 +100,27 @@ void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
         goto out;
     }
 
-    object_property_set_int(OBJECT(dev), addr, PC_DIMM_ADDR_PROP, &local_err);
+    object_property_set_int(OBJECT(dev), addr, DIMM_ADDR_PROP, &local_err);
     if (local_err) {
         goto out;
     }
-    trace_mhp_pc_dimm_assigned_address(addr);
+    trace_mhp_dimm_assigned_address(addr);
 
-    slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP, &local_err);
+    slot = object_property_get_int(OBJECT(dev), DIMM_SLOT_PROP, &local_err);
     if (local_err) {
         goto out;
     }
 
-    slot = pc_dimm_get_free_slot(slot == PC_DIMM_UNASSIGNED_SLOT ? NULL : &slot,
+    slot = dimm_get_free_slot(slot == DIMM_UNASSIGNED_SLOT ? NULL : &slot,
                                  machine->ram_slots, &local_err);
     if (local_err) {
         goto out;
     }
-    object_property_set_int(OBJECT(dev), slot, PC_DIMM_SLOT_PROP, &local_err);
+    object_property_set_int(OBJECT(dev), slot, DIMM_SLOT_PROP, &local_err);
     if (local_err) {
         goto out;
     }
-    trace_mhp_pc_dimm_assigned_slot(slot);
+    trace_mhp_dimm_assigned_slot(slot);
 
     if (kvm_enabled() && !kvm_has_free_slot(machine)) {
         error_setg(&local_err, "hypervisor has no free memory slots left");
@@ -141,29 +141,29 @@ out:
     error_propagate(errp, local_err);
 }
 
-void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
+void dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
                            MemoryRegion *mr)
 {
-    PCDIMMDevice *dimm = PC_DIMM(dev);
+    DIMMDevice *dimm = DIMM(dev);
 
     numa_unset_mem_node_id(dimm->addr, memory_region_size(mr), dimm->node);
     memory_region_del_subregion(&hpms->mr, mr);
     vmstate_unregister_ram(mr, dev);
 }
 
-int qmp_pc_dimm_device_list(Object *obj, void *opaque)
+int qmp_dimm_device_list(Object *obj, void *opaque)
 {
     MemoryDeviceInfoList ***prev = opaque;
 
-    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
+    if (object_dynamic_cast(obj, TYPE_DIMM)) {
         DeviceState *dev = DEVICE(obj);
 
         if (dev->realized) {
             MemoryDeviceInfoList *elem = g_new0(MemoryDeviceInfoList, 1);
             MemoryDeviceInfo *info = g_new0(MemoryDeviceInfo, 1);
-            PCDIMMDeviceInfo *di = g_new0(PCDIMMDeviceInfo, 1);
+            DIMMDeviceInfo *di = g_new0(DIMMDeviceInfo, 1);
             DeviceClass *dc = DEVICE_GET_CLASS(obj);
-            PCDIMMDevice *dimm = PC_DIMM(obj);
+            DIMMDevice *dimm = DIMM(obj);
 
             if (dev->id) {
                 di->has_id = true;
@@ -174,7 +174,7 @@ int qmp_pc_dimm_device_list(Object *obj, void *opaque)
             di->addr = dimm->addr;
             di->slot = dimm->slot;
             di->node = dimm->node;
-            di->size = object_property_get_int(OBJECT(dimm), PC_DIMM_SIZE_PROP,
+            di->size = object_property_get_int(OBJECT(dimm), DIMM_SIZE_PROP,
                                                NULL);
             di->memdev = object_get_canonical_path(OBJECT(dimm->hostmem));
 
@@ -186,7 +186,7 @@ int qmp_pc_dimm_device_list(Object *obj, void *opaque)
         }
     }
 
-    object_child_foreach(obj, qmp_pc_dimm_device_list, opaque);
+    object_child_foreach(obj, qmp_dimm_device_list, opaque);
     return 0;
 }
 
@@ -197,7 +197,7 @@ ram_addr_t get_current_ram_size(void)
     MemoryDeviceInfoList *info;
     ram_addr_t size = ram_size;
 
-    qmp_pc_dimm_device_list(qdev_get_machine(), &prev);
+    qmp_dimm_device_list(qdev_get_machine(), &prev);
     for (info = info_list; info; info = info->next) {
         MemoryDeviceInfo *value = info->value;
 
@@ -216,28 +216,28 @@ ram_addr_t get_current_ram_size(void)
     return size;
 }
 
-static int pc_dimm_slot2bitmap(Object *obj, void *opaque)
+static int dimm_slot2bitmap(Object *obj, void *opaque)
 {
     unsigned long *bitmap = opaque;
 
-    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
+    if (object_dynamic_cast(obj, TYPE_DIMM)) {
         DeviceState *dev = DEVICE(obj);
         if (dev->realized) { /* count only realized DIMMs */
-            PCDIMMDevice *d = PC_DIMM(obj);
+            DIMMDevice *d = DIMM(obj);
             set_bit(d->slot, bitmap);
         }
     }
 
-    object_child_foreach(obj, pc_dimm_slot2bitmap, opaque);
+    object_child_foreach(obj, dimm_slot2bitmap, opaque);
     return 0;
 }
 
-int pc_dimm_get_free_slot(const int *hint, int max_slots, Error **errp)
+int dimm_get_free_slot(const int *hint, int max_slots, Error **errp)
 {
     unsigned long *bitmap = bitmap_new(max_slots);
     int slot = 0;
 
-    object_child_foreach(qdev_get_machine(), pc_dimm_slot2bitmap, bitmap);
+    object_child_foreach(qdev_get_machine(), dimm_slot2bitmap, bitmap);
 
     /* check if requested slot is not occupied */
     if (hint) {
@@ -262,10 +262,10 @@ out:
     return slot;
 }
 
-static gint pc_dimm_addr_sort(gconstpointer a, gconstpointer b)
+static gint dimm_addr_sort(gconstpointer a, gconstpointer b)
 {
-    PCDIMMDevice *x = PC_DIMM(a);
-    PCDIMMDevice *y = PC_DIMM(b);
+    DIMMDevice *x = DIMM(a);
+    DIMMDevice *y = DIMM(b);
     Int128 diff = int128_sub(int128_make64(x->addr), int128_make64(y->addr));
 
     if (int128_lt(diff, int128_zero())) {
@@ -276,22 +276,22 @@ static gint pc_dimm_addr_sort(gconstpointer a, gconstpointer b)
     return 0;
 }
 
-static int pc_dimm_built_list(Object *obj, void *opaque)
+static int dimm_built_list(Object *obj, void *opaque)
 {
     GSList **list = opaque;
 
-    if (object_dynamic_cast(obj, TYPE_PC_DIMM)) {
+    if (object_dynamic_cast(obj, TYPE_DIMM)) {
         DeviceState *dev = DEVICE(obj);
         if (dev->realized) { /* only realized DIMMs matter */
-            *list = g_slist_insert_sorted(*list, dev, pc_dimm_addr_sort);
+            *list = g_slist_insert_sorted(*list, dev, dimm_addr_sort);
         }
     }
 
-    object_child_foreach(obj, pc_dimm_built_list, opaque);
+    object_child_foreach(obj, dimm_built_list, opaque);
     return 0;
 }
 
-uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
+uint64_t dimm_get_free_addr(uint64_t address_space_start,
                                uint64_t address_space_size,
                                uint64_t *hint, uint64_t align, uint64_t size,
                                Error **errp)
@@ -321,7 +321,7 @@ uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
     }
 
     assert(address_space_end > address_space_start);
-    object_child_foreach(qdev_get_machine(), pc_dimm_built_list, &list);
+    object_child_foreach(qdev_get_machine(), dimm_built_list, &list);
 
     if (hint) {
         new_addr = *hint;
@@ -331,9 +331,9 @@ uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
 
     /* find address range that will fit new DIMM */
     for (item = list; item; item = g_slist_next(item)) {
-        PCDIMMDevice *dimm = item->data;
+        DIMMDevice *dimm = item->data;
         uint64_t dimm_size = object_property_get_int(OBJECT(dimm),
-                                                     PC_DIMM_SIZE_PROP,
+                                                     DIMM_SIZE_PROP,
                                                      errp);
         if (errp && *errp) {
             goto out;
@@ -363,20 +363,20 @@ out:
     return ret;
 }
 
-static Property pc_dimm_properties[] = {
-    DEFINE_PROP_UINT64(PC_DIMM_ADDR_PROP, PCDIMMDevice, addr, 0),
-    DEFINE_PROP_UINT32(PC_DIMM_NODE_PROP, PCDIMMDevice, node, 0),
-    DEFINE_PROP_INT32(PC_DIMM_SLOT_PROP, PCDIMMDevice, slot,
-                      PC_DIMM_UNASSIGNED_SLOT),
+static Property dimm_properties[] = {
+    DEFINE_PROP_UINT64(DIMM_ADDR_PROP, DIMMDevice, addr, 0),
+    DEFINE_PROP_UINT32(DIMM_NODE_PROP, DIMMDevice, node, 0),
+    DEFINE_PROP_INT32(DIMM_SLOT_PROP, DIMMDevice, slot,
+                      DIMM_UNASSIGNED_SLOT),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-static void pc_dimm_get_size(Object *obj, Visitor *v, void *opaque,
+static void dimm_get_size(Object *obj, Visitor *v, void *opaque,
                           const char *name, Error **errp)
 {
     int64_t value;
     MemoryRegion *mr;
-    PCDIMMDevice *dimm = PC_DIMM(obj);
+    DIMMDevice *dimm = DIMM(obj);
 
     mr = host_memory_backend_get_memory(dimm->hostmem, errp);
     value = memory_region_size(mr);
@@ -384,7 +384,7 @@ static void pc_dimm_get_size(Object *obj, Visitor *v, void *opaque,
     visit_type_int(v, &value, name, errp);
 }
 
-static void pc_dimm_check_memdev_is_busy(Object *obj, const char *name,
+static void dimm_check_memdev_is_busy(Object *obj, const char *name,
                                       Object *val, Error **errp)
 {
     MemoryRegion *mr;
@@ -399,65 +399,65 @@ static void pc_dimm_check_memdev_is_busy(Object *obj, const char *name,
     }
 }
 
-static void pc_dimm_init(Object *obj)
+static void dimm_init(Object *obj)
 {
-    PCDIMMDevice *dimm = PC_DIMM(obj);
+    DIMMDevice *dimm = DIMM(obj);
 
-    object_property_add(obj, PC_DIMM_SIZE_PROP, "int", pc_dimm_get_size,
+    object_property_add(obj, DIMM_SIZE_PROP, "int", dimm_get_size,
                         NULL, NULL, NULL, &error_abort);
-    object_property_add_link(obj, PC_DIMM_MEMDEV_PROP, TYPE_MEMORY_BACKEND,
+    object_property_add_link(obj, DIMM_MEMDEV_PROP, TYPE_MEMORY_BACKEND,
                              (Object **)&dimm->hostmem,
-                             pc_dimm_check_memdev_is_busy,
+                             dimm_check_memdev_is_busy,
                              OBJ_PROP_LINK_UNREF_ON_RELEASE,
                              &error_abort);
 }
 
-static void pc_dimm_realize(DeviceState *dev, Error **errp)
+static void dimm_realize(DeviceState *dev, Error **errp)
 {
-    PCDIMMDevice *dimm = PC_DIMM(dev);
+    DIMMDevice *dimm = DIMM(dev);
 
     if (!dimm->hostmem) {
-        error_setg(errp, "'" PC_DIMM_MEMDEV_PROP "' property is not set");
+        error_setg(errp, "'" DIMM_MEMDEV_PROP "' property is not set");
         return;
     }
     if (((nb_numa_nodes > 0) && (dimm->node >= nb_numa_nodes)) ||
         (!nb_numa_nodes && dimm->node)) {
-        error_setg(errp, "'DIMM property " PC_DIMM_NODE_PROP " has value %"
+        error_setg(errp, "'DIMM property " DIMM_NODE_PROP " has value %"
                    PRIu32 "' which exceeds the number of numa nodes: %d",
                    dimm->node, nb_numa_nodes ? nb_numa_nodes : 1);
         return;
     }
 }
 
-static MemoryRegion *pc_dimm_get_memory_region(PCDIMMDevice *dimm)
+static MemoryRegion *dimm_get_memory_region(DIMMDevice *dimm)
 {
     return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
 }
 
-static void pc_dimm_class_init(ObjectClass *oc, void *data)
+static void dimm_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
-    PCDIMMDeviceClass *ddc = PC_DIMM_CLASS(oc);
+    DIMMDeviceClass *ddc = DIMM_CLASS(oc);
 
-    dc->realize = pc_dimm_realize;
-    dc->props = pc_dimm_properties;
+    dc->realize = dimm_realize;
+    dc->props = dimm_properties;
     dc->desc = "DIMM memory module";
 
-    ddc->get_memory_region = pc_dimm_get_memory_region;
+    ddc->get_memory_region = dimm_get_memory_region;
 }
 
-static TypeInfo pc_dimm_info = {
-    .name          = TYPE_PC_DIMM,
+static TypeInfo dimm_info = {
+    .name          = TYPE_DIMM,
     .parent        = TYPE_DEVICE,
-    .instance_size = sizeof(PCDIMMDevice),
-    .instance_init = pc_dimm_init,
-    .class_init    = pc_dimm_class_init,
-    .class_size    = sizeof(PCDIMMDeviceClass),
+    .instance_size = sizeof(DIMMDevice),
+    .instance_init = dimm_init,
+    .class_init    = dimm_class_init,
+    .class_size    = sizeof(DIMMDeviceClass),
 };
 
-static void pc_dimm_register_types(void)
+static void dimm_register_types(void)
 {
-    type_register_static(&pc_dimm_info);
+    type_register_static(&dimm_info);
 }
 
-type_init(pc_dimm_register_types)
+type_init(dimm_register_types)
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 288b57e..ab6eb83 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2144,8 +2144,8 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
 {
     Error *local_err = NULL;
     sPAPRMachineState *ms = SPAPR_MACHINE(hotplug_dev);
-    PCDIMMDevice *dimm = PC_DIMM(dev);
-    PCDIMMDeviceClass *ddc = PC_DIMM_GET_CLASS(dimm);
+    DIMMDevice *dimm = DIMM(dev);
+    DIMMDeviceClass *ddc = DIMM_GET_CLASS(dimm);
     MemoryRegion *mr = ddc->get_memory_region(dimm);
     uint64_t align = memory_region_get_alignment(mr);
     uint64_t size = memory_region_size(mr);
@@ -2157,14 +2157,14 @@ static void spapr_memory_plug(HotplugHandler *hotplug_dev, DeviceState *dev,
         goto out;
     }
 
-    pc_dimm_memory_plug(dev, &ms->hotplug_memory, mr, align, &local_err);
+    dimm_memory_plug(dev, &ms->hotplug_memory, mr, align, &local_err);
     if (local_err) {
         goto out;
     }
 
-    addr = object_property_get_int(OBJECT(dimm), PC_DIMM_ADDR_PROP, &local_err);
+    addr = object_property_get_int(OBJECT(dimm), DIMM_ADDR_PROP, &local_err);
     if (local_err) {
-        pc_dimm_memory_unplug(dev, &ms->hotplug_memory, mr);
+        dimm_memory_unplug(dev, &ms->hotplug_memory, mr);
         goto out;
     }
 
@@ -2179,14 +2179,14 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
 {
     sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(qdev_get_machine());
 
-    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+    if (object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         int node;
 
         if (!smc->dr_lmb_enabled) {
             error_setg(errp, "Memory hotplug not supported for this machine");
             return;
         }
-        node = object_property_get_int(OBJECT(dev), PC_DIMM_NODE_PROP, errp);
+        node = object_property_get_int(OBJECT(dev), DIMM_NODE_PROP, errp);
         if (*errp) {
             return;
         }
@@ -2220,7 +2220,7 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
 static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
                                       DeviceState *dev, Error **errp)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+    if (object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         error_setg(errp, "Memory hot unplug not supported by sPAPR");
     }
 }
@@ -2228,7 +2228,7 @@ static void spapr_machine_device_unplug(HotplugHandler *hotplug_dev,
 static HotplugHandler *spapr_get_hotpug_handler(MachineState *machine,
                                              DeviceState *dev)
 {
-    if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
+    if (object_dynamic_cast(OBJECT(dev), TYPE_DIMM)) {
         return HOTPLUG_HANDLER(machine);
     }
     return NULL;
diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
index 8a43548..ece8786 100644
--- a/include/hw/mem/pc-dimm.h
+++ b/include/hw/mem/pc-dimm.h
@@ -13,39 +13,39 @@
  *
  */
 
-#ifndef QEMU_PC_DIMM_H
-#define QEMU_PC_DIMM_H
+#ifndef QEMU_DIMM_H
+#define QEMU_DIMM_H
 
 #include "exec/memory.h"
 #include "sysemu/hostmem.h"
 #include "hw/qdev.h"
 
-#define TYPE_PC_DIMM "pc-dimm"
-#define PC_DIMM(obj) \
-    OBJECT_CHECK(PCDIMMDevice, (obj), TYPE_PC_DIMM)
-#define PC_DIMM_CLASS(oc) \
-    OBJECT_CLASS_CHECK(PCDIMMDeviceClass, (oc), TYPE_PC_DIMM)
-#define PC_DIMM_GET_CLASS(obj) \
-    OBJECT_GET_CLASS(PCDIMMDeviceClass, (obj), TYPE_PC_DIMM)
+#define TYPE_DIMM "pc-dimm"
+#define DIMM(obj) \
+    OBJECT_CHECK(DIMMDevice, (obj), TYPE_DIMM)
+#define DIMM_CLASS(oc) \
+    OBJECT_CLASS_CHECK(DIMMDeviceClass, (oc), TYPE_DIMM)
+#define DIMM_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(DIMMDeviceClass, (obj), TYPE_DIMM)
 
-#define PC_DIMM_ADDR_PROP "addr"
-#define PC_DIMM_SLOT_PROP "slot"
-#define PC_DIMM_NODE_PROP "node"
-#define PC_DIMM_SIZE_PROP "size"
-#define PC_DIMM_MEMDEV_PROP "memdev"
+#define DIMM_ADDR_PROP "addr"
+#define DIMM_SLOT_PROP "slot"
+#define DIMM_NODE_PROP "node"
+#define DIMM_SIZE_PROP "size"
+#define DIMM_MEMDEV_PROP "memdev"
 
-#define PC_DIMM_UNASSIGNED_SLOT -1
+#define DIMM_UNASSIGNED_SLOT -1
 
 /**
- * PCDIMMDevice:
- * @addr: starting guest physical address, where @PCDIMMDevice is mapped.
+ * DIMMDevice:
+ * @addr: starting guest physical address, where @DIMMDevice is mapped.
  *         Default value: 0, means that address is auto-allocated.
- * @node: numa node to which @PCDIMMDevice is attached.
- * @slot: slot number into which @PCDIMMDevice is plugged in.
+ * @node: numa node to which @DIMMDevice is attached.
+ * @slot: slot number into which @DIMMDevice is plugged in.
  *        Default value: -1, means that slot is auto-allocated.
- * @hostmem: host memory backend providing memory for @PCDIMMDevice
+ * @hostmem: host memory backend providing memory for @DIMMDevice
  */
-typedef struct PCDIMMDevice {
+typedef struct DIMMDevice {
     /* private */
     DeviceState parent_obj;
 
@@ -54,19 +54,19 @@ typedef struct PCDIMMDevice {
     uint32_t node;
     int32_t slot;
     HostMemoryBackend *hostmem;
-} PCDIMMDevice;
+} DIMMDevice;
 
 /**
- * PCDIMMDeviceClass:
+ * DIMMDeviceClass:
  * @get_memory_region: returns #MemoryRegion associated with @dimm
  */
-typedef struct PCDIMMDeviceClass {
+typedef struct DIMMDeviceClass {
     /* private */
     DeviceClass parent_class;
 
     /* public */
-    MemoryRegion *(*get_memory_region)(PCDIMMDevice *dimm);
-} PCDIMMDeviceClass;
+    MemoryRegion *(*get_memory_region)(DIMMDevice *dimm);
+} DIMMDeviceClass;
 
 /**
  * MemoryHotplugState:
@@ -79,16 +79,16 @@ typedef struct MemoryHotplugState {
     MemoryRegion mr;
 } MemoryHotplugState;
 
-uint64_t pc_dimm_get_free_addr(uint64_t address_space_start,
+uint64_t dimm_get_free_addr(uint64_t address_space_start,
                                uint64_t address_space_size,
                                uint64_t *hint, uint64_t align, uint64_t size,
                                Error **errp);
 
-int pc_dimm_get_free_slot(const int *hint, int max_slots, Error **errp);
+int dimm_get_free_slot(const int *hint, int max_slots, Error **errp);
 
-int qmp_pc_dimm_device_list(Object *obj, void *opaque);
-void pc_dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
+int qmp_dimm_device_list(Object *obj, void *opaque);
+void dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
                          MemoryRegion *mr, uint64_t align, Error **errp);
-void pc_dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
+void dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
                            MemoryRegion *mr);
 #endif
diff --git a/numa.c b/numa.c
index e9b18f5..cb69965 100644
--- a/numa.c
+++ b/numa.c
@@ -482,7 +482,7 @@ static void numa_stat_memory_devices(uint64_t node_mem[])
     MemoryDeviceInfoList **prev = &info_list;
     MemoryDeviceInfoList *info;
 
-    qmp_pc_dimm_device_list(qdev_get_machine(), &prev);
+    qmp_dimm_device_list(qdev_get_machine(), &prev);
     for (info = info_list; info; info = info->next) {
         MemoryDeviceInfo *value = info->value;
 
diff --git a/qapi-schema.json b/qapi-schema.json
index f60be29..aeae833 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -3717,9 +3717,9 @@
 { 'command': 'query-memdev', 'returns': ['Memdev'] }
 
 ##
-# @PCDIMMDeviceInfo:
+# @DIMMDeviceInfo:
 #
-# PCDIMMDevice state information
+# DIMMDevice state information
 #
 # @id: #optional device's ID
 #
@@ -3739,7 +3739,7 @@
 #
 # Since: 2.1
 ##
-{ 'struct': 'PCDIMMDeviceInfo',
+{ 'struct': 'DIMMDeviceInfo',
   'data': { '*id': 'str',
             'addr': 'int',
             'size': 'int',
@@ -3758,7 +3758,7 @@
 #
 # Since: 2.1
 ##
-{ 'union': 'MemoryDeviceInfo', 'data': {'dimm': 'PCDIMMDeviceInfo'} }
+{ 'union': 'MemoryDeviceInfo', 'data': {'dimm': 'DIMMDeviceInfo'} }
 
 ##
 # @query-memory-devices
diff --git a/qmp.c b/qmp.c
index ff54e5a..169b981 100644
--- a/qmp.c
+++ b/qmp.c
@@ -712,7 +712,7 @@ MemoryDeviceInfoList *qmp_query_memory_devices(Error **errp)
     MemoryDeviceInfoList *head = NULL;
     MemoryDeviceInfoList **prev = &head;
 
-    qmp_pc_dimm_device_list(qdev_get_machine(), &prev);
+    qmp_dimm_device_list(qdev_get_machine(), &prev);
 
     return head;
 }
diff --git a/stubs/qmp_pc_dimm_device_list.c b/stubs/qmp_pc_dimm_device_list.c
index b584bd8..b2704c6 100644
--- a/stubs/qmp_pc_dimm_device_list.c
+++ b/stubs/qmp_pc_dimm_device_list.c
@@ -1,7 +1,7 @@
 #include "qom/object.h"
 #include "hw/mem/pc-dimm.h"
 
-int qmp_pc_dimm_device_list(Object *obj, void *opaque)
+int qmp_dimm_device_list(Object *obj, void *opaque)
 {
    return 0;
 }
diff --git a/trace-events b/trace-events
index bdfe79f..d1f258d 100644
--- a/trace-events
+++ b/trace-events
@@ -1652,12 +1652,12 @@ mhp_acpi_write_ost_ev(uint32_t slot, uint32_t ev) "slot[0x%"PRIx32"] OST EVENT:
 mhp_acpi_write_ost_status(uint32_t slot, uint32_t st) "slot[0x%"PRIx32"] OST STATUS: 0x%"PRIx32
 mhp_acpi_clear_insert_evt(uint32_t slot) "slot[0x%"PRIx32"] clear insert event"
 mhp_acpi_clear_remove_evt(uint32_t slot) "slot[0x%"PRIx32"] clear remove event"
-mhp_acpi_pc_dimm_deleted(uint32_t slot) "slot[0x%"PRIx32"] pc-dimm deleted"
-mhp_acpi_pc_dimm_delete_failed(uint32_t slot) "slot[0x%"PRIx32"] pc-dimm delete failed"
+mhp_acpi_dimm_deleted(uint32_t slot) "slot[0x%"PRIx32"] dimm deleted"
+mhp_acpi_dimm_delete_failed(uint32_t slot) "slot[0x%"PRIx32"] dimm delete failed"
 
 # hw/i386/pc.c
-mhp_pc_dimm_assigned_slot(int slot) "0x%d"
-mhp_pc_dimm_assigned_address(uint64_t addr) "0x%"PRIx64
+mhp_dimm_assigned_slot(int slot) "0x%d"
+mhp_dimm_assigned_address(uint64_t addr) "0x%"PRIx64
 
 # target-s390x/kvm.c
 kvm_enable_cmma(int rc) "CMMA: enabling with result code %d"
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 17/35] stubs: rename qmp_pc_dimm_device_list.c
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Rename qmp_pc_dimm_device_list.c to qmp_dimm_device_list.c

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 stubs/Makefile.objs                                         | 2 +-
 stubs/{qmp_pc_dimm_device_list.c => qmp_dimm_device_list.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename stubs/{qmp_pc_dimm_device_list.c => qmp_dimm_device_list.c} (100%)

diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 251443b..d5c862a 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -32,6 +32,6 @@ stub-obj-y += vmstate.o
 stub-obj-$(CONFIG_WIN32) += fd-register.o
 stub-obj-y += cpus.o
 stub-obj-y += kvm.o
-stub-obj-y += qmp_pc_dimm_device_list.o
+stub-obj-y += qmp_dimm_device_list.o
 stub-obj-y += target-monitor-defs.o
 stub-obj-y += vhost.o
diff --git a/stubs/qmp_pc_dimm_device_list.c b/stubs/qmp_dimm_device_list.c
similarity index 100%
rename from stubs/qmp_pc_dimm_device_list.c
rename to stubs/qmp_dimm_device_list.c
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 17/35] stubs: rename qmp_pc_dimm_device_list.c
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Rename qmp_pc_dimm_device_list.c to qmp_dimm_device_list.c

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 stubs/Makefile.objs                                         | 2 +-
 stubs/{qmp_pc_dimm_device_list.c => qmp_dimm_device_list.c} | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename stubs/{qmp_pc_dimm_device_list.c => qmp_dimm_device_list.c} (100%)

diff --git a/stubs/Makefile.objs b/stubs/Makefile.objs
index 251443b..d5c862a 100644
--- a/stubs/Makefile.objs
+++ b/stubs/Makefile.objs
@@ -32,6 +32,6 @@ stub-obj-y += vmstate.o
 stub-obj-$(CONFIG_WIN32) += fd-register.o
 stub-obj-y += cpus.o
 stub-obj-y += kvm.o
-stub-obj-y += qmp_pc_dimm_device_list.o
+stub-obj-y += qmp_dimm_device_list.o
 stub-obj-y += target-monitor-defs.o
 stub-obj-y += vhost.o
diff --git a/stubs/qmp_pc_dimm_device_list.c b/stubs/qmp_dimm_device_list.c
similarity index 100%
rename from stubs/qmp_pc_dimm_device_list.c
rename to stubs/qmp_dimm_device_list.c
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 18/35] pc-dimm: rename pc-dimm.c and pc-dimm.h
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Rename:
   pc-dimm.c => dimm.c
   pc-dimm.h => dimm.h

It prepares the work which abstracts dimm device type for both pc-dimm and
nvdimm

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/Makefile.objs                     | 2 +-
 hw/acpi/ich9.c                       | 2 +-
 hw/acpi/memory_hotplug.c             | 4 ++--
 hw/acpi/piix4.c                      | 2 +-
 hw/i386/pc.c                         | 2 +-
 hw/mem/Makefile.objs                 | 2 +-
 hw/mem/{pc-dimm.c => dimm.c}         | 2 +-
 hw/ppc/spapr.c                       | 2 +-
 include/hw/i386/pc.h                 | 2 +-
 include/hw/mem/{pc-dimm.h => dimm.h} | 0
 include/hw/ppc/spapr.h               | 2 +-
 numa.c                               | 2 +-
 qmp.c                                | 2 +-
 stubs/qmp_dimm_device_list.c         | 2 +-
 14 files changed, 14 insertions(+), 14 deletions(-)
 rename hw/mem/{pc-dimm.c => dimm.c} (99%)
 rename include/hw/mem/{pc-dimm.h => dimm.h} (100%)

diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 7e7c241..12ecda9 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -30,8 +30,8 @@ devices-dirs-$(CONFIG_SOFTMMU) += vfio/
 devices-dirs-$(CONFIG_VIRTIO) += virtio/
 devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
 devices-dirs-$(CONFIG_SOFTMMU) += xen/
-devices-dirs-$(CONFIG_MEM_HOTPLUG) += mem/
 devices-dirs-$(CONFIG_SMBIOS) += smbios/
+devices-dirs-y += mem/
 devices-dirs-y += core/
 common-obj-y += $(devices-dirs-y)
 obj-y += $(devices-dirs-y)
diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index b0d6a67..1e9ae20 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -35,7 +35,7 @@
 #include "exec/address-spaces.h"
 
 #include "hw/i386/ich9.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 
 //#define DEBUG
 
diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index e687852..20d3093 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -1,6 +1,6 @@
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/pc-hotplug.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "hw/boards.h"
 #include "hw/qdev-core.h"
 #include "trace.h"
@@ -148,7 +148,7 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr addr, uint64_t data,
 
             dev = DEVICE(mdev->dimm);
             hotplug_ctrl = qdev_get_hotplug_handler(dev);
-            /* call pc-dimm unplug cb */
+            /* call dimm unplug cb */
             hotplug_handler_unplug(hotplug_ctrl, dev, &local_err);
             if (local_err) {
                 trace_mhp_acpi_dimm_delete_failed(mem_st->selector);
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 0b2cb6e..b2f5b2c 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -33,7 +33,7 @@
 #include "hw/acpi/pcihp.h"
 #include "hw/acpi/cpu_hotplug.h"
 #include "hw/hotplug.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/acpi_dev_interface.h"
 #include "hw/xen/xen.h"
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 67ecc4f..6bf569a 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -62,7 +62,7 @@
 #include "hw/boards.h"
 #include "hw/pci/pci_host.h"
 #include "acpi-build.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "qapi/visitor.h"
 #include "qapi-visit.h"
 
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index b000fb4..7563ef5 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1 +1 @@
-common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
+common-obj-$(CONFIG_MEM_HOTPLUG) += dimm.o
diff --git a/hw/mem/pc-dimm.c b/hw/mem/dimm.c
similarity index 99%
rename from hw/mem/pc-dimm.c
rename to hw/mem/dimm.c
index 67afc53..9f55cee 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/dimm.c
@@ -18,7 +18,7 @@
  * License along with this library; if not, see <http://www.gnu.org/licenses/>
  */
 
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "qemu/config-file.h"
 #include "qapi/visitor.h"
 #include "qemu/range.h"
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ab6eb83..9ff24fd 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2199,7 +2199,7 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
          *
          * - Memory gets hotplugged to a different node than what the user
          *   specified.
-         * - Since pc-dimm subsystem in QEMU still thinks that memory belongs
+         * - Since dimm subsystem in QEMU still thinks that memory belongs
          *   to memory-less node, a reboot will set things accordingly
          *   and the previously hotplugged memory now ends in the right node.
          *   This appears as if some memory moved from one node to another.
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 606dbc2..62e8fb5 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -16,7 +16,7 @@
 #include "hw/pci/pci.h"
 #include "hw/boards.h"
 #include "hw/compat.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 
 #define HPET_INTCAP "hpet-intcap"
 
diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/dimm.h
similarity index 100%
rename from include/hw/mem/pc-dimm.h
rename to include/hw/mem/dimm.h
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 5baa906..0ae3abe 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -5,7 +5,7 @@
 #include "hw/boards.h"
 #include "hw/ppc/xics.h"
 #include "hw/ppc/spapr_drc.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 
 struct VIOsPAPRBus;
 struct sPAPRPHBState;
diff --git a/numa.c b/numa.c
index cb69965..34c6d6b 100644
--- a/numa.c
+++ b/numa.c
@@ -34,7 +34,7 @@
 #include "hw/boards.h"
 #include "sysemu/hostmem.h"
 #include "qmp-commands.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "qemu/option.h"
 #include "qemu/config-file.h"
 
diff --git a/qmp.c b/qmp.c
index 169b981..5ae37f7 100644
--- a/qmp.c
+++ b/qmp.c
@@ -31,7 +31,7 @@
 #include "qapi/qmp-input-visitor.h"
 #include "hw/boards.h"
 #include "qom/object_interfaces.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "hw/acpi/acpi_dev_interface.h"
 
 NameInfo *qmp_query_name(Error **errp)
diff --git a/stubs/qmp_dimm_device_list.c b/stubs/qmp_dimm_device_list.c
index b2704c6..fb66400 100644
--- a/stubs/qmp_dimm_device_list.c
+++ b/stubs/qmp_dimm_device_list.c
@@ -1,5 +1,5 @@
 #include "qom/object.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 
 int qmp_dimm_device_list(Object *obj, void *opaque)
 {
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 18/35] pc-dimm: rename pc-dimm.c and pc-dimm.h
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Rename:
   pc-dimm.c => dimm.c
   pc-dimm.h => dimm.h

It prepares the work which abstracts dimm device type for both pc-dimm and
nvdimm

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/Makefile.objs                     | 2 +-
 hw/acpi/ich9.c                       | 2 +-
 hw/acpi/memory_hotplug.c             | 4 ++--
 hw/acpi/piix4.c                      | 2 +-
 hw/i386/pc.c                         | 2 +-
 hw/mem/Makefile.objs                 | 2 +-
 hw/mem/{pc-dimm.c => dimm.c}         | 2 +-
 hw/ppc/spapr.c                       | 2 +-
 include/hw/i386/pc.h                 | 2 +-
 include/hw/mem/{pc-dimm.h => dimm.h} | 0
 include/hw/ppc/spapr.h               | 2 +-
 numa.c                               | 2 +-
 qmp.c                                | 2 +-
 stubs/qmp_dimm_device_list.c         | 2 +-
 14 files changed, 14 insertions(+), 14 deletions(-)
 rename hw/mem/{pc-dimm.c => dimm.c} (99%)
 rename include/hw/mem/{pc-dimm.h => dimm.h} (100%)

diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 7e7c241..12ecda9 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -30,8 +30,8 @@ devices-dirs-$(CONFIG_SOFTMMU) += vfio/
 devices-dirs-$(CONFIG_VIRTIO) += virtio/
 devices-dirs-$(CONFIG_SOFTMMU) += watchdog/
 devices-dirs-$(CONFIG_SOFTMMU) += xen/
-devices-dirs-$(CONFIG_MEM_HOTPLUG) += mem/
 devices-dirs-$(CONFIG_SMBIOS) += smbios/
+devices-dirs-y += mem/
 devices-dirs-y += core/
 common-obj-y += $(devices-dirs-y)
 obj-y += $(devices-dirs-y)
diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index b0d6a67..1e9ae20 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -35,7 +35,7 @@
 #include "exec/address-spaces.h"
 
 #include "hw/i386/ich9.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 
 //#define DEBUG
 
diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index e687852..20d3093 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -1,6 +1,6 @@
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/pc-hotplug.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "hw/boards.h"
 #include "hw/qdev-core.h"
 #include "trace.h"
@@ -148,7 +148,7 @@ static void acpi_memory_hotplug_write(void *opaque, hwaddr addr, uint64_t data,
 
             dev = DEVICE(mdev->dimm);
             hotplug_ctrl = qdev_get_hotplug_handler(dev);
-            /* call pc-dimm unplug cb */
+            /* call dimm unplug cb */
             hotplug_handler_unplug(hotplug_ctrl, dev, &local_err);
             if (local_err) {
                 trace_mhp_acpi_dimm_delete_failed(mem_st->selector);
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index 0b2cb6e..b2f5b2c 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -33,7 +33,7 @@
 #include "hw/acpi/pcihp.h"
 #include "hw/acpi/cpu_hotplug.h"
 #include "hw/hotplug.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/acpi_dev_interface.h"
 #include "hw/xen/xen.h"
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 67ecc4f..6bf569a 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -62,7 +62,7 @@
 #include "hw/boards.h"
 #include "hw/pci/pci_host.h"
 #include "acpi-build.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "qapi/visitor.h"
 #include "qapi-visit.h"
 
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index b000fb4..7563ef5 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1 +1 @@
-common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
+common-obj-$(CONFIG_MEM_HOTPLUG) += dimm.o
diff --git a/hw/mem/pc-dimm.c b/hw/mem/dimm.c
similarity index 99%
rename from hw/mem/pc-dimm.c
rename to hw/mem/dimm.c
index 67afc53..9f55cee 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/dimm.c
@@ -18,7 +18,7 @@
  * License along with this library; if not, see <http://www.gnu.org/licenses/>
  */
 
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "qemu/config-file.h"
 #include "qapi/visitor.h"
 #include "qemu/range.h"
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ab6eb83..9ff24fd 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2199,7 +2199,7 @@ static void spapr_machine_device_plug(HotplugHandler *hotplug_dev,
          *
          * - Memory gets hotplugged to a different node than what the user
          *   specified.
-         * - Since pc-dimm subsystem in QEMU still thinks that memory belongs
+         * - Since dimm subsystem in QEMU still thinks that memory belongs
          *   to memory-less node, a reboot will set things accordingly
          *   and the previously hotplugged memory now ends in the right node.
          *   This appears as if some memory moved from one node to another.
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 606dbc2..62e8fb5 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -16,7 +16,7 @@
 #include "hw/pci/pci.h"
 #include "hw/boards.h"
 #include "hw/compat.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 
 #define HPET_INTCAP "hpet-intcap"
 
diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/dimm.h
similarity index 100%
rename from include/hw/mem/pc-dimm.h
rename to include/hw/mem/dimm.h
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 5baa906..0ae3abe 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -5,7 +5,7 @@
 #include "hw/boards.h"
 #include "hw/ppc/xics.h"
 #include "hw/ppc/spapr_drc.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 
 struct VIOsPAPRBus;
 struct sPAPRPHBState;
diff --git a/numa.c b/numa.c
index cb69965..34c6d6b 100644
--- a/numa.c
+++ b/numa.c
@@ -34,7 +34,7 @@
 #include "hw/boards.h"
 #include "sysemu/hostmem.h"
 #include "qmp-commands.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "qemu/option.h"
 #include "qemu/config-file.h"
 
diff --git a/qmp.c b/qmp.c
index 169b981..5ae37f7 100644
--- a/qmp.c
+++ b/qmp.c
@@ -31,7 +31,7 @@
 #include "qapi/qmp-input-visitor.h"
 #include "hw/boards.h"
 #include "qom/object_interfaces.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 #include "hw/acpi/acpi_dev_interface.h"
 
 NameInfo *qmp_query_name(Error **errp)
diff --git a/stubs/qmp_dimm_device_list.c b/stubs/qmp_dimm_device_list.c
index b2704c6..fb66400 100644
--- a/stubs/qmp_dimm_device_list.c
+++ b/stubs/qmp_dimm_device_list.c
@@ -1,5 +1,5 @@
 #include "qom/object.h"
-#include "hw/mem/pc-dimm.h"
+#include "hw/mem/dimm.h"
 
 int qmp_dimm_device_list(Object *obj, void *opaque)
 {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 19/35] dimm: abstract dimm device from pc-dimm
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

A base device, dimm, is abstracted from pc-dimm, so that we can
build nvdimm device based on dimm in the later patch

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 default-configs/i386-softmmu.mak   |  1 +
 default-configs/ppc64-softmmu.mak  |  1 +
 default-configs/x86_64-softmmu.mak |  1 +
 hw/mem/Makefile.objs               |  3 ++-
 hw/mem/dimm.c                      | 11 ++-------
 hw/mem/pc-dimm.c                   | 46 ++++++++++++++++++++++++++++++++++++++
 include/hw/mem/dimm.h              |  4 ++--
 include/hw/mem/pc-dimm.h           |  7 ++++++
 8 files changed, 62 insertions(+), 12 deletions(-)
 create mode 100644 hw/mem/pc-dimm.c
 create mode 100644 include/hw/mem/pc-dimm.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 43c96d1..3ece8bb 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -18,6 +18,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_X86_ICH=y
+CONFIG_DIMM=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index bb71b23..482b8a1 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -54,3 +54,4 @@ CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
 CONFIG_MC146818RTC=y
 CONFIG_ISA_TESTDEV=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_DIMM=y
diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index dfb8095..92ea7c1 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -18,6 +18,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_X86_ICH=y
+CONFIG_DIMM=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index 7563ef5..cebb4b1 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1 +1,2 @@
-common-obj-$(CONFIG_MEM_HOTPLUG) += dimm.o
+common-obj-$(CONFIG_DIMM) += dimm.o
+common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 9f55cee..4a63409 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -1,5 +1,5 @@
 /*
- * Dimm device for Memory Hotplug
+ * Dimm device abstraction
  *
  * Copyright ProfitBricks GmbH 2012
  * Copyright (C) 2014 Red Hat Inc
@@ -429,21 +429,13 @@ static void dimm_realize(DeviceState *dev, Error **errp)
     }
 }
 
-static MemoryRegion *dimm_get_memory_region(DIMMDevice *dimm)
-{
-    return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
-}
-
 static void dimm_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
-    DIMMDeviceClass *ddc = DIMM_CLASS(oc);
 
     dc->realize = dimm_realize;
     dc->props = dimm_properties;
     dc->desc = "DIMM memory module";
-
-    ddc->get_memory_region = dimm_get_memory_region;
 }
 
 static TypeInfo dimm_info = {
@@ -453,6 +445,7 @@ static TypeInfo dimm_info = {
     .instance_init = dimm_init,
     .class_init    = dimm_class_init,
     .class_size    = sizeof(DIMMDeviceClass),
+    .abstract      = true,
 };
 
 static void dimm_register_types(void)
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
new file mode 100644
index 0000000..38323e9
--- /dev/null
+++ b/hw/mem/pc-dimm.c
@@ -0,0 +1,46 @@
+/*
+ * Dimm device for Memory Hotplug
+ *
+ * Copyright ProfitBricks GmbH 2012
+ * Copyright (C) 2014 Red Hat Inc
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "hw/mem/pc-dimm.h"
+
+static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
+{
+    return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
+}
+
+static void pc_dimm_class_init(ObjectClass *oc, void *data)
+{
+    DIMMDeviceClass *ddc = DIMM_CLASS(oc);
+
+    ddc->get_memory_region = pc_dimm_get_memory_region;
+}
+
+static TypeInfo pc_dimm_info = {
+    .name          = TYPE_PC_DIMM,
+    .parent        = TYPE_DIMM,
+    .class_init    = pc_dimm_class_init,
+};
+
+static void pc_dimm_register_types(void)
+{
+    type_register_static(&pc_dimm_info);
+}
+
+type_init(pc_dimm_register_types)
diff --git a/include/hw/mem/dimm.h b/include/hw/mem/dimm.h
index ece8786..50f768a 100644
--- a/include/hw/mem/dimm.h
+++ b/include/hw/mem/dimm.h
@@ -1,5 +1,5 @@
 /*
- * PC DIMM device
+ * Dimm device abstraction
  *
  * Copyright ProfitBricks GmbH 2012
  * Copyright (C) 2013-2014 Red Hat Inc
@@ -20,7 +20,7 @@
 #include "sysemu/hostmem.h"
 #include "hw/qdev.h"
 
-#define TYPE_DIMM "pc-dimm"
+#define TYPE_DIMM "dimm"
 #define DIMM(obj) \
     OBJECT_CHECK(DIMMDevice, (obj), TYPE_DIMM)
 #define DIMM_CLASS(oc) \
diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
new file mode 100644
index 0000000..50818c2
--- /dev/null
+++ b/include/hw/mem/pc-dimm.h
@@ -0,0 +1,7 @@
+#ifndef QEMU_PC_DIMM_H
+#define QEMU_PC_DIMM_H
+
+#include "hw/mem/dimm.h"
+
+#define TYPE_PC_DIMM "pc-dimm"
+#endif
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 19/35] dimm: abstract dimm device from pc-dimm
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

A base device, dimm, is abstracted from pc-dimm, so that we can
build nvdimm device based on dimm in the later patch

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 default-configs/i386-softmmu.mak   |  1 +
 default-configs/ppc64-softmmu.mak  |  1 +
 default-configs/x86_64-softmmu.mak |  1 +
 hw/mem/Makefile.objs               |  3 ++-
 hw/mem/dimm.c                      | 11 ++-------
 hw/mem/pc-dimm.c                   | 46 ++++++++++++++++++++++++++++++++++++++
 include/hw/mem/dimm.h              |  4 ++--
 include/hw/mem/pc-dimm.h           |  7 ++++++
 8 files changed, 62 insertions(+), 12 deletions(-)
 create mode 100644 hw/mem/pc-dimm.c
 create mode 100644 include/hw/mem/pc-dimm.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 43c96d1..3ece8bb 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -18,6 +18,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_X86_ICH=y
+CONFIG_DIMM=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak
index bb71b23..482b8a1 100644
--- a/default-configs/ppc64-softmmu.mak
+++ b/default-configs/ppc64-softmmu.mak
@@ -54,3 +54,4 @@ CONFIG_XICS_KVM=$(and $(CONFIG_PSERIES),$(CONFIG_KVM))
 CONFIG_MC146818RTC=y
 CONFIG_ISA_TESTDEV=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_DIMM=y
diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index dfb8095..92ea7c1 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -18,6 +18,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_X86_ICH=y
+CONFIG_DIMM=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index 7563ef5..cebb4b1 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1 +1,2 @@
-common-obj-$(CONFIG_MEM_HOTPLUG) += dimm.o
+common-obj-$(CONFIG_DIMM) += dimm.o
+common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 9f55cee..4a63409 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -1,5 +1,5 @@
 /*
- * Dimm device for Memory Hotplug
+ * Dimm device abstraction
  *
  * Copyright ProfitBricks GmbH 2012
  * Copyright (C) 2014 Red Hat Inc
@@ -429,21 +429,13 @@ static void dimm_realize(DeviceState *dev, Error **errp)
     }
 }
 
-static MemoryRegion *dimm_get_memory_region(DIMMDevice *dimm)
-{
-    return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
-}
-
 static void dimm_class_init(ObjectClass *oc, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(oc);
-    DIMMDeviceClass *ddc = DIMM_CLASS(oc);
 
     dc->realize = dimm_realize;
     dc->props = dimm_properties;
     dc->desc = "DIMM memory module";
-
-    ddc->get_memory_region = dimm_get_memory_region;
 }
 
 static TypeInfo dimm_info = {
@@ -453,6 +445,7 @@ static TypeInfo dimm_info = {
     .instance_init = dimm_init,
     .class_init    = dimm_class_init,
     .class_size    = sizeof(DIMMDeviceClass),
+    .abstract      = true,
 };
 
 static void dimm_register_types(void)
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
new file mode 100644
index 0000000..38323e9
--- /dev/null
+++ b/hw/mem/pc-dimm.c
@@ -0,0 +1,46 @@
+/*
+ * Dimm device for Memory Hotplug
+ *
+ * Copyright ProfitBricks GmbH 2012
+ * Copyright (C) 2014 Red Hat Inc
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "hw/mem/pc-dimm.h"
+
+static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
+{
+    return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
+}
+
+static void pc_dimm_class_init(ObjectClass *oc, void *data)
+{
+    DIMMDeviceClass *ddc = DIMM_CLASS(oc);
+
+    ddc->get_memory_region = pc_dimm_get_memory_region;
+}
+
+static TypeInfo pc_dimm_info = {
+    .name          = TYPE_PC_DIMM,
+    .parent        = TYPE_DIMM,
+    .class_init    = pc_dimm_class_init,
+};
+
+static void pc_dimm_register_types(void)
+{
+    type_register_static(&pc_dimm_info);
+}
+
+type_init(pc_dimm_register_types)
diff --git a/include/hw/mem/dimm.h b/include/hw/mem/dimm.h
index ece8786..50f768a 100644
--- a/include/hw/mem/dimm.h
+++ b/include/hw/mem/dimm.h
@@ -1,5 +1,5 @@
 /*
- * PC DIMM device
+ * Dimm device abstraction
  *
  * Copyright ProfitBricks GmbH 2012
  * Copyright (C) 2013-2014 Red Hat Inc
@@ -20,7 +20,7 @@
 #include "sysemu/hostmem.h"
 #include "hw/qdev.h"
 
-#define TYPE_DIMM "pc-dimm"
+#define TYPE_DIMM "dimm"
 #define DIMM(obj) \
     OBJECT_CHECK(DIMMDevice, (obj), TYPE_DIMM)
 #define DIMM_CLASS(oc) \
diff --git a/include/hw/mem/pc-dimm.h b/include/hw/mem/pc-dimm.h
new file mode 100644
index 0000000..50818c2
--- /dev/null
+++ b/include/hw/mem/pc-dimm.h
@@ -0,0 +1,7 @@
+#ifndef QEMU_PC_DIMM_H
+#define QEMU_PC_DIMM_H
+
+#include "hw/mem/dimm.h"
+
+#define TYPE_PC_DIMM "pc-dimm"
+#endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Curretly, the memory region of backed memory is directly mapped to
guest's address space, however, it is not true for nvdimm device

This patch let dimm device realize this fact and use
DIMMDeviceClass->get_memory_region method to get the mapped memory
region

Current code did not check the return value of get_memory_region as it
assumed the backend memory of pc-dimm is always properly initialized,
we make get_memory_region internally catch the case if something is
wrong

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/dimm.c    |  3 ++-
 hw/mem/pc-dimm.c | 12 +++++++++++-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 4a63409..498d380 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -377,8 +377,9 @@ static void dimm_get_size(Object *obj, Visitor *v, void *opaque,
     int64_t value;
     MemoryRegion *mr;
     DIMMDevice *dimm = DIMM(obj);
+    DIMMDeviceClass *ddc = DIMM_GET_CLASS(obj);
 
-    mr = host_memory_backend_get_memory(dimm->hostmem, errp);
+    mr = ddc->get_memory_region(dimm);
     value = memory_region_size(mr);
 
     visit_type_int(v, &value, name, errp);
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 38323e9..e6b6a9f 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -22,7 +22,17 @@
 
 static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
 {
-    return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
+    Error *local_err = NULL;
+    MemoryRegion *mr;
+
+    mr = host_memory_backend_get_memory(dimm->hostmem, &local_err);
+
+    /*
+     * plug a pc-dimm device whose backend memory was not properly
+     * initialized?
+     */
+    assert(!local_err && mr);
+    return mr;
 }
 
 static void pc_dimm_class_init(ObjectClass *oc, void *data)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Curretly, the memory region of backed memory is directly mapped to
guest's address space, however, it is not true for nvdimm device

This patch let dimm device realize this fact and use
DIMMDeviceClass->get_memory_region method to get the mapped memory
region

Current code did not check the return value of get_memory_region as it
assumed the backend memory of pc-dimm is always properly initialized,
we make get_memory_region internally catch the case if something is
wrong

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/dimm.c    |  3 ++-
 hw/mem/pc-dimm.c | 12 +++++++++++-
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 4a63409..498d380 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -377,8 +377,9 @@ static void dimm_get_size(Object *obj, Visitor *v, void *opaque,
     int64_t value;
     MemoryRegion *mr;
     DIMMDevice *dimm = DIMM(obj);
+    DIMMDeviceClass *ddc = DIMM_GET_CLASS(obj);
 
-    mr = host_memory_backend_get_memory(dimm->hostmem, errp);
+    mr = ddc->get_memory_region(dimm);
     value = memory_region_size(mr);
 
     visit_type_int(v, &value, name, errp);
diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
index 38323e9..e6b6a9f 100644
--- a/hw/mem/pc-dimm.c
+++ b/hw/mem/pc-dimm.c
@@ -22,7 +22,17 @@
 
 static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
 {
-    return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
+    Error *local_err = NULL;
+    MemoryRegion *mr;
+
+    mr = host_memory_backend_get_memory(dimm->hostmem, &local_err);
+
+    /*
+     * plug a pc-dimm device whose backend memory was not properly
+     * initialized?
+     */
+    assert(!local_err && mr);
+    return mr;
 }
 
 static void pc_dimm_class_init(ObjectClass *oc, void *data)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 21/35] dimm: keep the state of the whole backend memory
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

QEMU keeps the state of memory of dimm device during live migration,
however, it is not enough for nvdimm device as its memory does not
contain its label data, so that we should protect the whole backend
memory instead

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/dimm.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 498d380..44447d1 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -134,9 +134,16 @@ void dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
     }
 
     memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
-    vmstate_register_ram(mr, dev);
     numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
 
+    /*
+     * save the state only for @mr is not enough as it does not contain
+     * the label data of NVDIMM device, so that we keep the state of
+     * whole hostmem instead.
+     */
+    vmstate_register_ram(host_memory_backend_get_memory(dimm->hostmem, errp),
+                         dev);
+
 out:
     error_propagate(errp, local_err);
 }
@@ -145,10 +152,13 @@ void dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
                            MemoryRegion *mr)
 {
     DIMMDevice *dimm = DIMM(dev);
+    MemoryRegion *backend_mr;
+
+    backend_mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort);
 
     numa_unset_mem_node_id(dimm->addr, memory_region_size(mr), dimm->node);
     memory_region_del_subregion(&hpms->mr, mr);
-    vmstate_unregister_ram(mr, dev);
+    vmstate_unregister_ram(backend_mr, dev);
 }
 
 int qmp_dimm_device_list(Object *obj, void *opaque)
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 21/35] dimm: keep the state of the whole backend memory
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

QEMU keeps the state of memory of dimm device during live migration,
however, it is not enough for nvdimm device as its memory does not
contain its label data, so that we should protect the whole backend
memory instead

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/dimm.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 498d380..44447d1 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -134,9 +134,16 @@ void dimm_memory_plug(DeviceState *dev, MemoryHotplugState *hpms,
     }
 
     memory_region_add_subregion(&hpms->mr, addr - hpms->base, mr);
-    vmstate_register_ram(mr, dev);
     numa_set_mem_node_id(addr, memory_region_size(mr), dimm->node);
 
+    /*
+     * save the state only for @mr is not enough as it does not contain
+     * the label data of NVDIMM device, so that we keep the state of
+     * whole hostmem instead.
+     */
+    vmstate_register_ram(host_memory_backend_get_memory(dimm->hostmem, errp),
+                         dev);
+
 out:
     error_propagate(errp, local_err);
 }
@@ -145,10 +152,13 @@ void dimm_memory_unplug(DeviceState *dev, MemoryHotplugState *hpms,
                            MemoryRegion *mr)
 {
     DIMMDevice *dimm = DIMM(dev);
+    MemoryRegion *backend_mr;
+
+    backend_mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort);
 
     numa_unset_mem_node_id(dimm->addr, memory_region_size(mr), dimm->node);
     memory_region_del_subregion(&hpms->mr, mr);
-    vmstate_unregister_ram(mr, dev);
+    vmstate_unregister_ram(backend_mr, dev);
 }
 
 int qmp_dimm_device_list(Object *obj, void *opaque)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 22/35] dimm: introduce realize callback
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

nvdimm need check if the backend memory is large enough to contain label
data and init its memory region when the device is realized, so introduce
realize callback which is called after common dimm has been realize

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/dimm.c         | 5 +++++
 include/hw/mem/dimm.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 44447d1..0ae23ce 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -426,6 +426,7 @@ static void dimm_init(Object *obj)
 static void dimm_realize(DeviceState *dev, Error **errp)
 {
     DIMMDevice *dimm = DIMM(dev);
+    DIMMDeviceClass *ddc = DIMM_GET_CLASS(dimm);
 
     if (!dimm->hostmem) {
         error_setg(errp, "'" DIMM_MEMDEV_PROP "' property is not set");
@@ -438,6 +439,10 @@ static void dimm_realize(DeviceState *dev, Error **errp)
                    dimm->node, nb_numa_nodes ? nb_numa_nodes : 1);
         return;
     }
+
+    if (ddc->realize) {
+        ddc->realize(dimm, errp);
+    }
 }
 
 static void dimm_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/mem/dimm.h b/include/hw/mem/dimm.h
index 50f768a..72ec24c 100644
--- a/include/hw/mem/dimm.h
+++ b/include/hw/mem/dimm.h
@@ -65,6 +65,7 @@ typedef struct DIMMDeviceClass {
     DeviceClass parent_class;
 
     /* public */
+    void (*realize)(DIMMDevice *dimm, Error **errp);
     MemoryRegion *(*get_memory_region)(DIMMDevice *dimm);
 } DIMMDeviceClass;
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 22/35] dimm: introduce realize callback
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

nvdimm need check if the backend memory is large enough to contain label
data and init its memory region when the device is realized, so introduce
realize callback which is called after common dimm has been realize

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/mem/dimm.c         | 5 +++++
 include/hw/mem/dimm.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
index 44447d1..0ae23ce 100644
--- a/hw/mem/dimm.c
+++ b/hw/mem/dimm.c
@@ -426,6 +426,7 @@ static void dimm_init(Object *obj)
 static void dimm_realize(DeviceState *dev, Error **errp)
 {
     DIMMDevice *dimm = DIMM(dev);
+    DIMMDeviceClass *ddc = DIMM_GET_CLASS(dimm);
 
     if (!dimm->hostmem) {
         error_setg(errp, "'" DIMM_MEMDEV_PROP "' property is not set");
@@ -438,6 +439,10 @@ static void dimm_realize(DeviceState *dev, Error **errp)
                    dimm->node, nb_numa_nodes ? nb_numa_nodes : 1);
         return;
     }
+
+    if (ddc->realize) {
+        ddc->realize(dimm, errp);
+    }
 }
 
 static void dimm_class_init(ObjectClass *oc, void *data)
diff --git a/include/hw/mem/dimm.h b/include/hw/mem/dimm.h
index 50f768a..72ec24c 100644
--- a/include/hw/mem/dimm.h
+++ b/include/hw/mem/dimm.h
@@ -65,6 +65,7 @@ typedef struct DIMMDeviceClass {
     DeviceClass parent_class;
 
     /* public */
+    void (*realize)(DIMMDevice *dimm, Error **errp);
     MemoryRegion *(*get_memory_region)(DIMMDevice *dimm);
 } DIMMDeviceClass;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 23/35] nvdimm: implement NVDIMM device abstract
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Introduce "nvdimm" device which is based on dimm device type

128K memory region which is the minimum namespace label size
required by NVDIMM Namespace Spec locates at the end of
backend memory device is reserved for label data

We can use "-m 1G,maxmem=100G,slots=10 -object memory-backend-file,
id=mem1,size=1G,mem-path=/dev/pmem0 -device nvdimm,memdev=mem1" to
create NVDIMM device for guest

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 hw/acpi/memory_hotplug.c           |   6 ++
 hw/mem/Makefile.objs               |   1 +
 hw/mem/nvdimm.c                    | 116 +++++++++++++++++++++++++++++++++++++
 include/hw/mem/nvdimm.h            |  83 ++++++++++++++++++++++++++
 6 files changed, 208 insertions(+)
 create mode 100644 hw/mem/nvdimm.c
 create mode 100644 include/hw/mem/nvdimm.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 3ece8bb..4e84a1c 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -47,6 +47,7 @@ CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index 92ea7c1..e877a86 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -47,6 +47,7 @@ CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index 20d3093..bb5a29f 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -1,6 +1,7 @@
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/pc-hotplug.h"
 #include "hw/mem/dimm.h"
+#include "hw/mem/nvdimm.h"
 #include "hw/boards.h"
 #include "hw/qdev-core.h"
 #include "trace.h"
@@ -231,6 +232,11 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st,
 {
     MemStatus *mdev;
 
+    /* Currently, NVDIMM hotplug has not been supported yet. */
+    if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
+        return;
+    }
+
     mdev = acpi_memory_slot_status(mem_st, dev, errp);
     if (!mdev) {
         return;
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index cebb4b1..12d9b72 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1,2 +1,3 @@
 common-obj-$(CONFIG_DIMM) += dimm.o
 common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
+common-obj-$(CONFIG_NVDIMM) += nvdimm.o
diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
new file mode 100644
index 0000000..c310887
--- /dev/null
+++ b/hw/mem/nvdimm.c
@@ -0,0 +1,116 @@
+/*
+ * Non-Volatile Dual In-line Memory Module Virtualization Implementation
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *
+ * Currently, it only supports PMEM Virtualization.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qapi/visitor.h"
+#include "hw/mem/nvdimm.h"
+
+static MemoryRegion *nvdimm_get_memory_region(DIMMDevice *dimm)
+{
+    NVDIMMDevice *nvdimm = NVDIMM(dimm);
+
+    /* plug a NVDIMM device which is not properly realized? */
+    assert(memory_region_size(&nvdimm->nvdimm_mr));
+
+    return &nvdimm->nvdimm_mr;
+}
+
+static void nvdimm_realize(DIMMDevice *dimm, Error **errp)
+{
+    MemoryRegion *mr;
+    NVDIMMDevice *nvdimm = NVDIMM(dimm);
+    uint64_t size;
+
+    nvdimm->label_size = MIN_NAMESPACE_LABEL_SIZE;
+
+    mr = host_memory_backend_get_memory(dimm->hostmem, errp);
+    size = memory_region_size(mr);
+
+    if (size <= nvdimm->label_size) {
+        char *path = object_get_canonical_path_component(OBJECT(dimm->hostmem));
+        error_setg(errp, "the size of memdev %s (0x%" PRIx64 ") is too small"
+                   " to contain nvdimm namespace label (0x%" PRIx64 ")", path,
+                   memory_region_size(mr), nvdimm->label_size);
+        return;
+    }
+
+    memory_region_init_alias(&nvdimm->nvdimm_mr, OBJECT(dimm), "nvdimm-memory",
+                             mr, 0, size - nvdimm->label_size);
+    nvdimm->label_data = memory_region_get_ram_ptr(mr) +
+                         memory_region_size(&nvdimm->nvdimm_mr);
+}
+
+static void nvdimm_read_label_data(NVDIMMDevice *nvdimm, void *buf,
+                                   uint64_t size, uint64_t offset)
+{
+    assert((nvdimm->label_size >= size + offset) && (offset + size > offset));
+
+    memcpy(buf, nvdimm->label_data + offset, size);
+}
+
+static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf,
+                                    uint64_t size, uint64_t offset)
+{
+    MemoryRegion *mr;
+    DIMMDevice *dimm = DIMM(nvdimm);
+    uint64_t backend_offset;
+
+    assert((nvdimm->label_size >= size + offset) && (offset + size > offset));
+
+    memcpy(nvdimm->label_data + offset, buf, size);
+
+    mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort);
+    backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
+    memory_region_set_dirty(mr, backend_offset, size);
+}
+
+static void nvdimm_class_init(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+    DIMMDeviceClass *ddc = DIMM_CLASS(oc);
+    NVDIMMClass *nvc = NVDIMM_CLASS(oc);
+
+    /* nvdimm hotplug has not been supported yet. */
+    dc->hotpluggable = false;
+
+    ddc->realize = nvdimm_realize;
+    ddc->get_memory_region = nvdimm_get_memory_region;
+
+    nvc->read_label_data = nvdimm_read_label_data;
+    nvc->write_label_data = nvdimm_write_label_data;
+}
+
+static TypeInfo nvdimm_info = {
+    .name          = TYPE_NVDIMM,
+    .parent        = TYPE_DIMM,
+    .instance_size = sizeof(NVDIMMDevice),
+    .class_init    = nvdimm_class_init,
+    .class_size    = sizeof(NVDIMMClass),
+};
+
+static void nvdimm_register_types(void)
+{
+    type_register_static(&nvdimm_info);
+}
+
+type_init(nvdimm_register_types)
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
new file mode 100644
index 0000000..cd90957
--- /dev/null
+++ b/include/hw/mem/nvdimm.h
@@ -0,0 +1,83 @@
+/*
+ * Non-Volatile Dual In-line Memory Module Virtualization Implementation
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *
+ * NVDIMM specifications and some documents can be found at:
+ * NVDIMM ACPI device and NFIT are introduced in ACPI 6:
+ *      http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
+ * NVDIMM Namespace specification:
+ *      http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
+ * DSM Interface Example:
+ *      http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
+ * Driver Writer's Guide:
+ *      http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_NVDIMM_H
+#define QEMU_NVDIMM_H
+
+#include "hw/mem/dimm.h"
+
+/*
+ * The minimum label data size is required by NVDIMM Namespace
+ * specification, please refer to chapter 2 Namespaces:
+ *   "NVDIMMs following the NVDIMM Block Mode Specification use an area
+ *    at least 128KB in size, which holds around 1000 labels."
+ */
+#define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
+
+#define TYPE_NVDIMM      "nvdimm"
+#define NVDIMM(obj)      OBJECT_CHECK(NVDIMMDevice, (obj), TYPE_NVDIMM)
+#define NVDIMM_CLASS(oc) OBJECT_CLASS_CHECK(NVDIMMClass, (oc), TYPE_NVDIMM)
+#define NVDIMM_GET_CLASS(obj) OBJECT_GET_CLASS(NVDIMMClass, (obj), \
+                                               TYPE_NVDIMM)
+
+struct NVDIMMDevice {
+    /* private */
+    DIMMDevice parent_obj;
+
+    /* public */
+
+    /*
+     * the size of label data in NVDIMM device which is presented to
+     * guest via __DSM "Get Namespace Label Size" command.
+     */
+    uint64_t label_size;
+
+    /*
+     * the address of label data which is read by __DSM "Get Namespace
+     * Label Data" command and written by __DSM "Set Namespace Label
+     * Data" command.
+     */
+    void *label_data;
+
+    /*
+     * it's the PMEM region in NVDIMM device, which is presented to
+     * guest via ACPI NFIT and _FIT method if NVDIMM hotplug is supported.
+     */
+    MemoryRegion nvdimm_mr;
+};
+typedef struct NVDIMMDevice NVDIMMDevice;
+
+struct NVDIMMClass {
+    /* private */
+    DIMMDeviceClass parent_class;
+
+    /* public */
+    /* read @size bytes from NVDIMM label data at @offset into @buf. */
+    void (*read_label_data)(NVDIMMDevice *nvdimm, void *buf,
+                            uint64_t size, uint64_t offset);
+    /* write @size bytes from @buf to NVDIMM label data at @offset. */
+    void (*write_label_data)(NVDIMMDevice *nvdimm, const void *buf,
+                             uint64_t size, uint64_t offset);
+};
+typedef struct NVDIMMClass NVDIMMClass;
+
+#endif
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 23/35] nvdimm: implement NVDIMM device abstract
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Introduce "nvdimm" device which is based on dimm device type

128K memory region which is the minimum namespace label size
required by NVDIMM Namespace Spec locates at the end of
backend memory device is reserved for label data

We can use "-m 1G,maxmem=100G,slots=10 -object memory-backend-file,
id=mem1,size=1G,mem-path=/dev/pmem0 -device nvdimm,memdev=mem1" to
create NVDIMM device for guest

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 hw/acpi/memory_hotplug.c           |   6 ++
 hw/mem/Makefile.objs               |   1 +
 hw/mem/nvdimm.c                    | 116 +++++++++++++++++++++++++++++++++++++
 include/hw/mem/nvdimm.h            |  83 ++++++++++++++++++++++++++
 6 files changed, 208 insertions(+)
 create mode 100644 hw/mem/nvdimm.c
 create mode 100644 include/hw/mem/nvdimm.h

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 3ece8bb..4e84a1c 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -47,6 +47,7 @@ CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index 92ea7c1..e877a86 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -47,6 +47,7 @@ CONFIG_APIC=y
 CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
+CONFIG_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
index 20d3093..bb5a29f 100644
--- a/hw/acpi/memory_hotplug.c
+++ b/hw/acpi/memory_hotplug.c
@@ -1,6 +1,7 @@
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/pc-hotplug.h"
 #include "hw/mem/dimm.h"
+#include "hw/mem/nvdimm.h"
 #include "hw/boards.h"
 #include "hw/qdev-core.h"
 #include "trace.h"
@@ -231,6 +232,11 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st,
 {
     MemStatus *mdev;
 
+    /* Currently, NVDIMM hotplug has not been supported yet. */
+    if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
+        return;
+    }
+
     mdev = acpi_memory_slot_status(mem_st, dev, errp);
     if (!mdev) {
         return;
diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
index cebb4b1..12d9b72 100644
--- a/hw/mem/Makefile.objs
+++ b/hw/mem/Makefile.objs
@@ -1,2 +1,3 @@
 common-obj-$(CONFIG_DIMM) += dimm.o
 common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
+common-obj-$(CONFIG_NVDIMM) += nvdimm.o
diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
new file mode 100644
index 0000000..c310887
--- /dev/null
+++ b/hw/mem/nvdimm.c
@@ -0,0 +1,116 @@
+/*
+ * Non-Volatile Dual In-line Memory Module Virtualization Implementation
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *
+ * Currently, it only supports PMEM Virtualization.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qapi/visitor.h"
+#include "hw/mem/nvdimm.h"
+
+static MemoryRegion *nvdimm_get_memory_region(DIMMDevice *dimm)
+{
+    NVDIMMDevice *nvdimm = NVDIMM(dimm);
+
+    /* plug a NVDIMM device which is not properly realized? */
+    assert(memory_region_size(&nvdimm->nvdimm_mr));
+
+    return &nvdimm->nvdimm_mr;
+}
+
+static void nvdimm_realize(DIMMDevice *dimm, Error **errp)
+{
+    MemoryRegion *mr;
+    NVDIMMDevice *nvdimm = NVDIMM(dimm);
+    uint64_t size;
+
+    nvdimm->label_size = MIN_NAMESPACE_LABEL_SIZE;
+
+    mr = host_memory_backend_get_memory(dimm->hostmem, errp);
+    size = memory_region_size(mr);
+
+    if (size <= nvdimm->label_size) {
+        char *path = object_get_canonical_path_component(OBJECT(dimm->hostmem));
+        error_setg(errp, "the size of memdev %s (0x%" PRIx64 ") is too small"
+                   " to contain nvdimm namespace label (0x%" PRIx64 ")", path,
+                   memory_region_size(mr), nvdimm->label_size);
+        return;
+    }
+
+    memory_region_init_alias(&nvdimm->nvdimm_mr, OBJECT(dimm), "nvdimm-memory",
+                             mr, 0, size - nvdimm->label_size);
+    nvdimm->label_data = memory_region_get_ram_ptr(mr) +
+                         memory_region_size(&nvdimm->nvdimm_mr);
+}
+
+static void nvdimm_read_label_data(NVDIMMDevice *nvdimm, void *buf,
+                                   uint64_t size, uint64_t offset)
+{
+    assert((nvdimm->label_size >= size + offset) && (offset + size > offset));
+
+    memcpy(buf, nvdimm->label_data + offset, size);
+}
+
+static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf,
+                                    uint64_t size, uint64_t offset)
+{
+    MemoryRegion *mr;
+    DIMMDevice *dimm = DIMM(nvdimm);
+    uint64_t backend_offset;
+
+    assert((nvdimm->label_size >= size + offset) && (offset + size > offset));
+
+    memcpy(nvdimm->label_data + offset, buf, size);
+
+    mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort);
+    backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
+    memory_region_set_dirty(mr, backend_offset, size);
+}
+
+static void nvdimm_class_init(ObjectClass *oc, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(oc);
+    DIMMDeviceClass *ddc = DIMM_CLASS(oc);
+    NVDIMMClass *nvc = NVDIMM_CLASS(oc);
+
+    /* nvdimm hotplug has not been supported yet. */
+    dc->hotpluggable = false;
+
+    ddc->realize = nvdimm_realize;
+    ddc->get_memory_region = nvdimm_get_memory_region;
+
+    nvc->read_label_data = nvdimm_read_label_data;
+    nvc->write_label_data = nvdimm_write_label_data;
+}
+
+static TypeInfo nvdimm_info = {
+    .name          = TYPE_NVDIMM,
+    .parent        = TYPE_DIMM,
+    .instance_size = sizeof(NVDIMMDevice),
+    .class_init    = nvdimm_class_init,
+    .class_size    = sizeof(NVDIMMClass),
+};
+
+static void nvdimm_register_types(void)
+{
+    type_register_static(&nvdimm_info);
+}
+
+type_init(nvdimm_register_types)
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
new file mode 100644
index 0000000..cd90957
--- /dev/null
+++ b/include/hw/mem/nvdimm.h
@@ -0,0 +1,83 @@
+/*
+ * Non-Volatile Dual In-line Memory Module Virtualization Implementation
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *
+ * NVDIMM specifications and some documents can be found at:
+ * NVDIMM ACPI device and NFIT are introduced in ACPI 6:
+ *      http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
+ * NVDIMM Namespace specification:
+ *      http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
+ * DSM Interface Example:
+ *      http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
+ * Driver Writer's Guide:
+ *      http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_NVDIMM_H
+#define QEMU_NVDIMM_H
+
+#include "hw/mem/dimm.h"
+
+/*
+ * The minimum label data size is required by NVDIMM Namespace
+ * specification, please refer to chapter 2 Namespaces:
+ *   "NVDIMMs following the NVDIMM Block Mode Specification use an area
+ *    at least 128KB in size, which holds around 1000 labels."
+ */
+#define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
+
+#define TYPE_NVDIMM      "nvdimm"
+#define NVDIMM(obj)      OBJECT_CHECK(NVDIMMDevice, (obj), TYPE_NVDIMM)
+#define NVDIMM_CLASS(oc) OBJECT_CLASS_CHECK(NVDIMMClass, (oc), TYPE_NVDIMM)
+#define NVDIMM_GET_CLASS(obj) OBJECT_GET_CLASS(NVDIMMClass, (obj), \
+                                               TYPE_NVDIMM)
+
+struct NVDIMMDevice {
+    /* private */
+    DIMMDevice parent_obj;
+
+    /* public */
+
+    /*
+     * the size of label data in NVDIMM device which is presented to
+     * guest via __DSM "Get Namespace Label Size" command.
+     */
+    uint64_t label_size;
+
+    /*
+     * the address of label data which is read by __DSM "Get Namespace
+     * Label Data" command and written by __DSM "Set Namespace Label
+     * Data" command.
+     */
+    void *label_data;
+
+    /*
+     * it's the PMEM region in NVDIMM device, which is presented to
+     * guest via ACPI NFIT and _FIT method if NVDIMM hotplug is supported.
+     */
+    MemoryRegion nvdimm_mr;
+};
+typedef struct NVDIMMDevice NVDIMMDevice;
+
+struct NVDIMMClass {
+    /* private */
+    DIMMDeviceClass parent_class;
+
+    /* public */
+    /* read @size bytes from NVDIMM label data at @offset into @buf. */
+    void (*read_label_data)(NVDIMMDevice *nvdimm, void *buf,
+                            uint64_t size, uint64_t offset);
+    /* write @size bytes from @buf to NVDIMM label data at @offset. */
+    void (*write_label_data)(NVDIMMDevice *nvdimm, const void *buf,
+                             uint64_t size, uint64_t offset);
+};
+typedef struct NVDIMMClass NVDIMMClass;
+
+#endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 24/35] docs: add NVDIMM ACPI documentation
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

It describes the basic concepts of NVDIMM ACPI and the interface
between QEMU and the ACPI BIOS

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 docs/specs/acpi_nvdimm.txt | 179 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 179 insertions(+)
 create mode 100644 docs/specs/acpi_nvdimm.txt

diff --git a/docs/specs/acpi_nvdimm.txt b/docs/specs/acpi_nvdimm.txt
new file mode 100644
index 0000000..cc5db2c
--- /dev/null
+++ b/docs/specs/acpi_nvdimm.txt
@@ -0,0 +1,179 @@
+QEMU<->ACPI BIOS NVDIMM interface
+---------------------------------
+
+QEMU supports NVDIMM via ACPI. This document describes the basic concepts of
+NVDIMM ACPI and the interface between QEMU and the ACPI BIOS.
+
+NVDIMM ACPI Background
+----------------------
+NVDIMM is introduced in ACPI 6.0 which defines an NVDIMM root device under
+_SB scope with a _HID of “ACPI0012”. For each NVDIMM present or intended
+to be supported by platform, platform firmware also exposes an ACPI
+Namespace Device under the root device.
+
+The NVDIMM child devices under the NVDIMM root device are defined with _ADR
+corresponding to the NFIT device handle. The NVDIMM root device and the
+NVDIMM devices can have device specific methods (_DSM) to provide additional
+functions specific to a particular NVDIMM implementation.
+
+This is an example from ACPI 6.0, a platform contains one NVDIMM:
+
+Scope (\_SB){
+   Device (NVDR) // Root device
+   {
+      Name (_HID, “ACPI0012”)
+      Method (_STA) {...}
+      Method (_FIT) {...}
+      Method (_DSM, ...) {...}
+      Device (NVD)
+      {
+         Name(_ADR, h) //where h is NFIT Device Handle for this NVDIMM
+         Method (_DSM, ...) {...}
+      }
+   }
+}
+
+Methods supported on both NVDIMM root device and NVDIMM device are
+1) _STA(Status)
+   It returns the current status of a device, which can be one of the
+   following: enabled, disabled, or removed.
+
+   Arguments: None
+
+   Return Value:
+   It returns an An Integer which is defined as followings:
+   Bit [0] – Set if the device is present.
+   Bit [1] – Set if the device is enabled and decoding its resources.
+   Bit [2] – Set if the device should be shown in the UI.
+   Bit [3] – Set if the device is functioning properly (cleared if device
+             failed its diagnostics).
+   Bit [4] – Set if the battery is present.
+   Bits [31:5] – Reserved (must be cleared).
+
+2) _DSM (Device Specific Method)
+   It is a control method that enables devices to provide device specific
+   control functions that are consumed by the device driver.
+   The NVDIMM DSM specification can be found at:
+        http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
+
+   Arguments:
+   Arg0 – A Buffer containing a UUID (16 Bytes)
+   Arg1 – An Integer containing the Revision ID (4 Bytes)
+   Arg2 – An Integer containing the Function Index (4 Bytes)
+   Arg3 – A package containing parameters for the function specified by the
+          UUID, Revision ID, and Function Index
+
+   Return Value:
+   If Function Index = 0, a Buffer containing a function index bitfield.
+   Otherwise, the return value and type depends on the UUID, revision ID
+   and function index which are described in the DSM specification.
+
+Methods on NVDIMM ROOT Device
+_FIT(Firmware Interface Table)
+   It evaluates to a buffer returning data in the format of a series of NFIT
+   Type Structure.
+
+   Arguments: None
+
+   Return Value:
+   A Buffer containing a list of NFIT Type structure entries.
+
+   The detailed definition of the structure can be found at ACPI 6.0: 5.2.25
+   NVDIMM Firmware Interface Table (NFIT).
+
+QEMU NVDIMM Implemention
+========================
+QEMU reserves a page starting from 0xFF00000 and 4 bytes IO Port starting
+from 0x0a18 for NVDIMM ACPI.
+
+Memory 0xFF00000 - 0xFF00FFF:
+   This page is RAM-based and it is used to transfer data between _DSM
+   method and QEMU. If ACPI has control, this pages is owned by ACPI which
+   writes _DSM input data to it, otherwise, it is owned by QEMU which
+   emulates _DSM access and writes the output data to it.
+
+   ACPI Writes _DSM Input Data:
+   [0xFF00000 - 0xFF00003]: 4 bytes, NVDIMM Devcie Handle, 0 is reserved
+                            for NVDIMM Root device.
+   [0xFF00004 - 0xFF00007]: 4 bytes, Revision ID, that is the Arg1 of _DSM
+                            method.
+   [0xFF00008 - 0xFF0000B]: 4 bytes. Function Index, that is the Arg2 of
+                            _DSM method.
+   [0xFF0000C - 0xFF00FFF]: 4084 bytes, the Arg3 of _DSM method
+
+   QEMU Writes Output Data:
+   [0xFF00000 - 0xFF00FFF]: the DSM return result filled by QEMU
+
+IO Port 0x0a18 - 0xa1b:
+   ACPI uses it to transfer control from guest to QEMU and read the size
+   of return result filled by QEMU
+
+   Read Access:
+       [0x0a18 - 0xa1b]: 4 bytes, the buffer size of _DSM output data.
+
+_DSM process diagram:
+---------------------
+The page, 0xFF00000 - 0xFF00FFF, is used by _DSM Virtualization.
+
+ +----------------------+      +-----------------------+
+ |    1. OSPM           |      |    2. OSPM            |
+ | save _DSM input data |      |   read 0x0a18         | Exit to QEMU
+ | to the page          +----->|                       +------------+
+ |                      |      |                       |            |
+ +----------------------+      +-----------------------+            |
+                                                                    |
+                                                                    v
+ +-------------   ----+       +-----------+      +------------------+--------+
+ |      5 QEMU        |       | 4 QEMU    |      |        3. QEMU            |
+ | write _DSM result  |       |  emulate  |      | get _DSM input parameters |
+ | to the page        +<------+ _DSM      +<-----+ from the page             |
+ |                    |       |           |      |                           |
+ +--------+-----------+       +-----------+      +---------------------------+
+          |
+          | Enter Guest
+          |
+          v
+ +--------------------------+      +--------------+
+ |     6 OSPM               |      |   7 OSPM     |
+ | result size is returned  |      |  _DSM return |
+ | by read and get DSM      +----->+              |
+ | result from the page     |      |              |
+ +--------------------------+      +--------------+
+
+ QEMU internal use only _DSM function
+ ------------------------------------
+ There is the function introduced by QEMU and only used by QEMU internal.
+
+ 1) Read FIT
+ As we only reserved one page for NVDIMM ACPI it is impossible to map the
+ whole FIT data to guest's address space. This function is used by _FIT
+ method to read a piece of FIT data from QEMU.
+
+ Input parameters:
+ Arg0 – UUID {set to 2f10e7a4-9e91-11e4-89d3-123b93f75cba}
+ Arg1 – Revision ID (set to 1)
+ Arg2 - 0xFFFFFFFF
+ Arg3 - A package containing a buffer whose layout is as follows:
+
+ +----------+-------------+-------------+-----------------------------------+
+ |  Filed   | Byte Length | Byte Offset | Description                       |
+ +----------+-------------+-------------+-----------------------------------+
+ | offset   | 4           | 0           | the offset of FIT buffer          |
+ +----------+-------------+-------------+-----------------------------------+
+
+ Output:
+ +----------+-------------+-------------+-----------------------------------+
+ |  Filed   | Byte Length | Byte Offset | Description                       |
+ +----------+-------------+-------------+-----------------------------------+
+ | status   | 4           | 0           | return status codes following     |
+ |          |             |             | Chapter 3 in DSM Spec Rev1        |
+ +----------+-------------+-------------+-----------------------------------+
+ | length   | 4           | 4           | the length of FIT buffer read out |
+ +----------+-------------+-------------+-----------------------------------+
+ | fit data | Varies      | 8           | FIT data, its size is indicated   |
+ |          |             |             | by length field above             |
+ +----------+-------------+-------------+-----------------------------------+
+
+ The FIT offset is maintained by the caller itself, current offset plugs
+ the length returned by the function is the next offset we should read.
+ When all the FIT data has been read out, zero length is returned.
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 24/35] docs: add NVDIMM ACPI documentation
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

It describes the basic concepts of NVDIMM ACPI and the interface
between QEMU and the ACPI BIOS

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 docs/specs/acpi_nvdimm.txt | 179 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 179 insertions(+)
 create mode 100644 docs/specs/acpi_nvdimm.txt

diff --git a/docs/specs/acpi_nvdimm.txt b/docs/specs/acpi_nvdimm.txt
new file mode 100644
index 0000000..cc5db2c
--- /dev/null
+++ b/docs/specs/acpi_nvdimm.txt
@@ -0,0 +1,179 @@
+QEMU<->ACPI BIOS NVDIMM interface
+---------------------------------
+
+QEMU supports NVDIMM via ACPI. This document describes the basic concepts of
+NVDIMM ACPI and the interface between QEMU and the ACPI BIOS.
+
+NVDIMM ACPI Background
+----------------------
+NVDIMM is introduced in ACPI 6.0 which defines an NVDIMM root device under
+_SB scope with a _HID of “ACPI0012”. For each NVDIMM present or intended
+to be supported by platform, platform firmware also exposes an ACPI
+Namespace Device under the root device.
+
+The NVDIMM child devices under the NVDIMM root device are defined with _ADR
+corresponding to the NFIT device handle. The NVDIMM root device and the
+NVDIMM devices can have device specific methods (_DSM) to provide additional
+functions specific to a particular NVDIMM implementation.
+
+This is an example from ACPI 6.0, a platform contains one NVDIMM:
+
+Scope (\_SB){
+   Device (NVDR) // Root device
+   {
+      Name (_HID, “ACPI0012”)
+      Method (_STA) {...}
+      Method (_FIT) {...}
+      Method (_DSM, ...) {...}
+      Device (NVD)
+      {
+         Name(_ADR, h) //where h is NFIT Device Handle for this NVDIMM
+         Method (_DSM, ...) {...}
+      }
+   }
+}
+
+Methods supported on both NVDIMM root device and NVDIMM device are
+1) _STA(Status)
+   It returns the current status of a device, which can be one of the
+   following: enabled, disabled, or removed.
+
+   Arguments: None
+
+   Return Value:
+   It returns an An Integer which is defined as followings:
+   Bit [0] – Set if the device is present.
+   Bit [1] – Set if the device is enabled and decoding its resources.
+   Bit [2] – Set if the device should be shown in the UI.
+   Bit [3] – Set if the device is functioning properly (cleared if device
+             failed its diagnostics).
+   Bit [4] – Set if the battery is present.
+   Bits [31:5] – Reserved (must be cleared).
+
+2) _DSM (Device Specific Method)
+   It is a control method that enables devices to provide device specific
+   control functions that are consumed by the device driver.
+   The NVDIMM DSM specification can be found at:
+        http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
+
+   Arguments:
+   Arg0 – A Buffer containing a UUID (16 Bytes)
+   Arg1 – An Integer containing the Revision ID (4 Bytes)
+   Arg2 – An Integer containing the Function Index (4 Bytes)
+   Arg3 – A package containing parameters for the function specified by the
+          UUID, Revision ID, and Function Index
+
+   Return Value:
+   If Function Index = 0, a Buffer containing a function index bitfield.
+   Otherwise, the return value and type depends on the UUID, revision ID
+   and function index which are described in the DSM specification.
+
+Methods on NVDIMM ROOT Device
+_FIT(Firmware Interface Table)
+   It evaluates to a buffer returning data in the format of a series of NFIT
+   Type Structure.
+
+   Arguments: None
+
+   Return Value:
+   A Buffer containing a list of NFIT Type structure entries.
+
+   The detailed definition of the structure can be found at ACPI 6.0: 5.2.25
+   NVDIMM Firmware Interface Table (NFIT).
+
+QEMU NVDIMM Implemention
+========================
+QEMU reserves a page starting from 0xFF00000 and 4 bytes IO Port starting
+from 0x0a18 for NVDIMM ACPI.
+
+Memory 0xFF00000 - 0xFF00FFF:
+   This page is RAM-based and it is used to transfer data between _DSM
+   method and QEMU. If ACPI has control, this pages is owned by ACPI which
+   writes _DSM input data to it, otherwise, it is owned by QEMU which
+   emulates _DSM access and writes the output data to it.
+
+   ACPI Writes _DSM Input Data:
+   [0xFF00000 - 0xFF00003]: 4 bytes, NVDIMM Devcie Handle, 0 is reserved
+                            for NVDIMM Root device.
+   [0xFF00004 - 0xFF00007]: 4 bytes, Revision ID, that is the Arg1 of _DSM
+                            method.
+   [0xFF00008 - 0xFF0000B]: 4 bytes. Function Index, that is the Arg2 of
+                            _DSM method.
+   [0xFF0000C - 0xFF00FFF]: 4084 bytes, the Arg3 of _DSM method
+
+   QEMU Writes Output Data:
+   [0xFF00000 - 0xFF00FFF]: the DSM return result filled by QEMU
+
+IO Port 0x0a18 - 0xa1b:
+   ACPI uses it to transfer control from guest to QEMU and read the size
+   of return result filled by QEMU
+
+   Read Access:
+       [0x0a18 - 0xa1b]: 4 bytes, the buffer size of _DSM output data.
+
+_DSM process diagram:
+---------------------
+The page, 0xFF00000 - 0xFF00FFF, is used by _DSM Virtualization.
+
+ +----------------------+      +-----------------------+
+ |    1. OSPM           |      |    2. OSPM            |
+ | save _DSM input data |      |   read 0x0a18         | Exit to QEMU
+ | to the page          +----->|                       +------------+
+ |                      |      |                       |            |
+ +----------------------+      +-----------------------+            |
+                                                                    |
+                                                                    v
+ +-------------   ----+       +-----------+      +------------------+--------+
+ |      5 QEMU        |       | 4 QEMU    |      |        3. QEMU            |
+ | write _DSM result  |       |  emulate  |      | get _DSM input parameters |
+ | to the page        +<------+ _DSM      +<-----+ from the page             |
+ |                    |       |           |      |                           |
+ +--------+-----------+       +-----------+      +---------------------------+
+          |
+          | Enter Guest
+          |
+          v
+ +--------------------------+      +--------------+
+ |     6 OSPM               |      |   7 OSPM     |
+ | result size is returned  |      |  _DSM return |
+ | by read and get DSM      +----->+              |
+ | result from the page     |      |              |
+ +--------------------------+      +--------------+
+
+ QEMU internal use only _DSM function
+ ------------------------------------
+ There is the function introduced by QEMU and only used by QEMU internal.
+
+ 1) Read FIT
+ As we only reserved one page for NVDIMM ACPI it is impossible to map the
+ whole FIT data to guest's address space. This function is used by _FIT
+ method to read a piece of FIT data from QEMU.
+
+ Input parameters:
+ Arg0 – UUID {set to 2f10e7a4-9e91-11e4-89d3-123b93f75cba}
+ Arg1 – Revision ID (set to 1)
+ Arg2 - 0xFFFFFFFF
+ Arg3 - A package containing a buffer whose layout is as follows:
+
+ +----------+-------------+-------------+-----------------------------------+
+ |  Filed   | Byte Length | Byte Offset | Description                       |
+ +----------+-------------+-------------+-----------------------------------+
+ | offset   | 4           | 0           | the offset of FIT buffer          |
+ +----------+-------------+-------------+-----------------------------------+
+
+ Output:
+ +----------+-------------+-------------+-----------------------------------+
+ |  Filed   | Byte Length | Byte Offset | Description                       |
+ +----------+-------------+-------------+-----------------------------------+
+ | status   | 4           | 0           | return status codes following     |
+ |          |             |             | Chapter 3 in DSM Spec Rev1        |
+ +----------+-------------+-------------+-----------------------------------+
+ | length   | 4           | 4           | the length of FIT buffer read out |
+ +----------+-------------+-------------+-----------------------------------+
+ | fit data | Varies      | 8           | FIT data, its size is indicated   |
+ |          |             |             | by length field above             |
+ +----------+-------------+-------------+-----------------------------------+
+
+ The FIT offset is maintained by the caller itself, current offset plugs
+ the length returned by the function is the next offset we should read.
+ When all the FIT data has been read out, zero length is returned.
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
for detailed design

A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
that controls if nvdimm support is enabled, it is true on default and
it is false on 2.4 and its earlier version to keep compatibility

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 default-configs/i386-softmmu.mak     |  1 +
 default-configs/mips-softmmu.mak     |  1 +
 default-configs/mips64-softmmu.mak   |  1 +
 default-configs/mips64el-softmmu.mak |  1 +
 default-configs/mipsel-softmmu.mak   |  1 +
 default-configs/x86_64-softmmu.mak   |  1 +
 hw/acpi/Makefile.objs                |  1 +
 hw/acpi/ich9.c                       | 24 ++++++++++++++
 hw/acpi/nvdimm.c                     | 63 ++++++++++++++++++++++++++++++++++++
 hw/acpi/piix4.c                      | 27 ++++++++++++----
 include/hw/acpi/ich9.h               |  3 ++
 include/hw/i386/pc.h                 | 10 ++++++
 include/hw/mem/nvdimm.h              | 34 +++++++++++++++++++
 13 files changed, 161 insertions(+), 7 deletions(-)
 create mode 100644 hw/acpi/nvdimm.c

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 4e84a1c..51e71d4 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -48,6 +48,7 @@ CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
 CONFIG_NVDIMM=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/default-configs/mips-softmmu.mak b/default-configs/mips-softmmu.mak
index 44467c3..6b8b70e 100644
--- a/default-configs/mips-softmmu.mak
+++ b/default-configs/mips-softmmu.mak
@@ -17,6 +17,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
 CONFIG_I8257=y
diff --git a/default-configs/mips64-softmmu.mak b/default-configs/mips64-softmmu.mak
index 66ed5f9..ea820f6 100644
--- a/default-configs/mips64-softmmu.mak
+++ b/default-configs/mips64-softmmu.mak
@@ -17,6 +17,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
 CONFIG_I8257=y
diff --git a/default-configs/mips64el-softmmu.mak b/default-configs/mips64el-softmmu.mak
index bfca2b2..8993851 100644
--- a/default-configs/mips64el-softmmu.mak
+++ b/default-configs/mips64el-softmmu.mak
@@ -17,6 +17,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
 CONFIG_I8257=y
diff --git a/default-configs/mipsel-softmmu.mak b/default-configs/mipsel-softmmu.mak
index 0162ef0..87ab964 100644
--- a/default-configs/mipsel-softmmu.mak
+++ b/default-configs/mipsel-softmmu.mak
@@ -17,6 +17,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
 CONFIG_I8257=y
diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index e877a86..0a7dc10 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -48,6 +48,7 @@ CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
 CONFIG_NVDIMM=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 7d3230c..84c082d 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -2,6 +2,7 @@ common-obj-$(CONFIG_ACPI_X86) += core.o piix4.o pcihp.o
 common-obj-$(CONFIG_ACPI_X86_ICH) += ich9.o tco.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
 common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
+obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI) += acpi_interface.o
 common-obj-$(CONFIG_ACPI) += bios-linker-loader.o
 common-obj-$(CONFIG_ACPI) += aml-build.o
diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 1e9ae20..603c1bd 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -280,6 +280,12 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm,
         acpi_memory_hotplug_init(pci_address_space_io(lpc_pci), OBJECT(lpc_pci),
                                  &pm->acpi_memory_hotplug);
     }
+
+    if (pm->acpi_nvdimm_state.is_enabled) {
+        nvdimm_init_acpi_state(pci_address_space(lpc_pci),
+                               pci_address_space_io(lpc_pci), OBJECT(lpc_pci),
+                               &pm->acpi_nvdimm_state);
+    }
 }
 
 static void ich9_pm_get_gpe0_blk(Object *obj, Visitor *v,
@@ -307,6 +313,20 @@ static void ich9_pm_set_memory_hotplug_support(Object *obj, bool value,
     s->pm.acpi_memory_hotplug.is_enabled = value;
 }
 
+static bool ich9_pm_get_nvdimm_support(Object *obj, Error **errp)
+{
+    ICH9LPCState *s = ICH9_LPC_DEVICE(obj);
+
+    return s->pm.acpi_nvdimm_state.is_enabled;
+}
+
+static void ich9_pm_set_nvdimm_support(Object *obj, bool value, Error **errp)
+{
+    ICH9LPCState *s = ICH9_LPC_DEVICE(obj);
+
+    s->pm.acpi_nvdimm_state.is_enabled = value;
+}
+
 static void ich9_pm_get_disable_s3(Object *obj, Visitor *v,
                                    void *opaque, const char *name,
                                    Error **errp)
@@ -419,6 +439,10 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm, Error **errp)
                              ich9_pm_get_memory_hotplug_support,
                              ich9_pm_set_memory_hotplug_support,
                              NULL);
+    object_property_add_bool(obj, "nvdimm-support",
+                             ich9_pm_get_nvdimm_support,
+                             ich9_pm_set_nvdimm_support,
+                             NULL);
     object_property_add(obj, ACPI_PM_PROP_S3_DISABLED, "uint8",
                         ich9_pm_get_disable_s3,
                         ich9_pm_set_disable_s3,
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
new file mode 100644
index 0000000..1223da2
--- /dev/null
+++ b/hw/acpi/nvdimm.c
@@ -0,0 +1,63 @@
+/*
+ * NVDIMM ACPI Implementation
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *
+ * NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)
+ * and the DSM specification can be found at:
+ *       http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
+ *
+ * Currently, it only supports PMEM Virtualization.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "hw/mem/nvdimm.h"
+
+static uint64_t
+nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
+{
+    return 0;
+}
+
+static void
+nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
+{
+}
+
+static const MemoryRegionOps nvdimm_dsm_ops = {
+    .read = nvdimm_dsm_read,
+    .write = nvdimm_dsm_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+};
+
+void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
+                            Object *owner, AcpiNVDIMMState *state)
+{
+    memory_region_init_ram(&state->ram_mr, owner, "nvdimm-acpi-ram",
+                           TARGET_PAGE_SIZE, &error_abort);
+    vmstate_register_ram_global(&state->ram_mr);
+    memory_region_add_subregion(memory, NVDIMM_ACPI_MEM_BASE, &state->ram_mr);
+
+    memory_region_init_io(&state->io_mr, owner, &nvdimm_dsm_ops, state,
+                          "nvdimm-acpi-io", NVDIMM_ACPI_IO_LEN);
+    memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
+}
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index b2f5b2c..39b8415 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -34,6 +34,7 @@
 #include "hw/acpi/cpu_hotplug.h"
 #include "hw/hotplug.h"
 #include "hw/mem/dimm.h"
+#include "hw/mem/nvdimm.h"
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/acpi_dev_interface.h"
 #include "hw/xen/xen.h"
@@ -86,6 +87,8 @@ typedef struct PIIX4PMState {
     AcpiCpuHotplug gpe_cpu;
 
     MemHotplugState acpi_memory_hotplug;
+
+    AcpiNVDIMMState acpi_nvdimm_state;
 } PIIX4PMState;
 
 #define TYPE_PIIX4_PM "PIIX4_PM"
@@ -93,7 +96,8 @@ typedef struct PIIX4PMState {
 #define PIIX4_PM(obj) \
     OBJECT_CHECK(PIIX4PMState, (obj), TYPE_PIIX4_PM)
 
-static void piix4_acpi_system_hot_add_init(MemoryRegion *parent,
+static void piix4_acpi_system_hot_add_init(MemoryRegion *mem_parent,
+                                           MemoryRegion *io_parent,
                                            PCIBus *bus, PIIX4PMState *s);
 
 #define ACPI_ENABLE 0xf1
@@ -486,7 +490,8 @@ static void piix4_pm_realize(PCIDevice *dev, Error **errp)
     qemu_add_machine_init_done_notifier(&s->machine_ready);
     qemu_register_reset(piix4_reset, s);
 
-    piix4_acpi_system_hot_add_init(pci_address_space_io(dev), dev->bus, s);
+    piix4_acpi_system_hot_add_init(pci_address_space(dev),
+                                   pci_address_space_io(dev), dev->bus, s);
 
     piix4_pm_add_propeties(s);
 }
@@ -558,21 +563,27 @@ static const MemoryRegionOps piix4_gpe_ops = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void piix4_acpi_system_hot_add_init(MemoryRegion *parent,
+static void piix4_acpi_system_hot_add_init(MemoryRegion *mem_parent,
+                                           MemoryRegion *io_parent,
                                            PCIBus *bus, PIIX4PMState *s)
 {
     memory_region_init_io(&s->io_gpe, OBJECT(s), &piix4_gpe_ops, s,
                           "acpi-gpe0", GPE_LEN);
-    memory_region_add_subregion(parent, GPE_BASE, &s->io_gpe);
+    memory_region_add_subregion(io_parent, GPE_BASE, &s->io_gpe);
 
-    acpi_pcihp_init(OBJECT(s), &s->acpi_pci_hotplug, bus, parent,
+    acpi_pcihp_init(OBJECT(s), &s->acpi_pci_hotplug, bus, io_parent,
                     s->use_acpi_pci_hotplug);
 
-    acpi_cpu_hotplug_init(parent, OBJECT(s), &s->gpe_cpu,
+    acpi_cpu_hotplug_init(io_parent, OBJECT(s), &s->gpe_cpu,
                           PIIX4_CPU_HOTPLUG_IO_BASE);
 
     if (s->acpi_memory_hotplug.is_enabled) {
-        acpi_memory_hotplug_init(parent, OBJECT(s), &s->acpi_memory_hotplug);
+        acpi_memory_hotplug_init(io_parent, OBJECT(s), &s->acpi_memory_hotplug);
+    }
+
+    if (s->acpi_nvdimm_state.is_enabled) {
+        nvdimm_init_acpi_state(mem_parent, io_parent, OBJECT(s),
+                               &s->acpi_nvdimm_state);
     }
 }
 
@@ -592,6 +603,8 @@ static Property piix4_pm_properties[] = {
                      use_acpi_pci_hotplug, true),
     DEFINE_PROP_BOOL("memory-hotplug-support", PIIX4PMState,
                      acpi_memory_hotplug.is_enabled, true),
+    DEFINE_PROP_BOOL("nvdimm-support", PIIX4PMState,
+                     acpi_nvdimm_state.is_enabled, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/acpi/ich9.h b/include/hw/acpi/ich9.h
index 345fd8d..66a99d4 100644
--- a/include/hw/acpi/ich9.h
+++ b/include/hw/acpi/ich9.h
@@ -26,6 +26,7 @@
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/acpi_dev_interface.h"
 #include "hw/acpi/tco.h"
+#include "hw/mem/nvdimm.h"
 
 typedef struct ICH9LPCPMRegs {
     /*
@@ -52,6 +53,8 @@ typedef struct ICH9LPCPMRegs {
 
     MemHotplugState acpi_memory_hotplug;
 
+    AcpiNVDIMMState acpi_nvdimm_state;
+
     uint8_t disable_s3;
     uint8_t disable_s4;
     uint8_t s4_val;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 62e8fb5..47162cf 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -322,6 +322,16 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
             .driver   = "host" "-" TYPE_X86_CPU,\
             .property = "host-cache-info",\
             .value    = "on",\
+        },\
+        {\
+            .driver   = "PIIX4_PM",\
+            .property = "nvdimm-support",\
+            .value    = "off",\
+        },\
+        {\
+            .driver   = "ICH9-LPC",\
+            .property = "nvdimm-support",\
+            .value    = "off",\
         },
 
 #define PC_COMPAT_2_3 \
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index cd90957..f0b1dda 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -33,6 +33,15 @@
  */
 #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
 
+/*
+ * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
+ * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
+ * for detailed design.
+ */
+#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
+#define NVDIMM_ACPI_IO_BASE           0x0a18
+#define NVDIMM_ACPI_IO_LEN            4
+
 #define TYPE_NVDIMM      "nvdimm"
 #define NVDIMM(obj)      OBJECT_CHECK(NVDIMMDevice, (obj), TYPE_NVDIMM)
 #define NVDIMM_CLASS(oc) OBJECT_CLASS_CHECK(NVDIMMClass, (oc), TYPE_NVDIMM)
@@ -80,4 +89,29 @@ struct NVDIMMClass {
 };
 typedef struct NVDIMMClass NVDIMMClass;
 
+/*
+ * AcpiNVDIMMState:
+ * @is_enabled: detect if NVDIMM support is enabled.
+ *
+ * @fit: fit buffer which will be accessed via ACPI _FIT method. It is
+ *       dynamically built based on current NVDIMM devices so that it does
+ *       not require to keep consistent during live migration.
+ *
+ * @ram_mr: RAM-based memory region which is mapped into guest address
+ *          space and used to transfer data between OSPM and QEMU.
+ * @io_mr: the IO region used by OSPM to transfer control to QEMU.
+ */
+struct AcpiNVDIMMState {
+    bool is_enabled;
+
+    GArray *fit;
+
+    MemoryRegion ram_mr;
+    MemoryRegion io_mr;
+};
+typedef struct AcpiNVDIMMState AcpiNVDIMMState;
+
+/* Initialize the memory and IO region needed by NVDIMM ACPI emulation.*/
+void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
+                            Object *owner, AcpiNVDIMMState *state);
 #endif
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
for detailed design

A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
that controls if nvdimm support is enabled, it is true on default and
it is false on 2.4 and its earlier version to keep compatibility

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 default-configs/i386-softmmu.mak     |  1 +
 default-configs/mips-softmmu.mak     |  1 +
 default-configs/mips64-softmmu.mak   |  1 +
 default-configs/mips64el-softmmu.mak |  1 +
 default-configs/mipsel-softmmu.mak   |  1 +
 default-configs/x86_64-softmmu.mak   |  1 +
 hw/acpi/Makefile.objs                |  1 +
 hw/acpi/ich9.c                       | 24 ++++++++++++++
 hw/acpi/nvdimm.c                     | 63 ++++++++++++++++++++++++++++++++++++
 hw/acpi/piix4.c                      | 27 ++++++++++++----
 include/hw/acpi/ich9.h               |  3 ++
 include/hw/i386/pc.h                 | 10 ++++++
 include/hw/mem/nvdimm.h              | 34 +++++++++++++++++++
 13 files changed, 161 insertions(+), 7 deletions(-)
 create mode 100644 hw/acpi/nvdimm.c

diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 4e84a1c..51e71d4 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -48,6 +48,7 @@ CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
 CONFIG_NVDIMM=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/default-configs/mips-softmmu.mak b/default-configs/mips-softmmu.mak
index 44467c3..6b8b70e 100644
--- a/default-configs/mips-softmmu.mak
+++ b/default-configs/mips-softmmu.mak
@@ -17,6 +17,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
 CONFIG_I8257=y
diff --git a/default-configs/mips64-softmmu.mak b/default-configs/mips64-softmmu.mak
index 66ed5f9..ea820f6 100644
--- a/default-configs/mips64-softmmu.mak
+++ b/default-configs/mips64-softmmu.mak
@@ -17,6 +17,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
 CONFIG_I8257=y
diff --git a/default-configs/mips64el-softmmu.mak b/default-configs/mips64el-softmmu.mak
index bfca2b2..8993851 100644
--- a/default-configs/mips64el-softmmu.mak
+++ b/default-configs/mips64el-softmmu.mak
@@ -17,6 +17,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
 CONFIG_I8257=y
diff --git a/default-configs/mipsel-softmmu.mak b/default-configs/mipsel-softmmu.mak
index 0162ef0..87ab964 100644
--- a/default-configs/mipsel-softmmu.mak
+++ b/default-configs/mipsel-softmmu.mak
@@ -17,6 +17,7 @@ CONFIG_FDC=y
 CONFIG_ACPI=y
 CONFIG_ACPI_X86=y
 CONFIG_ACPI_MEMORY_HOTPLUG=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_ACPI_CPU_HOTPLUG=y
 CONFIG_APM=y
 CONFIG_I8257=y
diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
index e877a86..0a7dc10 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -48,6 +48,7 @@ CONFIG_IOAPIC=y
 CONFIG_PVPANIC=y
 CONFIG_MEM_HOTPLUG=y
 CONFIG_NVDIMM=y
+CONFIG_ACPI_NVDIMM=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
 CONFIG_I82801B11=y
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 7d3230c..84c082d 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -2,6 +2,7 @@ common-obj-$(CONFIG_ACPI_X86) += core.o piix4.o pcihp.o
 common-obj-$(CONFIG_ACPI_X86_ICH) += ich9.o tco.o
 common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu_hotplug.o
 common-obj-$(CONFIG_ACPI_MEMORY_HOTPLUG) += memory_hotplug.o
+obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI) += acpi_interface.o
 common-obj-$(CONFIG_ACPI) += bios-linker-loader.o
 common-obj-$(CONFIG_ACPI) += aml-build.o
diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
index 1e9ae20..603c1bd 100644
--- a/hw/acpi/ich9.c
+++ b/hw/acpi/ich9.c
@@ -280,6 +280,12 @@ void ich9_pm_init(PCIDevice *lpc_pci, ICH9LPCPMRegs *pm,
         acpi_memory_hotplug_init(pci_address_space_io(lpc_pci), OBJECT(lpc_pci),
                                  &pm->acpi_memory_hotplug);
     }
+
+    if (pm->acpi_nvdimm_state.is_enabled) {
+        nvdimm_init_acpi_state(pci_address_space(lpc_pci),
+                               pci_address_space_io(lpc_pci), OBJECT(lpc_pci),
+                               &pm->acpi_nvdimm_state);
+    }
 }
 
 static void ich9_pm_get_gpe0_blk(Object *obj, Visitor *v,
@@ -307,6 +313,20 @@ static void ich9_pm_set_memory_hotplug_support(Object *obj, bool value,
     s->pm.acpi_memory_hotplug.is_enabled = value;
 }
 
+static bool ich9_pm_get_nvdimm_support(Object *obj, Error **errp)
+{
+    ICH9LPCState *s = ICH9_LPC_DEVICE(obj);
+
+    return s->pm.acpi_nvdimm_state.is_enabled;
+}
+
+static void ich9_pm_set_nvdimm_support(Object *obj, bool value, Error **errp)
+{
+    ICH9LPCState *s = ICH9_LPC_DEVICE(obj);
+
+    s->pm.acpi_nvdimm_state.is_enabled = value;
+}
+
 static void ich9_pm_get_disable_s3(Object *obj, Visitor *v,
                                    void *opaque, const char *name,
                                    Error **errp)
@@ -419,6 +439,10 @@ void ich9_pm_add_properties(Object *obj, ICH9LPCPMRegs *pm, Error **errp)
                              ich9_pm_get_memory_hotplug_support,
                              ich9_pm_set_memory_hotplug_support,
                              NULL);
+    object_property_add_bool(obj, "nvdimm-support",
+                             ich9_pm_get_nvdimm_support,
+                             ich9_pm_set_nvdimm_support,
+                             NULL);
     object_property_add(obj, ACPI_PM_PROP_S3_DISABLED, "uint8",
                         ich9_pm_get_disable_s3,
                         ich9_pm_set_disable_s3,
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
new file mode 100644
index 0000000..1223da2
--- /dev/null
+++ b/hw/acpi/nvdimm.c
@@ -0,0 +1,63 @@
+/*
+ * NVDIMM ACPI Implementation
+ *
+ * Copyright(C) 2015 Intel Corporation.
+ *
+ * Author:
+ *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
+ *
+ * NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)
+ * and the DSM specification can be found at:
+ *       http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
+ *
+ * Currently, it only supports PMEM Virtualization.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "hw/mem/nvdimm.h"
+
+static uint64_t
+nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
+{
+    return 0;
+}
+
+static void
+nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
+{
+}
+
+static const MemoryRegionOps nvdimm_dsm_ops = {
+    .read = nvdimm_dsm_read,
+    .write = nvdimm_dsm_write,
+    .endianness = DEVICE_LITTLE_ENDIAN,
+    .valid = {
+        .min_access_size = 4,
+        .max_access_size = 4,
+    },
+};
+
+void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
+                            Object *owner, AcpiNVDIMMState *state)
+{
+    memory_region_init_ram(&state->ram_mr, owner, "nvdimm-acpi-ram",
+                           TARGET_PAGE_SIZE, &error_abort);
+    vmstate_register_ram_global(&state->ram_mr);
+    memory_region_add_subregion(memory, NVDIMM_ACPI_MEM_BASE, &state->ram_mr);
+
+    memory_region_init_io(&state->io_mr, owner, &nvdimm_dsm_ops, state,
+                          "nvdimm-acpi-io", NVDIMM_ACPI_IO_LEN);
+    memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
+}
diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c
index b2f5b2c..39b8415 100644
--- a/hw/acpi/piix4.c
+++ b/hw/acpi/piix4.c
@@ -34,6 +34,7 @@
 #include "hw/acpi/cpu_hotplug.h"
 #include "hw/hotplug.h"
 #include "hw/mem/dimm.h"
+#include "hw/mem/nvdimm.h"
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/acpi_dev_interface.h"
 #include "hw/xen/xen.h"
@@ -86,6 +87,8 @@ typedef struct PIIX4PMState {
     AcpiCpuHotplug gpe_cpu;
 
     MemHotplugState acpi_memory_hotplug;
+
+    AcpiNVDIMMState acpi_nvdimm_state;
 } PIIX4PMState;
 
 #define TYPE_PIIX4_PM "PIIX4_PM"
@@ -93,7 +96,8 @@ typedef struct PIIX4PMState {
 #define PIIX4_PM(obj) \
     OBJECT_CHECK(PIIX4PMState, (obj), TYPE_PIIX4_PM)
 
-static void piix4_acpi_system_hot_add_init(MemoryRegion *parent,
+static void piix4_acpi_system_hot_add_init(MemoryRegion *mem_parent,
+                                           MemoryRegion *io_parent,
                                            PCIBus *bus, PIIX4PMState *s);
 
 #define ACPI_ENABLE 0xf1
@@ -486,7 +490,8 @@ static void piix4_pm_realize(PCIDevice *dev, Error **errp)
     qemu_add_machine_init_done_notifier(&s->machine_ready);
     qemu_register_reset(piix4_reset, s);
 
-    piix4_acpi_system_hot_add_init(pci_address_space_io(dev), dev->bus, s);
+    piix4_acpi_system_hot_add_init(pci_address_space(dev),
+                                   pci_address_space_io(dev), dev->bus, s);
 
     piix4_pm_add_propeties(s);
 }
@@ -558,21 +563,27 @@ static const MemoryRegionOps piix4_gpe_ops = {
     .endianness = DEVICE_LITTLE_ENDIAN,
 };
 
-static void piix4_acpi_system_hot_add_init(MemoryRegion *parent,
+static void piix4_acpi_system_hot_add_init(MemoryRegion *mem_parent,
+                                           MemoryRegion *io_parent,
                                            PCIBus *bus, PIIX4PMState *s)
 {
     memory_region_init_io(&s->io_gpe, OBJECT(s), &piix4_gpe_ops, s,
                           "acpi-gpe0", GPE_LEN);
-    memory_region_add_subregion(parent, GPE_BASE, &s->io_gpe);
+    memory_region_add_subregion(io_parent, GPE_BASE, &s->io_gpe);
 
-    acpi_pcihp_init(OBJECT(s), &s->acpi_pci_hotplug, bus, parent,
+    acpi_pcihp_init(OBJECT(s), &s->acpi_pci_hotplug, bus, io_parent,
                     s->use_acpi_pci_hotplug);
 
-    acpi_cpu_hotplug_init(parent, OBJECT(s), &s->gpe_cpu,
+    acpi_cpu_hotplug_init(io_parent, OBJECT(s), &s->gpe_cpu,
                           PIIX4_CPU_HOTPLUG_IO_BASE);
 
     if (s->acpi_memory_hotplug.is_enabled) {
-        acpi_memory_hotplug_init(parent, OBJECT(s), &s->acpi_memory_hotplug);
+        acpi_memory_hotplug_init(io_parent, OBJECT(s), &s->acpi_memory_hotplug);
+    }
+
+    if (s->acpi_nvdimm_state.is_enabled) {
+        nvdimm_init_acpi_state(mem_parent, io_parent, OBJECT(s),
+                               &s->acpi_nvdimm_state);
     }
 }
 
@@ -592,6 +603,8 @@ static Property piix4_pm_properties[] = {
                      use_acpi_pci_hotplug, true),
     DEFINE_PROP_BOOL("memory-hotplug-support", PIIX4PMState,
                      acpi_memory_hotplug.is_enabled, true),
+    DEFINE_PROP_BOOL("nvdimm-support", PIIX4PMState,
+                     acpi_nvdimm_state.is_enabled, true),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/acpi/ich9.h b/include/hw/acpi/ich9.h
index 345fd8d..66a99d4 100644
--- a/include/hw/acpi/ich9.h
+++ b/include/hw/acpi/ich9.h
@@ -26,6 +26,7 @@
 #include "hw/acpi/memory_hotplug.h"
 #include "hw/acpi/acpi_dev_interface.h"
 #include "hw/acpi/tco.h"
+#include "hw/mem/nvdimm.h"
 
 typedef struct ICH9LPCPMRegs {
     /*
@@ -52,6 +53,8 @@ typedef struct ICH9LPCPMRegs {
 
     MemHotplugState acpi_memory_hotplug;
 
+    AcpiNVDIMMState acpi_nvdimm_state;
+
     uint8_t disable_s3;
     uint8_t disable_s4;
     uint8_t s4_val;
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 62e8fb5..47162cf 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -322,6 +322,16 @@ bool e820_get_entry(int, uint32_t, uint64_t *, uint64_t *);
             .driver   = "host" "-" TYPE_X86_CPU,\
             .property = "host-cache-info",\
             .value    = "on",\
+        },\
+        {\
+            .driver   = "PIIX4_PM",\
+            .property = "nvdimm-support",\
+            .value    = "off",\
+        },\
+        {\
+            .driver   = "ICH9-LPC",\
+            .property = "nvdimm-support",\
+            .value    = "off",\
         },
 
 #define PC_COMPAT_2_3 \
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index cd90957..f0b1dda 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -33,6 +33,15 @@
  */
 #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
 
+/*
+ * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
+ * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
+ * for detailed design.
+ */
+#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
+#define NVDIMM_ACPI_IO_BASE           0x0a18
+#define NVDIMM_ACPI_IO_LEN            4
+
 #define TYPE_NVDIMM      "nvdimm"
 #define NVDIMM(obj)      OBJECT_CHECK(NVDIMMDevice, (obj), TYPE_NVDIMM)
 #define NVDIMM_CLASS(oc) OBJECT_CLASS_CHECK(NVDIMMClass, (oc), TYPE_NVDIMM)
@@ -80,4 +89,29 @@ struct NVDIMMClass {
 };
 typedef struct NVDIMMClass NVDIMMClass;
 
+/*
+ * AcpiNVDIMMState:
+ * @is_enabled: detect if NVDIMM support is enabled.
+ *
+ * @fit: fit buffer which will be accessed via ACPI _FIT method. It is
+ *       dynamically built based on current NVDIMM devices so that it does
+ *       not require to keep consistent during live migration.
+ *
+ * @ram_mr: RAM-based memory region which is mapped into guest address
+ *          space and used to transfer data between OSPM and QEMU.
+ * @io_mr: the IO region used by OSPM to transfer control to QEMU.
+ */
+struct AcpiNVDIMMState {
+    bool is_enabled;
+
+    GArray *fit;
+
+    MemoryRegion ram_mr;
+    MemoryRegion io_mr;
+};
+typedef struct AcpiNVDIMMState AcpiNVDIMMState;
+
+/* Initialize the memory and IO region needed by NVDIMM ACPI emulation.*/
+void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
+                            Object *owner, AcpiNVDIMMState *state);
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 26/35] nvdimm acpi: build ACPI NFIT table
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)

Currently, we only support PMEM mode. Each device has 3 structures:
- SPA structure, defines the PMEM region info

- MEM DEV structure, it has the @handle which is used to associate specified
  ACPI NVDIMM  device we will introduce in later patch.
  Also we can happily ignored the memory device's interleave, the real
  nvdimm hardware access is hidden behind host

- DCR structure, it defines vendor ID used to associate specified vendor
  nvdimm driver. Since we only implement PMEM mode this time, Command
  window and Data window are not needed

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c        | 355 ++++++++++++++++++++++++++++++++++++++++++++++++
 hw/i386/acpi-build.c    |   6 +
 include/hw/mem/nvdimm.h |  10 ++
 3 files changed, 371 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 1223da2..dd84e5f 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -26,8 +26,348 @@
  * License along with this library; if not, see <http://www.gnu.org/licenses/>
  */
 
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
 #include "hw/mem/nvdimm.h"
 
+static int nvdimm_plugged_device_list(Object *obj, void *opaque)
+{
+    GSList **list = opaque;
+
+    if (object_dynamic_cast(obj, TYPE_NVDIMM)) {
+        NVDIMMDevice *nvdimm = NVDIMM(obj);
+
+        if (memory_region_is_mapped(&nvdimm->nvdimm_mr)) {
+            *list = g_slist_append(*list, DEVICE(obj));
+        }
+    }
+
+    object_child_foreach(obj, nvdimm_plugged_device_list, opaque);
+    return 0;
+}
+
+/*
+ * inquire plugged NVDIMM devices and link them into the list which is
+ * returned to the caller.
+ *
+ * Note: it is the caller's responsibility to free the list to avoid
+ * memory leak.
+ */
+static GSList *nvdimm_get_plugged_device_list(void)
+{
+    GSList *list = NULL;
+
+    object_child_foreach(qdev_get_machine(), nvdimm_plugged_device_list,
+                         &list);
+    return list;
+}
+
+#define NVDIMM_UUID_LE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)             \
+   { (a) & 0xff, ((a) >> 8) & 0xff, ((a) >> 16) & 0xff, ((a) >> 24) & 0xff, \
+     (b) & 0xff, ((b) >> 8) & 0xff, (c) & 0xff, ((c) >> 8) & 0xff,          \
+     (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) }
+/*
+ * define Byte Addressable Persistent Memory (PM) Region according to
+ * ACPI 6.0: 5.2.25.1 System Physical Address Range Structure.
+ */
+static const uint8_t nvdimm_nfit_spa_uuid[] =
+      NVDIMM_UUID_LE(0x66f0d379, 0xb4f3, 0x4074, 0xac, 0x43, 0x0d, 0x33,
+                     0x18, 0xb7, 0x8c, 0xdb);
+
+/*
+ * NVDIMM Firmware Interface Table
+ * @signature: "NFIT"
+ *
+ * It provides information that allows OSPM to enumerate NVDIMM present in
+ * the platform and associate system physical address ranges created by the
+ * NVDIMMs.
+ *
+ * It is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)
+ */
+struct NvdimmNfitHeader {
+    ACPI_TABLE_HEADER_DEF
+    uint32_t reserved;
+} QEMU_PACKED;
+typedef struct NvdimmNfitHeader NvdimmNfitHeader;
+
+/*
+ * define NFIT structures according to ACPI 6.0: 5.2.25 NVDIMM Firmware
+ * Interface Table (NFIT).
+ */
+
+/*
+ * System Physical Address Range Structure
+ *
+ * It describes the system physical address ranges occupied by NVDIMMs and
+ * the types of the regions.
+ */
+struct NvdimmNfitSpa {
+    uint16_t type;
+    uint16_t length;
+    uint16_t spa_index;
+    uint16_t flags;
+    uint32_t reserved;
+    uint32_t proximity_domain;
+    uint8_t type_guid[16];
+    uint64_t spa_base;
+    uint64_t spa_length;
+    uint64_t mem_attr;
+} QEMU_PACKED;
+typedef struct NvdimmNfitSpa NvdimmNfitSpa;
+
+/*
+ * Memory Device to System Physical Address Range Mapping Structure
+ *
+ * It enables identifying each NVDIMM region and the corresponding SPA
+ * describing the memory interleave
+ */
+struct NvdimmNfitMemDev {
+    uint16_t type;
+    uint16_t length;
+    uint32_t nfit_handle;
+    uint16_t phys_id;
+    uint16_t region_id;
+    uint16_t spa_index;
+    uint16_t dcr_index;
+    uint64_t region_len;
+    uint64_t region_offset;
+    uint64_t region_dpa;
+    uint16_t interleave_index;
+    uint16_t interleave_ways;
+    uint16_t flags;
+    uint16_t reserved;
+} QEMU_PACKED;
+typedef struct NvdimmNfitMemDev NvdimmNfitMemDev;
+
+/*
+ * NVDIMM Control Region Structure
+ *
+ * It describes the NVDIMM and if applicable, Block Control Window.
+ */
+struct NvdimmNfitControlRegion {
+    uint16_t type;
+    uint16_t length;
+    uint16_t dcr_index;
+    uint16_t vendor_id;
+    uint16_t device_id;
+    uint16_t revision_id;
+    uint16_t sub_vendor_id;
+    uint16_t sub_device_id;
+    uint16_t sub_revision_id;
+    uint8_t reserved[6];
+    uint32_t serial_number;
+    uint16_t fic;
+    uint16_t num_bcw;
+    uint64_t bcw_size;
+    uint64_t cmd_offset;
+    uint64_t cmd_size;
+    uint64_t status_offset;
+    uint64_t status_size;
+    uint16_t flags;
+    uint8_t reserved2[6];
+} QEMU_PACKED;
+typedef struct NvdimmNfitControlRegion NvdimmNfitControlRegion;
+
+/*
+ * Module serial number is a unique number for each device. We use the
+ * slot id of NVDIMM device to generate this number so that each device
+ * associates with a different number.
+ *
+ * 0x123456 is a magic number we arbitrarily chose.
+ */
+static uint32_t nvdimm_slot_to_sn(int slot)
+{
+    return 0x123456 + slot;
+}
+
+/*
+ * handle is used to uniquely associate nfit_memdev structure with NVDIMM
+ * ACPI device - nfit_memdev.nfit_handle matches with the value returned
+ * by ACPI device _ADR method.
+ *
+ * We generate the handle with the slot id of NVDIMM device and reserve
+ * 0 for NVDIMM root device.
+ */
+static uint32_t nvdimm_slot_to_handle(int slot)
+{
+    return slot + 1;
+}
+
+/*
+ * index uniquely identifies the structure, 0 is reserved which indicates
+ * that the structure is not valid or the associated structure is not
+ * present.
+ *
+ * Each NVDIMM device needs two indexes, one for nfit_spa and another for
+ * nfit_dc which are generated by the slot id of NVDIMM device.
+ */
+static uint16_t nvdimm_slot_to_spa_index(int slot)
+{
+    return (slot + 1) << 1;
+}
+
+/* See the comments of nvdimm_slot_to_spa_index(). */
+static uint32_t nvdimm_slot_to_dcr_index(int slot)
+{
+    return nvdimm_slot_to_spa_index(slot) + 1;
+}
+
+/* ACPI 6.0: 5.2.25.1 System Physical Address Range Structure */
+static void
+nvdimm_build_structure_spa(GArray *structures, NVDIMMDevice *nvdimm)
+{
+    NvdimmNfitSpa *nfit_spa;
+    uint64_t addr = object_property_get_int(OBJECT(nvdimm), DIMM_ADDR_PROP,
+                                            NULL);
+    uint64_t size = object_property_get_int(OBJECT(nvdimm), DIMM_SIZE_PROP,
+                                            NULL);
+    uint32_t node = object_property_get_int(OBJECT(nvdimm), DIMM_NODE_PROP,
+                                            NULL);
+    int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
+                                            NULL);
+
+    nfit_spa = acpi_data_push(structures, sizeof(*nfit_spa));
+
+    nfit_spa->type = cpu_to_le16(0 /* System Physical Address Range
+                                      Structure */);
+    nfit_spa->length = cpu_to_le16(sizeof(*nfit_spa));
+    nfit_spa->spa_index = cpu_to_le16(nvdimm_slot_to_spa_index(slot));
+
+    /*
+     * Control region is strict as all the device info, such as SN, index,
+     * is associated with slot id.
+     */
+    nfit_spa->flags = cpu_to_le16(1 /* Control region is strictly for
+                                       management during hot add/online
+                                       operation */ |
+                                  2 /* Data in Proximity Domain field is
+                                       valid*/);
+
+    /* NUMA node. */
+    nfit_spa->proximity_domain = cpu_to_le32(node);
+    /* the region reported as PMEM. */
+    memcpy(nfit_spa->type_guid, nvdimm_nfit_spa_uuid,
+           sizeof(nvdimm_nfit_spa_uuid));
+
+    nfit_spa->spa_base = cpu_to_le64(addr);
+    nfit_spa->spa_length = cpu_to_le64(size);
+
+    /* It is the PMEM and can be cached as writeback. */
+    nfit_spa->mem_attr = cpu_to_le64(0x8ULL /* EFI_MEMORY_WB */ |
+                                     0x8000ULL /* EFI_MEMORY_NV */);
+}
+
+/*
+ * ACPI 6.0: 5.2.25.2 Memory Device to System Physical Address Range Mapping
+ * Structure
+ */
+static void
+nvdimm_build_structure_memdev(GArray *structures, NVDIMMDevice *nvdimm)
+{
+    NvdimmNfitMemDev *nfit_memdev;
+    uint64_t addr = object_property_get_int(OBJECT(nvdimm), DIMM_ADDR_PROP,
+                                            NULL);
+    uint64_t size = object_property_get_int(OBJECT(nvdimm), DIMM_SIZE_PROP,
+                                            NULL);
+    int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
+                                            NULL);
+    uint32_t handle = nvdimm_slot_to_handle(slot);
+
+    nfit_memdev = acpi_data_push(structures, sizeof(*nfit_memdev));
+
+    nfit_memdev->type = cpu_to_le16(1 /* Memory Device to System Address
+                                         Range Map Structure*/);
+    nfit_memdev->length = cpu_to_le16(sizeof(*nfit_memdev));
+    nfit_memdev->nfit_handle = cpu_to_le32(handle);
+
+    /*
+     * associate memory device with System Physical Address Range
+     * Structure.
+     */
+    nfit_memdev->spa_index = cpu_to_le16(nvdimm_slot_to_spa_index(slot));
+    /* associate memory device with Control Region Structure. */
+    nfit_memdev->dcr_index = cpu_to_le16(nvdimm_slot_to_dcr_index(slot));
+
+    /* The memory region on the device. */
+    nfit_memdev->region_len = cpu_to_le64(size);
+    nfit_memdev->region_dpa = cpu_to_le64(addr);
+
+    /* Only one interleave for PMEM. */
+    nfit_memdev->interleave_ways = cpu_to_le16(1);
+}
+
+/*
+ * ACPI 6.0: 5.2.25.5 NVDIMM Control Region Structure.
+ */
+static void nvdimm_build_structure_dcr(GArray *structures,
+                                       NVDIMMDevice *nvdimm)
+{
+    NvdimmNfitControlRegion *nfit_dcr;
+    int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
+                                       NULL);
+    uint32_t sn = nvdimm_slot_to_sn(slot);
+
+    nfit_dcr = acpi_data_push(structures, sizeof(*nfit_dcr));
+
+    nfit_dcr->type = cpu_to_le16(4 /* NVDIMM Control Region Structure */);
+    nfit_dcr->length = cpu_to_le16(sizeof(*nfit_dcr));
+    nfit_dcr->dcr_index = cpu_to_le16(nvdimm_slot_to_dcr_index(slot));
+
+    /* vendor: Intel. */
+    nfit_dcr->vendor_id = cpu_to_le16(0x8086);
+    nfit_dcr->device_id = cpu_to_le16(1);
+
+    /* The _DSM method is following Intel's DSM specification. */
+    nfit_dcr->revision_id = cpu_to_le16(1 /* Current Revision supported
+                                             in ACPI 6.0 is 1. */);
+    nfit_dcr->serial_number = cpu_to_le32(sn);
+    nfit_dcr->fic = cpu_to_le16(0x201 /* Format Interface Code. See Chapter
+                                         2: NVDIMM Device Specific Method
+                                         (DSM) in DSM Spec Rev1.*/);
+}
+
+static GArray *nvdimm_build_device_structure(GSList *device_list)
+{
+    GArray *structures = g_array_new(false, true /* clear */, 1);
+
+    for (; device_list; device_list = device_list->next) {
+        NVDIMMDevice *nvdimm = device_list->data;
+
+        /* build System Physical Address Range Structure. */
+        nvdimm_build_structure_spa(structures, nvdimm);
+
+        /*
+         * build Memory Device to System Physical Address Range Mapping
+         * Structure.
+         */
+        nvdimm_build_structure_memdev(structures, nvdimm);
+
+        /* build NVDIMM Control Region Structure. */
+        nvdimm_build_structure_dcr(structures, nvdimm);
+    }
+
+    return structures;
+}
+
+static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
+                              GArray *table_data, GArray *linker)
+{
+    GArray *structures = nvdimm_build_device_structure(device_list);
+    void *header;
+
+    acpi_add_table(table_offsets, table_data);
+
+    /* NFIT header. */
+    header = acpi_data_push(table_data, sizeof(NvdimmNfitHeader));
+
+    /* NVDIMM device structures. */
+    g_array_append_vals(table_data, structures->data, structures->len);
+
+    build_header(linker, table_data, header, "NFIT",
+                 sizeof(NvdimmNfitHeader) + structures->len, 1);
+    g_array_free(structures, true);
+}
+
 static uint64_t
 nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 {
@@ -61,3 +401,18 @@ void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
                           "nvdimm-acpi-io", NVDIMM_ACPI_IO_LEN);
     memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
 }
+
+void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
+                       GArray *linker)
+{
+    GSList *device_list;
+
+    /* no NVDIMM device is plugged. */
+    device_list = nvdimm_get_plugged_device_list();
+    if (!device_list) {
+        return;
+    }
+
+    nvdimm_build_nfit(device_list, table_offsets, table_data, linker);
+    g_slist_free(device_list);
+}
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 95e0c65..27b2854 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -90,6 +90,7 @@ typedef struct AcpiPmInfo {
     bool s3_disabled;
     bool s4_disabled;
     bool pcihp_bridge_en;
+    bool nvdimm_support;
     uint8_t s4_val;
     uint16_t sci_int;
     uint8_t acpi_enable_cmd;
@@ -231,6 +232,7 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
     pm->pcihp_bridge_en =
         object_property_get_bool(obj, "acpi-pci-hotplug-with-bridge-support",
                                  NULL);
+    pm->nvdimm_support = object_property_get_bool(obj, "nvdimm-support", NULL);
 }
 
 static void acpi_get_misc_info(AcpiMiscInfo *info)
@@ -1742,6 +1744,10 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
         build_dmar_q35(tables_blob, tables->linker);
     }
 
+    if (pm.nvdimm_support) {
+        nvdimm_build_acpi(table_offsets, tables_blob, tables->linker);
+    }
+
     /* Add tables supplied by user (if any) */
     for (u = acpi_table_first(); u; u = acpi_table_next(u)) {
         unsigned len = acpi_table_len(u);
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index f0b1dda..7fdf591 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -25,6 +25,14 @@
 
 #include "hw/mem/dimm.h"
 
+#define NVDIMM_DEBUG 0
+#define nvdimm_debug(fmt, ...)                                \
+    do {                                                      \
+        if (NVDIMM_DEBUG) {                                   \
+            fprintf(stderr, "nvdimm: " fmt, ## __VA_ARGS__);  \
+        }                                                     \
+    } while (0)
+
 /*
  * The minimum label data size is required by NVDIMM Namespace
  * specification, please refer to chapter 2 Namespaces:
@@ -114,4 +122,6 @@ typedef struct AcpiNVDIMMState AcpiNVDIMMState;
 /* Initialize the memory and IO region needed by NVDIMM ACPI emulation.*/
 void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
                             Object *owner, AcpiNVDIMMState *state);
+void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
+                       GArray *linker);
 #endif
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 26/35] nvdimm acpi: build ACPI NFIT table
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

NFIT is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)

Currently, we only support PMEM mode. Each device has 3 structures:
- SPA structure, defines the PMEM region info

- MEM DEV structure, it has the @handle which is used to associate specified
  ACPI NVDIMM  device we will introduce in later patch.
  Also we can happily ignored the memory device's interleave, the real
  nvdimm hardware access is hidden behind host

- DCR structure, it defines vendor ID used to associate specified vendor
  nvdimm driver. Since we only implement PMEM mode this time, Command
  window and Data window are not needed

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c        | 355 ++++++++++++++++++++++++++++++++++++++++++++++++
 hw/i386/acpi-build.c    |   6 +
 include/hw/mem/nvdimm.h |  10 ++
 3 files changed, 371 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 1223da2..dd84e5f 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -26,8 +26,348 @@
  * License along with this library; if not, see <http://www.gnu.org/licenses/>
  */
 
+#include "hw/acpi/acpi.h"
+#include "hw/acpi/aml-build.h"
 #include "hw/mem/nvdimm.h"
 
+static int nvdimm_plugged_device_list(Object *obj, void *opaque)
+{
+    GSList **list = opaque;
+
+    if (object_dynamic_cast(obj, TYPE_NVDIMM)) {
+        NVDIMMDevice *nvdimm = NVDIMM(obj);
+
+        if (memory_region_is_mapped(&nvdimm->nvdimm_mr)) {
+            *list = g_slist_append(*list, DEVICE(obj));
+        }
+    }
+
+    object_child_foreach(obj, nvdimm_plugged_device_list, opaque);
+    return 0;
+}
+
+/*
+ * inquire plugged NVDIMM devices and link them into the list which is
+ * returned to the caller.
+ *
+ * Note: it is the caller's responsibility to free the list to avoid
+ * memory leak.
+ */
+static GSList *nvdimm_get_plugged_device_list(void)
+{
+    GSList *list = NULL;
+
+    object_child_foreach(qdev_get_machine(), nvdimm_plugged_device_list,
+                         &list);
+    return list;
+}
+
+#define NVDIMM_UUID_LE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)             \
+   { (a) & 0xff, ((a) >> 8) & 0xff, ((a) >> 16) & 0xff, ((a) >> 24) & 0xff, \
+     (b) & 0xff, ((b) >> 8) & 0xff, (c) & 0xff, ((c) >> 8) & 0xff,          \
+     (d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) }
+/*
+ * define Byte Addressable Persistent Memory (PM) Region according to
+ * ACPI 6.0: 5.2.25.1 System Physical Address Range Structure.
+ */
+static const uint8_t nvdimm_nfit_spa_uuid[] =
+      NVDIMM_UUID_LE(0x66f0d379, 0xb4f3, 0x4074, 0xac, 0x43, 0x0d, 0x33,
+                     0x18, 0xb7, 0x8c, 0xdb);
+
+/*
+ * NVDIMM Firmware Interface Table
+ * @signature: "NFIT"
+ *
+ * It provides information that allows OSPM to enumerate NVDIMM present in
+ * the platform and associate system physical address ranges created by the
+ * NVDIMMs.
+ *
+ * It is defined in ACPI 6.0: 5.2.25 NVDIMM Firmware Interface Table (NFIT)
+ */
+struct NvdimmNfitHeader {
+    ACPI_TABLE_HEADER_DEF
+    uint32_t reserved;
+} QEMU_PACKED;
+typedef struct NvdimmNfitHeader NvdimmNfitHeader;
+
+/*
+ * define NFIT structures according to ACPI 6.0: 5.2.25 NVDIMM Firmware
+ * Interface Table (NFIT).
+ */
+
+/*
+ * System Physical Address Range Structure
+ *
+ * It describes the system physical address ranges occupied by NVDIMMs and
+ * the types of the regions.
+ */
+struct NvdimmNfitSpa {
+    uint16_t type;
+    uint16_t length;
+    uint16_t spa_index;
+    uint16_t flags;
+    uint32_t reserved;
+    uint32_t proximity_domain;
+    uint8_t type_guid[16];
+    uint64_t spa_base;
+    uint64_t spa_length;
+    uint64_t mem_attr;
+} QEMU_PACKED;
+typedef struct NvdimmNfitSpa NvdimmNfitSpa;
+
+/*
+ * Memory Device to System Physical Address Range Mapping Structure
+ *
+ * It enables identifying each NVDIMM region and the corresponding SPA
+ * describing the memory interleave
+ */
+struct NvdimmNfitMemDev {
+    uint16_t type;
+    uint16_t length;
+    uint32_t nfit_handle;
+    uint16_t phys_id;
+    uint16_t region_id;
+    uint16_t spa_index;
+    uint16_t dcr_index;
+    uint64_t region_len;
+    uint64_t region_offset;
+    uint64_t region_dpa;
+    uint16_t interleave_index;
+    uint16_t interleave_ways;
+    uint16_t flags;
+    uint16_t reserved;
+} QEMU_PACKED;
+typedef struct NvdimmNfitMemDev NvdimmNfitMemDev;
+
+/*
+ * NVDIMM Control Region Structure
+ *
+ * It describes the NVDIMM and if applicable, Block Control Window.
+ */
+struct NvdimmNfitControlRegion {
+    uint16_t type;
+    uint16_t length;
+    uint16_t dcr_index;
+    uint16_t vendor_id;
+    uint16_t device_id;
+    uint16_t revision_id;
+    uint16_t sub_vendor_id;
+    uint16_t sub_device_id;
+    uint16_t sub_revision_id;
+    uint8_t reserved[6];
+    uint32_t serial_number;
+    uint16_t fic;
+    uint16_t num_bcw;
+    uint64_t bcw_size;
+    uint64_t cmd_offset;
+    uint64_t cmd_size;
+    uint64_t status_offset;
+    uint64_t status_size;
+    uint16_t flags;
+    uint8_t reserved2[6];
+} QEMU_PACKED;
+typedef struct NvdimmNfitControlRegion NvdimmNfitControlRegion;
+
+/*
+ * Module serial number is a unique number for each device. We use the
+ * slot id of NVDIMM device to generate this number so that each device
+ * associates with a different number.
+ *
+ * 0x123456 is a magic number we arbitrarily chose.
+ */
+static uint32_t nvdimm_slot_to_sn(int slot)
+{
+    return 0x123456 + slot;
+}
+
+/*
+ * handle is used to uniquely associate nfit_memdev structure with NVDIMM
+ * ACPI device - nfit_memdev.nfit_handle matches with the value returned
+ * by ACPI device _ADR method.
+ *
+ * We generate the handle with the slot id of NVDIMM device and reserve
+ * 0 for NVDIMM root device.
+ */
+static uint32_t nvdimm_slot_to_handle(int slot)
+{
+    return slot + 1;
+}
+
+/*
+ * index uniquely identifies the structure, 0 is reserved which indicates
+ * that the structure is not valid or the associated structure is not
+ * present.
+ *
+ * Each NVDIMM device needs two indexes, one for nfit_spa and another for
+ * nfit_dc which are generated by the slot id of NVDIMM device.
+ */
+static uint16_t nvdimm_slot_to_spa_index(int slot)
+{
+    return (slot + 1) << 1;
+}
+
+/* See the comments of nvdimm_slot_to_spa_index(). */
+static uint32_t nvdimm_slot_to_dcr_index(int slot)
+{
+    return nvdimm_slot_to_spa_index(slot) + 1;
+}
+
+/* ACPI 6.0: 5.2.25.1 System Physical Address Range Structure */
+static void
+nvdimm_build_structure_spa(GArray *structures, NVDIMMDevice *nvdimm)
+{
+    NvdimmNfitSpa *nfit_spa;
+    uint64_t addr = object_property_get_int(OBJECT(nvdimm), DIMM_ADDR_PROP,
+                                            NULL);
+    uint64_t size = object_property_get_int(OBJECT(nvdimm), DIMM_SIZE_PROP,
+                                            NULL);
+    uint32_t node = object_property_get_int(OBJECT(nvdimm), DIMM_NODE_PROP,
+                                            NULL);
+    int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
+                                            NULL);
+
+    nfit_spa = acpi_data_push(structures, sizeof(*nfit_spa));
+
+    nfit_spa->type = cpu_to_le16(0 /* System Physical Address Range
+                                      Structure */);
+    nfit_spa->length = cpu_to_le16(sizeof(*nfit_spa));
+    nfit_spa->spa_index = cpu_to_le16(nvdimm_slot_to_spa_index(slot));
+
+    /*
+     * Control region is strict as all the device info, such as SN, index,
+     * is associated with slot id.
+     */
+    nfit_spa->flags = cpu_to_le16(1 /* Control region is strictly for
+                                       management during hot add/online
+                                       operation */ |
+                                  2 /* Data in Proximity Domain field is
+                                       valid*/);
+
+    /* NUMA node. */
+    nfit_spa->proximity_domain = cpu_to_le32(node);
+    /* the region reported as PMEM. */
+    memcpy(nfit_spa->type_guid, nvdimm_nfit_spa_uuid,
+           sizeof(nvdimm_nfit_spa_uuid));
+
+    nfit_spa->spa_base = cpu_to_le64(addr);
+    nfit_spa->spa_length = cpu_to_le64(size);
+
+    /* It is the PMEM and can be cached as writeback. */
+    nfit_spa->mem_attr = cpu_to_le64(0x8ULL /* EFI_MEMORY_WB */ |
+                                     0x8000ULL /* EFI_MEMORY_NV */);
+}
+
+/*
+ * ACPI 6.0: 5.2.25.2 Memory Device to System Physical Address Range Mapping
+ * Structure
+ */
+static void
+nvdimm_build_structure_memdev(GArray *structures, NVDIMMDevice *nvdimm)
+{
+    NvdimmNfitMemDev *nfit_memdev;
+    uint64_t addr = object_property_get_int(OBJECT(nvdimm), DIMM_ADDR_PROP,
+                                            NULL);
+    uint64_t size = object_property_get_int(OBJECT(nvdimm), DIMM_SIZE_PROP,
+                                            NULL);
+    int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
+                                            NULL);
+    uint32_t handle = nvdimm_slot_to_handle(slot);
+
+    nfit_memdev = acpi_data_push(structures, sizeof(*nfit_memdev));
+
+    nfit_memdev->type = cpu_to_le16(1 /* Memory Device to System Address
+                                         Range Map Structure*/);
+    nfit_memdev->length = cpu_to_le16(sizeof(*nfit_memdev));
+    nfit_memdev->nfit_handle = cpu_to_le32(handle);
+
+    /*
+     * associate memory device with System Physical Address Range
+     * Structure.
+     */
+    nfit_memdev->spa_index = cpu_to_le16(nvdimm_slot_to_spa_index(slot));
+    /* associate memory device with Control Region Structure. */
+    nfit_memdev->dcr_index = cpu_to_le16(nvdimm_slot_to_dcr_index(slot));
+
+    /* The memory region on the device. */
+    nfit_memdev->region_len = cpu_to_le64(size);
+    nfit_memdev->region_dpa = cpu_to_le64(addr);
+
+    /* Only one interleave for PMEM. */
+    nfit_memdev->interleave_ways = cpu_to_le16(1);
+}
+
+/*
+ * ACPI 6.0: 5.2.25.5 NVDIMM Control Region Structure.
+ */
+static void nvdimm_build_structure_dcr(GArray *structures,
+                                       NVDIMMDevice *nvdimm)
+{
+    NvdimmNfitControlRegion *nfit_dcr;
+    int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
+                                       NULL);
+    uint32_t sn = nvdimm_slot_to_sn(slot);
+
+    nfit_dcr = acpi_data_push(structures, sizeof(*nfit_dcr));
+
+    nfit_dcr->type = cpu_to_le16(4 /* NVDIMM Control Region Structure */);
+    nfit_dcr->length = cpu_to_le16(sizeof(*nfit_dcr));
+    nfit_dcr->dcr_index = cpu_to_le16(nvdimm_slot_to_dcr_index(slot));
+
+    /* vendor: Intel. */
+    nfit_dcr->vendor_id = cpu_to_le16(0x8086);
+    nfit_dcr->device_id = cpu_to_le16(1);
+
+    /* The _DSM method is following Intel's DSM specification. */
+    nfit_dcr->revision_id = cpu_to_le16(1 /* Current Revision supported
+                                             in ACPI 6.0 is 1. */);
+    nfit_dcr->serial_number = cpu_to_le32(sn);
+    nfit_dcr->fic = cpu_to_le16(0x201 /* Format Interface Code. See Chapter
+                                         2: NVDIMM Device Specific Method
+                                         (DSM) in DSM Spec Rev1.*/);
+}
+
+static GArray *nvdimm_build_device_structure(GSList *device_list)
+{
+    GArray *structures = g_array_new(false, true /* clear */, 1);
+
+    for (; device_list; device_list = device_list->next) {
+        NVDIMMDevice *nvdimm = device_list->data;
+
+        /* build System Physical Address Range Structure. */
+        nvdimm_build_structure_spa(structures, nvdimm);
+
+        /*
+         * build Memory Device to System Physical Address Range Mapping
+         * Structure.
+         */
+        nvdimm_build_structure_memdev(structures, nvdimm);
+
+        /* build NVDIMM Control Region Structure. */
+        nvdimm_build_structure_dcr(structures, nvdimm);
+    }
+
+    return structures;
+}
+
+static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
+                              GArray *table_data, GArray *linker)
+{
+    GArray *structures = nvdimm_build_device_structure(device_list);
+    void *header;
+
+    acpi_add_table(table_offsets, table_data);
+
+    /* NFIT header. */
+    header = acpi_data_push(table_data, sizeof(NvdimmNfitHeader));
+
+    /* NVDIMM device structures. */
+    g_array_append_vals(table_data, structures->data, structures->len);
+
+    build_header(linker, table_data, header, "NFIT",
+                 sizeof(NvdimmNfitHeader) + structures->len, 1);
+    g_array_free(structures, true);
+}
+
 static uint64_t
 nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 {
@@ -61,3 +401,18 @@ void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
                           "nvdimm-acpi-io", NVDIMM_ACPI_IO_LEN);
     memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
 }
+
+void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
+                       GArray *linker)
+{
+    GSList *device_list;
+
+    /* no NVDIMM device is plugged. */
+    device_list = nvdimm_get_plugged_device_list();
+    if (!device_list) {
+        return;
+    }
+
+    nvdimm_build_nfit(device_list, table_offsets, table_data, linker);
+    g_slist_free(device_list);
+}
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 95e0c65..27b2854 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -90,6 +90,7 @@ typedef struct AcpiPmInfo {
     bool s3_disabled;
     bool s4_disabled;
     bool pcihp_bridge_en;
+    bool nvdimm_support;
     uint8_t s4_val;
     uint16_t sci_int;
     uint8_t acpi_enable_cmd;
@@ -231,6 +232,7 @@ static void acpi_get_pm_info(AcpiPmInfo *pm)
     pm->pcihp_bridge_en =
         object_property_get_bool(obj, "acpi-pci-hotplug-with-bridge-support",
                                  NULL);
+    pm->nvdimm_support = object_property_get_bool(obj, "nvdimm-support", NULL);
 }
 
 static void acpi_get_misc_info(AcpiMiscInfo *info)
@@ -1742,6 +1744,10 @@ void acpi_build(PcGuestInfo *guest_info, AcpiBuildTables *tables)
         build_dmar_q35(tables_blob, tables->linker);
     }
 
+    if (pm.nvdimm_support) {
+        nvdimm_build_acpi(table_offsets, tables_blob, tables->linker);
+    }
+
     /* Add tables supplied by user (if any) */
     for (u = acpi_table_first(); u; u = acpi_table_next(u)) {
         unsigned len = acpi_table_len(u);
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index f0b1dda..7fdf591 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -25,6 +25,14 @@
 
 #include "hw/mem/dimm.h"
 
+#define NVDIMM_DEBUG 0
+#define nvdimm_debug(fmt, ...)                                \
+    do {                                                      \
+        if (NVDIMM_DEBUG) {                                   \
+            fprintf(stderr, "nvdimm: " fmt, ## __VA_ARGS__);  \
+        }                                                     \
+    } while (0)
+
 /*
  * The minimum label data size is required by NVDIMM Namespace
  * specification, please refer to chapter 2 Namespaces:
@@ -114,4 +122,6 @@ typedef struct AcpiNVDIMMState AcpiNVDIMMState;
 /* Initialize the memory and IO region needed by NVDIMM ACPI emulation.*/
 void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
                             Object *owner, AcpiNVDIMMState *state);
+void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
+                       GArray *linker);
 #endif
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 27/35] nvdimm acpi: build ACPI nvdimm devices
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices

There is a root device under \_SB and specified NVDIMM devices are under the
root device. Each NVDIMM device has _ADR which returns its handle used to
associate MEMDEV structure in NFIT

We reserve handle 0 for root device. In this patch, we save handle, handle,
arg1 and arg2 to dsm memory. Arg3 is conditionally saved in later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 184 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index dd84e5f..53ed675 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -368,6 +368,15 @@ static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
     g_array_free(structures, true);
 }
 
+struct NvdimmDsmIn {
+    uint32_t handle;
+    uint32_t revision;
+    uint32_t function;
+   /* the remaining size in the page is used by arg3. */
+    uint8_t arg3[0];
+} QEMU_PACKED;
+typedef struct NvdimmDsmIn NvdimmDsmIn;
+
 static uint64_t
 nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 {
@@ -377,6 +386,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 static void
 nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
 {
+    fprintf(stderr, "BUG: we never write DSM notification IO Port.\n");
 }
 
 static const MemoryRegionOps nvdimm_dsm_ops = {
@@ -402,6 +412,179 @@ void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
     memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
 }
 
+#define BUILD_STA_METHOD(_dev_, _method_)                                  \
+    do {                                                                   \
+        _method_ = aml_method("_STA", 0);                                  \
+        aml_append(_method_, aml_return(aml_int(0x0f)));                   \
+        aml_append(_dev_, _method_);                                       \
+    } while (0)
+
+#define BUILD_DSM_METHOD(_dev_, _method_, _handle_, _uuid_)                \
+    do {                                                                   \
+        Aml *ifctx, *uuid;                                                 \
+        _method_ = aml_method("_DSM", 4);                                  \
+        /* check UUID if it is we expect, return the errorcode if not.*/   \
+        uuid = aml_touuid(_uuid_);                                         \
+        ifctx = aml_if(aml_lnot(aml_equal(aml_arg(0), uuid)));             \
+        aml_append(ifctx, aml_return(aml_int(1 /* Not Supported */)));     \
+        aml_append(method, ifctx);                                         \
+        aml_append(method, aml_return(aml_call4("NCAL", aml_int(_handle_), \
+                   aml_arg(1), aml_arg(2), aml_arg(3))));                  \
+        aml_append(_dev_, _method_);                                       \
+    } while (0)
+
+#define BUILD_FIELD_UNIT_SIZE(_field_, _byte_, _name_)                     \
+    aml_append(_field_, aml_named_field(_name_, (_byte_) * BITS_PER_BYTE))
+
+#define BUILD_FIELD_UNIT_STRUCT(_field_, _s_, _f_, _name_)                 \
+    BUILD_FIELD_UNIT_SIZE(_field_, sizeof(typeof_field(_s_, _f_)), _name_)
+
+static void build_nvdimm_devices(GSList *device_list, Aml *root_dev)
+{
+    for (; device_list; device_list = device_list->next) {
+        NVDIMMDevice *nvdimm = device_list->data;
+        int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
+                                           NULL);
+        uint32_t handle = nvdimm_slot_to_handle(slot);
+        Aml *dev, *method;
+
+        dev = aml_device("NV%02X", slot);
+        aml_append(dev, aml_name_decl("_ADR", aml_int(handle)));
+
+        BUILD_STA_METHOD(dev, method);
+
+        /*
+         * Chapter 4: _DSM Interface for NVDIMM Device (non-root) - Example
+         * in DSM Spec Rev1.
+         */
+        BUILD_DSM_METHOD(dev, method,
+                         handle /* NVDIMM Device Handle */,
+                         "4309AC30-0D11-11E4-9191-0800200C9A66"
+                         /* UUID for NVDIMM Devices. */);
+
+        aml_append(root_dev, dev);
+    }
+}
+
+static void nvdimm_build_acpi_devices(GSList *device_list, Aml *sb_scope)
+{
+    Aml *dev, *method, *field;
+    uint64_t page_size = TARGET_PAGE_SIZE;
+
+    dev = aml_device("NVDR");
+    aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
+
+    /* map DSM memory and IO into ACPI namespace. */
+    aml_append(dev, aml_operation_region("NPIO", AML_SYSTEM_IO,
+               NVDIMM_ACPI_IO_BASE, NVDIMM_ACPI_IO_LEN));
+    aml_append(dev, aml_operation_region("NRAM", AML_SYSTEM_MEMORY,
+               NVDIMM_ACPI_MEM_BASE, page_size));
+
+    /*
+     * DSM notifier:
+     * @NOTI: Read it will notify QEMU that _DSM method is being
+     *        called and the parameters can be found in NvdimmDsmIn.
+     *        The value read from it is the buffer size of DSM output
+     *        filled by QEMU.
+     */
+    field = aml_field("NPIO", AML_DWORD_ACC, AML_PRESERVE);
+    BUILD_FIELD_UNIT_SIZE(field, sizeof(uint32_t), "NOTI");
+    aml_append(dev, field);
+
+    /*
+     * DSM input:
+     * @HDLE: store device's handle, it's zero if the _DSM call happens
+     *        on NVDIMM Root Device.
+     * @REVS: store the Arg1 of _DSM call.
+     * @FUNC: store the Arg2 of _DSM call.
+     * @ARG3: store the Arg3 of _DSM call.
+     *
+     * They are RAM mapping on host so that these accesses never cause
+     * VM-EXIT.
+     */
+    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
+    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, handle, "HDLE");
+    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, revision, "REVS");
+    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, function, "FUNC");
+    BUILD_FIELD_UNIT_SIZE(field, page_size - offsetof(NvdimmDsmIn, arg3),
+                          "ARG3");
+    aml_append(dev, field);
+
+    /*
+     * DSM output:
+     * @ODAT: the buffer QEMU uses to store the result, the actual size
+     *        filled by QEMU is the value read from NOT1.
+     *
+     * Since the page is reused by both input and out, the input data
+     * will be lost after storing new result into @ODAT.
+    */
+    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
+    BUILD_FIELD_UNIT_SIZE(field, page_size, "ODAT");
+    aml_append(dev, field);
+
+    method = aml_method_serialized("NCAL", 4);
+    {
+        Aml *buffer_size = aml_local(0);
+
+        aml_append(method, aml_store(aml_arg(0), aml_name("HDLE")));
+        aml_append(method, aml_store(aml_arg(1), aml_name("REVS")));
+        aml_append(method, aml_store(aml_arg(2), aml_name("FUNC")));
+
+        /*
+         * transfer control to QEMU and the buffer size filled by
+         * QEMU is returned.
+         */
+        aml_append(method, aml_store(aml_name("NOTI"), buffer_size));
+
+        aml_append(method, aml_store(aml_shiftleft(buffer_size,
+                                       aml_int(3)), buffer_size));
+
+        aml_append(method, aml_create_field(aml_name("ODAT"), aml_int(0),
+                                            buffer_size , "OBUF"));
+        aml_append(method, aml_concatenate(aml_buffer(0, NULL),
+                                           aml_name("OBUF"), aml_local(1)));
+        aml_append(method, aml_return(aml_local(1)));
+    }
+    aml_append(dev, method);
+
+    BUILD_STA_METHOD(dev, method);
+
+    /*
+     * Chapter 3: _DSM Interface for NVDIMM Root Device - Example in DSM
+     * Spec Rev1.
+     */
+    BUILD_DSM_METHOD(dev, method,
+                     0 /* 0 is reserved for NVDIMM Root Device*/,
+                     "2F10E7A4-9E91-11E4-89D3-123B93F75CBA"
+                     /* UUID for NVDIMM Root Devices. */);
+
+    build_nvdimm_devices(device_list, dev);
+
+    aml_append(sb_scope, dev);
+}
+
+static void nvdimm_build_ssdt(GSList *device_list, GArray *table_offsets,
+                              GArray *table_data, GArray *linker)
+{
+    Aml *ssdt, *sb_scope;
+
+    acpi_add_table(table_offsets, table_data);
+
+    ssdt = init_aml_allocator();
+    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
+
+    sb_scope = aml_scope("\\_SB");
+    nvdimm_build_acpi_devices(device_list, sb_scope);
+
+    aml_append(ssdt, sb_scope);
+    /* copy AML table into ACPI tables blob and patch header there */
+    g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
+    build_header(linker, table_data,
+        (void *)(table_data->data + table_data->len - ssdt->buf->len),
+        "SSDT", ssdt->buf->len, 1);
+    free_aml_allocator();
+}
+
 void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
                        GArray *linker)
 {
@@ -414,5 +597,6 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
     }
 
     nvdimm_build_nfit(device_list, table_offsets, table_data, linker);
+    nvdimm_build_ssdt(device_list, table_offsets, table_data, linker);
     g_slist_free(device_list);
 }
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 27/35] nvdimm acpi: build ACPI nvdimm devices
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices

There is a root device under \_SB and specified NVDIMM devices are under the
root device. Each NVDIMM device has _ADR which returns its handle used to
associate MEMDEV structure in NFIT

We reserve handle 0 for root device. In this patch, we save handle, handle,
arg1 and arg2 to dsm memory. Arg3 is conditionally saved in later patch

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 184 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index dd84e5f..53ed675 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -368,6 +368,15 @@ static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
     g_array_free(structures, true);
 }
 
+struct NvdimmDsmIn {
+    uint32_t handle;
+    uint32_t revision;
+    uint32_t function;
+   /* the remaining size in the page is used by arg3. */
+    uint8_t arg3[0];
+} QEMU_PACKED;
+typedef struct NvdimmDsmIn NvdimmDsmIn;
+
 static uint64_t
 nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 {
@@ -377,6 +386,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 static void
 nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
 {
+    fprintf(stderr, "BUG: we never write DSM notification IO Port.\n");
 }
 
 static const MemoryRegionOps nvdimm_dsm_ops = {
@@ -402,6 +412,179 @@ void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
     memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
 }
 
+#define BUILD_STA_METHOD(_dev_, _method_)                                  \
+    do {                                                                   \
+        _method_ = aml_method("_STA", 0);                                  \
+        aml_append(_method_, aml_return(aml_int(0x0f)));                   \
+        aml_append(_dev_, _method_);                                       \
+    } while (0)
+
+#define BUILD_DSM_METHOD(_dev_, _method_, _handle_, _uuid_)                \
+    do {                                                                   \
+        Aml *ifctx, *uuid;                                                 \
+        _method_ = aml_method("_DSM", 4);                                  \
+        /* check UUID if it is we expect, return the errorcode if not.*/   \
+        uuid = aml_touuid(_uuid_);                                         \
+        ifctx = aml_if(aml_lnot(aml_equal(aml_arg(0), uuid)));             \
+        aml_append(ifctx, aml_return(aml_int(1 /* Not Supported */)));     \
+        aml_append(method, ifctx);                                         \
+        aml_append(method, aml_return(aml_call4("NCAL", aml_int(_handle_), \
+                   aml_arg(1), aml_arg(2), aml_arg(3))));                  \
+        aml_append(_dev_, _method_);                                       \
+    } while (0)
+
+#define BUILD_FIELD_UNIT_SIZE(_field_, _byte_, _name_)                     \
+    aml_append(_field_, aml_named_field(_name_, (_byte_) * BITS_PER_BYTE))
+
+#define BUILD_FIELD_UNIT_STRUCT(_field_, _s_, _f_, _name_)                 \
+    BUILD_FIELD_UNIT_SIZE(_field_, sizeof(typeof_field(_s_, _f_)), _name_)
+
+static void build_nvdimm_devices(GSList *device_list, Aml *root_dev)
+{
+    for (; device_list; device_list = device_list->next) {
+        NVDIMMDevice *nvdimm = device_list->data;
+        int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
+                                           NULL);
+        uint32_t handle = nvdimm_slot_to_handle(slot);
+        Aml *dev, *method;
+
+        dev = aml_device("NV%02X", slot);
+        aml_append(dev, aml_name_decl("_ADR", aml_int(handle)));
+
+        BUILD_STA_METHOD(dev, method);
+
+        /*
+         * Chapter 4: _DSM Interface for NVDIMM Device (non-root) - Example
+         * in DSM Spec Rev1.
+         */
+        BUILD_DSM_METHOD(dev, method,
+                         handle /* NVDIMM Device Handle */,
+                         "4309AC30-0D11-11E4-9191-0800200C9A66"
+                         /* UUID for NVDIMM Devices. */);
+
+        aml_append(root_dev, dev);
+    }
+}
+
+static void nvdimm_build_acpi_devices(GSList *device_list, Aml *sb_scope)
+{
+    Aml *dev, *method, *field;
+    uint64_t page_size = TARGET_PAGE_SIZE;
+
+    dev = aml_device("NVDR");
+    aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
+
+    /* map DSM memory and IO into ACPI namespace. */
+    aml_append(dev, aml_operation_region("NPIO", AML_SYSTEM_IO,
+               NVDIMM_ACPI_IO_BASE, NVDIMM_ACPI_IO_LEN));
+    aml_append(dev, aml_operation_region("NRAM", AML_SYSTEM_MEMORY,
+               NVDIMM_ACPI_MEM_BASE, page_size));
+
+    /*
+     * DSM notifier:
+     * @NOTI: Read it will notify QEMU that _DSM method is being
+     *        called and the parameters can be found in NvdimmDsmIn.
+     *        The value read from it is the buffer size of DSM output
+     *        filled by QEMU.
+     */
+    field = aml_field("NPIO", AML_DWORD_ACC, AML_PRESERVE);
+    BUILD_FIELD_UNIT_SIZE(field, sizeof(uint32_t), "NOTI");
+    aml_append(dev, field);
+
+    /*
+     * DSM input:
+     * @HDLE: store device's handle, it's zero if the _DSM call happens
+     *        on NVDIMM Root Device.
+     * @REVS: store the Arg1 of _DSM call.
+     * @FUNC: store the Arg2 of _DSM call.
+     * @ARG3: store the Arg3 of _DSM call.
+     *
+     * They are RAM mapping on host so that these accesses never cause
+     * VM-EXIT.
+     */
+    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
+    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, handle, "HDLE");
+    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, revision, "REVS");
+    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, function, "FUNC");
+    BUILD_FIELD_UNIT_SIZE(field, page_size - offsetof(NvdimmDsmIn, arg3),
+                          "ARG3");
+    aml_append(dev, field);
+
+    /*
+     * DSM output:
+     * @ODAT: the buffer QEMU uses to store the result, the actual size
+     *        filled by QEMU is the value read from NOT1.
+     *
+     * Since the page is reused by both input and out, the input data
+     * will be lost after storing new result into @ODAT.
+    */
+    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
+    BUILD_FIELD_UNIT_SIZE(field, page_size, "ODAT");
+    aml_append(dev, field);
+
+    method = aml_method_serialized("NCAL", 4);
+    {
+        Aml *buffer_size = aml_local(0);
+
+        aml_append(method, aml_store(aml_arg(0), aml_name("HDLE")));
+        aml_append(method, aml_store(aml_arg(1), aml_name("REVS")));
+        aml_append(method, aml_store(aml_arg(2), aml_name("FUNC")));
+
+        /*
+         * transfer control to QEMU and the buffer size filled by
+         * QEMU is returned.
+         */
+        aml_append(method, aml_store(aml_name("NOTI"), buffer_size));
+
+        aml_append(method, aml_store(aml_shiftleft(buffer_size,
+                                       aml_int(3)), buffer_size));
+
+        aml_append(method, aml_create_field(aml_name("ODAT"), aml_int(0),
+                                            buffer_size , "OBUF"));
+        aml_append(method, aml_concatenate(aml_buffer(0, NULL),
+                                           aml_name("OBUF"), aml_local(1)));
+        aml_append(method, aml_return(aml_local(1)));
+    }
+    aml_append(dev, method);
+
+    BUILD_STA_METHOD(dev, method);
+
+    /*
+     * Chapter 3: _DSM Interface for NVDIMM Root Device - Example in DSM
+     * Spec Rev1.
+     */
+    BUILD_DSM_METHOD(dev, method,
+                     0 /* 0 is reserved for NVDIMM Root Device*/,
+                     "2F10E7A4-9E91-11E4-89D3-123B93F75CBA"
+                     /* UUID for NVDIMM Root Devices. */);
+
+    build_nvdimm_devices(device_list, dev);
+
+    aml_append(sb_scope, dev);
+}
+
+static void nvdimm_build_ssdt(GSList *device_list, GArray *table_offsets,
+                              GArray *table_data, GArray *linker)
+{
+    Aml *ssdt, *sb_scope;
+
+    acpi_add_table(table_offsets, table_data);
+
+    ssdt = init_aml_allocator();
+    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
+
+    sb_scope = aml_scope("\\_SB");
+    nvdimm_build_acpi_devices(device_list, sb_scope);
+
+    aml_append(ssdt, sb_scope);
+    /* copy AML table into ACPI tables blob and patch header there */
+    g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
+    build_header(linker, table_data,
+        (void *)(table_data->data + table_data->len - ssdt->buf->len),
+        "SSDT", ssdt->buf->len, 1);
+    free_aml_allocator();
+}
+
 void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
                        GArray *linker)
 {
@@ -414,5 +597,6 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
     }
 
     nvdimm_build_nfit(device_list, table_offsets, table_data, linker);
+    nvdimm_build_ssdt(device_list, table_offsets, table_data, linker);
     g_slist_free(device_list);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 28/35] nvdimm acpi: save arg3 for NVDIMM device _DSM method
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Check if the input Arg3 is valid then store it into dsm_in if needed

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 53ed675..e179a72 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -524,13 +524,38 @@ static void nvdimm_build_acpi_devices(GSList *device_list, Aml *sb_scope)
 
     method = aml_method_serialized("NCAL", 4);
     {
-        Aml *buffer_size = aml_local(0);
+        Aml *ifctx, *pckg, *buffer_size = aml_local(0);
 
         aml_append(method, aml_store(aml_arg(0), aml_name("HDLE")));
         aml_append(method, aml_store(aml_arg(1), aml_name("REVS")));
         aml_append(method, aml_store(aml_arg(2), aml_name("FUNC")));
 
         /*
+         * The fourth parameter (Arg3) of _DSM is a package which contains
+         * a buffer, the layout of the buffer is specified by UUID (Arg0),
+         * Revision ID (Arg1) and Function Index (Arg2) which are documented
+         * in the DSM Spec.
+         */
+        pckg = aml_arg(3);
+        ifctx = aml_if(aml_and(aml_equal(aml_object_type(pckg),
+                                         aml_int(4 /* Package */)),
+                               aml_equal(aml_sizeof(pckg),
+                                         aml_int(1))));
+        {
+            Aml *pckg_index, *pckg_buf;
+
+            pckg_index = aml_local(2);
+            pckg_buf = aml_local(3);
+
+            aml_append(ifctx, aml_store(aml_index(pckg, aml_int(0)),
+                                        pckg_index));
+            aml_append(ifctx, aml_store(aml_derefof(pckg_index),
+                                        pckg_buf));
+            aml_append(ifctx, aml_store(pckg_buf, aml_name("ARG3")));
+        }
+        aml_append(method, ifctx);
+
+        /*
          * transfer control to QEMU and the buffer size filled by
          * QEMU is returned.
          */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 28/35] nvdimm acpi: save arg3 for NVDIMM device _DSM method
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Check if the input Arg3 is valid then store it into dsm_in if needed

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 53ed675..e179a72 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -524,13 +524,38 @@ static void nvdimm_build_acpi_devices(GSList *device_list, Aml *sb_scope)
 
     method = aml_method_serialized("NCAL", 4);
     {
-        Aml *buffer_size = aml_local(0);
+        Aml *ifctx, *pckg, *buffer_size = aml_local(0);
 
         aml_append(method, aml_store(aml_arg(0), aml_name("HDLE")));
         aml_append(method, aml_store(aml_arg(1), aml_name("REVS")));
         aml_append(method, aml_store(aml_arg(2), aml_name("FUNC")));
 
         /*
+         * The fourth parameter (Arg3) of _DSM is a package which contains
+         * a buffer, the layout of the buffer is specified by UUID (Arg0),
+         * Revision ID (Arg1) and Function Index (Arg2) which are documented
+         * in the DSM Spec.
+         */
+        pckg = aml_arg(3);
+        ifctx = aml_if(aml_and(aml_equal(aml_object_type(pckg),
+                                         aml_int(4 /* Package */)),
+                               aml_equal(aml_sizeof(pckg),
+                                         aml_int(1))));
+        {
+            Aml *pckg_index, *pckg_buf;
+
+            pckg_index = aml_local(2);
+            pckg_buf = aml_local(3);
+
+            aml_append(ifctx, aml_store(aml_index(pckg, aml_int(0)),
+                                        pckg_index));
+            aml_append(ifctx, aml_store(aml_derefof(pckg_index),
+                                        pckg_buf));
+            aml_append(ifctx, aml_store(pckg_buf, aml_name("ARG3")));
+        }
+        aml_append(method, ifctx);
+
+        /*
          * transfer control to QEMU and the buffer size filled by
          * QEMU is returned.
          */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 29/35] nvdimm acpi: support function 0
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

__DSM is defined in ACPI 6.0: 9.14.1 _DSM (Device Specific Method)

Function 0 is a query function. We do not support any function on root
device and only 3 functions are support for NVDIMM device, Get Namespace
Label Size, Get Namespace Label Data and Set Namespace Label Data, that
means we currently only allow to access device's Label Namespace

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         |   2 +-
 hw/acpi/nvdimm.c            | 158 +++++++++++++++++++++++++++++++++++++++++++-
 include/hw/acpi/aml-build.h |   1 +
 3 files changed, 159 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 8bee8b2..90229c5 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -231,7 +231,7 @@ static void build_extop_package(GArray *package, uint8_t op)
     build_prepend_byte(package, 0x5B); /* ExtOpPrefix */
 }
 
-static void build_append_int_noprefix(GArray *table, uint64_t value, int size)
+void build_append_int_noprefix(GArray *table, uint64_t value, int size)
 {
     int i;
 
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index e179a72..3895ad8 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -212,6 +212,22 @@ static uint32_t nvdimm_slot_to_dcr_index(int slot)
     return nvdimm_slot_to_spa_index(slot) + 1;
 }
 
+static NVDIMMDevice
+*nvdimm_get_device_by_handle(GSList *list, uint32_t handle)
+{
+    for (; list; list = list->next) {
+        NVDIMMDevice *nvdimm = list->data;
+        int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
+                                           NULL);
+
+        if (nvdimm_slot_to_handle(slot) == handle) {
+            return nvdimm;
+        }
+    }
+
+    return NULL;
+}
+
 /* ACPI 6.0: 5.2.25.1 System Physical Address Range Structure */
 static void
 nvdimm_build_structure_spa(GArray *structures, NVDIMMDevice *nvdimm)
@@ -368,6 +384,29 @@ static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
     g_array_free(structures, true);
 }
 
+/* define NVDIMM DSM return status codes according to DSM Spec Rev1. */
+enum {
+    /* Common return status codes. */
+    /* Success */
+    NVDIMM_DSM_STATUS_SUCCESS = 0,
+    /* Not Supported */
+    NVDIMM_DSM_STATUS_NOT_SUPPORTED = 1,
+
+    /* NVDIMM Root Device _DSM function return status codes*/
+    /* Invalid Input Parameters */
+    NVDIMM_DSM_ROOT_DEV_STATUS_INVALID_PARAS = 2,
+    /* Function-Specific Error */
+    NVDIMM_DSM_ROOT_DEV_STATUS_FUNCTION_SPECIFIC_ERROR = 3,
+
+    /* NVDIMM Device (non-root) _DSM function return status codes*/
+    /* Non-Existing Memory Device */
+    NVDIMM_DSM_DEV_STATUS_NON_EXISTING_MEM_DEV = 2,
+    /* Invalid Input Parameters */
+    NVDIMM_DSM_DEV_STATUS_INVALID_PARAS = 3,
+    /* Vendor Specific Error */
+    NVDIMM_DSM_DEV_STATUS_VENDOR_SPECIFIC_ERROR = 4,
+};
+
 struct NvdimmDsmIn {
     uint32_t handle;
     uint32_t revision;
@@ -377,10 +416,127 @@ struct NvdimmDsmIn {
 } QEMU_PACKED;
 typedef struct NvdimmDsmIn NvdimmDsmIn;
 
+static void nvdimm_dsm_write_status(GArray *out, uint32_t status)
+{
+    status = cpu_to_le32(status);
+    build_append_int_noprefix(out, status, sizeof(status));
+}
+
+static void nvdimm_dsm_root(NvdimmDsmIn *in, GArray *out)
+{
+    uint32_t status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
+
+    /*
+     * Query command implemented per ACPI Specification, it is defined in
+     * ACPI 6.0: 9.14.1 _DSM (Device Specific Method).
+     */
+    if (in->function == 0x0) {
+        /*
+         * Set it to zero to indicate no function is supported for NVDIMM
+         * root.
+         */
+        uint64_t cmd_list = cpu_to_le64(0);
+
+        build_append_int_noprefix(out, cmd_list, sizeof(cmd_list));
+        return;
+    }
+
+    nvdimm_debug("Return status %#x.\n", status);
+    nvdimm_dsm_write_status(out, status);
+}
+
+static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
+{
+    GSList *list = nvdimm_get_plugged_device_list();
+    NVDIMMDevice *nvdimm = nvdimm_get_device_by_handle(list, in->handle);
+    uint32_t status = NVDIMM_DSM_DEV_STATUS_NON_EXISTING_MEM_DEV;
+    uint64_t cmd_list;
+
+    if (!nvdimm) {
+        goto set_status_free;
+    }
+
+    /* Encode DSM function according to DSM Spec Rev1. */
+    switch (in->function) {
+    /* see comments in nvdimm_dsm_root(). */
+    case 0x0:
+        cmd_list = cpu_to_le64(0x1 /* Bit 0 indicates whether there is
+                                      support for any functions other
+                                      than function 0.
+                                    */                               |
+                               1 << 4 /* Get Namespace Label Size */ |
+                               1 << 5 /* Get Namespace Label Data */ |
+                               1 << 6 /* Set Namespace Label Data */);
+        build_append_int_noprefix(out, cmd_list, sizeof(cmd_list));
+        goto free;
+    default:
+        status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
+    };
+
+set_status_free:
+    nvdimm_debug("Return status %#x.\n", status);
+    nvdimm_dsm_write_status(out, status);
+free:
+    g_slist_free(list);
+}
+
 static uint64_t
 nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 {
-    return 0;
+    AcpiNVDIMMState *state = opaque;
+    MemoryRegion *dsm_ram_mr = &state->ram_mr;
+    NvdimmDsmIn *in;
+    GArray *out;
+    void *dsm_ram_addr;
+    uint32_t buf_size;
+
+    assert(memory_region_size(dsm_ram_mr) >= sizeof(NvdimmDsmIn));
+    dsm_ram_addr = memory_region_get_ram_ptr(dsm_ram_mr);
+
+    /*
+     * The DSM memory is mapped to guest address space so an evil guest
+     * can change its content while we are doing DSM emulation. Avoid
+     * this by copying DSM memory to QEMU local memory.
+     */
+    in = g_malloc(memory_region_size(dsm_ram_mr));
+    memcpy(in, dsm_ram_addr, memory_region_size(dsm_ram_mr));
+
+    le32_to_cpus(&in->revision);
+    le32_to_cpus(&in->function);
+    le32_to_cpus(&in->handle);
+
+    nvdimm_debug("Revision %#x Handler %#x Function %#x.\n", in->revision,
+                 in->handle, in->function);
+
+    out = g_array_new(false, true /* clear */, 1);
+
+    if (in->revision != 0x1 /* Current we support DSM Spec Rev1. */) {
+        nvdimm_debug("Revision %#x is not supported, expect %#x.\n",
+                      in->revision, 0x1);
+        nvdimm_dsm_write_status(out, NVDIMM_DSM_STATUS_NOT_SUPPORTED);
+        goto exit;
+    }
+
+    /* Handle 0 is reserved for NVDIMM Root Device. */
+    if (!in->handle) {
+        nvdimm_dsm_root(in, out);
+        goto exit;
+    }
+
+    nvdimm_dsm_device(in, out);
+
+exit:
+    assert(out->len <= memory_region_size(dsm_ram_mr));
+
+    /* Write output result to dsm memory. */
+    memcpy(dsm_ram_addr, out->data, out->len);
+    memory_region_set_dirty(dsm_ram_mr, 0, out->len);
+
+    buf_size = cpu_to_le32(out->len);
+
+    g_free(in);
+    g_array_free(out, true);
+    return buf_size;
 }
 
 static void
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 00cf40e..2c80043 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -281,6 +281,7 @@ Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
 Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target);
 Aml *aml_object_type(Aml *object);
 
+void build_append_int_noprefix(GArray *table, uint64_t value, int size);
 void
 build_header(GArray *linker, GArray *table_data,
              AcpiTableHeader *h, const char *sig, int len, uint8_t rev);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 29/35] nvdimm acpi: support function 0
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

__DSM is defined in ACPI 6.0: 9.14.1 _DSM (Device Specific Method)

Function 0 is a query function. We do not support any function on root
device and only 3 functions are support for NVDIMM device, Get Namespace
Label Size, Get Namespace Label Data and Set Namespace Label Data, that
means we currently only allow to access device's Label Namespace

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/aml-build.c         |   2 +-
 hw/acpi/nvdimm.c            | 158 +++++++++++++++++++++++++++++++++++++++++++-
 include/hw/acpi/aml-build.h |   1 +
 3 files changed, 159 insertions(+), 2 deletions(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 8bee8b2..90229c5 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -231,7 +231,7 @@ static void build_extop_package(GArray *package, uint8_t op)
     build_prepend_byte(package, 0x5B); /* ExtOpPrefix */
 }
 
-static void build_append_int_noprefix(GArray *table, uint64_t value, int size)
+void build_append_int_noprefix(GArray *table, uint64_t value, int size)
 {
     int i;
 
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index e179a72..3895ad8 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -212,6 +212,22 @@ static uint32_t nvdimm_slot_to_dcr_index(int slot)
     return nvdimm_slot_to_spa_index(slot) + 1;
 }
 
+static NVDIMMDevice
+*nvdimm_get_device_by_handle(GSList *list, uint32_t handle)
+{
+    for (; list; list = list->next) {
+        NVDIMMDevice *nvdimm = list->data;
+        int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
+                                           NULL);
+
+        if (nvdimm_slot_to_handle(slot) == handle) {
+            return nvdimm;
+        }
+    }
+
+    return NULL;
+}
+
 /* ACPI 6.0: 5.2.25.1 System Physical Address Range Structure */
 static void
 nvdimm_build_structure_spa(GArray *structures, NVDIMMDevice *nvdimm)
@@ -368,6 +384,29 @@ static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
     g_array_free(structures, true);
 }
 
+/* define NVDIMM DSM return status codes according to DSM Spec Rev1. */
+enum {
+    /* Common return status codes. */
+    /* Success */
+    NVDIMM_DSM_STATUS_SUCCESS = 0,
+    /* Not Supported */
+    NVDIMM_DSM_STATUS_NOT_SUPPORTED = 1,
+
+    /* NVDIMM Root Device _DSM function return status codes*/
+    /* Invalid Input Parameters */
+    NVDIMM_DSM_ROOT_DEV_STATUS_INVALID_PARAS = 2,
+    /* Function-Specific Error */
+    NVDIMM_DSM_ROOT_DEV_STATUS_FUNCTION_SPECIFIC_ERROR = 3,
+
+    /* NVDIMM Device (non-root) _DSM function return status codes*/
+    /* Non-Existing Memory Device */
+    NVDIMM_DSM_DEV_STATUS_NON_EXISTING_MEM_DEV = 2,
+    /* Invalid Input Parameters */
+    NVDIMM_DSM_DEV_STATUS_INVALID_PARAS = 3,
+    /* Vendor Specific Error */
+    NVDIMM_DSM_DEV_STATUS_VENDOR_SPECIFIC_ERROR = 4,
+};
+
 struct NvdimmDsmIn {
     uint32_t handle;
     uint32_t revision;
@@ -377,10 +416,127 @@ struct NvdimmDsmIn {
 } QEMU_PACKED;
 typedef struct NvdimmDsmIn NvdimmDsmIn;
 
+static void nvdimm_dsm_write_status(GArray *out, uint32_t status)
+{
+    status = cpu_to_le32(status);
+    build_append_int_noprefix(out, status, sizeof(status));
+}
+
+static void nvdimm_dsm_root(NvdimmDsmIn *in, GArray *out)
+{
+    uint32_t status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
+
+    /*
+     * Query command implemented per ACPI Specification, it is defined in
+     * ACPI 6.0: 9.14.1 _DSM (Device Specific Method).
+     */
+    if (in->function == 0x0) {
+        /*
+         * Set it to zero to indicate no function is supported for NVDIMM
+         * root.
+         */
+        uint64_t cmd_list = cpu_to_le64(0);
+
+        build_append_int_noprefix(out, cmd_list, sizeof(cmd_list));
+        return;
+    }
+
+    nvdimm_debug("Return status %#x.\n", status);
+    nvdimm_dsm_write_status(out, status);
+}
+
+static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
+{
+    GSList *list = nvdimm_get_plugged_device_list();
+    NVDIMMDevice *nvdimm = nvdimm_get_device_by_handle(list, in->handle);
+    uint32_t status = NVDIMM_DSM_DEV_STATUS_NON_EXISTING_MEM_DEV;
+    uint64_t cmd_list;
+
+    if (!nvdimm) {
+        goto set_status_free;
+    }
+
+    /* Encode DSM function according to DSM Spec Rev1. */
+    switch (in->function) {
+    /* see comments in nvdimm_dsm_root(). */
+    case 0x0:
+        cmd_list = cpu_to_le64(0x1 /* Bit 0 indicates whether there is
+                                      support for any functions other
+                                      than function 0.
+                                    */                               |
+                               1 << 4 /* Get Namespace Label Size */ |
+                               1 << 5 /* Get Namespace Label Data */ |
+                               1 << 6 /* Set Namespace Label Data */);
+        build_append_int_noprefix(out, cmd_list, sizeof(cmd_list));
+        goto free;
+    default:
+        status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
+    };
+
+set_status_free:
+    nvdimm_debug("Return status %#x.\n", status);
+    nvdimm_dsm_write_status(out, status);
+free:
+    g_slist_free(list);
+}
+
 static uint64_t
 nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 {
-    return 0;
+    AcpiNVDIMMState *state = opaque;
+    MemoryRegion *dsm_ram_mr = &state->ram_mr;
+    NvdimmDsmIn *in;
+    GArray *out;
+    void *dsm_ram_addr;
+    uint32_t buf_size;
+
+    assert(memory_region_size(dsm_ram_mr) >= sizeof(NvdimmDsmIn));
+    dsm_ram_addr = memory_region_get_ram_ptr(dsm_ram_mr);
+
+    /*
+     * The DSM memory is mapped to guest address space so an evil guest
+     * can change its content while we are doing DSM emulation. Avoid
+     * this by copying DSM memory to QEMU local memory.
+     */
+    in = g_malloc(memory_region_size(dsm_ram_mr));
+    memcpy(in, dsm_ram_addr, memory_region_size(dsm_ram_mr));
+
+    le32_to_cpus(&in->revision);
+    le32_to_cpus(&in->function);
+    le32_to_cpus(&in->handle);
+
+    nvdimm_debug("Revision %#x Handler %#x Function %#x.\n", in->revision,
+                 in->handle, in->function);
+
+    out = g_array_new(false, true /* clear */, 1);
+
+    if (in->revision != 0x1 /* Current we support DSM Spec Rev1. */) {
+        nvdimm_debug("Revision %#x is not supported, expect %#x.\n",
+                      in->revision, 0x1);
+        nvdimm_dsm_write_status(out, NVDIMM_DSM_STATUS_NOT_SUPPORTED);
+        goto exit;
+    }
+
+    /* Handle 0 is reserved for NVDIMM Root Device. */
+    if (!in->handle) {
+        nvdimm_dsm_root(in, out);
+        goto exit;
+    }
+
+    nvdimm_dsm_device(in, out);
+
+exit:
+    assert(out->len <= memory_region_size(dsm_ram_mr));
+
+    /* Write output result to dsm memory. */
+    memcpy(dsm_ram_addr, out->data, out->len);
+    memory_region_set_dirty(dsm_ram_mr, 0, out->len);
+
+    buf_size = cpu_to_le32(out->len);
+
+    g_free(in);
+    g_array_free(out, true);
+    return buf_size;
 }
 
 static void
diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
index 00cf40e..2c80043 100644
--- a/include/hw/acpi/aml-build.h
+++ b/include/hw/acpi/aml-build.h
@@ -281,6 +281,7 @@ Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
 Aml *aml_concatenate(Aml *source1, Aml *source2, Aml *target);
 Aml *aml_object_type(Aml *object);
 
+void build_append_int_noprefix(GArray *table, uint64_t value, int size);
 void
 build_header(GArray *linker, GArray *table_data,
              AcpiTableHeader *h, const char *sig, int len, uint8_t rev);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 30/35] nvdimm acpi: support Get Namespace Label Size function
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Function 4 is used to get Namespace label size

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 86 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 3895ad8..cdc361c 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -407,15 +407,48 @@ enum {
     NVDIMM_DSM_DEV_STATUS_VENDOR_SPECIFIC_ERROR = 4,
 };
 
+struct NvdimmFuncInGetLabelData {
+    uint32_t offset; /* the offset in the namespace label data area. */
+    uint32_t length; /* the size of data is to be read via the function. */
+} QEMU_PACKED;
+typedef struct NvdimmFuncInGetLabelData NvdimmFuncInGetLabelData;
+
+struct NvdimmFuncInSetLabelData {
+    uint32_t offset; /* the offset in the namespace label data area. */
+    uint32_t length; /* the size of data is to be written via the function. */
+    uint8_t in_buf[0]; /* the data written to label data area. */
+} QEMU_PACKED;
+typedef struct NvdimmFuncInSetLabelData NvdimmFuncInSetLabelData;
+
 struct NvdimmDsmIn {
     uint32_t handle;
     uint32_t revision;
     uint32_t function;
    /* the remaining size in the page is used by arg3. */
-    uint8_t arg3[0];
+    union {
+        uint8_t arg3[0];
+        NvdimmFuncInSetLabelData func_set_label_data;
+    };
 } QEMU_PACKED;
 typedef struct NvdimmDsmIn NvdimmDsmIn;
 
+struct NvdimmFuncOutLabelSize {
+    uint32_t status;     /* return status code. */
+    uint32_t label_size; /* the size of label data area. */
+    /*
+     * Maximum size of the namespace label data length supported by
+     * the platform in Get/Set Namespace Label Data functions.
+     */
+    uint32_t max_xfer;
+} QEMU_PACKED;
+typedef struct NvdimmFuncOutLabelSize NvdimmFuncOutLabelSize;
+
+struct NvdimmFuncOutGetLabelData {
+    uint32_t status;    /*return status code. */
+    uint8_t out_buf[0]; /* the data got via Get Namesapce Label function. */
+} QEMU_PACKED;
+typedef struct NvdimmFuncOutGetLabelData NvdimmFuncOutGetLabelData;
+
 static void nvdimm_dsm_write_status(GArray *out, uint32_t status)
 {
     status = cpu_to_le32(status);
@@ -445,6 +478,55 @@ static void nvdimm_dsm_root(NvdimmDsmIn *in, GArray *out)
     nvdimm_dsm_write_status(out, status);
 }
 
+/*
+ * the max transfer size is the max size transferred by both a
+ * 'Get Namespace Label Data' function and a 'Set Namespace Label Data'
+ * function.
+ */
+static uint32_t nvdimm_get_max_xfer_label_size(void)
+{
+    NvdimmDsmIn *in;
+    uint32_t max_get_size, max_set_size, dsm_memory_size = TARGET_PAGE_SIZE;
+
+    /*
+     * the max data ACPI can read one time which is transferred by
+     * the response of 'Get Namespace Label Data' function.
+     */
+    max_get_size = dsm_memory_size - sizeof(NvdimmFuncOutGetLabelData);
+
+    /*
+     * the max data ACPI can write one time which is transferred by
+     * 'Set Namespace Label Data' function.
+     */
+    max_set_size = dsm_memory_size - offsetof(NvdimmDsmIn, arg3) -
+                   sizeof(in->func_set_label_data);
+
+    return MIN(max_get_size, max_set_size);
+}
+
+/*
+ * DSM Spec Rev1 4.4 Get Namespace Label Size (Function Index 4).
+ *
+ * It gets the size of Namespace Label data area and the max data size
+ * that Get/Set Namespace Label Data functions can transfer.
+ */
+static void nvdimm_dsm_func_label_size(NVDIMMDevice *nvdimm, GArray *out)
+{
+    NvdimmFuncOutLabelSize func_label_size;
+    uint32_t label_size, mxfer;
+
+    label_size = nvdimm->label_size;
+    mxfer = nvdimm_get_max_xfer_label_size();
+
+    nvdimm_debug("label_size %#x, max_xfer %#x.\n", label_size, mxfer);
+
+    func_label_size.status = cpu_to_le32(NVDIMM_DSM_STATUS_SUCCESS);
+    func_label_size.label_size = cpu_to_le32(label_size);
+    func_label_size.max_xfer = cpu_to_le32(mxfer);
+
+    g_array_append_vals(out, &func_label_size, sizeof(func_label_size));
+}
+
 static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
 {
     GSList *list = nvdimm_get_plugged_device_list();
@@ -469,6 +551,9 @@ static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
                                1 << 6 /* Set Namespace Label Data */);
         build_append_int_noprefix(out, cmd_list, sizeof(cmd_list));
         goto free;
+    case 0x4 /* Get Namespace Label Size */:
+        nvdimm_dsm_func_label_size(nvdimm, out);
+        goto free;
     default:
         status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
     };
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 30/35] nvdimm acpi: support Get Namespace Label Size function
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Function 4 is used to get Namespace label size

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 86 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 3895ad8..cdc361c 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -407,15 +407,48 @@ enum {
     NVDIMM_DSM_DEV_STATUS_VENDOR_SPECIFIC_ERROR = 4,
 };
 
+struct NvdimmFuncInGetLabelData {
+    uint32_t offset; /* the offset in the namespace label data area. */
+    uint32_t length; /* the size of data is to be read via the function. */
+} QEMU_PACKED;
+typedef struct NvdimmFuncInGetLabelData NvdimmFuncInGetLabelData;
+
+struct NvdimmFuncInSetLabelData {
+    uint32_t offset; /* the offset in the namespace label data area. */
+    uint32_t length; /* the size of data is to be written via the function. */
+    uint8_t in_buf[0]; /* the data written to label data area. */
+} QEMU_PACKED;
+typedef struct NvdimmFuncInSetLabelData NvdimmFuncInSetLabelData;
+
 struct NvdimmDsmIn {
     uint32_t handle;
     uint32_t revision;
     uint32_t function;
    /* the remaining size in the page is used by arg3. */
-    uint8_t arg3[0];
+    union {
+        uint8_t arg3[0];
+        NvdimmFuncInSetLabelData func_set_label_data;
+    };
 } QEMU_PACKED;
 typedef struct NvdimmDsmIn NvdimmDsmIn;
 
+struct NvdimmFuncOutLabelSize {
+    uint32_t status;     /* return status code. */
+    uint32_t label_size; /* the size of label data area. */
+    /*
+     * Maximum size of the namespace label data length supported by
+     * the platform in Get/Set Namespace Label Data functions.
+     */
+    uint32_t max_xfer;
+} QEMU_PACKED;
+typedef struct NvdimmFuncOutLabelSize NvdimmFuncOutLabelSize;
+
+struct NvdimmFuncOutGetLabelData {
+    uint32_t status;    /*return status code. */
+    uint8_t out_buf[0]; /* the data got via Get Namesapce Label function. */
+} QEMU_PACKED;
+typedef struct NvdimmFuncOutGetLabelData NvdimmFuncOutGetLabelData;
+
 static void nvdimm_dsm_write_status(GArray *out, uint32_t status)
 {
     status = cpu_to_le32(status);
@@ -445,6 +478,55 @@ static void nvdimm_dsm_root(NvdimmDsmIn *in, GArray *out)
     nvdimm_dsm_write_status(out, status);
 }
 
+/*
+ * the max transfer size is the max size transferred by both a
+ * 'Get Namespace Label Data' function and a 'Set Namespace Label Data'
+ * function.
+ */
+static uint32_t nvdimm_get_max_xfer_label_size(void)
+{
+    NvdimmDsmIn *in;
+    uint32_t max_get_size, max_set_size, dsm_memory_size = TARGET_PAGE_SIZE;
+
+    /*
+     * the max data ACPI can read one time which is transferred by
+     * the response of 'Get Namespace Label Data' function.
+     */
+    max_get_size = dsm_memory_size - sizeof(NvdimmFuncOutGetLabelData);
+
+    /*
+     * the max data ACPI can write one time which is transferred by
+     * 'Set Namespace Label Data' function.
+     */
+    max_set_size = dsm_memory_size - offsetof(NvdimmDsmIn, arg3) -
+                   sizeof(in->func_set_label_data);
+
+    return MIN(max_get_size, max_set_size);
+}
+
+/*
+ * DSM Spec Rev1 4.4 Get Namespace Label Size (Function Index 4).
+ *
+ * It gets the size of Namespace Label data area and the max data size
+ * that Get/Set Namespace Label Data functions can transfer.
+ */
+static void nvdimm_dsm_func_label_size(NVDIMMDevice *nvdimm, GArray *out)
+{
+    NvdimmFuncOutLabelSize func_label_size;
+    uint32_t label_size, mxfer;
+
+    label_size = nvdimm->label_size;
+    mxfer = nvdimm_get_max_xfer_label_size();
+
+    nvdimm_debug("label_size %#x, max_xfer %#x.\n", label_size, mxfer);
+
+    func_label_size.status = cpu_to_le32(NVDIMM_DSM_STATUS_SUCCESS);
+    func_label_size.label_size = cpu_to_le32(label_size);
+    func_label_size.max_xfer = cpu_to_le32(mxfer);
+
+    g_array_append_vals(out, &func_label_size, sizeof(func_label_size));
+}
+
 static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
 {
     GSList *list = nvdimm_get_plugged_device_list();
@@ -469,6 +551,9 @@ static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
                                1 << 6 /* Set Namespace Label Data */);
         build_append_int_noprefix(out, cmd_list, sizeof(cmd_list));
         goto free;
+    case 0x4 /* Get Namespace Label Size */:
+        nvdimm_dsm_func_label_size(nvdimm, out);
+        goto free;
     default:
         status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
     };
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 31/35] nvdimm acpi: support Get Namespace Label Data function
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Function 5 is used to get Namespace Label Data

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index cdc361c..f30e2ff 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -428,6 +428,7 @@ struct NvdimmDsmIn {
     union {
         uint8_t arg3[0];
         NvdimmFuncInSetLabelData func_set_label_data;
+        NvdimmFuncInGetLabelData func_get_label_data;
     };
 } QEMU_PACKED;
 typedef struct NvdimmDsmIn NvdimmDsmIn;
@@ -527,6 +528,65 @@ static void nvdimm_dsm_func_label_size(NVDIMMDevice *nvdimm, GArray *out)
     g_array_append_vals(out, &func_label_size, sizeof(func_label_size));
 }
 
+static uint32_t nvdimm_rw_label_data_check(NVDIMMDevice *nvdimm,
+                                           uint32_t offset, uint32_t length)
+{
+    if (offset + length < offset) {
+        nvdimm_debug("offset %#x + length %#x is overflow.\n", offset,
+                     length);
+        return NVDIMM_DSM_DEV_STATUS_INVALID_PARAS;
+    }
+
+    if (nvdimm->label_size < offset + length) {
+        nvdimm_debug("position %#x is beyond label data (len = %#lx).\n",
+                     offset + length, nvdimm->label_size);
+        return NVDIMM_DSM_DEV_STATUS_INVALID_PARAS;
+    }
+
+    if (length > nvdimm_get_max_xfer_label_size()) {
+        nvdimm_debug("length (%#x) is larger than max_xfer (%#x).\n",
+                     length, nvdimm_get_max_xfer_label_size());
+        return NVDIMM_DSM_DEV_STATUS_INVALID_PARAS;
+    }
+
+    return NVDIMM_DSM_STATUS_SUCCESS;
+}
+
+/*
+ * DSM Spec Rev1 4.5 Get Namespace Label Data (Function Index 5).
+ */
+static void nvdimm_dsm_func_get_label_data(NVDIMMDevice *nvdimm,
+                                           NvdimmDsmIn *in, GArray *out)
+{
+    NVDIMMClass *nvc = NVDIMM_GET_CLASS(nvdimm);
+    NvdimmFuncInGetLabelData *get_label_data = &in->func_get_label_data;
+    void *buf;
+    uint32_t status;
+
+    le32_to_cpus(&get_label_data->offset);
+    le32_to_cpus(&get_label_data->length);
+
+    nvdimm_debug("Read Label Data: offset %#x length %#x.\n",
+                 get_label_data->offset, get_label_data->length);
+
+    status = nvdimm_rw_label_data_check(nvdimm, get_label_data->offset,
+                                        get_label_data->length);
+    if (status != NVDIMM_DSM_STATUS_SUCCESS) {
+        goto exit;
+    }
+
+    /* write nvdimm_func_out_get_label_data.status. */
+    nvdimm_dsm_write_status(out, status);
+    /* write nvdimm_func_out_get_label_data.out_buf. */
+    buf = acpi_data_push(out, get_label_data->length);
+    nvc->read_label_data(nvdimm, buf, get_label_data->length,
+                         get_label_data->offset);
+    return;
+
+exit:
+    nvdimm_dsm_write_status(out, status);
+}
+
 static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
 {
     GSList *list = nvdimm_get_plugged_device_list();
@@ -554,6 +614,9 @@ static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
     case 0x4 /* Get Namespace Label Size */:
         nvdimm_dsm_func_label_size(nvdimm, out);
         goto free;
+    case 0x5 /* Get Namespace Label Data */:
+        nvdimm_dsm_func_get_label_data(nvdimm, in, out);
+        goto free;
     default:
         status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
     };
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 31/35] nvdimm acpi: support Get Namespace Label Data function
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Function 5 is used to get Namespace Label Data

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index cdc361c..f30e2ff 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -428,6 +428,7 @@ struct NvdimmDsmIn {
     union {
         uint8_t arg3[0];
         NvdimmFuncInSetLabelData func_set_label_data;
+        NvdimmFuncInGetLabelData func_get_label_data;
     };
 } QEMU_PACKED;
 typedef struct NvdimmDsmIn NvdimmDsmIn;
@@ -527,6 +528,65 @@ static void nvdimm_dsm_func_label_size(NVDIMMDevice *nvdimm, GArray *out)
     g_array_append_vals(out, &func_label_size, sizeof(func_label_size));
 }
 
+static uint32_t nvdimm_rw_label_data_check(NVDIMMDevice *nvdimm,
+                                           uint32_t offset, uint32_t length)
+{
+    if (offset + length < offset) {
+        nvdimm_debug("offset %#x + length %#x is overflow.\n", offset,
+                     length);
+        return NVDIMM_DSM_DEV_STATUS_INVALID_PARAS;
+    }
+
+    if (nvdimm->label_size < offset + length) {
+        nvdimm_debug("position %#x is beyond label data (len = %#lx).\n",
+                     offset + length, nvdimm->label_size);
+        return NVDIMM_DSM_DEV_STATUS_INVALID_PARAS;
+    }
+
+    if (length > nvdimm_get_max_xfer_label_size()) {
+        nvdimm_debug("length (%#x) is larger than max_xfer (%#x).\n",
+                     length, nvdimm_get_max_xfer_label_size());
+        return NVDIMM_DSM_DEV_STATUS_INVALID_PARAS;
+    }
+
+    return NVDIMM_DSM_STATUS_SUCCESS;
+}
+
+/*
+ * DSM Spec Rev1 4.5 Get Namespace Label Data (Function Index 5).
+ */
+static void nvdimm_dsm_func_get_label_data(NVDIMMDevice *nvdimm,
+                                           NvdimmDsmIn *in, GArray *out)
+{
+    NVDIMMClass *nvc = NVDIMM_GET_CLASS(nvdimm);
+    NvdimmFuncInGetLabelData *get_label_data = &in->func_get_label_data;
+    void *buf;
+    uint32_t status;
+
+    le32_to_cpus(&get_label_data->offset);
+    le32_to_cpus(&get_label_data->length);
+
+    nvdimm_debug("Read Label Data: offset %#x length %#x.\n",
+                 get_label_data->offset, get_label_data->length);
+
+    status = nvdimm_rw_label_data_check(nvdimm, get_label_data->offset,
+                                        get_label_data->length);
+    if (status != NVDIMM_DSM_STATUS_SUCCESS) {
+        goto exit;
+    }
+
+    /* write nvdimm_func_out_get_label_data.status. */
+    nvdimm_dsm_write_status(out, status);
+    /* write nvdimm_func_out_get_label_data.out_buf. */
+    buf = acpi_data_push(out, get_label_data->length);
+    nvc->read_label_data(nvdimm, buf, get_label_data->length,
+                         get_label_data->offset);
+    return;
+
+exit:
+    nvdimm_dsm_write_status(out, status);
+}
+
 static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
 {
     GSList *list = nvdimm_get_plugged_device_list();
@@ -554,6 +614,9 @@ static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
     case 0x4 /* Get Namespace Label Size */:
         nvdimm_dsm_func_label_size(nvdimm, out);
         goto free;
+    case 0x5 /* Get Namespace Label Data */:
+        nvdimm_dsm_func_get_label_data(nvdimm, in, out);
+        goto free;
     default:
         status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
     };
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 32/35] nvdimm acpi: support Set Namespace Label Data function
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Function 6 is used to set Namespace Label Data

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index f30e2ff..2553be9 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -587,6 +587,34 @@ exit:
     nvdimm_dsm_write_status(out, status);
 }
 
+/*
+ * DSM Spec Rev1 4.6 Set Namespace Label Data (Function Index 6).
+ */
+static void nvdimm_dsm_func_set_label_data(NVDIMMDevice *nvdimm,
+                                           NvdimmDsmIn *in, GArray *out)
+{
+    NVDIMMClass *nvc = NVDIMM_GET_CLASS(nvdimm);
+    NvdimmFuncInSetLabelData *set_label_data = &in->func_set_label_data;
+    uint32_t status;
+
+    le32_to_cpus(&set_label_data->offset);
+    le32_to_cpus(&set_label_data->length);
+
+    nvdimm_debug("Write Label Data: offset %#x length %#x.\n",
+                 set_label_data->offset, set_label_data->length);
+
+    status = nvdimm_rw_label_data_check(nvdimm, set_label_data->offset,
+                                        set_label_data->length);
+    if (status != NVDIMM_DSM_STATUS_SUCCESS) {
+        goto exit;
+    }
+
+    nvc->write_label_data(nvdimm, set_label_data->in_buf,
+                          set_label_data->length, set_label_data->offset);
+exit:
+    nvdimm_dsm_write_status(out, status);
+}
+
 static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
 {
     GSList *list = nvdimm_get_plugged_device_list();
@@ -617,6 +645,9 @@ static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
     case 0x5 /* Get Namespace Label Data */:
         nvdimm_dsm_func_get_label_data(nvdimm, in, out);
         goto free;
+    case 0x6 /* Set Namespace Label Data */:
+        nvdimm_dsm_func_set_label_data(nvdimm, in, out);
+        goto free;
     default:
         status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
     };
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 32/35] nvdimm acpi: support Set Namespace Label Data function
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Function 6 is used to set Namespace Label Data

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index f30e2ff..2553be9 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -587,6 +587,34 @@ exit:
     nvdimm_dsm_write_status(out, status);
 }
 
+/*
+ * DSM Spec Rev1 4.6 Set Namespace Label Data (Function Index 6).
+ */
+static void nvdimm_dsm_func_set_label_data(NVDIMMDevice *nvdimm,
+                                           NvdimmDsmIn *in, GArray *out)
+{
+    NVDIMMClass *nvc = NVDIMM_GET_CLASS(nvdimm);
+    NvdimmFuncInSetLabelData *set_label_data = &in->func_set_label_data;
+    uint32_t status;
+
+    le32_to_cpus(&set_label_data->offset);
+    le32_to_cpus(&set_label_data->length);
+
+    nvdimm_debug("Write Label Data: offset %#x length %#x.\n",
+                 set_label_data->offset, set_label_data->length);
+
+    status = nvdimm_rw_label_data_check(nvdimm, set_label_data->offset,
+                                        set_label_data->length);
+    if (status != NVDIMM_DSM_STATUS_SUCCESS) {
+        goto exit;
+    }
+
+    nvc->write_label_data(nvdimm, set_label_data->in_buf,
+                          set_label_data->length, set_label_data->offset);
+exit:
+    nvdimm_dsm_write_status(out, status);
+}
+
 static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
 {
     GSList *list = nvdimm_get_plugged_device_list();
@@ -617,6 +645,9 @@ static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
     case 0x5 /* Get Namespace Label Data */:
         nvdimm_dsm_func_get_label_data(nvdimm, in, out);
         goto free;
+    case 0x6 /* Set Namespace Label Data */:
+        nvdimm_dsm_func_set_label_data(nvdimm, in, out);
+        goto free;
     default:
         status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
     };
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 33/35] nvdimm: allow using whole backend memory as pmem
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Introduce a parameter, named "reserve-label-data", if it is
false which indicates that QEMU does not reserve any region
on the backend memory to support label data. It is a
'label-less' NVDIMM device mode that linux will use whole
memory on the device as a single namesapce

This is useful for the users who want to pass whole nvdimm
device and make its data completely be visible to guest

The parameter is false on default

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c        | 12 ++++++++++++
 hw/mem/nvdimm.c         | 43 ++++++++++++++++++++++++++++++++++++-------
 include/hw/mem/nvdimm.h |  6 ++++++
 3 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 2553be9..eab5d9c 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -531,6 +531,12 @@ static void nvdimm_dsm_func_label_size(NVDIMMDevice *nvdimm, GArray *out)
 static uint32_t nvdimm_rw_label_data_check(NVDIMMDevice *nvdimm,
                                            uint32_t offset, uint32_t length)
 {
+    if (!nvdimm->reserve_label_data) {
+        nvdimm_debug("read/write label request on the device without "
+                     "label data reserved.\n");
+        return NVDIMM_DSM_STATUS_NOT_SUPPORTED;
+    }
+
     if (offset + length < offset) {
         nvdimm_debug("offset %#x + length %#x is overflow.\n", offset,
                      length);
@@ -637,6 +643,12 @@ static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
                                1 << 4 /* Get Namespace Label Size */ |
                                1 << 5 /* Get Namespace Label Data */ |
                                1 << 6 /* Set Namespace Label Data */);
+
+        /* no function support if the device does not have label data. */
+        if (!nvdimm->reserve_label_data) {
+            cmd_list = cpu_to_le64(0);
+        }
+
         build_append_int_noprefix(out, cmd_list, sizeof(cmd_list));
         goto free;
     case 0x4 /* Get Namespace Label Size */:
diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index c310887..fde1c7f 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -39,14 +39,15 @@ static void nvdimm_realize(DIMMDevice *dimm, Error **errp)
 {
     MemoryRegion *mr;
     NVDIMMDevice *nvdimm = NVDIMM(dimm);
-    uint64_t size;
+    uint64_t reserved_label_size, size;
 
     nvdimm->label_size = MIN_NAMESPACE_LABEL_SIZE;
+    reserved_label_size = nvdimm->reserve_label_data ? nvdimm->label_size : 0;
 
     mr = host_memory_backend_get_memory(dimm->hostmem, errp);
     size = memory_region_size(mr);
 
-    if (size <= nvdimm->label_size) {
+    if (size <= reserved_label_size) {
         char *path = object_get_canonical_path_component(OBJECT(dimm->hostmem));
         error_setg(errp, "the size of memdev %s (0x%" PRIx64 ") is too small"
                    " to contain nvdimm namespace label (0x%" PRIx64 ")", path,
@@ -55,15 +56,19 @@ static void nvdimm_realize(DIMMDevice *dimm, Error **errp)
     }
 
     memory_region_init_alias(&nvdimm->nvdimm_mr, OBJECT(dimm), "nvdimm-memory",
-                             mr, 0, size - nvdimm->label_size);
-    nvdimm->label_data = memory_region_get_ram_ptr(mr) +
-                         memory_region_size(&nvdimm->nvdimm_mr);
+                             mr, 0, size - reserved_label_size);
+
+    if (reserved_label_size) {
+        nvdimm->label_data = memory_region_get_ram_ptr(mr) +
+                             memory_region_size(&nvdimm->nvdimm_mr);
+    }
 }
 
 static void nvdimm_read_label_data(NVDIMMDevice *nvdimm, void *buf,
                                    uint64_t size, uint64_t offset)
 {
-    assert((nvdimm->label_size >= size + offset) && (offset + size > offset));
+    assert(nvdimm->reserve_label_data &&
+           (nvdimm->label_size >= size + offset) && (offset + size > offset));
 
     memcpy(buf, nvdimm->label_data + offset, size);
 }
@@ -75,7 +80,8 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf,
     DIMMDevice *dimm = DIMM(nvdimm);
     uint64_t backend_offset;
 
-    assert((nvdimm->label_size >= size + offset) && (offset + size > offset));
+    assert(nvdimm->reserve_label_data &&
+           (nvdimm->label_size >= size + offset) && (offset + size > offset));
 
     memcpy(nvdimm->label_data + offset, buf, size);
 
@@ -100,10 +106,33 @@ static void nvdimm_class_init(ObjectClass *oc, void *data)
     nvc->write_label_data = nvdimm_write_label_data;
 }
 
+static bool nvdimm_get_reserve_label_data(Object *obj, Error **errp)
+{
+    NVDIMMDevice *nvdimm = NVDIMM(obj);
+
+    return nvdimm->reserve_label_data;
+}
+
+static void
+nvdimm_set_reserve_label_data(Object *obj, bool value, Error **errp)
+{
+    NVDIMMDevice *nvdimm = NVDIMM(obj);
+
+    nvdimm->reserve_label_data = value;
+}
+
+static void nvdimm_init(Object *obj)
+{
+    object_property_add_bool(obj, "reserve-label-data",
+                             nvdimm_get_reserve_label_data,
+                             nvdimm_set_reserve_label_data, NULL);
+}
+
 static TypeInfo nvdimm_info = {
     .name          = TYPE_NVDIMM,
     .parent        = TYPE_DIMM,
     .instance_size = sizeof(NVDIMMDevice),
+    .instance_init = nvdimm_init,
     .class_init    = nvdimm_class_init,
     .class_size    = sizeof(NVDIMMClass),
 };
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index 7fdf591..b6ac266 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -63,6 +63,12 @@ struct NVDIMMDevice {
     /* public */
 
     /*
+     * if we need to reserve memory region for NVDIMM label data at
+     * the end of backend memory?
+     */
+    bool reserve_label_data;
+
+    /*
      * the size of label data in NVDIMM device which is presented to
      * guest via __DSM "Get Namespace Label Size" command.
      */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 33/35] nvdimm: allow using whole backend memory as pmem
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Introduce a parameter, named "reserve-label-data", if it is
false which indicates that QEMU does not reserve any region
on the backend memory to support label data. It is a
'label-less' NVDIMM device mode that linux will use whole
memory on the device as a single namesapce

This is useful for the users who want to pass whole nvdimm
device and make its data completely be visible to guest

The parameter is false on default

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c        | 12 ++++++++++++
 hw/mem/nvdimm.c         | 43 ++++++++++++++++++++++++++++++++++++-------
 include/hw/mem/nvdimm.h |  6 ++++++
 3 files changed, 54 insertions(+), 7 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 2553be9..eab5d9c 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -531,6 +531,12 @@ static void nvdimm_dsm_func_label_size(NVDIMMDevice *nvdimm, GArray *out)
 static uint32_t nvdimm_rw_label_data_check(NVDIMMDevice *nvdimm,
                                            uint32_t offset, uint32_t length)
 {
+    if (!nvdimm->reserve_label_data) {
+        nvdimm_debug("read/write label request on the device without "
+                     "label data reserved.\n");
+        return NVDIMM_DSM_STATUS_NOT_SUPPORTED;
+    }
+
     if (offset + length < offset) {
         nvdimm_debug("offset %#x + length %#x is overflow.\n", offset,
                      length);
@@ -637,6 +643,12 @@ static void nvdimm_dsm_device(NvdimmDsmIn *in, GArray *out)
                                1 << 4 /* Get Namespace Label Size */ |
                                1 << 5 /* Get Namespace Label Data */ |
                                1 << 6 /* Set Namespace Label Data */);
+
+        /* no function support if the device does not have label data. */
+        if (!nvdimm->reserve_label_data) {
+            cmd_list = cpu_to_le64(0);
+        }
+
         build_append_int_noprefix(out, cmd_list, sizeof(cmd_list));
         goto free;
     case 0x4 /* Get Namespace Label Size */:
diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
index c310887..fde1c7f 100644
--- a/hw/mem/nvdimm.c
+++ b/hw/mem/nvdimm.c
@@ -39,14 +39,15 @@ static void nvdimm_realize(DIMMDevice *dimm, Error **errp)
 {
     MemoryRegion *mr;
     NVDIMMDevice *nvdimm = NVDIMM(dimm);
-    uint64_t size;
+    uint64_t reserved_label_size, size;
 
     nvdimm->label_size = MIN_NAMESPACE_LABEL_SIZE;
+    reserved_label_size = nvdimm->reserve_label_data ? nvdimm->label_size : 0;
 
     mr = host_memory_backend_get_memory(dimm->hostmem, errp);
     size = memory_region_size(mr);
 
-    if (size <= nvdimm->label_size) {
+    if (size <= reserved_label_size) {
         char *path = object_get_canonical_path_component(OBJECT(dimm->hostmem));
         error_setg(errp, "the size of memdev %s (0x%" PRIx64 ") is too small"
                    " to contain nvdimm namespace label (0x%" PRIx64 ")", path,
@@ -55,15 +56,19 @@ static void nvdimm_realize(DIMMDevice *dimm, Error **errp)
     }
 
     memory_region_init_alias(&nvdimm->nvdimm_mr, OBJECT(dimm), "nvdimm-memory",
-                             mr, 0, size - nvdimm->label_size);
-    nvdimm->label_data = memory_region_get_ram_ptr(mr) +
-                         memory_region_size(&nvdimm->nvdimm_mr);
+                             mr, 0, size - reserved_label_size);
+
+    if (reserved_label_size) {
+        nvdimm->label_data = memory_region_get_ram_ptr(mr) +
+                             memory_region_size(&nvdimm->nvdimm_mr);
+    }
 }
 
 static void nvdimm_read_label_data(NVDIMMDevice *nvdimm, void *buf,
                                    uint64_t size, uint64_t offset)
 {
-    assert((nvdimm->label_size >= size + offset) && (offset + size > offset));
+    assert(nvdimm->reserve_label_data &&
+           (nvdimm->label_size >= size + offset) && (offset + size > offset));
 
     memcpy(buf, nvdimm->label_data + offset, size);
 }
@@ -75,7 +80,8 @@ static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf,
     DIMMDevice *dimm = DIMM(nvdimm);
     uint64_t backend_offset;
 
-    assert((nvdimm->label_size >= size + offset) && (offset + size > offset));
+    assert(nvdimm->reserve_label_data &&
+           (nvdimm->label_size >= size + offset) && (offset + size > offset));
 
     memcpy(nvdimm->label_data + offset, buf, size);
 
@@ -100,10 +106,33 @@ static void nvdimm_class_init(ObjectClass *oc, void *data)
     nvc->write_label_data = nvdimm_write_label_data;
 }
 
+static bool nvdimm_get_reserve_label_data(Object *obj, Error **errp)
+{
+    NVDIMMDevice *nvdimm = NVDIMM(obj);
+
+    return nvdimm->reserve_label_data;
+}
+
+static void
+nvdimm_set_reserve_label_data(Object *obj, bool value, Error **errp)
+{
+    NVDIMMDevice *nvdimm = NVDIMM(obj);
+
+    nvdimm->reserve_label_data = value;
+}
+
+static void nvdimm_init(Object *obj)
+{
+    object_property_add_bool(obj, "reserve-label-data",
+                             nvdimm_get_reserve_label_data,
+                             nvdimm_set_reserve_label_data, NULL);
+}
+
 static TypeInfo nvdimm_info = {
     .name          = TYPE_NVDIMM,
     .parent        = TYPE_DIMM,
     .instance_size = sizeof(NVDIMMDevice),
+    .instance_init = nvdimm_init,
     .class_init    = nvdimm_class_init,
     .class_size    = sizeof(NVDIMMClass),
 };
diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index 7fdf591..b6ac266 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -63,6 +63,12 @@ struct NVDIMMDevice {
     /* public */
 
     /*
+     * if we need to reserve memory region for NVDIMM label data at
+     * the end of backend memory?
+     */
+    bool reserve_label_data;
+
+    /*
      * the size of label data in NVDIMM device which is presented to
      * guest via __DSM "Get Namespace Label Size" command.
      */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 34/35] nvdimm acpi: support _FIT method
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

FIT buffer is not completely mapped into guest address space, so a new
function, Read FIT, function index 0xFFFFFFFF, is reserved by QEMU to
read the piece of FIT buffer. The buffer is concatenated before _FIT
return

Refer to docs/specs/acpi-nvdimm.txt for detailed design

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c | 168 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 164 insertions(+), 4 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index eab5d9c..f5b5c12 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -384,6 +384,18 @@ static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
     g_array_free(structures, true);
 }
 
+/*
+ * define UUID for NVDIMM Root Device according to Chapter 3 DSM Interface
+ * for NVDIMM Root Device - Example in DSM Spec Rev1.
+ */
+#define NVDIMM_DSM_ROOT_UUID "2F10E7A4-9E91-11E4-89D3-123B93F75CBA"
+
+/*
+ * Read FIT Function, which is a QEMU internal use only function, more detail
+ * refer to docs/specs/acpi_nvdimm.txt
+ */
+#define NVDIMM_DSM_FUNC_READ_FIT 0xFFFFFFFF
+
 /* define NVDIMM DSM return status codes according to DSM Spec Rev1. */
 enum {
     /* Common return status codes. */
@@ -420,6 +432,11 @@ struct NvdimmFuncInSetLabelData {
 } QEMU_PACKED;
 typedef struct NvdimmFuncInSetLabelData NvdimmFuncInSetLabelData;
 
+struct NvdimmFuncInReadFit {
+    uint32_t offset; /* fit offset */
+} QEMU_PACKED;
+typedef struct NvdimmFuncInReadFit NvdimmFuncInReadFit;
+
 struct NvdimmDsmIn {
     uint32_t handle;
     uint32_t revision;
@@ -429,6 +446,7 @@ struct NvdimmDsmIn {
         uint8_t arg3[0];
         NvdimmFuncInSetLabelData func_set_label_data;
         NvdimmFuncInGetLabelData func_get_label_data;
+        NvdimmFuncInReadFit func_read_fit;
     };
 } QEMU_PACKED;
 typedef struct NvdimmDsmIn NvdimmDsmIn;
@@ -450,13 +468,71 @@ struct NvdimmFuncOutGetLabelData {
 } QEMU_PACKED;
 typedef struct NvdimmFuncOutGetLabelData NvdimmFuncOutGetLabelData;
 
+struct NvdimmFuncOutReadFit {
+    uint32_t status;    /* return status code. */
+    uint32_t length;    /* the length of fit data we read. */
+    uint8_t fit_data[0]; /* fit data. */
+} QEMU_PACKED;
+typedef struct NvdimmFuncOutReadFit NvdimmFuncOutReadFit;
+
 static void nvdimm_dsm_write_status(GArray *out, uint32_t status)
 {
     status = cpu_to_le32(status);
     build_append_int_noprefix(out, status, sizeof(status));
 }
 
-static void nvdimm_dsm_root(NvdimmDsmIn *in, GArray *out)
+/* Build fit memory which is presented to guest via _FIT method. */
+static void nvdimm_build_fit(AcpiNVDIMMState *state)
+{
+    if (!state->fit) {
+        GSList *device_list = nvdimm_get_plugged_device_list();
+
+        nvdimm_debug("Rebuild FIT...\n");
+        state->fit = nvdimm_build_device_structure(device_list);
+        g_slist_free(device_list);
+    }
+}
+
+/* Read FIT data, defined in docs/specs/acpi_nvdimm.txt. */
+static void nvdimm_dsm_func_read_fit(AcpiNVDIMMState *state,
+                                     NvdimmDsmIn *in, GArray *out)
+{
+    NvdimmFuncInReadFit *read_fit = &in->func_read_fit;
+    NvdimmFuncOutReadFit fit_out;
+    uint32_t read_length = TARGET_PAGE_SIZE - sizeof(NvdimmFuncOutReadFit);
+    uint32_t status = NVDIMM_DSM_ROOT_DEV_STATUS_INVALID_PARAS;
+
+    nvdimm_build_fit(state);
+
+    le32_to_cpus(&read_fit->offset);
+
+    nvdimm_debug("Read FIT offset %#x.\n", read_fit->offset);
+
+    if (read_fit->offset > state->fit->len) {
+        nvdimm_debug("offset %#x is beyond fit size (%#x).\n",
+                     read_fit->offset, state->fit->len);
+        goto exit;
+    }
+
+    read_length = MIN(read_length, state->fit->len - read_fit->offset);
+    nvdimm_debug("read length %#x.\n", read_length);
+
+    fit_out.status = cpu_to_le32(NVDIMM_DSM_STATUS_SUCCESS);
+    fit_out.length = cpu_to_le32(read_length);
+    g_array_append_vals(out, &fit_out, sizeof(fit_out));
+
+    if (read_length) {
+        g_array_append_vals(out, state->fit->data + read_fit->offset,
+                            read_length);
+    }
+    return;
+
+exit:
+    nvdimm_dsm_write_status(out, status);
+}
+
+static void nvdimm_dsm_root(AcpiNVDIMMState *state, NvdimmDsmIn *in,
+                            GArray *out)
 {
     uint32_t status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
 
@@ -475,6 +551,10 @@ static void nvdimm_dsm_root(NvdimmDsmIn *in, GArray *out)
         return;
     }
 
+    if (in->function == NVDIMM_DSM_FUNC_READ_FIT /* FIT Read */) {
+        return nvdimm_dsm_func_read_fit(state, in, out);
+    }
+
     nvdimm_debug("Return status %#x.\n", status);
     nvdimm_dsm_write_status(out, status);
 }
@@ -710,7 +790,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 
     /* Handle 0 is reserved for NVDIMM Root Device. */
     if (!in->handle) {
-        nvdimm_dsm_root(in, out);
+        nvdimm_dsm_root(state, in, out);
         goto exit;
     }
 
@@ -927,8 +1007,88 @@ static void nvdimm_build_acpi_devices(GSList *device_list, Aml *sb_scope)
      */
     BUILD_DSM_METHOD(dev, method,
                      0 /* 0 is reserved for NVDIMM Root Device*/,
-                     "2F10E7A4-9E91-11E4-89D3-123B93F75CBA"
-                     /* UUID for NVDIMM Root Devices. */);
+                     NVDIMM_DSM_ROOT_UUID /* UUID for NVDIMM Root Devices. */);
+
+    method = aml_method("RFIT", 1);
+    {
+        Aml *ret, *pckg, *ifcond, *ifctx, *dsm_return = aml_local(0);
+
+        aml_append(method, aml_create_dword_field(aml_buffer(4, NULL),
+                                                  aml_int(0), "OFST"));
+
+        /* prepare NvdimmFuncInReadFit.offset */
+        aml_append(method, aml_store(aml_arg(0), aml_name("OFST")));
+        pckg = aml_package(1);
+        aml_append(pckg, aml_name("OFST"));
+
+        ret = aml_call4("_DSM",
+                        aml_touuid(NVDIMM_DSM_ROOT_UUID) /* Root Device UUID */,
+                        aml_int(1) /* Revision 1 */,
+                        aml_int(NVDIMM_DSM_FUNC_READ_FIT) /* Read FIT
+                                                             Function Index */,
+                        pckg);
+        aml_append(method, aml_store(ret, dsm_return));
+
+        aml_append(method, aml_create_dword_field(dsm_return,
+                                          aml_int(0) /* offset at byte 0 */,
+                                          "STAU"));
+        /* if something is wrong during _DSM. */
+        ifcond = aml_equal(aml_int(NVDIMM_DSM_STATUS_SUCCESS),
+                           aml_name("STAU"));
+        ifctx = aml_if(aml_lnot(ifcond));
+        {
+            aml_append(ifctx, aml_return(aml_buffer(0, NULL)));
+        }
+        aml_append(method, ifctx);
+
+        aml_append(method, aml_create_dword_field(dsm_return,
+                                          aml_int(4) /* offset at byte 4. */,
+                                          "BFSZ"));
+        /* if we read the end of fit. */
+        ifctx = aml_if(aml_equal(aml_name("BFSZ"), aml_int(0)));
+        {
+            aml_append(ifctx, aml_return(aml_buffer(0, NULL)));
+        }
+        aml_append(method, ifctx);
+
+        aml_append(method, aml_store(aml_shiftleft(aml_name("BFSZ"),
+                                                   aml_int(3)), aml_local(6)));
+        aml_append(method, aml_create_field(dsm_return,
+                            aml_int(8 * BITS_PER_BYTE), /* offset at byte 8.*/
+                            aml_local(6), "BUFF"));
+        aml_append(method, aml_return(aml_name("BUFF")));
+    }
+    aml_append(dev, method);
+
+    method = aml_method("_FIT", 0);
+    {
+        Aml *whilectx, *fit = aml_local(0), *offset = aml_local(1);
+
+        aml_append(method, aml_store(aml_buffer(0, NULL), fit));
+        aml_append(method, aml_store(aml_int(0), offset));
+
+        whilectx = aml_while(aml_int(1));
+        {
+            Aml *ifctx, *buf = aml_local(2), *bufsize = aml_local(3);
+
+            aml_append(whilectx, aml_store(aml_call1("RFIT", offset), buf));
+            aml_append(whilectx, aml_store(aml_sizeof(buf), bufsize));
+
+            /* finish fit read if no data is read out. */
+            ifctx = aml_if(aml_equal(bufsize, aml_int(0)));
+            {
+                aml_append(ifctx, aml_return(fit));
+            }
+            aml_append(whilectx, ifctx);
+
+            /* update the offset. */
+            aml_append(whilectx, aml_store(aml_add(offset, bufsize), offset));
+            /* append the data we read out to the fit buffer. */
+            aml_append(whilectx, aml_concatenate(fit, buf, fit));
+        }
+        aml_append(method, whilectx);
+    }
+    aml_append(dev, method);
 
     build_nvdimm_devices(device_list, dev);
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 34/35] nvdimm acpi: support _FIT method
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

FIT buffer is not completely mapped into guest address space, so a new
function, Read FIT, function index 0xFFFFFFFF, is reserved by QEMU to
read the piece of FIT buffer. The buffer is concatenated before _FIT
return

Refer to docs/specs/acpi-nvdimm.txt for detailed design

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 hw/acpi/nvdimm.c | 168 +++++++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 164 insertions(+), 4 deletions(-)

diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index eab5d9c..f5b5c12 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -384,6 +384,18 @@ static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
     g_array_free(structures, true);
 }
 
+/*
+ * define UUID for NVDIMM Root Device according to Chapter 3 DSM Interface
+ * for NVDIMM Root Device - Example in DSM Spec Rev1.
+ */
+#define NVDIMM_DSM_ROOT_UUID "2F10E7A4-9E91-11E4-89D3-123B93F75CBA"
+
+/*
+ * Read FIT Function, which is a QEMU internal use only function, more detail
+ * refer to docs/specs/acpi_nvdimm.txt
+ */
+#define NVDIMM_DSM_FUNC_READ_FIT 0xFFFFFFFF
+
 /* define NVDIMM DSM return status codes according to DSM Spec Rev1. */
 enum {
     /* Common return status codes. */
@@ -420,6 +432,11 @@ struct NvdimmFuncInSetLabelData {
 } QEMU_PACKED;
 typedef struct NvdimmFuncInSetLabelData NvdimmFuncInSetLabelData;
 
+struct NvdimmFuncInReadFit {
+    uint32_t offset; /* fit offset */
+} QEMU_PACKED;
+typedef struct NvdimmFuncInReadFit NvdimmFuncInReadFit;
+
 struct NvdimmDsmIn {
     uint32_t handle;
     uint32_t revision;
@@ -429,6 +446,7 @@ struct NvdimmDsmIn {
         uint8_t arg3[0];
         NvdimmFuncInSetLabelData func_set_label_data;
         NvdimmFuncInGetLabelData func_get_label_data;
+        NvdimmFuncInReadFit func_read_fit;
     };
 } QEMU_PACKED;
 typedef struct NvdimmDsmIn NvdimmDsmIn;
@@ -450,13 +468,71 @@ struct NvdimmFuncOutGetLabelData {
 } QEMU_PACKED;
 typedef struct NvdimmFuncOutGetLabelData NvdimmFuncOutGetLabelData;
 
+struct NvdimmFuncOutReadFit {
+    uint32_t status;    /* return status code. */
+    uint32_t length;    /* the length of fit data we read. */
+    uint8_t fit_data[0]; /* fit data. */
+} QEMU_PACKED;
+typedef struct NvdimmFuncOutReadFit NvdimmFuncOutReadFit;
+
 static void nvdimm_dsm_write_status(GArray *out, uint32_t status)
 {
     status = cpu_to_le32(status);
     build_append_int_noprefix(out, status, sizeof(status));
 }
 
-static void nvdimm_dsm_root(NvdimmDsmIn *in, GArray *out)
+/* Build fit memory which is presented to guest via _FIT method. */
+static void nvdimm_build_fit(AcpiNVDIMMState *state)
+{
+    if (!state->fit) {
+        GSList *device_list = nvdimm_get_plugged_device_list();
+
+        nvdimm_debug("Rebuild FIT...\n");
+        state->fit = nvdimm_build_device_structure(device_list);
+        g_slist_free(device_list);
+    }
+}
+
+/* Read FIT data, defined in docs/specs/acpi_nvdimm.txt. */
+static void nvdimm_dsm_func_read_fit(AcpiNVDIMMState *state,
+                                     NvdimmDsmIn *in, GArray *out)
+{
+    NvdimmFuncInReadFit *read_fit = &in->func_read_fit;
+    NvdimmFuncOutReadFit fit_out;
+    uint32_t read_length = TARGET_PAGE_SIZE - sizeof(NvdimmFuncOutReadFit);
+    uint32_t status = NVDIMM_DSM_ROOT_DEV_STATUS_INVALID_PARAS;
+
+    nvdimm_build_fit(state);
+
+    le32_to_cpus(&read_fit->offset);
+
+    nvdimm_debug("Read FIT offset %#x.\n", read_fit->offset);
+
+    if (read_fit->offset > state->fit->len) {
+        nvdimm_debug("offset %#x is beyond fit size (%#x).\n",
+                     read_fit->offset, state->fit->len);
+        goto exit;
+    }
+
+    read_length = MIN(read_length, state->fit->len - read_fit->offset);
+    nvdimm_debug("read length %#x.\n", read_length);
+
+    fit_out.status = cpu_to_le32(NVDIMM_DSM_STATUS_SUCCESS);
+    fit_out.length = cpu_to_le32(read_length);
+    g_array_append_vals(out, &fit_out, sizeof(fit_out));
+
+    if (read_length) {
+        g_array_append_vals(out, state->fit->data + read_fit->offset,
+                            read_length);
+    }
+    return;
+
+exit:
+    nvdimm_dsm_write_status(out, status);
+}
+
+static void nvdimm_dsm_root(AcpiNVDIMMState *state, NvdimmDsmIn *in,
+                            GArray *out)
 {
     uint32_t status = NVDIMM_DSM_STATUS_NOT_SUPPORTED;
 
@@ -475,6 +551,10 @@ static void nvdimm_dsm_root(NvdimmDsmIn *in, GArray *out)
         return;
     }
 
+    if (in->function == NVDIMM_DSM_FUNC_READ_FIT /* FIT Read */) {
+        return nvdimm_dsm_func_read_fit(state, in, out);
+    }
+
     nvdimm_debug("Return status %#x.\n", status);
     nvdimm_dsm_write_status(out, status);
 }
@@ -710,7 +790,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 
     /* Handle 0 is reserved for NVDIMM Root Device. */
     if (!in->handle) {
-        nvdimm_dsm_root(in, out);
+        nvdimm_dsm_root(state, in, out);
         goto exit;
     }
 
@@ -927,8 +1007,88 @@ static void nvdimm_build_acpi_devices(GSList *device_list, Aml *sb_scope)
      */
     BUILD_DSM_METHOD(dev, method,
                      0 /* 0 is reserved for NVDIMM Root Device*/,
-                     "2F10E7A4-9E91-11E4-89D3-123B93F75CBA"
-                     /* UUID for NVDIMM Root Devices. */);
+                     NVDIMM_DSM_ROOT_UUID /* UUID for NVDIMM Root Devices. */);
+
+    method = aml_method("RFIT", 1);
+    {
+        Aml *ret, *pckg, *ifcond, *ifctx, *dsm_return = aml_local(0);
+
+        aml_append(method, aml_create_dword_field(aml_buffer(4, NULL),
+                                                  aml_int(0), "OFST"));
+
+        /* prepare NvdimmFuncInReadFit.offset */
+        aml_append(method, aml_store(aml_arg(0), aml_name("OFST")));
+        pckg = aml_package(1);
+        aml_append(pckg, aml_name("OFST"));
+
+        ret = aml_call4("_DSM",
+                        aml_touuid(NVDIMM_DSM_ROOT_UUID) /* Root Device UUID */,
+                        aml_int(1) /* Revision 1 */,
+                        aml_int(NVDIMM_DSM_FUNC_READ_FIT) /* Read FIT
+                                                             Function Index */,
+                        pckg);
+        aml_append(method, aml_store(ret, dsm_return));
+
+        aml_append(method, aml_create_dword_field(dsm_return,
+                                          aml_int(0) /* offset at byte 0 */,
+                                          "STAU"));
+        /* if something is wrong during _DSM. */
+        ifcond = aml_equal(aml_int(NVDIMM_DSM_STATUS_SUCCESS),
+                           aml_name("STAU"));
+        ifctx = aml_if(aml_lnot(ifcond));
+        {
+            aml_append(ifctx, aml_return(aml_buffer(0, NULL)));
+        }
+        aml_append(method, ifctx);
+
+        aml_append(method, aml_create_dword_field(dsm_return,
+                                          aml_int(4) /* offset at byte 4. */,
+                                          "BFSZ"));
+        /* if we read the end of fit. */
+        ifctx = aml_if(aml_equal(aml_name("BFSZ"), aml_int(0)));
+        {
+            aml_append(ifctx, aml_return(aml_buffer(0, NULL)));
+        }
+        aml_append(method, ifctx);
+
+        aml_append(method, aml_store(aml_shiftleft(aml_name("BFSZ"),
+                                                   aml_int(3)), aml_local(6)));
+        aml_append(method, aml_create_field(dsm_return,
+                            aml_int(8 * BITS_PER_BYTE), /* offset at byte 8.*/
+                            aml_local(6), "BUFF"));
+        aml_append(method, aml_return(aml_name("BUFF")));
+    }
+    aml_append(dev, method);
+
+    method = aml_method("_FIT", 0);
+    {
+        Aml *whilectx, *fit = aml_local(0), *offset = aml_local(1);
+
+        aml_append(method, aml_store(aml_buffer(0, NULL), fit));
+        aml_append(method, aml_store(aml_int(0), offset));
+
+        whilectx = aml_while(aml_int(1));
+        {
+            Aml *ifctx, *buf = aml_local(2), *bufsize = aml_local(3);
+
+            aml_append(whilectx, aml_store(aml_call1("RFIT", offset), buf));
+            aml_append(whilectx, aml_store(aml_sizeof(buf), bufsize));
+
+            /* finish fit read if no data is read out. */
+            ifctx = aml_if(aml_equal(bufsize, aml_int(0)));
+            {
+                aml_append(ifctx, aml_return(fit));
+            }
+            aml_append(whilectx, ifctx);
+
+            /* update the offset. */
+            aml_append(whilectx, aml_store(aml_add(offset, bufsize), offset));
+            /* append the data we read out to the fit buffer. */
+            aml_append(whilectx, aml_concatenate(fit, buf, fit));
+        }
+        aml_append(method, whilectx);
+    }
+    aml_append(dev, method);
 
     build_nvdimm_devices(device_list, dev);
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [PATCH v7 35/35] nvdimm: add maintain info
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02  9:13   ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake, Xiao Guangrong

Add NVDIMM maintainer

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 MAINTAINERS | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3144113..865c0cf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -907,6 +907,13 @@ M: Jiri Pirko <jiri@resnulli.us>
 S: Maintained
 F: hw/net/rocker/
 
+NVDIMM
+M: Xiao Guangrong <guangrong.xiao@linux.intel.com>
+S: Maintained
+F: hw/acpi/nvdimm.c
+F: hw/mem/nvdimm.c
+F: include/hw/mem/nvdimm.h
+
 Subsystems
 ----------
 Audio
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 200+ messages in thread

* [Qemu-devel] [PATCH v7 35/35] nvdimm: add maintain info
@ 2015-11-02  9:13   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02  9:13 UTC (permalink / raw)
  To: pbonzini, imammedo
  Cc: vsementsov, Xiao Guangrong, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

Add NVDIMM maintainer

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 MAINTAINERS | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3144113..865c0cf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -907,6 +907,13 @@ M: Jiri Pirko <jiri@resnulli.us>
 S: Maintained
 F: hw/net/rocker/
 
+NVDIMM
+M: Xiao Guangrong <guangrong.xiao@linux.intel.com>
+S: Maintained
+F: hw/acpi/nvdimm.c
+F: hw/mem/nvdimm.c
+F: include/hw/mem/nvdimm.h
+
 Subsystems
 ----------
 Audio
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 00/35] implement vNVDIMM
  2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02 11:51   ` Stefan Hajnoczi
  -1 siblings, 0 replies; 200+ messages in thread
From: Stefan Hajnoczi @ 2015-11-02 11:51 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth, ehabkost,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake

[-- Attachment #1: Type: text/plain, Size: 84 bytes --]

I have reviewed ACPI interface:

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 00/35] implement vNVDIMM
@ 2015-11-02 11:51   ` Stefan Hajnoczi
  0 siblings, 0 replies; 200+ messages in thread
From: Stefan Hajnoczi @ 2015-11-02 11:51 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, imammedo, pbonzini, dan.j.williams, rth

[-- Attachment #1: Type: text/plain, Size: 84 bytes --]

I have reviewed ACPI interface:

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02 12:19     ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 12:19 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake

On 02.11.2015 12:13, Xiao Guangrong wrote:
> Curretly, the memory region of backed memory is directly mapped to
> guest's address space, however, it is not true for nvdimm device
>
> This patch let dimm device realize this fact and use
> DIMMDeviceClass->get_memory_region method to get the mapped memory
> region
>
> Current code did not check the return value of get_memory_region as it
> assumed the backend memory of pc-dimm is always properly initialized,
> we make get_memory_region internally catch the case if something is
> wrong
>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   hw/mem/dimm.c    |  3 ++-
>   hw/mem/pc-dimm.c | 12 +++++++++++-
>   2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
> index 4a63409..498d380 100644
> --- a/hw/mem/dimm.c
> +++ b/hw/mem/dimm.c
> @@ -377,8 +377,9 @@ static void dimm_get_size(Object *obj, Visitor *v, void *opaque,
>       int64_t value;
>       MemoryRegion *mr;
>       DIMMDevice *dimm = DIMM(obj);
> +    DIMMDeviceClass *ddc = DIMM_GET_CLASS(obj);
>   
> -    mr = host_memory_backend_get_memory(dimm->hostmem, errp);
> +    mr = ddc->get_memory_region(dimm);
>       value = memory_region_size(mr);
>   
>       visit_type_int(v, &value, name, errp);
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index 38323e9..e6b6a9f 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -22,7 +22,17 @@
>   
>   static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
>   {
> -    return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
> +    Error *local_err = NULL;
> +    MemoryRegion *mr;
> +
> +    mr = host_memory_backend_get_memory(dimm->hostmem, &local_err);
> +
> +    /*
> +     * plug a pc-dimm device whose backend memory was not properly
> +     * initialized?
> +     */
> +    assert(!local_err && mr);
> +    return mr;
>   }

this should squashed into previous patch, I think

>   
>   static void pc_dimm_class_init(ObjectClass *oc, void *data)

I've discovered suddenly, that

MemoryRegion *
host_memory_backend_get_memory(HostMemoryBackend *backend, Error **errp)
{
     return memory_region_size(&backend->mr) ? &backend->mr : NULL;
}

- it doesn't use errp at all. In my opinion, this should be fixed 
globally, by deleting useless parameter in separate patch. Or just 
squash your function into previous patch.

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
@ 2015-11-02 12:19     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 12:19 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth

On 02.11.2015 12:13, Xiao Guangrong wrote:
> Curretly, the memory region of backed memory is directly mapped to
> guest's address space, however, it is not true for nvdimm device
>
> This patch let dimm device realize this fact and use
> DIMMDeviceClass->get_memory_region method to get the mapped memory
> region
>
> Current code did not check the return value of get_memory_region as it
> assumed the backend memory of pc-dimm is always properly initialized,
> we make get_memory_region internally catch the case if something is
> wrong
>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   hw/mem/dimm.c    |  3 ++-
>   hw/mem/pc-dimm.c | 12 +++++++++++-
>   2 files changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
> index 4a63409..498d380 100644
> --- a/hw/mem/dimm.c
> +++ b/hw/mem/dimm.c
> @@ -377,8 +377,9 @@ static void dimm_get_size(Object *obj, Visitor *v, void *opaque,
>       int64_t value;
>       MemoryRegion *mr;
>       DIMMDevice *dimm = DIMM(obj);
> +    DIMMDeviceClass *ddc = DIMM_GET_CLASS(obj);
>   
> -    mr = host_memory_backend_get_memory(dimm->hostmem, errp);
> +    mr = ddc->get_memory_region(dimm);
>       value = memory_region_size(mr);
>   
>       visit_type_int(v, &value, name, errp);
> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
> index 38323e9..e6b6a9f 100644
> --- a/hw/mem/pc-dimm.c
> +++ b/hw/mem/pc-dimm.c
> @@ -22,7 +22,17 @@
>   
>   static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
>   {
> -    return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
> +    Error *local_err = NULL;
> +    MemoryRegion *mr;
> +
> +    mr = host_memory_backend_get_memory(dimm->hostmem, &local_err);
> +
> +    /*
> +     * plug a pc-dimm device whose backend memory was not properly
> +     * initialized?
> +     */
> +    assert(!local_err && mr);
> +    return mr;
>   }

this should squashed into previous patch, I think

>   
>   static void pc_dimm_class_init(ObjectClass *oc, void *data)

I've discovered suddenly, that

MemoryRegion *
host_memory_backend_get_memory(HostMemoryBackend *backend, Error **errp)
{
     return memory_region_size(&backend->mr) ? &backend->mr : NULL;
}

- it doesn't use errp at all. In my opinion, this should be fixed 
globally, by deleting useless parameter in separate patch. Or just 
squash your function into previous patch.

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
  2015-11-02 12:19     ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
@ 2015-11-02 13:08       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02 13:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake



On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.11.2015 12:13, Xiao Guangrong wrote:
>> Curretly, the memory region of backed memory is directly mapped to
>> guest's address space, however, it is not true for nvdimm device
>>
>> This patch let dimm device realize this fact and use
>> DIMMDeviceClass->get_memory_region method to get the mapped memory
>> region
>>
>> Current code did not check the return value of get_memory_region as it
>> assumed the backend memory of pc-dimm is always properly initialized,
>> we make get_memory_region internally catch the case if something is
>> wrong
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   hw/mem/dimm.c    |  3 ++-
>>   hw/mem/pc-dimm.c | 12 +++++++++++-
>>   2 files changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
>> index 4a63409..498d380 100644
>> --- a/hw/mem/dimm.c
>> +++ b/hw/mem/dimm.c
>> @@ -377,8 +377,9 @@ static void dimm_get_size(Object *obj, Visitor *v, void *opaque,
>>       int64_t value;
>>       MemoryRegion *mr;
>>       DIMMDevice *dimm = DIMM(obj);
>> +    DIMMDeviceClass *ddc = DIMM_GET_CLASS(obj);
>> -    mr = host_memory_backend_get_memory(dimm->hostmem, errp);
>> +    mr = ddc->get_memory_region(dimm);
>>       value = memory_region_size(mr);
>>       visit_type_int(v, &value, name, errp);
>> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
>> index 38323e9..e6b6a9f 100644
>> --- a/hw/mem/pc-dimm.c
>> +++ b/hw/mem/pc-dimm.c
>> @@ -22,7 +22,17 @@
>>   static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
>>   {
>> -    return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
>> +    Error *local_err = NULL;
>> +    MemoryRegion *mr;
>> +
>> +    mr = host_memory_backend_get_memory(dimm->hostmem, &local_err);
>> +
>> +    /*
>> +     * plug a pc-dimm device whose backend memory was not properly
>> +     * initialized?
>> +     */
>> +    assert(!local_err && mr);
>> +    return mr;
>>   }
>
> this should squashed into previous patch, I think

You mean merger this patch with 19/35 (dimm: abstract dimm device from pc-dimm)?
The 19/35 mostly ‘moves’ the things, this one changes the core logic, it is not
a big deal. :D

>
>>   static void pc_dimm_class_init(ObjectClass *oc, void *data)
>
> I've discovered suddenly, that
>
> MemoryRegion *
> host_memory_backend_get_memory(HostMemoryBackend *backend, Error **errp)
> {
>      return memory_region_size(&backend->mr) ? &backend->mr : NULL;
> }
>
> - it doesn't use errp at all. In my opinion, this should be fixed globally, by deleting useless
> parameter in separate patch. Or just squash your function into previous patch.
>

Yup, this is a globally interface so i prefer to make a separate patch to do the
cleanup after this patchset.



^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
@ 2015-11-02 13:08       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02 13:08 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth



On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.11.2015 12:13, Xiao Guangrong wrote:
>> Curretly, the memory region of backed memory is directly mapped to
>> guest's address space, however, it is not true for nvdimm device
>>
>> This patch let dimm device realize this fact and use
>> DIMMDeviceClass->get_memory_region method to get the mapped memory
>> region
>>
>> Current code did not check the return value of get_memory_region as it
>> assumed the backend memory of pc-dimm is always properly initialized,
>> we make get_memory_region internally catch the case if something is
>> wrong
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   hw/mem/dimm.c    |  3 ++-
>>   hw/mem/pc-dimm.c | 12 +++++++++++-
>>   2 files changed, 13 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
>> index 4a63409..498d380 100644
>> --- a/hw/mem/dimm.c
>> +++ b/hw/mem/dimm.c
>> @@ -377,8 +377,9 @@ static void dimm_get_size(Object *obj, Visitor *v, void *opaque,
>>       int64_t value;
>>       MemoryRegion *mr;
>>       DIMMDevice *dimm = DIMM(obj);
>> +    DIMMDeviceClass *ddc = DIMM_GET_CLASS(obj);
>> -    mr = host_memory_backend_get_memory(dimm->hostmem, errp);
>> +    mr = ddc->get_memory_region(dimm);
>>       value = memory_region_size(mr);
>>       visit_type_int(v, &value, name, errp);
>> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
>> index 38323e9..e6b6a9f 100644
>> --- a/hw/mem/pc-dimm.c
>> +++ b/hw/mem/pc-dimm.c
>> @@ -22,7 +22,17 @@
>>   static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
>>   {
>> -    return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
>> +    Error *local_err = NULL;
>> +    MemoryRegion *mr;
>> +
>> +    mr = host_memory_backend_get_memory(dimm->hostmem, &local_err);
>> +
>> +    /*
>> +     * plug a pc-dimm device whose backend memory was not properly
>> +     * initialized?
>> +     */
>> +    assert(!local_err && mr);
>> +    return mr;
>>   }
>
> this should squashed into previous patch, I think

You mean merger this patch with 19/35 (dimm: abstract dimm device from pc-dimm)?
The 19/35 mostly ‘moves’ the things, this one changes the core logic, it is not
a big deal. :D

>
>>   static void pc_dimm_class_init(ObjectClass *oc, void *data)
>
> I've discovered suddenly, that
>
> MemoryRegion *
> host_memory_backend_get_memory(HostMemoryBackend *backend, Error **errp)
> {
>      return memory_region_size(&backend->mr) ? &backend->mr : NULL;
> }
>
> - it doesn't use errp at all. In my opinion, this should be fixed globally, by deleting useless
> parameter in separate patch. Or just squash your function into previous patch.
>

Yup, this is a globally interface so i prefer to make a separate patch to do the
cleanup after this patchset.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 07/35] util: introduce qemu_file_get_page_size()
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02 13:56     ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 13:56 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake

On 02.11.2015 12:13, Xiao Guangrong wrote:
> There are three places use the some logic to get the page size on
> the file path or file fd
>
> Windows did not support file hugepage, so it will return normal page
> for this case. And this interface has not been used on windows so far
>   
> This patch introduces qemu_file_get_page_size() to unify the code
>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   exec.c               | 33 ++++++---------------------------
>   include/qemu/osdep.h |  1 +
>   target-ppc/kvm.c     | 23 +++++------------------
>   util/oslib-posix.c   | 37 +++++++++++++++++++++++++++++++++----
>   util/oslib-win32.c   |  5 +++++
>   5 files changed, 50 insertions(+), 49 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 8af2570..9de38be 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1174,32 +1174,6 @@ void qemu_mutex_unlock_ramlist(void)
>   }
>   
>   #ifdef __linux__
> -
> -#include <sys/vfs.h>
> -
> -#define HUGETLBFS_MAGIC       0x958458f6
> -
> -static long gethugepagesize(const char *path, Error **errp)
> -{
> -    struct statfs fs;
> -    int ret;
> -
> -    do {
> -        ret = statfs(path, &fs);
> -    } while (ret != 0 && errno == EINTR);
> -
> -    if (ret != 0) {
> -        error_setg_errno(errp, errno, "failed to get page size of file %s",
> -                         path);
> -        return 0;
> -    }
> -
> -    if (fs.f_type != HUGETLBFS_MAGIC)
> -        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
> -
> -    return fs.f_bsize;
> -}
> -
>   static void *file_ram_alloc(RAMBlock *block,
>                               ram_addr_t memory,
>                               const char *path,
> @@ -1213,11 +1187,16 @@ static void *file_ram_alloc(RAMBlock *block,
>       uint64_t hpagesize;
>       Error *local_err = NULL;
>   
> -    hpagesize = gethugepagesize(path, &local_err);
> +    hpagesize = qemu_file_get_page_size(path, &local_err);
>       if (local_err) {
>           error_propagate(errp, local_err);
>           goto error;
>       }
> +
> +    if (hpagesize == getpagesize()) {
> +        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
> +    }
> +
>       block->mr->align = hpagesize;
>   
>       if (memory < hpagesize) {
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index b568424..dbc17dc 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -302,4 +302,5 @@ int qemu_read_password(char *buf, int buf_size);
>    */
>   pid_t qemu_fork(Error **errp);
>   
> +size_t qemu_file_get_page_size(const char *mem_path, Error **errp);
>   #endif
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index ac70f08..d8760ea 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -308,28 +308,15 @@ static void kvm_get_smmu_info(PowerPCCPU *cpu, struct kvm_ppc_smmu_info *info)
>   
>   static long gethugepagesize(const char *mem_path)
>   {
> -    struct statfs fs;
> -    int ret;
> -
> -    do {
> -        ret = statfs(mem_path, &fs);
> -    } while (ret != 0 && errno == EINTR);
> +    Error *local_err = NULL;
> +    long size = qemu_file_get_page_size(mem_path, local_err);
>   
> -    if (ret != 0) {
> -        fprintf(stderr, "Couldn't statfs() memory path: %s\n",
> -                strerror(errno));
> +    if (local_err) {
> +        error_report_err(local_err);
>           exit(1);
>       }
>   
> -#define HUGETLBFS_MAGIC       0x958458f6
> -
> -    if (fs.f_type != HUGETLBFS_MAGIC) {
> -        /* Explicit mempath, but it's ordinary pages */
> -        return getpagesize();
> -    }
> -
> -    /* It's hugepage, return the huge page size */
> -    return fs.f_bsize;
> +    return size;
>   }
>   
>   static int find_max_supported_pagesize(Object *obj, void *opaque)
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 914cef5..51437ff 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -340,7 +340,7 @@ static void sigbus_handler(int signal)
>       siglongjmp(sigjump, 1);
>   }
>   
> -static size_t fd_getpagesize(int fd)
> +static size_t fd_getpagesize(int fd, Error **errp)
>   {
>   #ifdef CONFIG_LINUX
>       struct statfs fs;
> @@ -351,7 +351,12 @@ static size_t fd_getpagesize(int fd)
>               ret = fstatfs(fd, &fs);
>           } while (ret != 0 && errno == EINTR);
>   
> -        if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {
> +        if (ret) {
> +            error_setg_errno(errp, errno, "fstatfs is failed");
> +            return 0;
> +        }
> +
> +        if (fs.f_type == HUGETLBFS_MAGIC) {
>               return fs.f_bsize;
>           }
>       }
> @@ -360,6 +365,22 @@ static size_t fd_getpagesize(int fd)
>       return getpagesize();
>   }
>   
> +size_t qemu_file_get_page_size(const char *path, Error **errp)
> +{
> +    size_t size = 0;
> +    int fd = qemu_open(path, O_RDONLY);
> +
> +    if (fd < 0) {
> +        error_setg_file_open(errp, errno, path);
> +        goto exit;
> +    }
> +
> +    size = fd_getpagesize(fd, errp);
> +    qemu_close(fd);
> +exit:
> +    return size;
> +}
> +
>   void os_mem_prealloc(int fd, char *area, size_t memory)
>   {
>       int ret;
> @@ -387,8 +408,16 @@ void os_mem_prealloc(int fd, char *area, size_t memory)
>           exit(1);
>       } else {
>           int i;
> -        size_t hpagesize = fd_getpagesize(fd);
> -        size_t numpages = DIV_ROUND_UP(memory, hpagesize);
> +        Error *local_err = NULL;
> +        size_t hpagesize = fd_getpagesize(fd, &local_err);
> +        size_t numpages;
> +
> +        if (local_err) {
> +            error_report_err(local_err);
> +            exit(1);
> +        }
> +
> +        numpages = DIV_ROUND_UP(memory, hpagesize);
>   
>           /* MAP_POPULATE silently ignores failures */
>           for (i = 0; i < numpages; i++) {
> diff --git a/util/oslib-win32.c b/util/oslib-win32.c
> index 09f9e98..dada6b6 100644
> --- a/util/oslib-win32.c
> +++ b/util/oslib-win32.c
> @@ -462,6 +462,11 @@ size_t getpagesize(void)
>       return system_info.dwPageSize;
>   }
>   
> +size_t qemu_file_get_page_size(const char *path, Error **errp)
> +{
> +    return getpagesize();
> +}
> +
>   void os_mem_prealloc(int fd, char *area, size_t memory)
>   {
>       int i;

Ok for me. The only thing: some functions about pagesize return size_t, 
when others return long. long is more common practice here, but this 
doesn't really matter, so with or without size_t <-> long changes:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 07/35] util: introduce qemu_file_get_page_size()
@ 2015-11-02 13:56     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 13:56 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth

On 02.11.2015 12:13, Xiao Guangrong wrote:
> There are three places use the some logic to get the page size on
> the file path or file fd
>
> Windows did not support file hugepage, so it will return normal page
> for this case. And this interface has not been used on windows so far
>   
> This patch introduces qemu_file_get_page_size() to unify the code
>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   exec.c               | 33 ++++++---------------------------
>   include/qemu/osdep.h |  1 +
>   target-ppc/kvm.c     | 23 +++++------------------
>   util/oslib-posix.c   | 37 +++++++++++++++++++++++++++++++++----
>   util/oslib-win32.c   |  5 +++++
>   5 files changed, 50 insertions(+), 49 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 8af2570..9de38be 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1174,32 +1174,6 @@ void qemu_mutex_unlock_ramlist(void)
>   }
>   
>   #ifdef __linux__
> -
> -#include <sys/vfs.h>
> -
> -#define HUGETLBFS_MAGIC       0x958458f6
> -
> -static long gethugepagesize(const char *path, Error **errp)
> -{
> -    struct statfs fs;
> -    int ret;
> -
> -    do {
> -        ret = statfs(path, &fs);
> -    } while (ret != 0 && errno == EINTR);
> -
> -    if (ret != 0) {
> -        error_setg_errno(errp, errno, "failed to get page size of file %s",
> -                         path);
> -        return 0;
> -    }
> -
> -    if (fs.f_type != HUGETLBFS_MAGIC)
> -        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
> -
> -    return fs.f_bsize;
> -}
> -
>   static void *file_ram_alloc(RAMBlock *block,
>                               ram_addr_t memory,
>                               const char *path,
> @@ -1213,11 +1187,16 @@ static void *file_ram_alloc(RAMBlock *block,
>       uint64_t hpagesize;
>       Error *local_err = NULL;
>   
> -    hpagesize = gethugepagesize(path, &local_err);
> +    hpagesize = qemu_file_get_page_size(path, &local_err);
>       if (local_err) {
>           error_propagate(errp, local_err);
>           goto error;
>       }
> +
> +    if (hpagesize == getpagesize()) {
> +        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
> +    }
> +
>       block->mr->align = hpagesize;
>   
>       if (memory < hpagesize) {
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index b568424..dbc17dc 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -302,4 +302,5 @@ int qemu_read_password(char *buf, int buf_size);
>    */
>   pid_t qemu_fork(Error **errp);
>   
> +size_t qemu_file_get_page_size(const char *mem_path, Error **errp);
>   #endif
> diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
> index ac70f08..d8760ea 100644
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -308,28 +308,15 @@ static void kvm_get_smmu_info(PowerPCCPU *cpu, struct kvm_ppc_smmu_info *info)
>   
>   static long gethugepagesize(const char *mem_path)
>   {
> -    struct statfs fs;
> -    int ret;
> -
> -    do {
> -        ret = statfs(mem_path, &fs);
> -    } while (ret != 0 && errno == EINTR);
> +    Error *local_err = NULL;
> +    long size = qemu_file_get_page_size(mem_path, local_err);
>   
> -    if (ret != 0) {
> -        fprintf(stderr, "Couldn't statfs() memory path: %s\n",
> -                strerror(errno));
> +    if (local_err) {
> +        error_report_err(local_err);
>           exit(1);
>       }
>   
> -#define HUGETLBFS_MAGIC       0x958458f6
> -
> -    if (fs.f_type != HUGETLBFS_MAGIC) {
> -        /* Explicit mempath, but it's ordinary pages */
> -        return getpagesize();
> -    }
> -
> -    /* It's hugepage, return the huge page size */
> -    return fs.f_bsize;
> +    return size;
>   }
>   
>   static int find_max_supported_pagesize(Object *obj, void *opaque)
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 914cef5..51437ff 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -340,7 +340,7 @@ static void sigbus_handler(int signal)
>       siglongjmp(sigjump, 1);
>   }
>   
> -static size_t fd_getpagesize(int fd)
> +static size_t fd_getpagesize(int fd, Error **errp)
>   {
>   #ifdef CONFIG_LINUX
>       struct statfs fs;
> @@ -351,7 +351,12 @@ static size_t fd_getpagesize(int fd)
>               ret = fstatfs(fd, &fs);
>           } while (ret != 0 && errno == EINTR);
>   
> -        if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {
> +        if (ret) {
> +            error_setg_errno(errp, errno, "fstatfs is failed");
> +            return 0;
> +        }
> +
> +        if (fs.f_type == HUGETLBFS_MAGIC) {
>               return fs.f_bsize;
>           }
>       }
> @@ -360,6 +365,22 @@ static size_t fd_getpagesize(int fd)
>       return getpagesize();
>   }
>   
> +size_t qemu_file_get_page_size(const char *path, Error **errp)
> +{
> +    size_t size = 0;
> +    int fd = qemu_open(path, O_RDONLY);
> +
> +    if (fd < 0) {
> +        error_setg_file_open(errp, errno, path);
> +        goto exit;
> +    }
> +
> +    size = fd_getpagesize(fd, errp);
> +    qemu_close(fd);
> +exit:
> +    return size;
> +}
> +
>   void os_mem_prealloc(int fd, char *area, size_t memory)
>   {
>       int ret;
> @@ -387,8 +408,16 @@ void os_mem_prealloc(int fd, char *area, size_t memory)
>           exit(1);
>       } else {
>           int i;
> -        size_t hpagesize = fd_getpagesize(fd);
> -        size_t numpages = DIV_ROUND_UP(memory, hpagesize);
> +        Error *local_err = NULL;
> +        size_t hpagesize = fd_getpagesize(fd, &local_err);
> +        size_t numpages;
> +
> +        if (local_err) {
> +            error_report_err(local_err);
> +            exit(1);
> +        }
> +
> +        numpages = DIV_ROUND_UP(memory, hpagesize);
>   
>           /* MAP_POPULATE silently ignores failures */
>           for (i = 0; i < numpages; i++) {
> diff --git a/util/oslib-win32.c b/util/oslib-win32.c
> index 09f9e98..dada6b6 100644
> --- a/util/oslib-win32.c
> +++ b/util/oslib-win32.c
> @@ -462,6 +462,11 @@ size_t getpagesize(void)
>       return system_info.dwPageSize;
>   }
>   
> +size_t qemu_file_get_page_size(const char *path, Error **errp)
> +{
> +    return getpagesize();
> +}
> +
>   void os_mem_prealloc(int fd, char *area, size_t memory)
>   {
>       int i;

Ok for me. The only thing: some functions about pagesize return size_t, 
when others return long. long is more common practice here, but this 
doesn't really matter, so with or without size_t <-> long changes:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
  2015-11-02 13:08       ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02 14:26         ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 14:26 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake

On 02.11.2015 16:08, Xiao Guangrong wrote:
>
>
> On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:
>> On 02.11.2015 12:13, Xiao Guangrong wrote:
>>> Curretly, the memory region of backed memory is directly mapped to
>>> guest's address space, however, it is not true for nvdimm device
>>>
>>> This patch let dimm device realize this fact and use
>>> DIMMDeviceClass->get_memory_region method to get the mapped memory
>>> region
>>>
>>> Current code did not check the return value of get_memory_region as it
>>> assumed the backend memory of pc-dimm is always properly initialized,
>>> we make get_memory_region internally catch the case if something is
>>> wrong

but here you call not pc-dimm's get_memory_region, but common 
ddc->get_memory_region, which may be nvdimm or possibly other future 
dimm, so, why not check it here? And than pc_dimm_get_memory_region may 
be left untouched (error_abort is ok, because errp is unused).

>>>
>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>> ---
>>>   hw/mem/dimm.c    |  3 ++-
>>>   hw/mem/pc-dimm.c | 12 +++++++++++-
>>>   2 files changed, 13 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
>>> index 4a63409..498d380 100644
>>> --- a/hw/mem/dimm.c
>>> +++ b/hw/mem/dimm.c
>>> @@ -377,8 +377,9 @@ static void dimm_get_size(Object *obj, Visitor 
>>> *v, void *opaque,
>>>       int64_t value;
>>>       MemoryRegion *mr;
>>>       DIMMDevice *dimm = DIMM(obj);
>>> +    DIMMDeviceClass *ddc = DIMM_GET_CLASS(obj);
>>> -    mr = host_memory_backend_get_memory(dimm->hostmem, errp);
>>> +    mr = ddc->get_memory_region(dimm);
>>>       value = memory_region_size(mr);
>>>       visit_type_int(v, &value, name, errp);
>>> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
>>> index 38323e9..e6b6a9f 100644
>>> --- a/hw/mem/pc-dimm.c
>>> +++ b/hw/mem/pc-dimm.c
>>> @@ -22,7 +22,17 @@
>>>   static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
>>>   {
>>> -    return host_memory_backend_get_memory(dimm->hostmem, 
>>> &error_abort);
>>> +    Error *local_err = NULL;
>>> +    MemoryRegion *mr;
>>> +
>>> +    mr = host_memory_backend_get_memory(dimm->hostmem, &local_err);
>>> +
>>> +    /*
>>> +     * plug a pc-dimm device whose backend memory was not properly
>>> +     * initialized?
>>> +     */
>>> +    assert(!local_err && mr);
>>> +    return mr;
>>>   }
>>
>> this should squashed into previous patch, I think
>
> You mean merger this patch with 19/35 (dimm: abstract dimm device from 
> pc-dimm)?
> The 19/35 mostly ‘moves’ the things, this one changes the core logic, 
> it is not
> a big deal. :D

stupid me, you are right)

>
>>
>>>   static void pc_dimm_class_init(ObjectClass *oc, void *data)
>>
>> I've discovered suddenly, that
>>
>> MemoryRegion *
>> host_memory_backend_get_memory(HostMemoryBackend *backend, Error **errp)
>> {
>>      return memory_region_size(&backend->mr) ? &backend->mr : NULL;
>> }
>>
>> - it doesn't use errp at all. In my opinion, this should be fixed 
>> globally, by deleting useless
>> parameter in separate patch. Or just squash your function into 
>> previous patch.
>>
>
> Yup, this is a globally interface so i prefer to make a separate patch 
> to do the
> cleanup after this patchset.
>
>

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
@ 2015-11-02 14:26         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 14:26 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth

On 02.11.2015 16:08, Xiao Guangrong wrote:
>
>
> On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:
>> On 02.11.2015 12:13, Xiao Guangrong wrote:
>>> Curretly, the memory region of backed memory is directly mapped to
>>> guest's address space, however, it is not true for nvdimm device
>>>
>>> This patch let dimm device realize this fact and use
>>> DIMMDeviceClass->get_memory_region method to get the mapped memory
>>> region
>>>
>>> Current code did not check the return value of get_memory_region as it
>>> assumed the backend memory of pc-dimm is always properly initialized,
>>> we make get_memory_region internally catch the case if something is
>>> wrong

but here you call not pc-dimm's get_memory_region, but common 
ddc->get_memory_region, which may be nvdimm or possibly other future 
dimm, so, why not check it here? And than pc_dimm_get_memory_region may 
be left untouched (error_abort is ok, because errp is unused).

>>>
>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>> ---
>>>   hw/mem/dimm.c    |  3 ++-
>>>   hw/mem/pc-dimm.c | 12 +++++++++++-
>>>   2 files changed, 13 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/hw/mem/dimm.c b/hw/mem/dimm.c
>>> index 4a63409..498d380 100644
>>> --- a/hw/mem/dimm.c
>>> +++ b/hw/mem/dimm.c
>>> @@ -377,8 +377,9 @@ static void dimm_get_size(Object *obj, Visitor 
>>> *v, void *opaque,
>>>       int64_t value;
>>>       MemoryRegion *mr;
>>>       DIMMDevice *dimm = DIMM(obj);
>>> +    DIMMDeviceClass *ddc = DIMM_GET_CLASS(obj);
>>> -    mr = host_memory_backend_get_memory(dimm->hostmem, errp);
>>> +    mr = ddc->get_memory_region(dimm);
>>>       value = memory_region_size(mr);
>>>       visit_type_int(v, &value, name, errp);
>>> diff --git a/hw/mem/pc-dimm.c b/hw/mem/pc-dimm.c
>>> index 38323e9..e6b6a9f 100644
>>> --- a/hw/mem/pc-dimm.c
>>> +++ b/hw/mem/pc-dimm.c
>>> @@ -22,7 +22,17 @@
>>>   static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
>>>   {
>>> -    return host_memory_backend_get_memory(dimm->hostmem, 
>>> &error_abort);
>>> +    Error *local_err = NULL;
>>> +    MemoryRegion *mr;
>>> +
>>> +    mr = host_memory_backend_get_memory(dimm->hostmem, &local_err);
>>> +
>>> +    /*
>>> +     * plug a pc-dimm device whose backend memory was not properly
>>> +     * initialized?
>>> +     */
>>> +    assert(!local_err && mr);
>>> +    return mr;
>>>   }
>>
>> this should squashed into previous patch, I think
>
> You mean merger this patch with 19/35 (dimm: abstract dimm device from 
> pc-dimm)?
> The 19/35 mostly ‘moves’ the things, this one changes the core logic, 
> it is not
> a big deal. :D

stupid me, you are right)

>
>>
>>>   static void pc_dimm_class_init(ObjectClass *oc, void *data)
>>
>> I've discovered suddenly, that
>>
>> MemoryRegion *
>> host_memory_backend_get_memory(HostMemoryBackend *backend, Error **errp)
>> {
>>      return memory_region_size(&backend->mr) ? &backend->mr : NULL;
>> }
>>
>> - it doesn't use errp at all. In my opinion, this should be fixed 
>> globally, by deleting useless
>> parameter in separate patch. Or just squash your function into 
>> previous patch.
>>
>
> Yup, this is a globally interface so i prefer to make a separate patch 
> to do the
> cleanup after this patchset.
>
>

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02 14:51     ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 14:51 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake

On 02.11.2015 12:13, Xiao Guangrong wrote:
> Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
> of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
> locates at DAX enabled filesystem
>
> So this patch let it work on any kind of path

No, this patch doesn't change any logic, but only fix variable name and 
some error messages.

>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   exec.c | 24 ++++++++++++------------
>   1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 9de38be..9075f4d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
>       char *c;
>       void *area;
>       int fd;
> -    uint64_t hpagesize;
> +    uint64_t pagesize;
>       Error *local_err = NULL;
>   
> -    hpagesize = qemu_file_get_page_size(path, &local_err);
> +    pagesize = qemu_file_get_page_size(path, &local_err);
>       if (local_err) {
>           error_propagate(errp, local_err);
>           goto error;
>       }
>   
> -    if (hpagesize == getpagesize()) {
> -        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
> +    if (pagesize == getpagesize()) {
> +        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");

Why do you remove path from error message? It is good additional 
information.. What if we have several memory file backends?

>       }
>   
> -    block->mr->align = hpagesize;
> +    block->mr->align = pagesize;
>   
> -    if (memory < hpagesize) {
> +    if (memory < pagesize) {
>           error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
> -                   "or larger than huge page size 0x%" PRIx64,
> -                   memory, hpagesize);
> +                   "or larger than page size 0x%" PRIx64,
> +                   memory, pagesize);
>           goto error;
>       }
>   
> @@ -1226,14 +1226,14 @@ static void *file_ram_alloc(RAMBlock *block,
>       fd = mkstemp(filename);
>       if (fd < 0) {
>           error_setg_errno(errp, errno,
> -                         "unable to create backing store for hugepages");
> +                         "unable to create backing store for path %s", path);
>           g_free(filename);
>           goto error;
>       }
>       unlink(filename);
>       g_free(filename);
>   
> -    memory = ROUND_UP(memory, hpagesize);
> +    memory = ROUND_UP(memory, pagesize);
>   
>       /*
>        * ftruncate is not supported by hugetlbfs in older
> @@ -1245,10 +1245,10 @@ static void *file_ram_alloc(RAMBlock *block,
>           perror("ftruncate");
>       }
>   
> -    area = qemu_ram_mmap(fd, memory, hpagesize, block->flags & RAM_SHARED);
> +    area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>       if (area == MAP_FAILED) {
>           error_setg_errno(errp, errno,
> -                         "unable to map backing store for hugepages");
> +                         "unable to map backing store for path %s", path);
>           close(fd);
>           goto error;
>       }
>

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
@ 2015-11-02 14:51     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 14:51 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth

On 02.11.2015 12:13, Xiao Guangrong wrote:
> Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
> of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
> locates at DAX enabled filesystem
>
> So this patch let it work on any kind of path

No, this patch doesn't change any logic, but only fix variable name and 
some error messages.

>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   exec.c | 24 ++++++++++++------------
>   1 file changed, 12 insertions(+), 12 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 9de38be..9075f4d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
>       char *c;
>       void *area;
>       int fd;
> -    uint64_t hpagesize;
> +    uint64_t pagesize;
>       Error *local_err = NULL;
>   
> -    hpagesize = qemu_file_get_page_size(path, &local_err);
> +    pagesize = qemu_file_get_page_size(path, &local_err);
>       if (local_err) {
>           error_propagate(errp, local_err);
>           goto error;
>       }
>   
> -    if (hpagesize == getpagesize()) {
> -        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
> +    if (pagesize == getpagesize()) {
> +        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");

Why do you remove path from error message? It is good additional 
information.. What if we have several memory file backends?

>       }
>   
> -    block->mr->align = hpagesize;
> +    block->mr->align = pagesize;
>   
> -    if (memory < hpagesize) {
> +    if (memory < pagesize) {
>           error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
> -                   "or larger than huge page size 0x%" PRIx64,
> -                   memory, hpagesize);
> +                   "or larger than page size 0x%" PRIx64,
> +                   memory, pagesize);
>           goto error;
>       }
>   
> @@ -1226,14 +1226,14 @@ static void *file_ram_alloc(RAMBlock *block,
>       fd = mkstemp(filename);
>       if (fd < 0) {
>           error_setg_errno(errp, errno,
> -                         "unable to create backing store for hugepages");
> +                         "unable to create backing store for path %s", path);
>           g_free(filename);
>           goto error;
>       }
>       unlink(filename);
>       g_free(filename);
>   
> -    memory = ROUND_UP(memory, hpagesize);
> +    memory = ROUND_UP(memory, pagesize);
>   
>       /*
>        * ftruncate is not supported by hugetlbfs in older
> @@ -1245,10 +1245,10 @@ static void *file_ram_alloc(RAMBlock *block,
>           perror("ftruncate");
>       }
>   
> -    area = qemu_ram_mmap(fd, memory, hpagesize, block->flags & RAM_SHARED);
> +    area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>       if (area == MAP_FAILED) {
>           error_setg_errno(errp, errno,
> -                         "unable to map backing store for hugepages");
> +                         "unable to map backing store for path %s", path);
>           close(fd);
>           goto error;
>       }
>

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
  2015-11-02 14:26         ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
@ 2015-11-02 15:06           ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02 15:06 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake



On 11/02/2015 10:26 PM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.11.2015 16:08, Xiao Guangrong wrote:
>>
>>
>> On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> On 02.11.2015 12:13, Xiao Guangrong wrote:
>>>> Curretly, the memory region of backed memory is directly mapped to
>>>> guest's address space, however, it is not true for nvdimm device
>>>>
>>>> This patch let dimm device realize this fact and use
>>>> DIMMDeviceClass->get_memory_region method to get the mapped memory
>>>> region
>>>>
>>>> Current code did not check the return value of get_memory_region as it
>>>> assumed the backend memory of pc-dimm is always properly initialized,
>>>> we make get_memory_region internally catch the case if something is
>>>> wrong
>
> but here you call not pc-dimm's get_memory_region, but common ddc->get_memory_region, which may be
> nvdimm or possibly other future dimm, so, why not check it here? And than pc_dimm_get_memory_region
> may be left untouched (error_abort is ok, because errp is unused).

Hmm, because 'here' is not the only place calling ->get_memory_region, this method has
multiple callers:

$ git grep "\->get_memory_region"
hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
hw/mem/dimm.c:    mr = ddc->get_memory_region(dimm);
hw/mem/nvdimm.c:    ddc->get_memory_region = nvdimm_get_memory_region;
hw/mem/pc-dimm.c:    ddc->get_memory_region = pc_dimm_get_memory_region;
hw/ppc/spapr.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);

memory region validation is also done for NVDIMM in nvdimm device.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
@ 2015-11-02 15:06           ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02 15:06 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth



On 11/02/2015 10:26 PM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.11.2015 16:08, Xiao Guangrong wrote:
>>
>>
>> On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> On 02.11.2015 12:13, Xiao Guangrong wrote:
>>>> Curretly, the memory region of backed memory is directly mapped to
>>>> guest's address space, however, it is not true for nvdimm device
>>>>
>>>> This patch let dimm device realize this fact and use
>>>> DIMMDeviceClass->get_memory_region method to get the mapped memory
>>>> region
>>>>
>>>> Current code did not check the return value of get_memory_region as it
>>>> assumed the backend memory of pc-dimm is always properly initialized,
>>>> we make get_memory_region internally catch the case if something is
>>>> wrong
>
> but here you call not pc-dimm's get_memory_region, but common ddc->get_memory_region, which may be
> nvdimm or possibly other future dimm, so, why not check it here? And than pc_dimm_get_memory_region
> may be left untouched (error_abort is ok, because errp is unused).

Hmm, because 'here' is not the only place calling ->get_memory_region, this method has
multiple callers:

$ git grep "\->get_memory_region"
hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
hw/mem/dimm.c:    mr = ddc->get_memory_region(dimm);
hw/mem/nvdimm.c:    ddc->get_memory_region = nvdimm_get_memory_region;
hw/mem/pc-dimm.c:    ddc->get_memory_region = pc_dimm_get_memory_region;
hw/ppc/spapr.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);

memory region validation is also done for NVDIMM in nvdimm device.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02 15:12     ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 15:12 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake

On 02.11.2015 12:13, Xiao Guangrong wrote:
> Currently, file_ram_alloc() only works on directory - it creates a file
> under @path and do mmap on it
>
> This patch tries to allow it to work on file directly, if @path is a

It isn't try to allow, it allows, as I understand)

> directory it works as before, otherwise it treats @path as the target
> file then directly allocate memory from it
>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   exec.c | 80 ++++++++++++++++++++++++++++++++++++++++++------------------------
>   1 file changed, 51 insertions(+), 29 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 9075f4d..db0fdaf 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>   }
>   
>   #ifdef __linux__
> +static bool path_is_dir(const char *path)
> +{
> +    struct stat fs;
> +
> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
> +}
> +
> +static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
> +{
> +    char *filename;
> +    char *sanitized_name;
> +    char *c;
> +    int fd;
> +
> +    if (!path_is_dir(path)) {
> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
> +
> +        flags |= O_EXCL;
> +        return open(path, flags);
> +    }
> +
> +    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
> +    sanitized_name = g_strdup(memory_region_name(block->mr));
> +    for (c = sanitized_name; *c != '\0'; c++) {
> +        if (*c == '/') {
> +            *c = '_';
> +        }
> +    }
> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
> +                               sanitized_name);
> +    g_free(sanitized_name);

one empty line will be very nice here, and it was in master branch

> +    fd = mkstemp(filename);
> +    if (fd >= 0) {
> +        unlink(filename);
> +        /*
> +         * ftruncate is not supported by hugetlbfs in older
> +         * hosts, so don't bother bailing out on errors.
> +         * If anything goes wrong with it under other filesystems,
> +         * mmap will fail.
> +         */
> +        if (ftruncate(fd, size)) {
> +            perror("ftruncate");
> +        }
> +    }
> +    g_free(filename);
> +
> +    return fd;
> +}
> +
>   static void *file_ram_alloc(RAMBlock *block,
>                               ram_addr_t memory,
>                               const char *path,
>                               Error **errp)
>   {
> -    char *filename;
> -    char *sanitized_name;
> -    char *c;
>       void *area;
>       int fd;
>       uint64_t pagesize;
> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>           goto error;
>       }
>   
> -    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
> -    sanitized_name = g_strdup(memory_region_name(block->mr));
> -    for (c = sanitized_name; *c != '\0'; c++) {
> -        if (*c == '/')
> -            *c = '_';
> -    }
> -
> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
> -                               sanitized_name);
> -    g_free(sanitized_name);
> +    memory = ROUND_UP(memory, pagesize);
>   
> -    fd = mkstemp(filename);
> +    fd = open_ram_file_path(block, path, memory);
>       if (fd < 0) {
>           error_setg_errno(errp, errno,
>                            "unable to create backing store for path %s", path);
> -        g_free(filename);
>           goto error;
>       }
> -    unlink(filename);
> -    g_free(filename);
> -
> -    memory = ROUND_UP(memory, pagesize);
> -
> -    /*
> -     * ftruncate is not supported by hugetlbfs in older
> -     * hosts, so don't bother bailing out on errors.
> -     * If anything goes wrong with it under other filesystems,
> -     * mmap will fail.
> -     */
> -    if (ftruncate(fd, memory)) {
> -        perror("ftruncate");
> -    }
>   
>       area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>       if (area == MAP_FAILED) {


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
@ 2015-11-02 15:12     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 15:12 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth

On 02.11.2015 12:13, Xiao Guangrong wrote:
> Currently, file_ram_alloc() only works on directory - it creates a file
> under @path and do mmap on it
>
> This patch tries to allow it to work on file directly, if @path is a

It isn't try to allow, it allows, as I understand)

> directory it works as before, otherwise it treats @path as the target
> file then directly allocate memory from it
>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   exec.c | 80 ++++++++++++++++++++++++++++++++++++++++++------------------------
>   1 file changed, 51 insertions(+), 29 deletions(-)
>
> diff --git a/exec.c b/exec.c
> index 9075f4d..db0fdaf 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>   }
>   
>   #ifdef __linux__
> +static bool path_is_dir(const char *path)
> +{
> +    struct stat fs;
> +
> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
> +}
> +
> +static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
> +{
> +    char *filename;
> +    char *sanitized_name;
> +    char *c;
> +    int fd;
> +
> +    if (!path_is_dir(path)) {
> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
> +
> +        flags |= O_EXCL;
> +        return open(path, flags);
> +    }
> +
> +    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
> +    sanitized_name = g_strdup(memory_region_name(block->mr));
> +    for (c = sanitized_name; *c != '\0'; c++) {
> +        if (*c == '/') {
> +            *c = '_';
> +        }
> +    }
> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
> +                               sanitized_name);
> +    g_free(sanitized_name);

one empty line will be very nice here, and it was in master branch

> +    fd = mkstemp(filename);
> +    if (fd >= 0) {
> +        unlink(filename);
> +        /*
> +         * ftruncate is not supported by hugetlbfs in older
> +         * hosts, so don't bother bailing out on errors.
> +         * If anything goes wrong with it under other filesystems,
> +         * mmap will fail.
> +         */
> +        if (ftruncate(fd, size)) {
> +            perror("ftruncate");
> +        }
> +    }
> +    g_free(filename);
> +
> +    return fd;
> +}
> +
>   static void *file_ram_alloc(RAMBlock *block,
>                               ram_addr_t memory,
>                               const char *path,
>                               Error **errp)
>   {
> -    char *filename;
> -    char *sanitized_name;
> -    char *c;
>       void *area;
>       int fd;
>       uint64_t pagesize;
> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>           goto error;
>       }
>   
> -    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
> -    sanitized_name = g_strdup(memory_region_name(block->mr));
> -    for (c = sanitized_name; *c != '\0'; c++) {
> -        if (*c == '/')
> -            *c = '_';
> -    }
> -
> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
> -                               sanitized_name);
> -    g_free(sanitized_name);
> +    memory = ROUND_UP(memory, pagesize);
>   
> -    fd = mkstemp(filename);
> +    fd = open_ram_file_path(block, path, memory);
>       if (fd < 0) {
>           error_setg_errno(errp, errno,
>                            "unable to create backing store for path %s", path);
> -        g_free(filename);
>           goto error;
>       }
> -    unlink(filename);
> -    g_free(filename);
> -
> -    memory = ROUND_UP(memory, pagesize);
> -
> -    /*
> -     * ftruncate is not supported by hugetlbfs in older
> -     * hosts, so don't bother bailing out on errors.
> -     * If anything goes wrong with it under other filesystems,
> -     * mmap will fail.
> -     */
> -    if (ftruncate(fd, memory)) {
> -        perror("ftruncate");
> -    }
>   
>       area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>       if (area == MAP_FAILED) {


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
  2015-11-02 14:51     ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
@ 2015-11-02 15:22       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02 15:22 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake



On 11/02/2015 10:51 PM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.11.2015 12:13, Xiao Guangrong wrote:
>> Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
>> of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
>> locates at DAX enabled filesystem
>>
>> So this patch let it work on any kind of path
>
> No, this patch doesn't change any logic, but only fix variable name and some error messages.

Yes, it is.

'let it work' in my thought exactly was "fix variable name and some error messages"... okay,
if it confused you, how about change it to:

"This patch fixes variable name and some error messages to let it be aware of normal
path"

>
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   exec.c | 24 ++++++++++++------------
>>   1 file changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 9de38be..9075f4d 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
>>       char *c;
>>       void *area;
>>       int fd;
>> -    uint64_t hpagesize;
>> +    uint64_t pagesize;
>>       Error *local_err = NULL;
>> -    hpagesize = qemu_file_get_page_size(path, &local_err);
>> +    pagesize = qemu_file_get_page_size(path, &local_err);
>>       if (local_err) {
>>           error_propagate(errp, local_err);
>>           goto error;
>>       }
>> -    if (hpagesize == getpagesize()) {
>> -        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
>> +    if (pagesize == getpagesize()) {
>> +        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
>
> Why do you remove path from error message? It is good additional information.. What if we have
> several memory file backends?

Good catch, will change it to:
fprintf(stderr, "Memory is not allocated from HugeTlbfs on path %s.\n", path);

BTW, if no other big change in the further, i will post the new version just for of this patch,

>
>>       }
>> -    block->mr->align = hpagesize;
>> +    block->mr->align = pagesize;
>> -    if (memory < hpagesize) {
>> +    if (memory < pagesize) {
>>           error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
>> -                   "or larger than huge page size 0x%" PRIx64,
>> -                   memory, hpagesize);
>> +                   "or larger than page size 0x%" PRIx64,
>> +                   memory, pagesize);
>>           goto error;
>>       }
>> @@ -1226,14 +1226,14 @@ static void *file_ram_alloc(RAMBlock *block,
>>       fd = mkstemp(filename);
>>       if (fd < 0) {
>>           error_setg_errno(errp, errno,
>> -                         "unable to create backing store for hugepages");
>> +                         "unable to create backing store for path %s", path);
>>           g_free(filename);
>>           goto error;
>>       }
>>       unlink(filename);
>>       g_free(filename);
>> -    memory = ROUND_UP(memory, hpagesize);
>> +    memory = ROUND_UP(memory, pagesize);
>>       /*
>>        * ftruncate is not supported by hugetlbfs in older
>> @@ -1245,10 +1245,10 @@ static void *file_ram_alloc(RAMBlock *block,
>>           perror("ftruncate");
>>       }
>> -    area = qemu_ram_mmap(fd, memory, hpagesize, block->flags & RAM_SHARED);
>> +    area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>>       if (area == MAP_FAILED) {
>>           error_setg_errno(errp, errno,
>> -                         "unable to map backing store for hugepages");
>> +                         "unable to map backing store for path %s", path);
>>           close(fd);
>>           goto error;
>>       }
>>
>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
@ 2015-11-02 15:22       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02 15:22 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth



On 11/02/2015 10:51 PM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.11.2015 12:13, Xiao Guangrong wrote:
>> Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
>> of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
>> locates at DAX enabled filesystem
>>
>> So this patch let it work on any kind of path
>
> No, this patch doesn't change any logic, but only fix variable name and some error messages.

Yes, it is.

'let it work' in my thought exactly was "fix variable name and some error messages"... okay,
if it confused you, how about change it to:

"This patch fixes variable name and some error messages to let it be aware of normal
path"

>
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   exec.c | 24 ++++++++++++------------
>>   1 file changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 9de38be..9075f4d 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
>>       char *c;
>>       void *area;
>>       int fd;
>> -    uint64_t hpagesize;
>> +    uint64_t pagesize;
>>       Error *local_err = NULL;
>> -    hpagesize = qemu_file_get_page_size(path, &local_err);
>> +    pagesize = qemu_file_get_page_size(path, &local_err);
>>       if (local_err) {
>>           error_propagate(errp, local_err);
>>           goto error;
>>       }
>> -    if (hpagesize == getpagesize()) {
>> -        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
>> +    if (pagesize == getpagesize()) {
>> +        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
>
> Why do you remove path from error message? It is good additional information.. What if we have
> several memory file backends?

Good catch, will change it to:
fprintf(stderr, "Memory is not allocated from HugeTlbfs on path %s.\n", path);

BTW, if no other big change in the further, i will post the new version just for of this patch,

>
>>       }
>> -    block->mr->align = hpagesize;
>> +    block->mr->align = pagesize;
>> -    if (memory < hpagesize) {
>> +    if (memory < pagesize) {
>>           error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be equal to "
>> -                   "or larger than huge page size 0x%" PRIx64,
>> -                   memory, hpagesize);
>> +                   "or larger than page size 0x%" PRIx64,
>> +                   memory, pagesize);
>>           goto error;
>>       }
>> @@ -1226,14 +1226,14 @@ static void *file_ram_alloc(RAMBlock *block,
>>       fd = mkstemp(filename);
>>       if (fd < 0) {
>>           error_setg_errno(errp, errno,
>> -                         "unable to create backing store for hugepages");
>> +                         "unable to create backing store for path %s", path);
>>           g_free(filename);
>>           goto error;
>>       }
>>       unlink(filename);
>>       g_free(filename);
>> -    memory = ROUND_UP(memory, hpagesize);
>> +    memory = ROUND_UP(memory, pagesize);
>>       /*
>>        * ftruncate is not supported by hugetlbfs in older
>> @@ -1245,10 +1245,10 @@ static void *file_ram_alloc(RAMBlock *block,
>>           perror("ftruncate");
>>       }
>> -    area = qemu_ram_mmap(fd, memory, hpagesize, block->flags & RAM_SHARED);
>> +    area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>>       if (area == MAP_FAILED) {
>>           error_setg_errno(errp, errno,
>> -                         "unable to map backing store for hugepages");
>> +                         "unable to map backing store for path %s", path);
>>           close(fd);
>>           goto error;
>>       }
>>
>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
  2015-11-02 15:12     ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
@ 2015-11-02 15:25       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02 15:25 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake



On 11/02/2015 11:12 PM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.11.2015 12:13, Xiao Guangrong wrote:
>> Currently, file_ram_alloc() only works on directory - it creates a file
>> under @path and do mmap on it
>>
>> This patch tries to allow it to work on file directly, if @path is a
>
> It isn't try to allow, it allows, as I understand)...

Err... Sorry for my English, but what is the different between:
”This patch tries to allow it to work on file directly“ and
"This patch allows it to work on file directly"

:(

>
>> directory it works as before, otherwise it treats @path as the target
>> file then directly allocate memory from it
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   exec.c | 80 ++++++++++++++++++++++++++++++++++++++++++------------------------
>>   1 file changed, 51 insertions(+), 29 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 9075f4d..db0fdaf 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>>   }
>>   #ifdef __linux__
>> +static bool path_is_dir(const char *path)
>> +{
>> +    struct stat fs;
>> +
>> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
>> +}
>> +
>> +static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
>> +{
>> +    char *filename;
>> +    char *sanitized_name;
>> +    char *c;
>> +    int fd;
>> +
>> +    if (!path_is_dir(path)) {
>> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
>> +
>> +        flags |= O_EXCL;
>> +        return open(path, flags);
>> +    }
>> +
>> +    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>> +    sanitized_name = g_strdup(memory_region_name(block->mr));
>> +    for (c = sanitized_name; *c != '\0'; c++) {
>> +        if (*c == '/') {
>> +            *c = '_';
>> +        }
>> +    }
>> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>> +                               sanitized_name);
>> +    g_free(sanitized_name);
>
> one empty line will be very nice here, and it was in master branch
>
>> +    fd = mkstemp(filename);
>> +    if (fd >= 0) {
>> +        unlink(filename);
>> +        /*
>> +         * ftruncate is not supported by hugetlbfs in older
>> +         * hosts, so don't bother bailing out on errors.
>> +         * If anything goes wrong with it under other filesystems,
>> +         * mmap will fail.
>> +         */
>> +        if (ftruncate(fd, size)) {
>> +            perror("ftruncate");
>> +        }
>> +    }
>> +    g_free(filename);
>> +
>> +    return fd;
>> +}
>> +
>>   static void *file_ram_alloc(RAMBlock *block,
>>                               ram_addr_t memory,
>>                               const char *path,
>>                               Error **errp)
>>   {
>> -    char *filename;
>> -    char *sanitized_name;
>> -    char *c;
>>       void *area;
>>       int fd;
>>       uint64_t pagesize;
>> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>>           goto error;
>>       }
>> -    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>> -    sanitized_name = g_strdup(memory_region_name(block->mr));
>> -    for (c = sanitized_name; *c != '\0'; c++) {
>> -        if (*c == '/')
>> -            *c = '_';
>> -    }
>> -
>> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>> -                               sanitized_name);
>> -    g_free(sanitized_name);
>> +    memory = ROUND_UP(memory, pagesize);
>> -    fd = mkstemp(filename);
>> +    fd = open_ram_file_path(block, path, memory);
>>       if (fd < 0) {
>>           error_setg_errno(errp, errno,
>>                            "unable to create backing store for path %s", path);
>> -        g_free(filename);
>>           goto error;
>>       }
>> -    unlink(filename);
>> -    g_free(filename);
>> -
>> -    memory = ROUND_UP(memory, pagesize);
>> -
>> -    /*
>> -     * ftruncate is not supported by hugetlbfs in older
>> -     * hosts, so don't bother bailing out on errors.
>> -     * If anything goes wrong with it under other filesystems,
>> -     * mmap will fail.
>> -     */
>> -    if (ftruncate(fd, memory)) {
>> -        perror("ftruncate");
>> -    }
>>       area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>>       if (area == MAP_FAILED) {
>
>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Thanks for your review.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
@ 2015-11-02 15:25       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02 15:25 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth



On 11/02/2015 11:12 PM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.11.2015 12:13, Xiao Guangrong wrote:
>> Currently, file_ram_alloc() only works on directory - it creates a file
>> under @path and do mmap on it
>>
>> This patch tries to allow it to work on file directly, if @path is a
>
> It isn't try to allow, it allows, as I understand)...

Err... Sorry for my English, but what is the different between:
”This patch tries to allow it to work on file directly“ and
"This patch allows it to work on file directly"

:(

>
>> directory it works as before, otherwise it treats @path as the target
>> file then directly allocate memory from it
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   exec.c | 80 ++++++++++++++++++++++++++++++++++++++++++------------------------
>>   1 file changed, 51 insertions(+), 29 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 9075f4d..db0fdaf 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>>   }
>>   #ifdef __linux__
>> +static bool path_is_dir(const char *path)
>> +{
>> +    struct stat fs;
>> +
>> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
>> +}
>> +
>> +static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
>> +{
>> +    char *filename;
>> +    char *sanitized_name;
>> +    char *c;
>> +    int fd;
>> +
>> +    if (!path_is_dir(path)) {
>> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
>> +
>> +        flags |= O_EXCL;
>> +        return open(path, flags);
>> +    }
>> +
>> +    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>> +    sanitized_name = g_strdup(memory_region_name(block->mr));
>> +    for (c = sanitized_name; *c != '\0'; c++) {
>> +        if (*c == '/') {
>> +            *c = '_';
>> +        }
>> +    }
>> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>> +                               sanitized_name);
>> +    g_free(sanitized_name);
>
> one empty line will be very nice here, and it was in master branch
>
>> +    fd = mkstemp(filename);
>> +    if (fd >= 0) {
>> +        unlink(filename);
>> +        /*
>> +         * ftruncate is not supported by hugetlbfs in older
>> +         * hosts, so don't bother bailing out on errors.
>> +         * If anything goes wrong with it under other filesystems,
>> +         * mmap will fail.
>> +         */
>> +        if (ftruncate(fd, size)) {
>> +            perror("ftruncate");
>> +        }
>> +    }
>> +    g_free(filename);
>> +
>> +    return fd;
>> +}
>> +
>>   static void *file_ram_alloc(RAMBlock *block,
>>                               ram_addr_t memory,
>>                               const char *path,
>>                               Error **errp)
>>   {
>> -    char *filename;
>> -    char *sanitized_name;
>> -    char *c;
>>       void *area;
>>       int fd;
>>       uint64_t pagesize;
>> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>>           goto error;
>>       }
>> -    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>> -    sanitized_name = g_strdup(memory_region_name(block->mr));
>> -    for (c = sanitized_name; *c != '\0'; c++) {
>> -        if (*c == '/')
>> -            *c = '_';
>> -    }
>> -
>> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>> -                               sanitized_name);
>> -    g_free(sanitized_name);
>> +    memory = ROUND_UP(memory, pagesize);
>> -    fd = mkstemp(filename);
>> +    fd = open_ram_file_path(block, path, memory);
>>       if (fd < 0) {
>>           error_setg_errno(errp, errno,
>>                            "unable to create backing store for path %s", path);
>> -        g_free(filename);
>>           goto error;
>>       }
>> -    unlink(filename);
>> -    g_free(filename);
>> -
>> -    memory = ROUND_UP(memory, pagesize);
>> -
>> -    /*
>> -     * ftruncate is not supported by hugetlbfs in older
>> -     * hosts, so don't bother bailing out on errors.
>> -     * If anything goes wrong with it under other filesystems,
>> -     * mmap will fail.
>> -     */
>> -    if (ftruncate(fd, memory)) {
>> -        perror("ftruncate");
>> -    }
>>       area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>>       if (area == MAP_FAILED) {
>
>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Thanks for your review.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 11/35] util: introduce qemu_file_getlength()
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02 15:26     ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 15:26 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake

On 02.11.2015 12:13, Xiao Guangrong wrote:
> It is used to get the size of the specified file, also qemu_fd_getlength()
> is introduced to unify the code with raw_getlength() in block/raw-posix.c
>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   block/raw-posix.c    |  7 +------
>   include/qemu/osdep.h |  2 ++
>   util/osdep.c         | 31 +++++++++++++++++++++++++++++++
>   3 files changed, 34 insertions(+), 6 deletions(-)
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 918c756..734e6dd 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1592,18 +1592,13 @@ static int64_t raw_getlength(BlockDriverState *bs)
>   {
>       BDRVRawState *s = bs->opaque;
>       int ret;
> -    int64_t size;
>   
>       ret = fd_open(bs);
>       if (ret < 0) {
>           return ret;
>       }
>   
> -    size = lseek(s->fd, 0, SEEK_END);
> -    if (size < 0) {
> -        return -errno;
> -    }
> -    return size;
> +    return qemu_fd_getlength(s->fd);
>   }
>   #endif
>   
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index dbc17dc..ca4c3fa 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -303,4 +303,6 @@ int qemu_read_password(char *buf, int buf_size);
>   pid_t qemu_fork(Error **errp);
>   
>   size_t qemu_file_get_page_size(const char *mem_path, Error **errp);
> +int64_t qemu_fd_getlength(int fd);
> +size_t qemu_file_getlength(const char *file, Error **errp);
>   #endif
> diff --git a/util/osdep.c b/util/osdep.c
> index 0092bb6..5a61e19 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -428,3 +428,34 @@ writev(int fd, const struct iovec *iov, int iov_cnt)
>       return readv_writev(fd, iov, iov_cnt, true);
>   }
>   #endif
> +
> +int64_t qemu_fd_getlength(int fd)
> +{
> +    int64_t size;
> +
> +    size = lseek(fd, 0, SEEK_END);
> +    if (size < 0) {
> +        return -errno;
> +    }
> +    return size;
> +}
> +
> +size_t qemu_file_getlength(const char *file, Error **errp)
> +{
> +    int64_t size;
> +    int fd = qemu_open(file, O_RDONLY);
> +
> +    if (fd < 0) {
> +        error_setg_file_open(errp, errno, file);
> +        return 0;
> +    }
> +
> +    size = qemu_fd_getlength(fd);
> +    if (size < 0) {
> +        error_setg_errno(errp, -size, "can't get size of file %s", file);
> +        size = 0;
> +    }
> +
> +    qemu_close(fd);
> +    return size;
> +}

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 11/35] util: introduce qemu_file_getlength()
@ 2015-11-02 15:26     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 15:26 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth

On 02.11.2015 12:13, Xiao Guangrong wrote:
> It is used to get the size of the specified file, also qemu_fd_getlength()
> is introduced to unify the code with raw_getlength() in block/raw-posix.c
>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   block/raw-posix.c    |  7 +------
>   include/qemu/osdep.h |  2 ++
>   util/osdep.c         | 31 +++++++++++++++++++++++++++++++
>   3 files changed, 34 insertions(+), 6 deletions(-)
>
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 918c756..734e6dd 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1592,18 +1592,13 @@ static int64_t raw_getlength(BlockDriverState *bs)
>   {
>       BDRVRawState *s = bs->opaque;
>       int ret;
> -    int64_t size;
>   
>       ret = fd_open(bs);
>       if (ret < 0) {
>           return ret;
>       }
>   
> -    size = lseek(s->fd, 0, SEEK_END);
> -    if (size < 0) {
> -        return -errno;
> -    }
> -    return size;
> +    return qemu_fd_getlength(s->fd);
>   }
>   #endif
>   
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index dbc17dc..ca4c3fa 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -303,4 +303,6 @@ int qemu_read_password(char *buf, int buf_size);
>   pid_t qemu_fork(Error **errp);
>   
>   size_t qemu_file_get_page_size(const char *mem_path, Error **errp);
> +int64_t qemu_fd_getlength(int fd);
> +size_t qemu_file_getlength(const char *file, Error **errp);
>   #endif
> diff --git a/util/osdep.c b/util/osdep.c
> index 0092bb6..5a61e19 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -428,3 +428,34 @@ writev(int fd, const struct iovec *iov, int iov_cnt)
>       return readv_writev(fd, iov, iov_cnt, true);
>   }
>   #endif
> +
> +int64_t qemu_fd_getlength(int fd)
> +{
> +    int64_t size;
> +
> +    size = lseek(fd, 0, SEEK_END);
> +    if (size < 0) {
> +        return -errno;
> +    }
> +    return size;
> +}
> +
> +size_t qemu_file_getlength(const char *file, Error **errp)
> +{
> +    int64_t size;
> +    int fd = qemu_open(file, O_RDONLY);
> +
> +    if (fd < 0) {
> +        error_setg_file_open(errp, errno, file);
> +        return 0;
> +    }
> +
> +    size = qemu_fd_getlength(fd);
> +    if (size < 0) {
> +        error_setg_errno(errp, -size, "can't get size of file %s", file);
> +        size = 0;
> +    }
> +
> +    qemu_close(fd);
> +    return size;
> +}

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
  2015-11-02 15:22       ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02 15:52         ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 15:52 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake

On 02.11.2015 18:22, Xiao Guangrong wrote:
>
>
> On 11/02/2015 10:51 PM, Vladimir Sementsov-Ogievskiy wrote:
>> On 02.11.2015 12:13, Xiao Guangrong wrote:
>>> Currently file_ram_alloc() is designed for hugetlbfs, however, the 
>>> memory
>>> of nvdimm can come from either raw pmem device eg, /dev/pmem, or the 
>>> file
>>> locates at DAX enabled filesystem
>>>
>>> So this patch let it work on any kind of path
>>
>> No, this patch doesn't change any logic, but only fix variable name 
>> and some error messages.
>
> Yes, it is.
>
> 'let it work' in my thought exactly was "fix variable name and some 
> error messages"... okay,
> if it confused you, how about change it to:
>
> "This patch fixes variable name and some error messages to let it be 
> aware of normal
> path"

My english is not very good, I don't know figures of speech. For me 
"patch let it work" means that without this patch it will not work))
Your new variant is ok for me, or better (imo) "This patch fixes 
variable name and some error messages to be suitable for any kind of path"

>
>>
>>>
>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>> ---
>>>   exec.c | 24 ++++++++++++------------
>>>   1 file changed, 12 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/exec.c b/exec.c
>>> index 9de38be..9075f4d 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
>>>       char *c;
>>>       void *area;
>>>       int fd;
>>> -    uint64_t hpagesize;
>>> +    uint64_t pagesize;
>>>       Error *local_err = NULL;
>>> -    hpagesize = qemu_file_get_page_size(path, &local_err);
>>> +    pagesize = qemu_file_get_page_size(path, &local_err);
>>>       if (local_err) {
>>>           error_propagate(errp, local_err);
>>>           goto error;
>>>       }
>>> -    if (hpagesize == getpagesize()) {
>>> -        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
>>> +    if (pagesize == getpagesize()) {
>>> +        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
>>
>> Why do you remove path from error message? It is good additional 
>> information.. What if we have
>> several memory file backends?
>
> Good catch, will change it to:
> fprintf(stderr, "Memory is not allocated from HugeTlbfs on path 
> %s.\n", path);
>
> BTW, if no other big change in the further, i will post the new 
> version just for of this patch,
>
>>
>>>       }
>>> -    block->mr->align = hpagesize;
>>> +    block->mr->align = pagesize;
>>> -    if (memory < hpagesize) {
>>> +    if (memory < pagesize) {
>>>           error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be 
>>> equal to "
>>> -                   "or larger than huge page size 0x%" PRIx64,
>>> -                   memory, hpagesize);
>>> +                   "or larger than page size 0x%" PRIx64,
>>> +                   memory, pagesize);
>>>           goto error;
>>>       }
>>> @@ -1226,14 +1226,14 @@ static void *file_ram_alloc(RAMBlock *block,
>>>       fd = mkstemp(filename);
>>>       if (fd < 0) {
>>>           error_setg_errno(errp, errno,
>>> -                         "unable to create backing store for 
>>> hugepages");
>>> +                         "unable to create backing store for path 
>>> %s", path);
>>>           g_free(filename);
>>>           goto error;
>>>       }
>>>       unlink(filename);
>>>       g_free(filename);
>>> -    memory = ROUND_UP(memory, hpagesize);
>>> +    memory = ROUND_UP(memory, pagesize);
>>>       /*
>>>        * ftruncate is not supported by hugetlbfs in older
>>> @@ -1245,10 +1245,10 @@ static void *file_ram_alloc(RAMBlock *block,
>>>           perror("ftruncate");
>>>       }
>>> -    area = qemu_ram_mmap(fd, memory, hpagesize, block->flags & 
>>> RAM_SHARED);
>>> +    area = qemu_ram_mmap(fd, memory, pagesize, block->flags & 
>>> RAM_SHARED);
>>>       if (area == MAP_FAILED) {
>>>           error_setg_errno(errp, errno,
>>> -                         "unable to map backing store for hugepages");
>>> +                         "unable to map backing store for path %s", 
>>> path);
>>>           close(fd);
>>>           goto error;
>>>       }
>>>
>>

With these two fixes (any commit message variant):

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
@ 2015-11-02 15:52         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 15:52 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth

On 02.11.2015 18:22, Xiao Guangrong wrote:
>
>
> On 11/02/2015 10:51 PM, Vladimir Sementsov-Ogievskiy wrote:
>> On 02.11.2015 12:13, Xiao Guangrong wrote:
>>> Currently file_ram_alloc() is designed for hugetlbfs, however, the 
>>> memory
>>> of nvdimm can come from either raw pmem device eg, /dev/pmem, or the 
>>> file
>>> locates at DAX enabled filesystem
>>>
>>> So this patch let it work on any kind of path
>>
>> No, this patch doesn't change any logic, but only fix variable name 
>> and some error messages.
>
> Yes, it is.
>
> 'let it work' in my thought exactly was "fix variable name and some 
> error messages"... okay,
> if it confused you, how about change it to:
>
> "This patch fixes variable name and some error messages to let it be 
> aware of normal
> path"

My english is not very good, I don't know figures of speech. For me 
"patch let it work" means that without this patch it will not work))
Your new variant is ok for me, or better (imo) "This patch fixes 
variable name and some error messages to be suitable for any kind of path"

>
>>
>>>
>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>> ---
>>>   exec.c | 24 ++++++++++++------------
>>>   1 file changed, 12 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/exec.c b/exec.c
>>> index 9de38be..9075f4d 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
>>>       char *c;
>>>       void *area;
>>>       int fd;
>>> -    uint64_t hpagesize;
>>> +    uint64_t pagesize;
>>>       Error *local_err = NULL;
>>> -    hpagesize = qemu_file_get_page_size(path, &local_err);
>>> +    pagesize = qemu_file_get_page_size(path, &local_err);
>>>       if (local_err) {
>>>           error_propagate(errp, local_err);
>>>           goto error;
>>>       }
>>> -    if (hpagesize == getpagesize()) {
>>> -        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
>>> +    if (pagesize == getpagesize()) {
>>> +        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
>>
>> Why do you remove path from error message? It is good additional 
>> information.. What if we have
>> several memory file backends?
>
> Good catch, will change it to:
> fprintf(stderr, "Memory is not allocated from HugeTlbfs on path 
> %s.\n", path);
>
> BTW, if no other big change in the further, i will post the new 
> version just for of this patch,
>
>>
>>>       }
>>> -    block->mr->align = hpagesize;
>>> +    block->mr->align = pagesize;
>>> -    if (memory < hpagesize) {
>>> +    if (memory < pagesize) {
>>>           error_setg(errp, "memory size 0x" RAM_ADDR_FMT " must be 
>>> equal to "
>>> -                   "or larger than huge page size 0x%" PRIx64,
>>> -                   memory, hpagesize);
>>> +                   "or larger than page size 0x%" PRIx64,
>>> +                   memory, pagesize);
>>>           goto error;
>>>       }
>>> @@ -1226,14 +1226,14 @@ static void *file_ram_alloc(RAMBlock *block,
>>>       fd = mkstemp(filename);
>>>       if (fd < 0) {
>>>           error_setg_errno(errp, errno,
>>> -                         "unable to create backing store for 
>>> hugepages");
>>> +                         "unable to create backing store for path 
>>> %s", path);
>>>           g_free(filename);
>>>           goto error;
>>>       }
>>>       unlink(filename);
>>>       g_free(filename);
>>> -    memory = ROUND_UP(memory, hpagesize);
>>> +    memory = ROUND_UP(memory, pagesize);
>>>       /*
>>>        * ftruncate is not supported by hugetlbfs in older
>>> @@ -1245,10 +1245,10 @@ static void *file_ram_alloc(RAMBlock *block,
>>>           perror("ftruncate");
>>>       }
>>> -    area = qemu_ram_mmap(fd, memory, hpagesize, block->flags & 
>>> RAM_SHARED);
>>> +    area = qemu_ram_mmap(fd, memory, pagesize, block->flags & 
>>> RAM_SHARED);
>>>       if (area == MAP_FAILED) {
>>>           error_setg_errno(errp, errno,
>>> -                         "unable to map backing store for hugepages");
>>> +                         "unable to map backing store for path %s", 
>>> path);
>>>           close(fd);
>>>           goto error;
>>>       }
>>>
>>

With these two fixes (any commit message variant):

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
  2015-11-02 15:25       ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02 15:58         ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 15:58 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth

On 02.11.2015 18:25, Xiao Guangrong wrote:
>
>
> On 11/02/2015 11:12 PM, Vladimir Sementsov-Ogievskiy wrote:
>> On 02.11.2015 12:13, Xiao Guangrong wrote:
>>> Currently, file_ram_alloc() only works on directory - it creates a file
>>> under @path and do mmap on it
>>>
>>> This patch tries to allow it to work on file directly, if @path is a
>>
>> It isn't try to allow, it allows, as I understand)...
>
> Err... Sorry for my English, but what is the different between:
> ”This patch tries to allow it to work on file directly“ and
> "This patch allows it to work on file directly"
>
> :(

Not sure that everyone is interested in this nit-picking discussion..

A allows B: if A then B
A tries to allow B: if A then _may be_ B

In any way it doesn't matter.

>
>>
>>> directory it works as before, otherwise it treats @path as the target
>>> file then directly allocate memory from it
>>>
>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>> ---
>>>   exec.c | 80 
>>> ++++++++++++++++++++++++++++++++++++++++++------------------------
>>>   1 file changed, 51 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/exec.c b/exec.c
>>> index 9075f4d..db0fdaf 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>>>   }
>>>   #ifdef __linux__
>>> +static bool path_is_dir(const char *path)
>>> +{
>>> +    struct stat fs;
>>> +
>>> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
>>> +}
>>> +
>>> +static int open_ram_file_path(RAMBlock *block, const char *path, 
>>> size_t size)
>>> +{
>>> +    char *filename;
>>> +    char *sanitized_name;
>>> +    char *c;
>>> +    int fd;
>>> +
>>> +    if (!path_is_dir(path)) {
>>> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
>>> +
>>> +        flags |= O_EXCL;
>>> +        return open(path, flags);
>>> +    }
>>> +
>>> +    /* Make name safe to use with mkstemp by replacing '/' with 
>>> '_'. */
>>> +    sanitized_name = g_strdup(memory_region_name(block->mr));
>>> +    for (c = sanitized_name; *c != '\0'; c++) {
>>> +        if (*c == '/') {
>>> +            *c = '_';
>>> +        }
>>> +    }
>>> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>>> +                               sanitized_name);
>>> +    g_free(sanitized_name);
>>
>> one empty line will be very nice here, and it was in master branch
>>
>>> +    fd = mkstemp(filename);
>>> +    if (fd >= 0) {
>>> +        unlink(filename);
>>> +        /*
>>> +         * ftruncate is not supported by hugetlbfs in older
>>> +         * hosts, so don't bother bailing out on errors.
>>> +         * If anything goes wrong with it under other filesystems,
>>> +         * mmap will fail.
>>> +         */
>>> +        if (ftruncate(fd, size)) {
>>> +            perror("ftruncate");
>>> +        }
>>> +    }
>>> +    g_free(filename);
>>> +
>>> +    return fd;
>>> +}
>>> +
>>>   static void *file_ram_alloc(RAMBlock *block,
>>>                               ram_addr_t memory,
>>>                               const char *path,
>>>                               Error **errp)
>>>   {
>>> -    char *filename;
>>> -    char *sanitized_name;
>>> -    char *c;
>>>       void *area;
>>>       int fd;
>>>       uint64_t pagesize;
>>> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>>>           goto error;
>>>       }
>>> -    /* Make name safe to use with mkstemp by replacing '/' with 
>>> '_'. */
>>> -    sanitized_name = g_strdup(memory_region_name(block->mr));
>>> -    for (c = sanitized_name; *c != '\0'; c++) {
>>> -        if (*c == '/')
>>> -            *c = '_';
>>> -    }
>>> -
>>> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>>> -                               sanitized_name);
>>> -    g_free(sanitized_name);
>>> +    memory = ROUND_UP(memory, pagesize);
>>> -    fd = mkstemp(filename);
>>> +    fd = open_ram_file_path(block, path, memory);
>>>       if (fd < 0) {
>>>           error_setg_errno(errp, errno,
>>>                            "unable to create backing store for path 
>>> %s", path);
>>> -        g_free(filename);
>>>           goto error;
>>>       }
>>> -    unlink(filename);
>>> -    g_free(filename);
>>> -
>>> -    memory = ROUND_UP(memory, pagesize);
>>> -
>>> -    /*
>>> -     * ftruncate is not supported by hugetlbfs in older
>>> -     * hosts, so don't bother bailing out on errors.
>>> -     * If anything goes wrong with it under other filesystems,
>>> -     * mmap will fail.
>>> -     */
>>> -    if (ftruncate(fd, memory)) {
>>> -        perror("ftruncate");
>>> -    }
>>>       area = qemu_ram_mmap(fd, memory, pagesize, block->flags & 
>>> RAM_SHARED);
>>>       if (area == MAP_FAILED) {
>>
>>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>
> Thanks for your review.
>
>


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
@ 2015-11-02 15:58         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 15:58 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth

On 02.11.2015 18:25, Xiao Guangrong wrote:
>
>
> On 11/02/2015 11:12 PM, Vladimir Sementsov-Ogievskiy wrote:
>> On 02.11.2015 12:13, Xiao Guangrong wrote:
>>> Currently, file_ram_alloc() only works on directory - it creates a file
>>> under @path and do mmap on it
>>>
>>> This patch tries to allow it to work on file directly, if @path is a
>>
>> It isn't try to allow, it allows, as I understand)...
>
> Err... Sorry for my English, but what is the different between:
> ”This patch tries to allow it to work on file directly“ and
> "This patch allows it to work on file directly"
>
> :(

Not sure that everyone is interested in this nit-picking discussion..

A allows B: if A then B
A tries to allow B: if A then _may be_ B

In any way it doesn't matter.

>
>>
>>> directory it works as before, otherwise it treats @path as the target
>>> file then directly allocate memory from it
>>>
>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>> ---
>>>   exec.c | 80 
>>> ++++++++++++++++++++++++++++++++++++++++++------------------------
>>>   1 file changed, 51 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/exec.c b/exec.c
>>> index 9075f4d..db0fdaf 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>>>   }
>>>   #ifdef __linux__
>>> +static bool path_is_dir(const char *path)
>>> +{
>>> +    struct stat fs;
>>> +
>>> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
>>> +}
>>> +
>>> +static int open_ram_file_path(RAMBlock *block, const char *path, 
>>> size_t size)
>>> +{
>>> +    char *filename;
>>> +    char *sanitized_name;
>>> +    char *c;
>>> +    int fd;
>>> +
>>> +    if (!path_is_dir(path)) {
>>> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
>>> +
>>> +        flags |= O_EXCL;
>>> +        return open(path, flags);
>>> +    }
>>> +
>>> +    /* Make name safe to use with mkstemp by replacing '/' with 
>>> '_'. */
>>> +    sanitized_name = g_strdup(memory_region_name(block->mr));
>>> +    for (c = sanitized_name; *c != '\0'; c++) {
>>> +        if (*c == '/') {
>>> +            *c = '_';
>>> +        }
>>> +    }
>>> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>>> +                               sanitized_name);
>>> +    g_free(sanitized_name);
>>
>> one empty line will be very nice here, and it was in master branch
>>
>>> +    fd = mkstemp(filename);
>>> +    if (fd >= 0) {
>>> +        unlink(filename);
>>> +        /*
>>> +         * ftruncate is not supported by hugetlbfs in older
>>> +         * hosts, so don't bother bailing out on errors.
>>> +         * If anything goes wrong with it under other filesystems,
>>> +         * mmap will fail.
>>> +         */
>>> +        if (ftruncate(fd, size)) {
>>> +            perror("ftruncate");
>>> +        }
>>> +    }
>>> +    g_free(filename);
>>> +
>>> +    return fd;
>>> +}
>>> +
>>>   static void *file_ram_alloc(RAMBlock *block,
>>>                               ram_addr_t memory,
>>>                               const char *path,
>>>                               Error **errp)
>>>   {
>>> -    char *filename;
>>> -    char *sanitized_name;
>>> -    char *c;
>>>       void *area;
>>>       int fd;
>>>       uint64_t pagesize;
>>> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>>>           goto error;
>>>       }
>>> -    /* Make name safe to use with mkstemp by replacing '/' with 
>>> '_'. */
>>> -    sanitized_name = g_strdup(memory_region_name(block->mr));
>>> -    for (c = sanitized_name; *c != '\0'; c++) {
>>> -        if (*c == '/')
>>> -            *c = '_';
>>> -    }
>>> -
>>> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>>> -                               sanitized_name);
>>> -    g_free(sanitized_name);
>>> +    memory = ROUND_UP(memory, pagesize);
>>> -    fd = mkstemp(filename);
>>> +    fd = open_ram_file_path(block, path, memory);
>>>       if (fd < 0) {
>>>           error_setg_errno(errp, errno,
>>>                            "unable to create backing store for path 
>>> %s", path);
>>> -        g_free(filename);
>>>           goto error;
>>>       }
>>> -    unlink(filename);
>>> -    g_free(filename);
>>> -
>>> -    memory = ROUND_UP(memory, pagesize);
>>> -
>>> -    /*
>>> -     * ftruncate is not supported by hugetlbfs in older
>>> -     * hosts, so don't bother bailing out on errors.
>>> -     * If anything goes wrong with it under other filesystems,
>>> -     * mmap will fail.
>>> -     */
>>> -    if (ftruncate(fd, memory)) {
>>> -        perror("ftruncate");
>>> -    }
>>>       area = qemu_ram_mmap(fd, memory, pagesize, block->flags & 
>>> RAM_SHARED);
>>>       if (area == MAP_FAILED) {
>>
>>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>
> Thanks for your review.
>
>


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 12/35] util: let qemu_fd_getlength support block device
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02 16:11     ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 16:11 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake

On 02.11.2015 12:13, Xiao Guangrong wrote:
> lseek can not work for all block devices as the man page says:
> | Some devices are incapable of seeking and POSIX does not specify
> | which devices must support lseek().
>
> This patch tries to add the support on Linux by using BLKGETSIZE64
> ioctl
>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   util/osdep.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
>
> diff --git a/util/osdep.c b/util/osdep.c
> index 5a61e19..b20c793 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -45,6 +45,11 @@
>   extern int madvise(caddr_t, size_t, int);
>   #endif
>   
> +#ifdef CONFIG_LINUX
> +#include <sys/ioctl.h>
> +#include <linux/fs.h>
> +#endif
> +
>   #include "qemu-common.h"
>   #include "qemu/sockets.h"
>   #include "qemu/error-report.h"
> @@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
>   {
>       int64_t size;
>   
> +#ifdef CONFIG_LINUX
> +    struct stat stat_buf;
> +    if (fstat(fd, &stat_buf) < 0) {
> +        return -errno;
> +    }
> +
> +    if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) {
> +        /* The size of block device is larger than max int64_t? */
> +        if (size < 0) {
> +            return -EOVERFLOW;
> +        }
> +        return size;
> +    }
> +#endif
> +
>       size = lseek(fd, 0, SEEK_END);
>       if (size < 0) {
>           return -errno;

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

just a question: is there any use for stat.st_size ? Is it always worse 
then lseek? Does it work for blk?

also, "This patch tries to add..". Hmm. It looks like this patch is not 
sure about will it success. I'd prefer "This patch adds", but this is 
not important

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/35] util: let qemu_fd_getlength support block device
@ 2015-11-02 16:11     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 16:11 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth

On 02.11.2015 12:13, Xiao Guangrong wrote:
> lseek can not work for all block devices as the man page says:
> | Some devices are incapable of seeking and POSIX does not specify
> | which devices must support lseek().
>
> This patch tries to add the support on Linux by using BLKGETSIZE64
> ioctl
>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   util/osdep.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
>
> diff --git a/util/osdep.c b/util/osdep.c
> index 5a61e19..b20c793 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -45,6 +45,11 @@
>   extern int madvise(caddr_t, size_t, int);
>   #endif
>   
> +#ifdef CONFIG_LINUX
> +#include <sys/ioctl.h>
> +#include <linux/fs.h>
> +#endif
> +
>   #include "qemu-common.h"
>   #include "qemu/sockets.h"
>   #include "qemu/error-report.h"
> @@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
>   {
>       int64_t size;
>   
> +#ifdef CONFIG_LINUX
> +    struct stat stat_buf;
> +    if (fstat(fd, &stat_buf) < 0) {
> +        return -errno;
> +    }
> +
> +    if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) {
> +        /* The size of block device is larger than max int64_t? */
> +        if (size < 0) {
> +            return -EOVERFLOW;
> +        }
> +        return size;
> +    }
> +#endif
> +
>       size = lseek(fd, 0, SEEK_END);
>       if (size < 0) {
>           return -errno;

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

just a question: is there any use for stat.st_size ? Is it always worse 
then lseek? Does it work for blk?

also, "This patch tries to add..". Hmm. It looks like this patch is not 
sure about will it success. I'd prefer "This patch adds", but this is 
not important

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
  2015-11-02 15:06           ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02 16:16             ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 16:16 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake

On 02.11.2015 18:06, Xiao Guangrong wrote:
>
>
> On 11/02/2015 10:26 PM, Vladimir Sementsov-Ogievskiy wrote:
>> On 02.11.2015 16:08, Xiao Guangrong wrote:
>>>
>>>
>>> On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>> On 02.11.2015 12:13, Xiao Guangrong wrote:
>>>>> Curretly, the memory region of backed memory is directly mapped to
>>>>> guest's address space, however, it is not true for nvdimm device
>>>>>
>>>>> This patch let dimm device realize this fact and use
>>>>> DIMMDeviceClass->get_memory_region method to get the mapped memory
>>>>> region
>>>>>
>>>>> Current code did not check the return value of get_memory_region 
>>>>> as it
>>>>> assumed the backend memory of pc-dimm is always properly initialized,
>>>>> we make get_memory_region internally catch the case if something is
>>>>> wrong
>>
>> but here you call not pc-dimm's get_memory_region, but common 
>> ddc->get_memory_region, which may be
>> nvdimm or possibly other future dimm, so, why not check it here? And 
>> than pc_dimm_get_memory_region
>> may be left untouched (error_abort is ok, because errp is unused).
>
> Hmm, because 'here' is not the only place calling ->get_memory_region, 
> this method has
> multiple callers:
>
> $ git grep "\->get_memory_region"
> hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
> hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
> hw/mem/dimm.c:    mr = ddc->get_memory_region(dimm);
> hw/mem/nvdimm.c:    ddc->get_memory_region = nvdimm_get_memory_region;
> hw/mem/pc-dimm.c:    ddc->get_memory_region = pc_dimm_get_memory_region;
> hw/ppc/spapr.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
>
> memory region validation is also done for NVDIMM in nvdimm device.
>
Ok, then it should be documented by a comment in dimm.h, where 
DIMMDeviceClass is defined, that this function should not fail

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
@ 2015-11-02 16:16             ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 16:16 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth

On 02.11.2015 18:06, Xiao Guangrong wrote:
>
>
> On 11/02/2015 10:26 PM, Vladimir Sementsov-Ogievskiy wrote:
>> On 02.11.2015 16:08, Xiao Guangrong wrote:
>>>
>>>
>>> On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>> On 02.11.2015 12:13, Xiao Guangrong wrote:
>>>>> Curretly, the memory region of backed memory is directly mapped to
>>>>> guest's address space, however, it is not true for nvdimm device
>>>>>
>>>>> This patch let dimm device realize this fact and use
>>>>> DIMMDeviceClass->get_memory_region method to get the mapped memory
>>>>> region
>>>>>
>>>>> Current code did not check the return value of get_memory_region 
>>>>> as it
>>>>> assumed the backend memory of pc-dimm is always properly initialized,
>>>>> we make get_memory_region internally catch the case if something is
>>>>> wrong
>>
>> but here you call not pc-dimm's get_memory_region, but common 
>> ddc->get_memory_region, which may be
>> nvdimm or possibly other future dimm, so, why not check it here? And 
>> than pc_dimm_get_memory_region
>> may be left untouched (error_abort is ok, because errp is unused).
>
> Hmm, because 'here' is not the only place calling ->get_memory_region, 
> this method has
> multiple callers:
>
> $ git grep "\->get_memory_region"
> hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
> hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
> hw/mem/dimm.c:    mr = ddc->get_memory_region(dimm);
> hw/mem/nvdimm.c:    ddc->get_memory_region = nvdimm_get_memory_region;
> hw/mem/pc-dimm.c:    ddc->get_memory_region = pc_dimm_get_memory_region;
> hw/ppc/spapr.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
>
> memory region validation is also done for NVDIMM in nvdimm device.
>
Ok, then it should be documented by a comment in dimm.h, where 
DIMMDeviceClass is defined, that this function should not fail

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 12/35] util: let qemu_fd_getlength support block device
  2015-11-02 16:11     ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
@ 2015-11-02 16:21       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02 16:21 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake



On 11/03/2015 12:11 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.11.2015 12:13, Xiao Guangrong wrote:
>> lseek can not work for all block devices as the man page says:
>> | Some devices are incapable of seeking and POSIX does not specify
>> | which devices must support lseek().
>>
>> This patch tries to add the support on Linux by using BLKGETSIZE64
>> ioctl
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   util/osdep.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
>>
>> diff --git a/util/osdep.c b/util/osdep.c
>> index 5a61e19..b20c793 100644
>> --- a/util/osdep.c
>> +++ b/util/osdep.c
>> @@ -45,6 +45,11 @@
>>   extern int madvise(caddr_t, size_t, int);
>>   #endif
>> +#ifdef CONFIG_LINUX
>> +#include <sys/ioctl.h>
>> +#include <linux/fs.h>
>> +#endif
>> +
>>   #include "qemu-common.h"
>>   #include "qemu/sockets.h"
>>   #include "qemu/error-report.h"
>> @@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
>>   {
>>       int64_t size;
>> +#ifdef CONFIG_LINUX
>> +    struct stat stat_buf;
>> +    if (fstat(fd, &stat_buf) < 0) {
>> +        return -errno;
>> +    }
>> +
>> +    if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) {
>> +        /* The size of block device is larger than max int64_t? */
>> +        if (size < 0) {
>> +            return -EOVERFLOW;
>> +        }
>> +        return size;
>> +    }
>> +#endif
>> +
>>       size = lseek(fd, 0, SEEK_END);
>>       if (size < 0) {
>>           return -errno;
>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>
> just a question: is there any use for stat.st_size ? Is it always worse then lseek?

The man page says:
The  st_size field gives the size of the file (if it is a regular file or a symbolic link)
in bytes.  The size of a symbolic link is the length of the pathname it contains, without a
terminating null byte.

So it can not work on symbolic link.

> Does it work for
> blk?
>

Quickly checked with a program written by myself and 'stat' command, the answer is NO. :)

> also, "This patch tries to add..". Hmm. It looks like this patch is not sure about will it success.
> I'd prefer "This patch adds", but this is not important
>

Thanks for your sharing. I did not know the different, now, i got it. :)


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/35] util: let qemu_fd_getlength support block device
@ 2015-11-02 16:21       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-02 16:21 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth



On 11/03/2015 12:11 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.11.2015 12:13, Xiao Guangrong wrote:
>> lseek can not work for all block devices as the man page says:
>> | Some devices are incapable of seeking and POSIX does not specify
>> | which devices must support lseek().
>>
>> This patch tries to add the support on Linux by using BLKGETSIZE64
>> ioctl
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   util/osdep.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
>>
>> diff --git a/util/osdep.c b/util/osdep.c
>> index 5a61e19..b20c793 100644
>> --- a/util/osdep.c
>> +++ b/util/osdep.c
>> @@ -45,6 +45,11 @@
>>   extern int madvise(caddr_t, size_t, int);
>>   #endif
>> +#ifdef CONFIG_LINUX
>> +#include <sys/ioctl.h>
>> +#include <linux/fs.h>
>> +#endif
>> +
>>   #include "qemu-common.h"
>>   #include "qemu/sockets.h"
>>   #include "qemu/error-report.h"
>> @@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
>>   {
>>       int64_t size;
>> +#ifdef CONFIG_LINUX
>> +    struct stat stat_buf;
>> +    if (fstat(fd, &stat_buf) < 0) {
>> +        return -errno;
>> +    }
>> +
>> +    if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) {
>> +        /* The size of block device is larger than max int64_t? */
>> +        if (size < 0) {
>> +            return -EOVERFLOW;
>> +        }
>> +        return size;
>> +    }
>> +#endif
>> +
>>       size = lseek(fd, 0, SEEK_END);
>>       if (size < 0) {
>>           return -errno;
>
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>
> just a question: is there any use for stat.st_size ? Is it always worse then lseek?

The man page says:
The  st_size field gives the size of the file (if it is a regular file or a symbolic link)
in bytes.  The size of a symbolic link is the length of the pathname it contains, without a
terminating null byte.

So it can not work on symbolic link.

> Does it work for
> blk?
>

Quickly checked with a program written by myself and 'stat' command, the answer is NO. :)

> also, "This patch tries to add..". Hmm. It looks like this patch is not sure about will it success.
> I'd prefer "This patch adds", but this is not important
>

Thanks for your sharing. I did not know the different, now, i got it. :)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 13/35] hostmem-file: use whole file size if possible
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02 17:09     ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 17:09 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake

On 02.11.2015 12:13, Xiao Guangrong wrote:
> Use the whole file size if @size is not specified which is useful
> if we want to directly pass a file to guest
>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   backends/hostmem-file.c | 22 ++++++++++++++++++----
>   1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index 9097a57..ea355c1 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -38,15 +38,29 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>   {
>       HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
>   
> -    if (!backend->size) {
> -        error_setg(errp, "can't create backend with size 0");
> -        return;
> -    }
>       if (!fb->mem_path) {
>           error_setg(errp, "mem-path property not set");
>           return;
>       }
>   
> +    if (!backend->size) {
> +        Error *local_err = NULL;
> +
> +        /*
> +         * use the whole file size if @size is not specified.
> +         */
> +        backend->size = qemu_file_getlength(fb->mem_path, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +    }
> +
> +    if (!backend->size) {
> +        error_setg(errp, "can't create backend on the file whose size is 0");
> +        return;
> +    }
> +
>       backend->force_prealloc = mem_prealloc;
>       memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
>                                    object_get_canonical_path(OBJECT(backend)),

why not just

+    if (!backend->size) {
+        /*
+         * use the whole file size if @size is not specified.
+         */
+        backend->size = qemu_file_getlength(fb->mem_path, errp);
+        if (*errp) {
+            return;
+        }
+    }


what the purpose of propagating?

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 13/35] hostmem-file: use whole file size if possible
@ 2015-11-02 17:09     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-02 17:09 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth

On 02.11.2015 12:13, Xiao Guangrong wrote:
> Use the whole file size if @size is not specified which is useful
> if we want to directly pass a file to guest
>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   backends/hostmem-file.c | 22 ++++++++++++++++++----
>   1 file changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
> index 9097a57..ea355c1 100644
> --- a/backends/hostmem-file.c
> +++ b/backends/hostmem-file.c
> @@ -38,15 +38,29 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>   {
>       HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
>   
> -    if (!backend->size) {
> -        error_setg(errp, "can't create backend with size 0");
> -        return;
> -    }
>       if (!fb->mem_path) {
>           error_setg(errp, "mem-path property not set");
>           return;
>       }
>   
> +    if (!backend->size) {
> +        Error *local_err = NULL;
> +
> +        /*
> +         * use the whole file size if @size is not specified.
> +         */
> +        backend->size = qemu_file_getlength(fb->mem_path, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +    }
> +
> +    if (!backend->size) {
> +        error_setg(errp, "can't create backend on the file whose size is 0");
> +        return;
> +    }
> +
>       backend->force_prealloc = mem_prealloc;
>       memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
>                                    object_get_canonical_path(OBJECT(backend)),

why not just

+    if (!backend->size) {
+        /*
+         * use the whole file size if @size is not specified.
+         */
+        backend->size = qemu_file_getlength(fb->mem_path, errp);
+        if (*errp) {
+            return;
+        }
+    }


what the purpose of propagating?

-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-02 21:12     ` Paolo Bonzini
  -1 siblings, 0 replies; 200+ messages in thread
From: Paolo Bonzini @ 2015-11-02 21:12 UTC (permalink / raw)
  To: Xiao Guangrong, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake



On 02/11/2015 10:13, Xiao Guangrong wrote:
> Currently, file_ram_alloc() only works on directory - it creates a file
> under @path and do mmap on it
> 
> This patch tries to allow it to work on file directly, if @path is a
> directory it works as before, otherwise it treats @path as the target
> file then directly allocate memory from it
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  exec.c | 80 ++++++++++++++++++++++++++++++++++++++++++------------------------
>  1 file changed, 51 insertions(+), 29 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 9075f4d..db0fdaf 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>  }
>  
>  #ifdef __linux__
> +static bool path_is_dir(const char *path)
> +{
> +    struct stat fs;
> +
> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
> +}
> +
> +static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
> +{
> +    char *filename;
> +    char *sanitized_name;
> +    char *c;
> +    int fd;
> +
> +    if (!path_is_dir(path)) {
> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
> +
> +        flags |= O_EXCL;
> +        return open(path, flags);
> +    }
> +
> +    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
> +    sanitized_name = g_strdup(memory_region_name(block->mr));
> +    for (c = sanitized_name; *c != '\0'; c++) {
> +        if (*c == '/') {
> +            *c = '_';
> +        }
> +    }
> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
> +                               sanitized_name);
> +    g_free(sanitized_name);
> +    fd = mkstemp(filename);
> +    if (fd >= 0) {
> +        unlink(filename);
> +        /*
> +         * ftruncate is not supported by hugetlbfs in older
> +         * hosts, so don't bother bailing out on errors.
> +         * If anything goes wrong with it under other filesystems,
> +         * mmap will fail.
> +         */
> +        if (ftruncate(fd, size)) {
> +            perror("ftruncate");
> +        }
> +    }
> +    g_free(filename);
> +
> +    return fd;
> +}
> +
>  static void *file_ram_alloc(RAMBlock *block,
>                              ram_addr_t memory,
>                              const char *path,
>                              Error **errp)
>  {
> -    char *filename;
> -    char *sanitized_name;
> -    char *c;
>      void *area;
>      int fd;
>      uint64_t pagesize;
> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>          goto error;
>      }
>  
> -    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
> -    sanitized_name = g_strdup(memory_region_name(block->mr));
> -    for (c = sanitized_name; *c != '\0'; c++) {
> -        if (*c == '/')
> -            *c = '_';
> -    }
> -
> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
> -                               sanitized_name);
> -    g_free(sanitized_name);
> +    memory = ROUND_UP(memory, pagesize);
>  
> -    fd = mkstemp(filename);
> +    fd = open_ram_file_path(block, path, memory);
>      if (fd < 0) {
>          error_setg_errno(errp, errno,
>                           "unable to create backing store for path %s", path);
> -        g_free(filename);
>          goto error;
>      }
> -    unlink(filename);
> -    g_free(filename);
> -
> -    memory = ROUND_UP(memory, pagesize);
> -
> -    /*
> -     * ftruncate is not supported by hugetlbfs in older
> -     * hosts, so don't bother bailing out on errors.
> -     * If anything goes wrong with it under other filesystems,
> -     * mmap will fail.
> -     */
> -    if (ftruncate(fd, memory)) {
> -        perror("ftruncate");
> -    }
>  
>      area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>      if (area == MAP_FAILED) {
> 

I was going to send tomorrow a pull request for a similar patch,
"backends/hostmem-file: Allow to specify full pathname for backing file".

The main difference seems to be your usage of O_EXCL.  Can you explain
why you added it?

Paolo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
@ 2015-11-02 21:12     ` Paolo Bonzini
  0 siblings, 0 replies; 200+ messages in thread
From: Paolo Bonzini @ 2015-11-02 21:12 UTC (permalink / raw)
  To: Xiao Guangrong, imammedo
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, dan.j.williams, rth



On 02/11/2015 10:13, Xiao Guangrong wrote:
> Currently, file_ram_alloc() only works on directory - it creates a file
> under @path and do mmap on it
> 
> This patch tries to allow it to work on file directly, if @path is a
> directory it works as before, otherwise it treats @path as the target
> file then directly allocate memory from it
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  exec.c | 80 ++++++++++++++++++++++++++++++++++++++++++------------------------
>  1 file changed, 51 insertions(+), 29 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 9075f4d..db0fdaf 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>  }
>  
>  #ifdef __linux__
> +static bool path_is_dir(const char *path)
> +{
> +    struct stat fs;
> +
> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
> +}
> +
> +static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
> +{
> +    char *filename;
> +    char *sanitized_name;
> +    char *c;
> +    int fd;
> +
> +    if (!path_is_dir(path)) {
> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
> +
> +        flags |= O_EXCL;
> +        return open(path, flags);
> +    }
> +
> +    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
> +    sanitized_name = g_strdup(memory_region_name(block->mr));
> +    for (c = sanitized_name; *c != '\0'; c++) {
> +        if (*c == '/') {
> +            *c = '_';
> +        }
> +    }
> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
> +                               sanitized_name);
> +    g_free(sanitized_name);
> +    fd = mkstemp(filename);
> +    if (fd >= 0) {
> +        unlink(filename);
> +        /*
> +         * ftruncate is not supported by hugetlbfs in older
> +         * hosts, so don't bother bailing out on errors.
> +         * If anything goes wrong with it under other filesystems,
> +         * mmap will fail.
> +         */
> +        if (ftruncate(fd, size)) {
> +            perror("ftruncate");
> +        }
> +    }
> +    g_free(filename);
> +
> +    return fd;
> +}
> +
>  static void *file_ram_alloc(RAMBlock *block,
>                              ram_addr_t memory,
>                              const char *path,
>                              Error **errp)
>  {
> -    char *filename;
> -    char *sanitized_name;
> -    char *c;
>      void *area;
>      int fd;
>      uint64_t pagesize;
> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>          goto error;
>      }
>  
> -    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
> -    sanitized_name = g_strdup(memory_region_name(block->mr));
> -    for (c = sanitized_name; *c != '\0'; c++) {
> -        if (*c == '/')
> -            *c = '_';
> -    }
> -
> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
> -                               sanitized_name);
> -    g_free(sanitized_name);
> +    memory = ROUND_UP(memory, pagesize);
>  
> -    fd = mkstemp(filename);
> +    fd = open_ram_file_path(block, path, memory);
>      if (fd < 0) {
>          error_setg_errno(errp, errno,
>                           "unable to create backing store for path %s", path);
> -        g_free(filename);
>          goto error;
>      }
> -    unlink(filename);
> -    g_free(filename);
> -
> -    memory = ROUND_UP(memory, pagesize);
> -
> -    /*
> -     * ftruncate is not supported by hugetlbfs in older
> -     * hosts, so don't bother bailing out on errors.
> -     * If anything goes wrong with it under other filesystems,
> -     * mmap will fail.
> -     */
> -    if (ftruncate(fd, memory)) {
> -        perror("ftruncate");
> -    }
>  
>      area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>      if (area == MAP_FAILED) {
> 

I was going to send tomorrow a pull request for a similar patch,
"backends/hostmem-file: Allow to specify full pathname for backing file".

The main difference seems to be your usage of O_EXCL.  Can you explain
why you added it?

Paolo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
  2015-11-02 21:12     ` [Qemu-devel] " Paolo Bonzini
@ 2015-11-03  3:56       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-03  3:56 UTC (permalink / raw)
  To: Paolo Bonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake



On 11/03/2015 05:12 AM, Paolo Bonzini wrote:
>
>
> On 02/11/2015 10:13, Xiao Guangrong wrote:
>> Currently, file_ram_alloc() only works on directory - it creates a file
>> under @path and do mmap on it
>>
>> This patch tries to allow it to work on file directly, if @path is a
>> directory it works as before, otherwise it treats @path as the target
>> file then directly allocate memory from it
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   exec.c | 80 ++++++++++++++++++++++++++++++++++++++++++------------------------
>>   1 file changed, 51 insertions(+), 29 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 9075f4d..db0fdaf 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>>   }
>>
>>   #ifdef __linux__
>> +static bool path_is_dir(const char *path)
>> +{
>> +    struct stat fs;
>> +
>> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
>> +}
>> +
>> +static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
>> +{
>> +    char *filename;
>> +    char *sanitized_name;
>> +    char *c;
>> +    int fd;
>> +
>> +    if (!path_is_dir(path)) {
>> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
>> +
>> +        flags |= O_EXCL;
>> +        return open(path, flags);
>> +    }
>> +
>> +    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>> +    sanitized_name = g_strdup(memory_region_name(block->mr));
>> +    for (c = sanitized_name; *c != '\0'; c++) {
>> +        if (*c == '/') {
>> +            *c = '_';
>> +        }
>> +    }
>> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>> +                               sanitized_name);
>> +    g_free(sanitized_name);
>> +    fd = mkstemp(filename);
>> +    if (fd >= 0) {
>> +        unlink(filename);
>> +        /*
>> +         * ftruncate is not supported by hugetlbfs in older
>> +         * hosts, so don't bother bailing out on errors.
>> +         * If anything goes wrong with it under other filesystems,
>> +         * mmap will fail.
>> +         */
>> +        if (ftruncate(fd, size)) {
>> +            perror("ftruncate");
>> +        }
>> +    }
>> +    g_free(filename);
>> +
>> +    return fd;
>> +}
>> +
>>   static void *file_ram_alloc(RAMBlock *block,
>>                               ram_addr_t memory,
>>                               const char *path,
>>                               Error **errp)
>>   {
>> -    char *filename;
>> -    char *sanitized_name;
>> -    char *c;
>>       void *area;
>>       int fd;
>>       uint64_t pagesize;
>> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>>           goto error;
>>       }
>>
>> -    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>> -    sanitized_name = g_strdup(memory_region_name(block->mr));
>> -    for (c = sanitized_name; *c != '\0'; c++) {
>> -        if (*c == '/')
>> -            *c = '_';
>> -    }
>> -
>> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>> -                               sanitized_name);
>> -    g_free(sanitized_name);
>> +    memory = ROUND_UP(memory, pagesize);
>>
>> -    fd = mkstemp(filename);
>> +    fd = open_ram_file_path(block, path, memory);
>>       if (fd < 0) {
>>           error_setg_errno(errp, errno,
>>                            "unable to create backing store for path %s", path);
>> -        g_free(filename);
>>           goto error;
>>       }
>> -    unlink(filename);
>> -    g_free(filename);
>> -
>> -    memory = ROUND_UP(memory, pagesize);
>> -
>> -    /*
>> -     * ftruncate is not supported by hugetlbfs in older
>> -     * hosts, so don't bother bailing out on errors.
>> -     * If anything goes wrong with it under other filesystems,
>> -     * mmap will fail.
>> -     */
>> -    if (ftruncate(fd, memory)) {
>> -        perror("ftruncate");
>> -    }
>>
>>       area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>>       if (area == MAP_FAILED) {
>>
>
> I was going to send tomorrow a pull request for a similar patch,
> "backends/hostmem-file: Allow to specify full pathname for backing file".
>
> The main difference seems to be your usage of O_EXCL.  Can you explain
> why you added it?

It' used if we pass a block device as a NVDIMM backend memory:
  O_EXCL can be used without O_CREAT if pathname refers to a block device.  If the block device
  is in use by the system (e.g., mounted), open() fails with the error EBUSY

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
@ 2015-11-03  3:56       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-03  3:56 UTC (permalink / raw)
  To: Paolo Bonzini, imammedo
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, dan.j.williams, rth



On 11/03/2015 05:12 AM, Paolo Bonzini wrote:
>
>
> On 02/11/2015 10:13, Xiao Guangrong wrote:
>> Currently, file_ram_alloc() only works on directory - it creates a file
>> under @path and do mmap on it
>>
>> This patch tries to allow it to work on file directly, if @path is a
>> directory it works as before, otherwise it treats @path as the target
>> file then directly allocate memory from it
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   exec.c | 80 ++++++++++++++++++++++++++++++++++++++++++------------------------
>>   1 file changed, 51 insertions(+), 29 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 9075f4d..db0fdaf 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>>   }
>>
>>   #ifdef __linux__
>> +static bool path_is_dir(const char *path)
>> +{
>> +    struct stat fs;
>> +
>> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
>> +}
>> +
>> +static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
>> +{
>> +    char *filename;
>> +    char *sanitized_name;
>> +    char *c;
>> +    int fd;
>> +
>> +    if (!path_is_dir(path)) {
>> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
>> +
>> +        flags |= O_EXCL;
>> +        return open(path, flags);
>> +    }
>> +
>> +    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>> +    sanitized_name = g_strdup(memory_region_name(block->mr));
>> +    for (c = sanitized_name; *c != '\0'; c++) {
>> +        if (*c == '/') {
>> +            *c = '_';
>> +        }
>> +    }
>> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>> +                               sanitized_name);
>> +    g_free(sanitized_name);
>> +    fd = mkstemp(filename);
>> +    if (fd >= 0) {
>> +        unlink(filename);
>> +        /*
>> +         * ftruncate is not supported by hugetlbfs in older
>> +         * hosts, so don't bother bailing out on errors.
>> +         * If anything goes wrong with it under other filesystems,
>> +         * mmap will fail.
>> +         */
>> +        if (ftruncate(fd, size)) {
>> +            perror("ftruncate");
>> +        }
>> +    }
>> +    g_free(filename);
>> +
>> +    return fd;
>> +}
>> +
>>   static void *file_ram_alloc(RAMBlock *block,
>>                               ram_addr_t memory,
>>                               const char *path,
>>                               Error **errp)
>>   {
>> -    char *filename;
>> -    char *sanitized_name;
>> -    char *c;
>>       void *area;
>>       int fd;
>>       uint64_t pagesize;
>> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>>           goto error;
>>       }
>>
>> -    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>> -    sanitized_name = g_strdup(memory_region_name(block->mr));
>> -    for (c = sanitized_name; *c != '\0'; c++) {
>> -        if (*c == '/')
>> -            *c = '_';
>> -    }
>> -
>> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>> -                               sanitized_name);
>> -    g_free(sanitized_name);
>> +    memory = ROUND_UP(memory, pagesize);
>>
>> -    fd = mkstemp(filename);
>> +    fd = open_ram_file_path(block, path, memory);
>>       if (fd < 0) {
>>           error_setg_errno(errp, errno,
>>                            "unable to create backing store for path %s", path);
>> -        g_free(filename);
>>           goto error;
>>       }
>> -    unlink(filename);
>> -    g_free(filename);
>> -
>> -    memory = ROUND_UP(memory, pagesize);
>> -
>> -    /*
>> -     * ftruncate is not supported by hugetlbfs in older
>> -     * hosts, so don't bother bailing out on errors.
>> -     * If anything goes wrong with it under other filesystems,
>> -     * mmap will fail.
>> -     */
>> -    if (ftruncate(fd, memory)) {
>> -        perror("ftruncate");
>> -    }
>>
>>       area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>>       if (area == MAP_FAILED) {
>>
>
> I was going to send tomorrow a pull request for a similar patch,
> "backends/hostmem-file: Allow to specify full pathname for backing file".
>
> The main difference seems to be your usage of O_EXCL.  Can you explain
> why you added it?

It' used if we pass a block device as a NVDIMM backend memory:
  O_EXCL can be used without O_CREAT if pathname refers to a block device.  If the block device
  is in use by the system (e.g., mounted), open() fails with the error EBUSY

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 03/35] acpi: add aml_create_field
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-03  6:14     ` Shannon Zhao
  -1 siblings, 0 replies; 200+ messages in thread
From: Shannon Zhao @ 2015-11-03  6:14 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, dan.j.williams, rth



On 2015/11/2 17:13, Xiao Guangrong wrote:
> Implement CreateField term which is used by NVDIMM _DSM method in later patch
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  hw/acpi/aml-build.c         | 13 +++++++++++++
>  include/hw/acpi/aml-build.h |  1 +
>  2 files changed, 14 insertions(+)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index a72214d..9fe5e7b 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1151,6 +1151,19 @@ Aml *aml_sizeof(Aml *arg)
>      return var;
>  }
>  
> +/* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefCreateField */
> +Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name)
> +{
> +    Aml *var = aml_alloc();
> +    build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
> +    build_append_byte(var->buf, 0x13); /* CreateFieldOp */
> +    aml_append(var, srcbuf);
> +    aml_append(var, index);
> +    aml_append(var, len);
> +    build_append_namestring(var->buf, "%s", name);
> +    return var;
> +}
> +
>  void
>  build_header(GArray *linker, GArray *table_data,
>               AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 7296efb..7e1c43b 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -276,6 +276,7 @@ Aml *aml_touuid(const char *uuid);
>  Aml *aml_unicode(const char *str);
>  Aml *aml_derefof(Aml *arg);
>  Aml *aml_sizeof(Aml *arg);
> +Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
>  
Maybe this could be moved together with existing aml_create_dword_field.

>  void
>  build_header(GArray *linker, GArray *table_data,
> 

-- 
Shannon

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/35] acpi: add aml_create_field
@ 2015-11-03  6:14     ` Shannon Zhao
  0 siblings, 0 replies; 200+ messages in thread
From: Shannon Zhao @ 2015-11-03  6:14 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, dan.j.williams, rth



On 2015/11/2 17:13, Xiao Guangrong wrote:
> Implement CreateField term which is used by NVDIMM _DSM method in later patch
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  hw/acpi/aml-build.c         | 13 +++++++++++++
>  include/hw/acpi/aml-build.h |  1 +
>  2 files changed, 14 insertions(+)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index a72214d..9fe5e7b 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1151,6 +1151,19 @@ Aml *aml_sizeof(Aml *arg)
>      return var;
>  }
>  
> +/* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefCreateField */
> +Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name)
> +{
> +    Aml *var = aml_alloc();
> +    build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
> +    build_append_byte(var->buf, 0x13); /* CreateFieldOp */
> +    aml_append(var, srcbuf);
> +    aml_append(var, index);
> +    aml_append(var, len);
> +    build_append_namestring(var->buf, "%s", name);
> +    return var;
> +}
> +
>  void
>  build_header(GArray *linker, GArray *table_data,
>               AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 7296efb..7e1c43b 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -276,6 +276,7 @@ Aml *aml_touuid(const char *uuid);
>  Aml *aml_unicode(const char *str);
>  Aml *aml_derefof(Aml *arg);
>  Aml *aml_sizeof(Aml *arg);
> +Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
>  
Maybe this could be moved together with existing aml_create_dword_field.

>  void
>  build_header(GArray *linker, GArray *table_data,
> 

-- 
Shannon

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 06/35] acpi: add aml_method_serialized
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-03 12:30     ` Igor Mammedov
  -1 siblings, 0 replies; 200+ messages in thread
From: Igor Mammedov @ 2015-11-03 12:30 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, vsementsov, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth

On Mon,  2 Nov 2015 17:13:08 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> It avoid explicit Mutex and will be used by NVDIMM ACPI
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  hw/acpi/aml-build.c         | 26 ++++++++++++++++++++++++--
>  include/hw/acpi/aml-build.h |  1 +
>  2 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 9f792ab..8bee8b2 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -696,14 +696,36 @@ Aml *aml_while(Aml *predicate)
>  }
>  
>  /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefMethod */
> -Aml *aml_method(const char *name, int arg_count)
> +static Aml *__aml_method(const char *name, int arg_count, bool serialized)
We don't have many users of aml_method() yet, so I'd prefer to have a single
vs multiple function call:

I suggest to do something like:
typedef enum {
    AML_NONSERIALIZED = 0,
    AML_SERIALIZED = 1,
} AmlSerializeRule;

aml_method(const char *name, AmlSerializeRule rule, int synclevel);

with current users fixed up with AML_NONSERIALIZED argument. 

>  {
>      Aml *var = aml_bundle(0x14 /* MethodOp */, AML_PACKAGE);
> +    int methodflags;
> +
> +    /*
> +     * MethodFlags:
> +     *   bit 0-2: ArgCount (0-7)
> +     *   bit 3: SerializeFlag
> +     *     0: NotSerialized
> +     *     1: Serialized
> +     *   bit 4-7: reserved (must be 0)
> +     */
> +    assert(!(arg_count & ~7));
> +    methodflags = arg_count | (serialized << 3);
>      build_append_namestring(var->buf, "%s", name);
> -    build_append_byte(var->buf, arg_count); /* MethodFlags: ArgCount */
> +    build_append_byte(var->buf, methodflags);
>      return var;
>  }
>  
> +Aml *aml_method(const char *name, int arg_count)
> +{
> +    return __aml_method(name, arg_count, false);
> +}
> +
> +Aml *aml_method_serialized(const char *name, int arg_count)
> +{
> +    return __aml_method(name, arg_count, true);
> +}
> +
>  /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefDevice */
>  Aml *aml_device(const char *name_format, ...)
>  {
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 5b8a118..00cf40e 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -263,6 +263,7 @@ Aml *aml_qword_memory(AmlDecode dec, AmlMinFixed min_fixed,
>  Aml *aml_scope(const char *name_format, ...) GCC_FMT_ATTR(1, 2);
>  Aml *aml_device(const char *name_format, ...) GCC_FMT_ATTR(1, 2);
>  Aml *aml_method(const char *name, int arg_count);
> +Aml *aml_method_serialized(const char *name, int arg_count);
>  Aml *aml_if(Aml *predicate);
>  Aml *aml_else(void);
>  Aml *aml_while(Aml *predicate);


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 06/35] acpi: add aml_method_serialized
@ 2015-11-03 12:30     ` Igor Mammedov
  0 siblings, 0 replies; 200+ messages in thread
From: Igor Mammedov @ 2015-11-03 12:30 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth

On Mon,  2 Nov 2015 17:13:08 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> It avoid explicit Mutex and will be used by NVDIMM ACPI
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  hw/acpi/aml-build.c         | 26 ++++++++++++++++++++++++--
>  include/hw/acpi/aml-build.h |  1 +
>  2 files changed, 25 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 9f792ab..8bee8b2 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -696,14 +696,36 @@ Aml *aml_while(Aml *predicate)
>  }
>  
>  /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefMethod */
> -Aml *aml_method(const char *name, int arg_count)
> +static Aml *__aml_method(const char *name, int arg_count, bool serialized)
We don't have many users of aml_method() yet, so I'd prefer to have a single
vs multiple function call:

I suggest to do something like:
typedef enum {
    AML_NONSERIALIZED = 0,
    AML_SERIALIZED = 1,
} AmlSerializeRule;

aml_method(const char *name, AmlSerializeRule rule, int synclevel);

with current users fixed up with AML_NONSERIALIZED argument. 

>  {
>      Aml *var = aml_bundle(0x14 /* MethodOp */, AML_PACKAGE);
> +    int methodflags;
> +
> +    /*
> +     * MethodFlags:
> +     *   bit 0-2: ArgCount (0-7)
> +     *   bit 3: SerializeFlag
> +     *     0: NotSerialized
> +     *     1: Serialized
> +     *   bit 4-7: reserved (must be 0)
> +     */
> +    assert(!(arg_count & ~7));
> +    methodflags = arg_count | (serialized << 3);
>      build_append_namestring(var->buf, "%s", name);
> -    build_append_byte(var->buf, arg_count); /* MethodFlags: ArgCount */
> +    build_append_byte(var->buf, methodflags);
>      return var;
>  }
>  
> +Aml *aml_method(const char *name, int arg_count)
> +{
> +    return __aml_method(name, arg_count, false);
> +}
> +
> +Aml *aml_method_serialized(const char *name, int arg_count)
> +{
> +    return __aml_method(name, arg_count, true);
> +}
> +
>  /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefDevice */
>  Aml *aml_device(const char *name_format, ...)
>  {
> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
> index 5b8a118..00cf40e 100644
> --- a/include/hw/acpi/aml-build.h
> +++ b/include/hw/acpi/aml-build.h
> @@ -263,6 +263,7 @@ Aml *aml_qword_memory(AmlDecode dec, AmlMinFixed min_fixed,
>  Aml *aml_scope(const char *name_format, ...) GCC_FMT_ATTR(1, 2);
>  Aml *aml_device(const char *name_format, ...) GCC_FMT_ATTR(1, 2);
>  Aml *aml_method(const char *name, int arg_count);
> +Aml *aml_method_serialized(const char *name, int arg_count);
>  Aml *aml_if(Aml *predicate);
>  Aml *aml_else(void);
>  Aml *aml_while(Aml *predicate);

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-03 12:34     ` Igor Mammedov
  -1 siblings, 0 replies; 200+ messages in thread
From: Igor Mammedov @ 2015-11-03 12:34 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, gleb, mtosatti, stefanha, mst, rth, ehabkost,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake

On Mon,  2 Nov 2015 17:13:11 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> Currently, file_ram_alloc() only works on directory - it creates a file
> under @path and do mmap on it
> 
> This patch tries to allow it to work on file directly, if @path is a
> directory it works as before, otherwise it treats @path as the target
> file then directly allocate memory from it
Paolo has just queued
https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg06513.html
perhaps that's what you can reuse here.
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  exec.c | 80 ++++++++++++++++++++++++++++++++++++++++++------------------------
>  1 file changed, 51 insertions(+), 29 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 9075f4d..db0fdaf 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>  }
>  
>  #ifdef __linux__
> +static bool path_is_dir(const char *path)
> +{
> +    struct stat fs;
> +
> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
> +}
> +
> +static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
> +{
> +    char *filename;
> +    char *sanitized_name;
> +    char *c;
> +    int fd;
> +
> +    if (!path_is_dir(path)) {
> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
> +
> +        flags |= O_EXCL;
> +        return open(path, flags);
> +    }
> +
> +    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
> +    sanitized_name = g_strdup(memory_region_name(block->mr));
> +    for (c = sanitized_name; *c != '\0'; c++) {
> +        if (*c == '/') {
> +            *c = '_';
> +        }
> +    }
> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
> +                               sanitized_name);
> +    g_free(sanitized_name);
> +    fd = mkstemp(filename);
> +    if (fd >= 0) {
> +        unlink(filename);
> +        /*
> +         * ftruncate is not supported by hugetlbfs in older
> +         * hosts, so don't bother bailing out on errors.
> +         * If anything goes wrong with it under other filesystems,
> +         * mmap will fail.
> +         */
> +        if (ftruncate(fd, size)) {
> +            perror("ftruncate");
> +        }
> +    }
> +    g_free(filename);
> +
> +    return fd;
> +}
> +
>  static void *file_ram_alloc(RAMBlock *block,
>                              ram_addr_t memory,
>                              const char *path,
>                              Error **errp)
>  {
> -    char *filename;
> -    char *sanitized_name;
> -    char *c;
>      void *area;
>      int fd;
>      uint64_t pagesize;
> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>          goto error;
>      }
>  
> -    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
> -    sanitized_name = g_strdup(memory_region_name(block->mr));
> -    for (c = sanitized_name; *c != '\0'; c++) {
> -        if (*c == '/')
> -            *c = '_';
> -    }
> -
> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
> -                               sanitized_name);
> -    g_free(sanitized_name);
> +    memory = ROUND_UP(memory, pagesize);
>  
> -    fd = mkstemp(filename);
> +    fd = open_ram_file_path(block, path, memory);
>      if (fd < 0) {
>          error_setg_errno(errp, errno,
>                           "unable to create backing store for path %s", path);
> -        g_free(filename);
>          goto error;
>      }
> -    unlink(filename);
> -    g_free(filename);
> -
> -    memory = ROUND_UP(memory, pagesize);
> -
> -    /*
> -     * ftruncate is not supported by hugetlbfs in older
> -     * hosts, so don't bother bailing out on errors.
> -     * If anything goes wrong with it under other filesystems,
> -     * mmap will fail.
> -     */
> -    if (ftruncate(fd, memory)) {
> -        perror("ftruncate");
> -    }
>  
>      area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>      if (area == MAP_FAILED) {


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
@ 2015-11-03 12:34     ` Igor Mammedov
  0 siblings, 0 replies; 200+ messages in thread
From: Igor Mammedov @ 2015-11-03 12:34 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth

On Mon,  2 Nov 2015 17:13:11 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> Currently, file_ram_alloc() only works on directory - it creates a file
> under @path and do mmap on it
> 
> This patch tries to allow it to work on file directly, if @path is a
> directory it works as before, otherwise it treats @path as the target
> file then directly allocate memory from it
Paolo has just queued
https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg06513.html
perhaps that's what you can reuse here.
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  exec.c | 80 ++++++++++++++++++++++++++++++++++++++++++------------------------
>  1 file changed, 51 insertions(+), 29 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 9075f4d..db0fdaf 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>  }
>  
>  #ifdef __linux__
> +static bool path_is_dir(const char *path)
> +{
> +    struct stat fs;
> +
> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
> +}
> +
> +static int open_ram_file_path(RAMBlock *block, const char *path, size_t size)
> +{
> +    char *filename;
> +    char *sanitized_name;
> +    char *c;
> +    int fd;
> +
> +    if (!path_is_dir(path)) {
> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
> +
> +        flags |= O_EXCL;
> +        return open(path, flags);
> +    }
> +
> +    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
> +    sanitized_name = g_strdup(memory_region_name(block->mr));
> +    for (c = sanitized_name; *c != '\0'; c++) {
> +        if (*c == '/') {
> +            *c = '_';
> +        }
> +    }
> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
> +                               sanitized_name);
> +    g_free(sanitized_name);
> +    fd = mkstemp(filename);
> +    if (fd >= 0) {
> +        unlink(filename);
> +        /*
> +         * ftruncate is not supported by hugetlbfs in older
> +         * hosts, so don't bother bailing out on errors.
> +         * If anything goes wrong with it under other filesystems,
> +         * mmap will fail.
> +         */
> +        if (ftruncate(fd, size)) {
> +            perror("ftruncate");
> +        }
> +    }
> +    g_free(filename);
> +
> +    return fd;
> +}
> +
>  static void *file_ram_alloc(RAMBlock *block,
>                              ram_addr_t memory,
>                              const char *path,
>                              Error **errp)
>  {
> -    char *filename;
> -    char *sanitized_name;
> -    char *c;
>      void *area;
>      int fd;
>      uint64_t pagesize;
> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>          goto error;
>      }
>  
> -    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
> -    sanitized_name = g_strdup(memory_region_name(block->mr));
> -    for (c = sanitized_name; *c != '\0'; c++) {
> -        if (*c == '/')
> -            *c = '_';
> -    }
> -
> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
> -                               sanitized_name);
> -    g_free(sanitized_name);
> +    memory = ROUND_UP(memory, pagesize);
>  
> -    fd = mkstemp(filename);
> +    fd = open_ram_file_path(block, path, memory);
>      if (fd < 0) {
>          error_setg_errno(errp, errno,
>                           "unable to create backing store for path %s", path);
> -        g_free(filename);
>          goto error;
>      }
> -    unlink(filename);
> -    g_free(filename);
> -
> -    memory = ROUND_UP(memory, pagesize);
> -
> -    /*
> -     * ftruncate is not supported by hugetlbfs in older
> -     * hosts, so don't bother bailing out on errors.
> -     * If anything goes wrong with it under other filesystems,
> -     * mmap will fail.
> -     */
> -    if (ftruncate(fd, memory)) {
> -        perror("ftruncate");
> -    }
>  
>      area = qemu_ram_mmap(fd, memory, pagesize, block->flags & RAM_SHARED);
>      if (area == MAP_FAILED) {

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 27/35] nvdimm acpi: build ACPI nvdimm devices
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-03 13:13     ` Igor Mammedov
  -1 siblings, 0 replies; 200+ messages in thread
From: Igor Mammedov @ 2015-11-03 13:13 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, gleb, mtosatti, stefanha, mst, rth, ehabkost,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake

On Mon,  2 Nov 2015 17:13:29 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices
> 
> There is a root device under \_SB and specified NVDIMM devices are under the
> root device. Each NVDIMM device has _ADR which returns its handle used to
> associate MEMDEV structure in NFIT
> 
> We reserve handle 0 for root device. In this patch, we save handle, handle,
> arg1 and arg2 to dsm memory. Arg3 is conditionally saved in later patch
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  hw/acpi/nvdimm.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 184 insertions(+)
> 
> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> index dd84e5f..53ed675 100644
> --- a/hw/acpi/nvdimm.c
> +++ b/hw/acpi/nvdimm.c
> @@ -368,6 +368,15 @@ static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
>      g_array_free(structures, true);
>  }
>  
> +struct NvdimmDsmIn {
> +    uint32_t handle;
> +    uint32_t revision;
> +    uint32_t function;
> +   /* the remaining size in the page is used by arg3. */
> +    uint8_t arg3[0];
> +} QEMU_PACKED;
> +typedef struct NvdimmDsmIn NvdimmDsmIn;
> +
>  static uint64_t
>  nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
>  {
> @@ -377,6 +386,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
>  static void
>  nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
>  {
> +    fprintf(stderr, "BUG: we never write DSM notification IO Port.\n");
it doesn't seem like this hunk belongs here

>  }
>  
>  static const MemoryRegionOps nvdimm_dsm_ops = {
> @@ -402,6 +412,179 @@ void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
>      memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
>  }
>  
> +#define BUILD_STA_METHOD(_dev_, _method_)                                  \
> +    do {                                                                   \
> +        _method_ = aml_method("_STA", 0);                                  \
> +        aml_append(_method_, aml_return(aml_int(0x0f)));                   \
> +        aml_append(_dev_, _method_);                                       \
> +    } while (0)
_STA doesn't have any logic here so drop macro and just
replace its call sites with:

aml_append(foo_dev, aml_name_decl("_STA", aml_int(0xf));


> +
> +#define BUILD_DSM_METHOD(_dev_, _method_, _handle_, _uuid_)                \
> +    do {                                                                   \
> +        Aml *ifctx, *uuid;                                                 \
> +        _method_ = aml_method("_DSM", 4);                                  \
> +        /* check UUID if it is we expect, return the errorcode if not.*/   \
> +        uuid = aml_touuid(_uuid_);                                         \
> +        ifctx = aml_if(aml_lnot(aml_equal(aml_arg(0), uuid)));             \
> +        aml_append(ifctx, aml_return(aml_int(1 /* Not Supported */)));     \
> +        aml_append(method, ifctx);                                         \
> +        aml_append(method, aml_return(aml_call4("NCAL", aml_int(_handle_), \
> +                   aml_arg(1), aml_arg(2), aml_arg(3))));                  \
> +        aml_append(_dev_, _method_);                                       \
> +    } while (0)
> +
> +#define BUILD_FIELD_UNIT_SIZE(_field_, _byte_, _name_)                     \
> +    aml_append(_field_, aml_named_field(_name_, (_byte_) * BITS_PER_BYTE))
> +
> +#define BUILD_FIELD_UNIT_STRUCT(_field_, _s_, _f_, _name_)                 \
> +    BUILD_FIELD_UNIT_SIZE(_field_, sizeof(typeof_field(_s_, _f_)), _name_)
> +
> +static void build_nvdimm_devices(GSList *device_list, Aml *root_dev)
> +{
> +    for (; device_list; device_list = device_list->next) {
> +        NVDIMMDevice *nvdimm = device_list->data;
> +        int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
> +                                           NULL);
> +        uint32_t handle = nvdimm_slot_to_handle(slot);
> +        Aml *dev, *method;
> +
> +        dev = aml_device("NV%02X", slot);
> +        aml_append(dev, aml_name_decl("_ADR", aml_int(handle)));
> +
> +        BUILD_STA_METHOD(dev, method);
> +
> +        /*
> +         * Chapter 4: _DSM Interface for NVDIMM Device (non-root) - Example
> +         * in DSM Spec Rev1.
> +         */
> +        BUILD_DSM_METHOD(dev, method,
> +                         handle /* NVDIMM Device Handle */,
> +                         "4309AC30-0D11-11E4-9191-0800200C9A66"
> +                         /* UUID for NVDIMM Devices. */);
this will add N-bytes * #NVDIMMS in worst case.
Please drop macro and just consolidate this method into _DSM method of parent scope
and then call it from here like this:
   Method(_DSM, 4)
       Return(^_DSM(Arg[0-3]))

> +
> +        aml_append(root_dev, dev);
> +    }
> +}
> +
> +static void nvdimm_build_acpi_devices(GSList *device_list, Aml *sb_scope)
> +{
> +    Aml *dev, *method, *field;
> +    uint64_t page_size = TARGET_PAGE_SIZE;
> +
> +    dev = aml_device("NVDR");
> +    aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
> +
> +    /* map DSM memory and IO into ACPI namespace. */
> +    aml_append(dev, aml_operation_region("NPIO", AML_SYSTEM_IO,
> +               NVDIMM_ACPI_IO_BASE, NVDIMM_ACPI_IO_LEN));
> +    aml_append(dev, aml_operation_region("NRAM", AML_SYSTEM_MEMORY,
> +               NVDIMM_ACPI_MEM_BASE, page_size));
> +
> +    /*
> +     * DSM notifier:
> +     * @NOTI: Read it will notify QEMU that _DSM method is being
> +     *        called and the parameters can be found in NvdimmDsmIn.
> +     *        The value read from it is the buffer size of DSM output
> +     *        filled by QEMU.
> +     */
> +    field = aml_field("NPIO", AML_DWORD_ACC, AML_PRESERVE);
> +    BUILD_FIELD_UNIT_SIZE(field, sizeof(uint32_t), "NOTI");
> +    aml_append(dev, field);
> +
> +    /*
> +     * DSM input:
> +     * @HDLE: store device's handle, it's zero if the _DSM call happens
> +     *        on NVDIMM Root Device.
> +     * @REVS: store the Arg1 of _DSM call.
> +     * @FUNC: store the Arg2 of _DSM call.
> +     * @ARG3: store the Arg3 of _DSM call.
> +     *
> +     * They are RAM mapping on host so that these accesses never cause
> +     * VM-EXIT.
> +     */
> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, handle, "HDLE");
> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, revision, "REVS");
> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, function, "FUNC");
> +    BUILD_FIELD_UNIT_SIZE(field, page_size - offsetof(NvdimmDsmIn, arg3),
> +                          "ARG3");
These macros don't make code any better and one has to jump to their
definition every time one sees it to figure out what it's doing.
Please don't hide code behind macros and just replace them with aml_foo()
here and at other places in this patch. 

> +    aml_append(dev, field);
> +
> +    /*
> +     * DSM output:
> +     * @ODAT: the buffer QEMU uses to store the result, the actual size
> +     *        filled by QEMU is the value read from NOT1.
> +     *
> +     * Since the page is reused by both input and out, the input data
> +     * will be lost after storing new result into @ODAT.
> +    */
> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
> +    BUILD_FIELD_UNIT_SIZE(field, page_size, "ODAT");
> +    aml_append(dev, field);
> +
> +    method = aml_method_serialized("NCAL", 4);
> +    {
> +        Aml *buffer_size = aml_local(0);
> +
> +        aml_append(method, aml_store(aml_arg(0), aml_name("HDLE")));
> +        aml_append(method, aml_store(aml_arg(1), aml_name("REVS")));
> +        aml_append(method, aml_store(aml_arg(2), aml_name("FUNC")));
> +
> +        /*
> +         * transfer control to QEMU and the buffer size filled by
> +         * QEMU is returned.
> +         */
> +        aml_append(method, aml_store(aml_name("NOTI"), buffer_size));
> +
> +        aml_append(method, aml_store(aml_shiftleft(buffer_size,
> +                                       aml_int(3)), buffer_size));
> +
> +        aml_append(method, aml_create_field(aml_name("ODAT"), aml_int(0),
> +                                            buffer_size , "OBUF"));
> +        aml_append(method, aml_concatenate(aml_buffer(0, NULL),
> +                                           aml_name("OBUF"), aml_local(1)));
> +        aml_append(method, aml_return(aml_local(1)));
> +    }
> +    aml_append(dev, method);
> +
> +    BUILD_STA_METHOD(dev, method);
> +
> +    /*
> +     * Chapter 3: _DSM Interface for NVDIMM Root Device - Example in DSM
> +     * Spec Rev1.
> +     */
> +    BUILD_DSM_METHOD(dev, method,
> +                     0 /* 0 is reserved for NVDIMM Root Device*/,
> +                     "2F10E7A4-9E91-11E4-89D3-123B93F75CBA"
> +                     /* UUID for NVDIMM Root Devices. */);
> +
> +    build_nvdimm_devices(device_list, dev);
> +
> +    aml_append(sb_scope, dev);
> +}
> +
> +static void nvdimm_build_ssdt(GSList *device_list, GArray *table_offsets,
> +                              GArray *table_data, GArray *linker)
> +{
> +    Aml *ssdt, *sb_scope;
> +
> +    acpi_add_table(table_offsets, table_data);
> +
> +    ssdt = init_aml_allocator();
> +    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
> +
> +    sb_scope = aml_scope("\\_SB");
> +    nvdimm_build_acpi_devices(device_list, sb_scope);
is there need for dedicated nvdimm_build_acpi_devices()?
Is it reused somewhere else?
If it's not then just inline it here.

> +
> +    aml_append(ssdt, sb_scope);
> +    /* copy AML table into ACPI tables blob and patch header there */
> +    g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
> +    build_header(linker, table_data,
> +        (void *)(table_data->data + table_data->len - ssdt->buf->len),
> +        "SSDT", ssdt->buf->len, 1);
It's not ok to have several SSDT tables with exact same signature.
how about extending build_header(..., oem_table_id)?
You can set it to NULL to get original behavior but provide NVDIMM
specific id for this table. for example "NVDIMM"

> +    free_aml_allocator();
> +}
> +
>  void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
>                         GArray *linker)
>  {
> @@ -414,5 +597,6 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
>      }
>  
>      nvdimm_build_nfit(device_list, table_offsets, table_data, linker);
> +    nvdimm_build_ssdt(device_list, table_offsets, table_data, linker);
>      g_slist_free(device_list);
>  }


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 27/35] nvdimm acpi: build ACPI nvdimm devices
@ 2015-11-03 13:13     ` Igor Mammedov
  0 siblings, 0 replies; 200+ messages in thread
From: Igor Mammedov @ 2015-11-03 13:13 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth

On Mon,  2 Nov 2015 17:13:29 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices
> 
> There is a root device under \_SB and specified NVDIMM devices are under the
> root device. Each NVDIMM device has _ADR which returns its handle used to
> associate MEMDEV structure in NFIT
> 
> We reserve handle 0 for root device. In this patch, we save handle, handle,
> arg1 and arg2 to dsm memory. Arg3 is conditionally saved in later patch
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  hw/acpi/nvdimm.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 184 insertions(+)
> 
> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> index dd84e5f..53ed675 100644
> --- a/hw/acpi/nvdimm.c
> +++ b/hw/acpi/nvdimm.c
> @@ -368,6 +368,15 @@ static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
>      g_array_free(structures, true);
>  }
>  
> +struct NvdimmDsmIn {
> +    uint32_t handle;
> +    uint32_t revision;
> +    uint32_t function;
> +   /* the remaining size in the page is used by arg3. */
> +    uint8_t arg3[0];
> +} QEMU_PACKED;
> +typedef struct NvdimmDsmIn NvdimmDsmIn;
> +
>  static uint64_t
>  nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
>  {
> @@ -377,6 +386,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
>  static void
>  nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
>  {
> +    fprintf(stderr, "BUG: we never write DSM notification IO Port.\n");
it doesn't seem like this hunk belongs here

>  }
>  
>  static const MemoryRegionOps nvdimm_dsm_ops = {
> @@ -402,6 +412,179 @@ void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
>      memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
>  }
>  
> +#define BUILD_STA_METHOD(_dev_, _method_)                                  \
> +    do {                                                                   \
> +        _method_ = aml_method("_STA", 0);                                  \
> +        aml_append(_method_, aml_return(aml_int(0x0f)));                   \
> +        aml_append(_dev_, _method_);                                       \
> +    } while (0)
_STA doesn't have any logic here so drop macro and just
replace its call sites with:

aml_append(foo_dev, aml_name_decl("_STA", aml_int(0xf));


> +
> +#define BUILD_DSM_METHOD(_dev_, _method_, _handle_, _uuid_)                \
> +    do {                                                                   \
> +        Aml *ifctx, *uuid;                                                 \
> +        _method_ = aml_method("_DSM", 4);                                  \
> +        /* check UUID if it is we expect, return the errorcode if not.*/   \
> +        uuid = aml_touuid(_uuid_);                                         \
> +        ifctx = aml_if(aml_lnot(aml_equal(aml_arg(0), uuid)));             \
> +        aml_append(ifctx, aml_return(aml_int(1 /* Not Supported */)));     \
> +        aml_append(method, ifctx);                                         \
> +        aml_append(method, aml_return(aml_call4("NCAL", aml_int(_handle_), \
> +                   aml_arg(1), aml_arg(2), aml_arg(3))));                  \
> +        aml_append(_dev_, _method_);                                       \
> +    } while (0)
> +
> +#define BUILD_FIELD_UNIT_SIZE(_field_, _byte_, _name_)                     \
> +    aml_append(_field_, aml_named_field(_name_, (_byte_) * BITS_PER_BYTE))
> +
> +#define BUILD_FIELD_UNIT_STRUCT(_field_, _s_, _f_, _name_)                 \
> +    BUILD_FIELD_UNIT_SIZE(_field_, sizeof(typeof_field(_s_, _f_)), _name_)
> +
> +static void build_nvdimm_devices(GSList *device_list, Aml *root_dev)
> +{
> +    for (; device_list; device_list = device_list->next) {
> +        NVDIMMDevice *nvdimm = device_list->data;
> +        int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
> +                                           NULL);
> +        uint32_t handle = nvdimm_slot_to_handle(slot);
> +        Aml *dev, *method;
> +
> +        dev = aml_device("NV%02X", slot);
> +        aml_append(dev, aml_name_decl("_ADR", aml_int(handle)));
> +
> +        BUILD_STA_METHOD(dev, method);
> +
> +        /*
> +         * Chapter 4: _DSM Interface for NVDIMM Device (non-root) - Example
> +         * in DSM Spec Rev1.
> +         */
> +        BUILD_DSM_METHOD(dev, method,
> +                         handle /* NVDIMM Device Handle */,
> +                         "4309AC30-0D11-11E4-9191-0800200C9A66"
> +                         /* UUID for NVDIMM Devices. */);
this will add N-bytes * #NVDIMMS in worst case.
Please drop macro and just consolidate this method into _DSM method of parent scope
and then call it from here like this:
   Method(_DSM, 4)
       Return(^_DSM(Arg[0-3]))

> +
> +        aml_append(root_dev, dev);
> +    }
> +}
> +
> +static void nvdimm_build_acpi_devices(GSList *device_list, Aml *sb_scope)
> +{
> +    Aml *dev, *method, *field;
> +    uint64_t page_size = TARGET_PAGE_SIZE;
> +
> +    dev = aml_device("NVDR");
> +    aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
> +
> +    /* map DSM memory and IO into ACPI namespace. */
> +    aml_append(dev, aml_operation_region("NPIO", AML_SYSTEM_IO,
> +               NVDIMM_ACPI_IO_BASE, NVDIMM_ACPI_IO_LEN));
> +    aml_append(dev, aml_operation_region("NRAM", AML_SYSTEM_MEMORY,
> +               NVDIMM_ACPI_MEM_BASE, page_size));
> +
> +    /*
> +     * DSM notifier:
> +     * @NOTI: Read it will notify QEMU that _DSM method is being
> +     *        called and the parameters can be found in NvdimmDsmIn.
> +     *        The value read from it is the buffer size of DSM output
> +     *        filled by QEMU.
> +     */
> +    field = aml_field("NPIO", AML_DWORD_ACC, AML_PRESERVE);
> +    BUILD_FIELD_UNIT_SIZE(field, sizeof(uint32_t), "NOTI");
> +    aml_append(dev, field);
> +
> +    /*
> +     * DSM input:
> +     * @HDLE: store device's handle, it's zero if the _DSM call happens
> +     *        on NVDIMM Root Device.
> +     * @REVS: store the Arg1 of _DSM call.
> +     * @FUNC: store the Arg2 of _DSM call.
> +     * @ARG3: store the Arg3 of _DSM call.
> +     *
> +     * They are RAM mapping on host so that these accesses never cause
> +     * VM-EXIT.
> +     */
> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, handle, "HDLE");
> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, revision, "REVS");
> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, function, "FUNC");
> +    BUILD_FIELD_UNIT_SIZE(field, page_size - offsetof(NvdimmDsmIn, arg3),
> +                          "ARG3");
These macros don't make code any better and one has to jump to their
definition every time one sees it to figure out what it's doing.
Please don't hide code behind macros and just replace them with aml_foo()
here and at other places in this patch. 

> +    aml_append(dev, field);
> +
> +    /*
> +     * DSM output:
> +     * @ODAT: the buffer QEMU uses to store the result, the actual size
> +     *        filled by QEMU is the value read from NOT1.
> +     *
> +     * Since the page is reused by both input and out, the input data
> +     * will be lost after storing new result into @ODAT.
> +    */
> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
> +    BUILD_FIELD_UNIT_SIZE(field, page_size, "ODAT");
> +    aml_append(dev, field);
> +
> +    method = aml_method_serialized("NCAL", 4);
> +    {
> +        Aml *buffer_size = aml_local(0);
> +
> +        aml_append(method, aml_store(aml_arg(0), aml_name("HDLE")));
> +        aml_append(method, aml_store(aml_arg(1), aml_name("REVS")));
> +        aml_append(method, aml_store(aml_arg(2), aml_name("FUNC")));
> +
> +        /*
> +         * transfer control to QEMU and the buffer size filled by
> +         * QEMU is returned.
> +         */
> +        aml_append(method, aml_store(aml_name("NOTI"), buffer_size));
> +
> +        aml_append(method, aml_store(aml_shiftleft(buffer_size,
> +                                       aml_int(3)), buffer_size));
> +
> +        aml_append(method, aml_create_field(aml_name("ODAT"), aml_int(0),
> +                                            buffer_size , "OBUF"));
> +        aml_append(method, aml_concatenate(aml_buffer(0, NULL),
> +                                           aml_name("OBUF"), aml_local(1)));
> +        aml_append(method, aml_return(aml_local(1)));
> +    }
> +    aml_append(dev, method);
> +
> +    BUILD_STA_METHOD(dev, method);
> +
> +    /*
> +     * Chapter 3: _DSM Interface for NVDIMM Root Device - Example in DSM
> +     * Spec Rev1.
> +     */
> +    BUILD_DSM_METHOD(dev, method,
> +                     0 /* 0 is reserved for NVDIMM Root Device*/,
> +                     "2F10E7A4-9E91-11E4-89D3-123B93F75CBA"
> +                     /* UUID for NVDIMM Root Devices. */);
> +
> +    build_nvdimm_devices(device_list, dev);
> +
> +    aml_append(sb_scope, dev);
> +}
> +
> +static void nvdimm_build_ssdt(GSList *device_list, GArray *table_offsets,
> +                              GArray *table_data, GArray *linker)
> +{
> +    Aml *ssdt, *sb_scope;
> +
> +    acpi_add_table(table_offsets, table_data);
> +
> +    ssdt = init_aml_allocator();
> +    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
> +
> +    sb_scope = aml_scope("\\_SB");
> +    nvdimm_build_acpi_devices(device_list, sb_scope);
is there need for dedicated nvdimm_build_acpi_devices()?
Is it reused somewhere else?
If it's not then just inline it here.

> +
> +    aml_append(ssdt, sb_scope);
> +    /* copy AML table into ACPI tables blob and patch header there */
> +    g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
> +    build_header(linker, table_data,
> +        (void *)(table_data->data + table_data->len - ssdt->buf->len),
> +        "SSDT", ssdt->buf->len, 1);
It's not ok to have several SSDT tables with exact same signature.
how about extending build_header(..., oem_table_id)?
You can set it to NULL to get original behavior but provide NVDIMM
specific id for this table. for example "NVDIMM"

> +    free_aml_allocator();
> +}
> +
>  void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
>                         GArray *linker)
>  {
> @@ -414,5 +597,6 @@ void nvdimm_build_acpi(GArray *table_offsets, GArray *table_data,
>      }
>  
>      nvdimm_build_nfit(device_list, table_offsets, table_data, linker);
> +    nvdimm_build_ssdt(device_list, table_offsets, table_data, linker);
>      g_slist_free(device_list);
>  }

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 06/35] acpi: add aml_method_serialized
  2015-11-03 12:30     ` Igor Mammedov
@ 2015-11-03 13:27       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-03 13:27 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: pbonzini, vsementsov, ehabkost, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, dan.j.williams, rth



On 11/03/2015 08:30 PM, Igor Mammedov wrote:
> On Mon,  2 Nov 2015 17:13:08 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> It avoid explicit Mutex and will be used by NVDIMM ACPI
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   hw/acpi/aml-build.c         | 26 ++++++++++++++++++++++++--
>>   include/hw/acpi/aml-build.h |  1 +
>>   2 files changed, 25 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> index 9f792ab..8bee8b2 100644
>> --- a/hw/acpi/aml-build.c
>> +++ b/hw/acpi/aml-build.c
>> @@ -696,14 +696,36 @@ Aml *aml_while(Aml *predicate)
>>   }
>>
>>   /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefMethod */
>> -Aml *aml_method(const char *name, int arg_count)
>> +static Aml *__aml_method(const char *name, int arg_count, bool serialized)
> We don't have many users of aml_method() yet, so I'd prefer to have a single
> vs multiple function call:
>
> I suggest to do something like:
> typedef enum {
>      AML_NONSERIALIZED = 0,
>      AML_SERIALIZED = 1,
> } AmlSerializeRule;
>
> aml_method(const char *name, AmlSerializeRule rule, int synclevel);
>
> with current users fixed up with AML_NONSERIALIZED argument.

Okay. It looks good to me, will follow it.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 06/35] acpi: add aml_method_serialized
@ 2015-11-03 13:27       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-03 13:27 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth



On 11/03/2015 08:30 PM, Igor Mammedov wrote:
> On Mon,  2 Nov 2015 17:13:08 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> It avoid explicit Mutex and will be used by NVDIMM ACPI
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   hw/acpi/aml-build.c         | 26 ++++++++++++++++++++++++--
>>   include/hw/acpi/aml-build.h |  1 +
>>   2 files changed, 25 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> index 9f792ab..8bee8b2 100644
>> --- a/hw/acpi/aml-build.c
>> +++ b/hw/acpi/aml-build.c
>> @@ -696,14 +696,36 @@ Aml *aml_while(Aml *predicate)
>>   }
>>
>>   /* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefMethod */
>> -Aml *aml_method(const char *name, int arg_count)
>> +static Aml *__aml_method(const char *name, int arg_count, bool serialized)
> We don't have many users of aml_method() yet, so I'd prefer to have a single
> vs multiple function call:
>
> I suggest to do something like:
> typedef enum {
>      AML_NONSERIALIZED = 0,
>      AML_SERIALIZED = 1,
> } AmlSerializeRule;
>
> aml_method(const char *name, AmlSerializeRule rule, int synclevel);
>
> with current users fixed up with AML_NONSERIALIZED argument.

Okay. It looks good to me, will follow it.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
  2015-11-03 12:34     ` [Qemu-devel] " Igor Mammedov
@ 2015-11-03 13:32       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-03 13:32 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: pbonzini, gleb, mtosatti, stefanha, mst, rth, ehabkost,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake



On 11/03/2015 08:34 PM, Igor Mammedov wrote:
> On Mon,  2 Nov 2015 17:13:11 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> Currently, file_ram_alloc() only works on directory - it creates a file
>> under @path and do mmap on it
>>
>> This patch tries to allow it to work on file directly, if @path is a
>> directory it works as before, otherwise it treats @path as the target
>> file then directly allocate memory from it
> Paolo has just queued
> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg06513.html
> perhaps that's what you can reuse here.

Yep, Paolo has told me about that, i will update this patchset after his
pull request.

BTW, which tree should this patchset be based on in future development?
Paolo's or Michael's or even upstream qemu tree?

Thanks!

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
@ 2015-11-03 13:32       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-03 13:32 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth



On 11/03/2015 08:34 PM, Igor Mammedov wrote:
> On Mon,  2 Nov 2015 17:13:11 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> Currently, file_ram_alloc() only works on directory - it creates a file
>> under @path and do mmap on it
>>
>> This patch tries to allow it to work on file directly, if @path is a
>> directory it works as before, otherwise it treats @path as the target
>> file then directly allocate memory from it
> Paolo has just queued
> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg06513.html
> perhaps that's what you can reuse here.

Yep, Paolo has told me about that, i will update this patchset after his
pull request.

BTW, which tree should this patchset be based on in future development?
Paolo's or Michael's or even upstream qemu tree?

Thanks!

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
  2015-11-03  3:56       ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-03 13:55         ` Paolo Bonzini
  -1 siblings, 0 replies; 200+ messages in thread
From: Paolo Bonzini @ 2015-11-03 13:55 UTC (permalink / raw)
  To: Xiao Guangrong, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, vsementsov, eblake



On 03/11/2015 04:56, Xiao Guangrong wrote:
> 
> 
> On 11/03/2015 05:12 AM, Paolo Bonzini wrote:
>>
>>
>> On 02/11/2015 10:13, Xiao Guangrong wrote:
>>> Currently, file_ram_alloc() only works on directory - it creates a file
>>> under @path and do mmap on it
>>>
>>> This patch tries to allow it to work on file directly, if @path is a
>>> directory it works as before, otherwise it treats @path as the target
>>> file then directly allocate memory from it
>>>
>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>> ---
>>>   exec.c | 80
>>> ++++++++++++++++++++++++++++++++++++++++++------------------------
>>>   1 file changed, 51 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/exec.c b/exec.c
>>> index 9075f4d..db0fdaf 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>>>   }
>>>
>>>   #ifdef __linux__
>>> +static bool path_is_dir(const char *path)
>>> +{
>>> +    struct stat fs;
>>> +
>>> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
>>> +}
>>> +
>>> +static int open_ram_file_path(RAMBlock *block, const char *path,
>>> size_t size)
>>> +{
>>> +    char *filename;
>>> +    char *sanitized_name;
>>> +    char *c;
>>> +    int fd;
>>> +
>>> +    if (!path_is_dir(path)) {
>>> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
>>> +
>>> +        flags |= O_EXCL;
>>> +        return open(path, flags);
>>> +    }
>>> +
>>> +    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>>> +    sanitized_name = g_strdup(memory_region_name(block->mr));
>>> +    for (c = sanitized_name; *c != '\0'; c++) {
>>> +        if (*c == '/') {
>>> +            *c = '_';
>>> +        }
>>> +    }
>>> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>>> +                               sanitized_name);
>>> +    g_free(sanitized_name);
>>> +    fd = mkstemp(filename);
>>> +    if (fd >= 0) {
>>> +        unlink(filename);
>>> +        /*
>>> +         * ftruncate is not supported by hugetlbfs in older
>>> +         * hosts, so don't bother bailing out on errors.
>>> +         * If anything goes wrong with it under other filesystems,
>>> +         * mmap will fail.
>>> +         */
>>> +        if (ftruncate(fd, size)) {
>>> +            perror("ftruncate");
>>> +        }
>>> +    }
>>> +    g_free(filename);
>>> +
>>> +    return fd;
>>> +}
>>> +
>>>   static void *file_ram_alloc(RAMBlock *block,
>>>                               ram_addr_t memory,
>>>                               const char *path,
>>>                               Error **errp)
>>>   {
>>> -    char *filename;
>>> -    char *sanitized_name;
>>> -    char *c;
>>>       void *area;
>>>       int fd;
>>>       uint64_t pagesize;
>>> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>>>           goto error;
>>>       }
>>>
>>> -    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>>> -    sanitized_name = g_strdup(memory_region_name(block->mr));
>>> -    for (c = sanitized_name; *c != '\0'; c++) {
>>> -        if (*c == '/')
>>> -            *c = '_';
>>> -    }
>>> -
>>> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>>> -                               sanitized_name);
>>> -    g_free(sanitized_name);
>>> +    memory = ROUND_UP(memory, pagesize);
>>>
>>> -    fd = mkstemp(filename);
>>> +    fd = open_ram_file_path(block, path, memory);
>>>       if (fd < 0) {
>>>           error_setg_errno(errp, errno,
>>>                            "unable to create backing store for path
>>> %s", path);
>>> -        g_free(filename);
>>>           goto error;
>>>       }
>>> -    unlink(filename);
>>> -    g_free(filename);
>>> -
>>> -    memory = ROUND_UP(memory, pagesize);
>>> -
>>> -    /*
>>> -     * ftruncate is not supported by hugetlbfs in older
>>> -     * hosts, so don't bother bailing out on errors.
>>> -     * If anything goes wrong with it under other filesystems,
>>> -     * mmap will fail.
>>> -     */
>>> -    if (ftruncate(fd, memory)) {
>>> -        perror("ftruncate");
>>> -    }
>>>
>>>       area = qemu_ram_mmap(fd, memory, pagesize, block->flags &
>>> RAM_SHARED);
>>>       if (area == MAP_FAILED) {
>>>
>>
>> I was going to send tomorrow a pull request for a similar patch,
>> "backends/hostmem-file: Allow to specify full pathname for backing file".
>>
>> The main difference seems to be your usage of O_EXCL.  Can you explain
>> why you added it?
> 
> It' used if we pass a block device as a NVDIMM backend memory:
>  O_EXCL can be used without O_CREAT if pathname refers to a block
> device.  If the block device
>  is in use by the system (e.g., mounted), open() fails with the error EBUSY

That makes sense, but I think it's better to be consistent with the
handling of block devices.  Block devices do not use O_EXCL when QEMU
opens them; I guess in principle it would also be possible to share a
single pmem backend between multiple guests.

Paolo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
@ 2015-11-03 13:55         ` Paolo Bonzini
  0 siblings, 0 replies; 200+ messages in thread
From: Paolo Bonzini @ 2015-11-03 13:55 UTC (permalink / raw)
  To: Xiao Guangrong, imammedo
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, dan.j.williams, rth



On 03/11/2015 04:56, Xiao Guangrong wrote:
> 
> 
> On 11/03/2015 05:12 AM, Paolo Bonzini wrote:
>>
>>
>> On 02/11/2015 10:13, Xiao Guangrong wrote:
>>> Currently, file_ram_alloc() only works on directory - it creates a file
>>> under @path and do mmap on it
>>>
>>> This patch tries to allow it to work on file directly, if @path is a
>>> directory it works as before, otherwise it treats @path as the target
>>> file then directly allocate memory from it
>>>
>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>> ---
>>>   exec.c | 80
>>> ++++++++++++++++++++++++++++++++++++++++++------------------------
>>>   1 file changed, 51 insertions(+), 29 deletions(-)
>>>
>>> diff --git a/exec.c b/exec.c
>>> index 9075f4d..db0fdaf 100644
>>> --- a/exec.c
>>> +++ b/exec.c
>>> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>>>   }
>>>
>>>   #ifdef __linux__
>>> +static bool path_is_dir(const char *path)
>>> +{
>>> +    struct stat fs;
>>> +
>>> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
>>> +}
>>> +
>>> +static int open_ram_file_path(RAMBlock *block, const char *path,
>>> size_t size)
>>> +{
>>> +    char *filename;
>>> +    char *sanitized_name;
>>> +    char *c;
>>> +    int fd;
>>> +
>>> +    if (!path_is_dir(path)) {
>>> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
>>> +
>>> +        flags |= O_EXCL;
>>> +        return open(path, flags);
>>> +    }
>>> +
>>> +    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>>> +    sanitized_name = g_strdup(memory_region_name(block->mr));
>>> +    for (c = sanitized_name; *c != '\0'; c++) {
>>> +        if (*c == '/') {
>>> +            *c = '_';
>>> +        }
>>> +    }
>>> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>>> +                               sanitized_name);
>>> +    g_free(sanitized_name);
>>> +    fd = mkstemp(filename);
>>> +    if (fd >= 0) {
>>> +        unlink(filename);
>>> +        /*
>>> +         * ftruncate is not supported by hugetlbfs in older
>>> +         * hosts, so don't bother bailing out on errors.
>>> +         * If anything goes wrong with it under other filesystems,
>>> +         * mmap will fail.
>>> +         */
>>> +        if (ftruncate(fd, size)) {
>>> +            perror("ftruncate");
>>> +        }
>>> +    }
>>> +    g_free(filename);
>>> +
>>> +    return fd;
>>> +}
>>> +
>>>   static void *file_ram_alloc(RAMBlock *block,
>>>                               ram_addr_t memory,
>>>                               const char *path,
>>>                               Error **errp)
>>>   {
>>> -    char *filename;
>>> -    char *sanitized_name;
>>> -    char *c;
>>>       void *area;
>>>       int fd;
>>>       uint64_t pagesize;
>>> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>>>           goto error;
>>>       }
>>>
>>> -    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>>> -    sanitized_name = g_strdup(memory_region_name(block->mr));
>>> -    for (c = sanitized_name; *c != '\0'; c++) {
>>> -        if (*c == '/')
>>> -            *c = '_';
>>> -    }
>>> -
>>> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>>> -                               sanitized_name);
>>> -    g_free(sanitized_name);
>>> +    memory = ROUND_UP(memory, pagesize);
>>>
>>> -    fd = mkstemp(filename);
>>> +    fd = open_ram_file_path(block, path, memory);
>>>       if (fd < 0) {
>>>           error_setg_errno(errp, errno,
>>>                            "unable to create backing store for path
>>> %s", path);
>>> -        g_free(filename);
>>>           goto error;
>>>       }
>>> -    unlink(filename);
>>> -    g_free(filename);
>>> -
>>> -    memory = ROUND_UP(memory, pagesize);
>>> -
>>> -    /*
>>> -     * ftruncate is not supported by hugetlbfs in older
>>> -     * hosts, so don't bother bailing out on errors.
>>> -     * If anything goes wrong with it under other filesystems,
>>> -     * mmap will fail.
>>> -     */
>>> -    if (ftruncate(fd, memory)) {
>>> -        perror("ftruncate");
>>> -    }
>>>
>>>       area = qemu_ram_mmap(fd, memory, pagesize, block->flags &
>>> RAM_SHARED);
>>>       if (area == MAP_FAILED) {
>>>
>>
>> I was going to send tomorrow a pull request for a similar patch,
>> "backends/hostmem-file: Allow to specify full pathname for backing file".
>>
>> The main difference seems to be your usage of O_EXCL.  Can you explain
>> why you added it?
> 
> It' used if we pass a block device as a NVDIMM backend memory:
>  O_EXCL can be used without O_CREAT if pathname refers to a block
> device.  If the block device
>  is in use by the system (e.g., mounted), open() fails with the error EBUSY

That makes sense, but I think it's better to be consistent with the
handling of block devices.  Block devices do not use O_EXCL when QEMU
opens them; I guess in principle it would also be possible to share a
single pmem backend between multiple guests.

Paolo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 27/35] nvdimm acpi: build ACPI nvdimm devices
  2015-11-03 13:13     ` [Qemu-devel] " Igor Mammedov
@ 2015-11-03 14:22       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-03 14:22 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: pbonzini, gleb, mtosatti, stefanha, mst, rth, ehabkost,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake



On 11/03/2015 09:13 PM, Igor Mammedov wrote:
> On Mon,  2 Nov 2015 17:13:29 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices
>>
>> There is a root device under \_SB and specified NVDIMM devices are under the
>> root device. Each NVDIMM device has _ADR which returns its handle used to
>> associate MEMDEV structure in NFIT
>>
>> We reserve handle 0 for root device. In this patch, we save handle, handle,
>> arg1 and arg2 to dsm memory. Arg3 is conditionally saved in later patch
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   hw/acpi/nvdimm.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 184 insertions(+)
>>
>> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
>> index dd84e5f..53ed675 100644
>> --- a/hw/acpi/nvdimm.c
>> +++ b/hw/acpi/nvdimm.c
>> @@ -368,6 +368,15 @@ static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
>>       g_array_free(structures, true);
>>   }
>>
>> +struct NvdimmDsmIn {
>> +    uint32_t handle;
>> +    uint32_t revision;
>> +    uint32_t function;
>> +   /* the remaining size in the page is used by arg3. */
>> +    uint8_t arg3[0];
>> +} QEMU_PACKED;
>> +typedef struct NvdimmDsmIn NvdimmDsmIn;
>> +
>>   static uint64_t
>>   nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
>>   {
>> @@ -377,6 +386,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
>>   static void
>>   nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
>>   {
>> +    fprintf(stderr, "BUG: we never write DSM notification IO Port.\n");
> it doesn't seem like this hunk belongs here

Er, we have changed the logic:
- others:
   1) the buffer length is directly got from IO read rather than got
      from dsm memory
[ This has documented in v5's changelog. ]

So, the IO write is replaced by IO read, nvdimm_dsm_write() should not be
triggered.

>
>>   }
>>
>>   static const MemoryRegionOps nvdimm_dsm_ops = {
>> @@ -402,6 +412,179 @@ void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
>>       memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
>>   }
>>
>> +#define BUILD_STA_METHOD(_dev_, _method_)                                  \
>> +    do {                                                                   \
>> +        _method_ = aml_method("_STA", 0);                                  \
>> +        aml_append(_method_, aml_return(aml_int(0x0f)));                   \
>> +        aml_append(_dev_, _method_);                                       \
>> +    } while (0)
> _STA doesn't have any logic here so drop macro and just
> replace its call sites with:

Okay, I was just wanting to save some code lines. I will drop this macro.

>
> aml_append(foo_dev, aml_name_decl("_STA", aml_int(0xf));

_STA is required as a method with zero argument but this statement just
define a object. It is okay?

>
>
>> +
>> +#define BUILD_DSM_METHOD(_dev_, _method_, _handle_, _uuid_)                \
>> +    do {                                                                   \
>> +        Aml *ifctx, *uuid;                                                 \
>> +        _method_ = aml_method("_DSM", 4);                                  \
>> +        /* check UUID if it is we expect, return the errorcode if not.*/   \
>> +        uuid = aml_touuid(_uuid_);                                         \
>> +        ifctx = aml_if(aml_lnot(aml_equal(aml_arg(0), uuid)));             \
>> +        aml_append(ifctx, aml_return(aml_int(1 /* Not Supported */)));     \
>> +        aml_append(method, ifctx);                                         \
>> +        aml_append(method, aml_return(aml_call4("NCAL", aml_int(_handle_), \
>> +                   aml_arg(1), aml_arg(2), aml_arg(3))));                  \
>> +        aml_append(_dev_, _method_);                                       \
>> +    } while (0)
>> +
>> +#define BUILD_FIELD_UNIT_SIZE(_field_, _byte_, _name_)                     \
>> +    aml_append(_field_, aml_named_field(_name_, (_byte_) * BITS_PER_BYTE))
>> +
>> +#define BUILD_FIELD_UNIT_STRUCT(_field_, _s_, _f_, _name_)                 \
>> +    BUILD_FIELD_UNIT_SIZE(_field_, sizeof(typeof_field(_s_, _f_)), _name_)
>> +
>> +static void build_nvdimm_devices(GSList *device_list, Aml *root_dev)
>> +{
>> +    for (; device_list; device_list = device_list->next) {
>> +        NVDIMMDevice *nvdimm = device_list->data;
>> +        int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
>> +                                           NULL);
>> +        uint32_t handle = nvdimm_slot_to_handle(slot);
>> +        Aml *dev, *method;
>> +
>> +        dev = aml_device("NV%02X", slot);
>> +        aml_append(dev, aml_name_decl("_ADR", aml_int(handle)));
>> +
>> +        BUILD_STA_METHOD(dev, method);
>> +
>> +        /*
>> +         * Chapter 4: _DSM Interface for NVDIMM Device (non-root) - Example
>> +         * in DSM Spec Rev1.
>> +         */
>> +        BUILD_DSM_METHOD(dev, method,
>> +                         handle /* NVDIMM Device Handle */,
>> +                         "4309AC30-0D11-11E4-9191-0800200C9A66"
>> +                         /* UUID for NVDIMM Devices. */);
> this will add N-bytes * #NVDIMMS in worst case.
> Please drop macro and just consolidate this method into _DSM method of parent scope
> and then call it from here like this:
>     Method(_DSM, 4)
>         Return(^_DSM(Arg[0-3]))

Parent _DSM can not be directly called as _DSM in parent requires different UUID.
UUID is not saved in dsm memory so that UUID verification should be done in AML.

This macro, BUILD_DSM_METHOD(), build its _DSM call and check if UUID is valid, if
not, it returns error code 1 (Not Supoorted), otherwise it call the common method
NCAL which saves input parameters into dsm memory and trigger IO exit. It seems no
byte is wasted. No?

>
>> +
>> +        aml_append(root_dev, dev);
>> +    }
>> +}
>> +
>> +static void nvdimm_build_acpi_devices(GSList *device_list, Aml *sb_scope)
>> +{
>> +    Aml *dev, *method, *field;
>> +    uint64_t page_size = TARGET_PAGE_SIZE;
>> +
>> +    dev = aml_device("NVDR");
>> +    aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
>> +
>> +    /* map DSM memory and IO into ACPI namespace. */
>> +    aml_append(dev, aml_operation_region("NPIO", AML_SYSTEM_IO,
>> +               NVDIMM_ACPI_IO_BASE, NVDIMM_ACPI_IO_LEN));
>> +    aml_append(dev, aml_operation_region("NRAM", AML_SYSTEM_MEMORY,
>> +               NVDIMM_ACPI_MEM_BASE, page_size));
>> +
>> +    /*
>> +     * DSM notifier:
>> +     * @NOTI: Read it will notify QEMU that _DSM method is being
>> +     *        called and the parameters can be found in NvdimmDsmIn.
>> +     *        The value read from it is the buffer size of DSM output
>> +     *        filled by QEMU.
>> +     */
>> +    field = aml_field("NPIO", AML_DWORD_ACC, AML_PRESERVE);
>> +    BUILD_FIELD_UNIT_SIZE(field, sizeof(uint32_t), "NOTI");
>> +    aml_append(dev, field);
>> +
>> +    /*
>> +     * DSM input:
>> +     * @HDLE: store device's handle, it's zero if the _DSM call happens
>> +     *        on NVDIMM Root Device.
>> +     * @REVS: store the Arg1 of _DSM call.
>> +     * @FUNC: store the Arg2 of _DSM call.
>> +     * @ARG3: store the Arg3 of _DSM call.
>> +     *
>> +     * They are RAM mapping on host so that these accesses never cause
>> +     * VM-EXIT.
>> +     */
>> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
>> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, handle, "HDLE");
>> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, revision, "REVS");
>> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, function, "FUNC");
>> +    BUILD_FIELD_UNIT_SIZE(field, page_size - offsetof(NvdimmDsmIn, arg3),
>> +                          "ARG3");
> These macros don't make code any better and one has to jump to their
> definition every time one sees it to figure out what it's doing.
> Please don't hide code behind macros and just replace them with aml_foo()
> here and at other places in this patch.
>

Okay, will follow your way. :)

>> +    aml_append(dev, field);
>> +
>> +    /*
>> +     * DSM output:
>> +     * @ODAT: the buffer QEMU uses to store the result, the actual size
>> +     *        filled by QEMU is the value read from NOT1.
>> +     *
>> +     * Since the page is reused by both input and out, the input data
>> +     * will be lost after storing new result into @ODAT.
>> +    */
>> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
>> +    BUILD_FIELD_UNIT_SIZE(field, page_size, "ODAT");
>> +    aml_append(dev, field);
>> +
>> +    method = aml_method_serialized("NCAL", 4);
>> +    {
>> +        Aml *buffer_size = aml_local(0);
>> +
>> +        aml_append(method, aml_store(aml_arg(0), aml_name("HDLE")));
>> +        aml_append(method, aml_store(aml_arg(1), aml_name("REVS")));
>> +        aml_append(method, aml_store(aml_arg(2), aml_name("FUNC")));
>> +
>> +        /*
>> +         * transfer control to QEMU and the buffer size filled by
>> +         * QEMU is returned.
>> +         */
>> +        aml_append(method, aml_store(aml_name("NOTI"), buffer_size));
>> +
>> +        aml_append(method, aml_store(aml_shiftleft(buffer_size,
>> +                                       aml_int(3)), buffer_size));
>> +
>> +        aml_append(method, aml_create_field(aml_name("ODAT"), aml_int(0),
>> +                                            buffer_size , "OBUF"));
>> +        aml_append(method, aml_concatenate(aml_buffer(0, NULL),
>> +                                           aml_name("OBUF"), aml_local(1)));
>> +        aml_append(method, aml_return(aml_local(1)));
>> +    }
>> +    aml_append(dev, method);
>> +
>> +    BUILD_STA_METHOD(dev, method);
>> +
>> +    /*
>> +     * Chapter 3: _DSM Interface for NVDIMM Root Device - Example in DSM
>> +     * Spec Rev1.
>> +     */
>> +    BUILD_DSM_METHOD(dev, method,
>> +                     0 /* 0 is reserved for NVDIMM Root Device*/,
>> +                     "2F10E7A4-9E91-11E4-89D3-123B93F75CBA"
>> +                     /* UUID for NVDIMM Root Devices. */);
>> +
>> +    build_nvdimm_devices(device_list, dev);
>> +
>> +    aml_append(sb_scope, dev);
>> +}
>> +
>> +static void nvdimm_build_ssdt(GSList *device_list, GArray *table_offsets,
>> +                              GArray *table_data, GArray *linker)
>> +{
>> +    Aml *ssdt, *sb_scope;
>> +
>> +    acpi_add_table(table_offsets, table_data);
>> +
>> +    ssdt = init_aml_allocator();
>> +    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
>> +
>> +    sb_scope = aml_scope("\\_SB");
>> +    nvdimm_build_acpi_devices(device_list, sb_scope);
> is there need for dedicated nvdimm_build_acpi_devices()?
> Is it reused somewhere else?
> If it's not then just inline it here.

Since building NVDIMM devices is a complex work so i designed to
let nvdimm_build_acpi_devices() build NVDIMM root device then
it calls build_nvdimm_devices() to build children devices. You
can see nvdimm_build_acpi_devices is a big function.

That proposal just wants to make the code clear. If you really
hate this, i will drop nvdimm_build_acpi_devices, no problem. :)

>
>> +
>> +    aml_append(ssdt, sb_scope);
>> +    /* copy AML table into ACPI tables blob and patch header there */
>> +    g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
>> +    build_header(linker, table_data,
>> +        (void *)(table_data->data + table_data->len - ssdt->buf->len),
>> +        "SSDT", ssdt->buf->len, 1);
> It's not ok to have several SSDT tables with exact same signature.
> how about extending build_header(..., oem_table_id)?
> You can set it to NULL to get original behavior but provide NVDIMM
> specific id for this table. for example "NVDIMM"
>

Ah, i just noticed the ACPI spec says:
| each secondary system description table listed in the RSDT/XSDT with a unique OEM Table ID is
| loaded.
You are right.

Okay, i will extend this function, thanks for your suggestion.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 27/35] nvdimm acpi: build ACPI nvdimm devices
@ 2015-11-03 14:22       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-03 14:22 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth



On 11/03/2015 09:13 PM, Igor Mammedov wrote:
> On Mon,  2 Nov 2015 17:13:29 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices
>>
>> There is a root device under \_SB and specified NVDIMM devices are under the
>> root device. Each NVDIMM device has _ADR which returns its handle used to
>> associate MEMDEV structure in NFIT
>>
>> We reserve handle 0 for root device. In this patch, we save handle, handle,
>> arg1 and arg2 to dsm memory. Arg3 is conditionally saved in later patch
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   hw/acpi/nvdimm.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 184 insertions(+)
>>
>> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
>> index dd84e5f..53ed675 100644
>> --- a/hw/acpi/nvdimm.c
>> +++ b/hw/acpi/nvdimm.c
>> @@ -368,6 +368,15 @@ static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
>>       g_array_free(structures, true);
>>   }
>>
>> +struct NvdimmDsmIn {
>> +    uint32_t handle;
>> +    uint32_t revision;
>> +    uint32_t function;
>> +   /* the remaining size in the page is used by arg3. */
>> +    uint8_t arg3[0];
>> +} QEMU_PACKED;
>> +typedef struct NvdimmDsmIn NvdimmDsmIn;
>> +
>>   static uint64_t
>>   nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
>>   {
>> @@ -377,6 +386,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
>>   static void
>>   nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
>>   {
>> +    fprintf(stderr, "BUG: we never write DSM notification IO Port.\n");
> it doesn't seem like this hunk belongs here

Er, we have changed the logic:
- others:
   1) the buffer length is directly got from IO read rather than got
      from dsm memory
[ This has documented in v5's changelog. ]

So, the IO write is replaced by IO read, nvdimm_dsm_write() should not be
triggered.

>
>>   }
>>
>>   static const MemoryRegionOps nvdimm_dsm_ops = {
>> @@ -402,6 +412,179 @@ void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
>>       memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
>>   }
>>
>> +#define BUILD_STA_METHOD(_dev_, _method_)                                  \
>> +    do {                                                                   \
>> +        _method_ = aml_method("_STA", 0);                                  \
>> +        aml_append(_method_, aml_return(aml_int(0x0f)));                   \
>> +        aml_append(_dev_, _method_);                                       \
>> +    } while (0)
> _STA doesn't have any logic here so drop macro and just
> replace its call sites with:

Okay, I was just wanting to save some code lines. I will drop this macro.

>
> aml_append(foo_dev, aml_name_decl("_STA", aml_int(0xf));

_STA is required as a method with zero argument but this statement just
define a object. It is okay?

>
>
>> +
>> +#define BUILD_DSM_METHOD(_dev_, _method_, _handle_, _uuid_)                \
>> +    do {                                                                   \
>> +        Aml *ifctx, *uuid;                                                 \
>> +        _method_ = aml_method("_DSM", 4);                                  \
>> +        /* check UUID if it is we expect, return the errorcode if not.*/   \
>> +        uuid = aml_touuid(_uuid_);                                         \
>> +        ifctx = aml_if(aml_lnot(aml_equal(aml_arg(0), uuid)));             \
>> +        aml_append(ifctx, aml_return(aml_int(1 /* Not Supported */)));     \
>> +        aml_append(method, ifctx);                                         \
>> +        aml_append(method, aml_return(aml_call4("NCAL", aml_int(_handle_), \
>> +                   aml_arg(1), aml_arg(2), aml_arg(3))));                  \
>> +        aml_append(_dev_, _method_);                                       \
>> +    } while (0)
>> +
>> +#define BUILD_FIELD_UNIT_SIZE(_field_, _byte_, _name_)                     \
>> +    aml_append(_field_, aml_named_field(_name_, (_byte_) * BITS_PER_BYTE))
>> +
>> +#define BUILD_FIELD_UNIT_STRUCT(_field_, _s_, _f_, _name_)                 \
>> +    BUILD_FIELD_UNIT_SIZE(_field_, sizeof(typeof_field(_s_, _f_)), _name_)
>> +
>> +static void build_nvdimm_devices(GSList *device_list, Aml *root_dev)
>> +{
>> +    for (; device_list; device_list = device_list->next) {
>> +        NVDIMMDevice *nvdimm = device_list->data;
>> +        int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
>> +                                           NULL);
>> +        uint32_t handle = nvdimm_slot_to_handle(slot);
>> +        Aml *dev, *method;
>> +
>> +        dev = aml_device("NV%02X", slot);
>> +        aml_append(dev, aml_name_decl("_ADR", aml_int(handle)));
>> +
>> +        BUILD_STA_METHOD(dev, method);
>> +
>> +        /*
>> +         * Chapter 4: _DSM Interface for NVDIMM Device (non-root) - Example
>> +         * in DSM Spec Rev1.
>> +         */
>> +        BUILD_DSM_METHOD(dev, method,
>> +                         handle /* NVDIMM Device Handle */,
>> +                         "4309AC30-0D11-11E4-9191-0800200C9A66"
>> +                         /* UUID for NVDIMM Devices. */);
> this will add N-bytes * #NVDIMMS in worst case.
> Please drop macro and just consolidate this method into _DSM method of parent scope
> and then call it from here like this:
>     Method(_DSM, 4)
>         Return(^_DSM(Arg[0-3]))

Parent _DSM can not be directly called as _DSM in parent requires different UUID.
UUID is not saved in dsm memory so that UUID verification should be done in AML.

This macro, BUILD_DSM_METHOD(), build its _DSM call and check if UUID is valid, if
not, it returns error code 1 (Not Supoorted), otherwise it call the common method
NCAL which saves input parameters into dsm memory and trigger IO exit. It seems no
byte is wasted. No?

>
>> +
>> +        aml_append(root_dev, dev);
>> +    }
>> +}
>> +
>> +static void nvdimm_build_acpi_devices(GSList *device_list, Aml *sb_scope)
>> +{
>> +    Aml *dev, *method, *field;
>> +    uint64_t page_size = TARGET_PAGE_SIZE;
>> +
>> +    dev = aml_device("NVDR");
>> +    aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
>> +
>> +    /* map DSM memory and IO into ACPI namespace. */
>> +    aml_append(dev, aml_operation_region("NPIO", AML_SYSTEM_IO,
>> +               NVDIMM_ACPI_IO_BASE, NVDIMM_ACPI_IO_LEN));
>> +    aml_append(dev, aml_operation_region("NRAM", AML_SYSTEM_MEMORY,
>> +               NVDIMM_ACPI_MEM_BASE, page_size));
>> +
>> +    /*
>> +     * DSM notifier:
>> +     * @NOTI: Read it will notify QEMU that _DSM method is being
>> +     *        called and the parameters can be found in NvdimmDsmIn.
>> +     *        The value read from it is the buffer size of DSM output
>> +     *        filled by QEMU.
>> +     */
>> +    field = aml_field("NPIO", AML_DWORD_ACC, AML_PRESERVE);
>> +    BUILD_FIELD_UNIT_SIZE(field, sizeof(uint32_t), "NOTI");
>> +    aml_append(dev, field);
>> +
>> +    /*
>> +     * DSM input:
>> +     * @HDLE: store device's handle, it's zero if the _DSM call happens
>> +     *        on NVDIMM Root Device.
>> +     * @REVS: store the Arg1 of _DSM call.
>> +     * @FUNC: store the Arg2 of _DSM call.
>> +     * @ARG3: store the Arg3 of _DSM call.
>> +     *
>> +     * They are RAM mapping on host so that these accesses never cause
>> +     * VM-EXIT.
>> +     */
>> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
>> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, handle, "HDLE");
>> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, revision, "REVS");
>> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, function, "FUNC");
>> +    BUILD_FIELD_UNIT_SIZE(field, page_size - offsetof(NvdimmDsmIn, arg3),
>> +                          "ARG3");
> These macros don't make code any better and one has to jump to their
> definition every time one sees it to figure out what it's doing.
> Please don't hide code behind macros and just replace them with aml_foo()
> here and at other places in this patch.
>

Okay, will follow your way. :)

>> +    aml_append(dev, field);
>> +
>> +    /*
>> +     * DSM output:
>> +     * @ODAT: the buffer QEMU uses to store the result, the actual size
>> +     *        filled by QEMU is the value read from NOT1.
>> +     *
>> +     * Since the page is reused by both input and out, the input data
>> +     * will be lost after storing new result into @ODAT.
>> +    */
>> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
>> +    BUILD_FIELD_UNIT_SIZE(field, page_size, "ODAT");
>> +    aml_append(dev, field);
>> +
>> +    method = aml_method_serialized("NCAL", 4);
>> +    {
>> +        Aml *buffer_size = aml_local(0);
>> +
>> +        aml_append(method, aml_store(aml_arg(0), aml_name("HDLE")));
>> +        aml_append(method, aml_store(aml_arg(1), aml_name("REVS")));
>> +        aml_append(method, aml_store(aml_arg(2), aml_name("FUNC")));
>> +
>> +        /*
>> +         * transfer control to QEMU and the buffer size filled by
>> +         * QEMU is returned.
>> +         */
>> +        aml_append(method, aml_store(aml_name("NOTI"), buffer_size));
>> +
>> +        aml_append(method, aml_store(aml_shiftleft(buffer_size,
>> +                                       aml_int(3)), buffer_size));
>> +
>> +        aml_append(method, aml_create_field(aml_name("ODAT"), aml_int(0),
>> +                                            buffer_size , "OBUF"));
>> +        aml_append(method, aml_concatenate(aml_buffer(0, NULL),
>> +                                           aml_name("OBUF"), aml_local(1)));
>> +        aml_append(method, aml_return(aml_local(1)));
>> +    }
>> +    aml_append(dev, method);
>> +
>> +    BUILD_STA_METHOD(dev, method);
>> +
>> +    /*
>> +     * Chapter 3: _DSM Interface for NVDIMM Root Device - Example in DSM
>> +     * Spec Rev1.
>> +     */
>> +    BUILD_DSM_METHOD(dev, method,
>> +                     0 /* 0 is reserved for NVDIMM Root Device*/,
>> +                     "2F10E7A4-9E91-11E4-89D3-123B93F75CBA"
>> +                     /* UUID for NVDIMM Root Devices. */);
>> +
>> +    build_nvdimm_devices(device_list, dev);
>> +
>> +    aml_append(sb_scope, dev);
>> +}
>> +
>> +static void nvdimm_build_ssdt(GSList *device_list, GArray *table_offsets,
>> +                              GArray *table_data, GArray *linker)
>> +{
>> +    Aml *ssdt, *sb_scope;
>> +
>> +    acpi_add_table(table_offsets, table_data);
>> +
>> +    ssdt = init_aml_allocator();
>> +    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
>> +
>> +    sb_scope = aml_scope("\\_SB");
>> +    nvdimm_build_acpi_devices(device_list, sb_scope);
> is there need for dedicated nvdimm_build_acpi_devices()?
> Is it reused somewhere else?
> If it's not then just inline it here.

Since building NVDIMM devices is a complex work so i designed to
let nvdimm_build_acpi_devices() build NVDIMM root device then
it calls build_nvdimm_devices() to build children devices. You
can see nvdimm_build_acpi_devices is a big function.

That proposal just wants to make the code clear. If you really
hate this, i will drop nvdimm_build_acpi_devices, no problem. :)

>
>> +
>> +    aml_append(ssdt, sb_scope);
>> +    /* copy AML table into ACPI tables blob and patch header there */
>> +    g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
>> +    build_header(linker, table_data,
>> +        (void *)(table_data->data + table_data->len - ssdt->buf->len),
>> +        "SSDT", ssdt->buf->len, 1);
> It's not ok to have several SSDT tables with exact same signature.
> how about extending build_header(..., oem_table_id)?
> You can set it to NULL to get original behavior but provide NVDIMM
> specific id for this table. for example "NVDIMM"
>

Ah, i just noticed the ACPI spec says:
| each secondary system description table listed in the RSDT/XSDT with a unique OEM Table ID is
| loaded.
You are right.

Okay, i will extend this function, thanks for your suggestion.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
  2015-11-03 13:55         ` [Qemu-devel] " Paolo Bonzini
@ 2015-11-03 14:26           ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-03 14:26 UTC (permalink / raw)
  To: Paolo Bonzini, imammedo
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, dan.j.williams, rth



On 11/03/2015 09:55 PM, Paolo Bonzini wrote:
>
>
> On 03/11/2015 04:56, Xiao Guangrong wrote:
>>
>>
>> On 11/03/2015 05:12 AM, Paolo Bonzini wrote:
>>>
>>>
>>> On 02/11/2015 10:13, Xiao Guangrong wrote:
>>>> Currently, file_ram_alloc() only works on directory - it creates a file
>>>> under @path and do mmap on it
>>>>
>>>> This patch tries to allow it to work on file directly, if @path is a
>>>> directory it works as before, otherwise it treats @path as the target
>>>> file then directly allocate memory from it
>>>>
>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>>> ---
>>>>    exec.c | 80
>>>> ++++++++++++++++++++++++++++++++++++++++++------------------------
>>>>    1 file changed, 51 insertions(+), 29 deletions(-)
>>>>
>>>> diff --git a/exec.c b/exec.c
>>>> index 9075f4d..db0fdaf 100644
>>>> --- a/exec.c
>>>> +++ b/exec.c
>>>> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>>>>    }
>>>>
>>>>    #ifdef __linux__
>>>> +static bool path_is_dir(const char *path)
>>>> +{
>>>> +    struct stat fs;
>>>> +
>>>> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
>>>> +}
>>>> +
>>>> +static int open_ram_file_path(RAMBlock *block, const char *path,
>>>> size_t size)
>>>> +{
>>>> +    char *filename;
>>>> +    char *sanitized_name;
>>>> +    char *c;
>>>> +    int fd;
>>>> +
>>>> +    if (!path_is_dir(path)) {
>>>> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
>>>> +
>>>> +        flags |= O_EXCL;
>>>> +        return open(path, flags);
>>>> +    }
>>>> +
>>>> +    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>>>> +    sanitized_name = g_strdup(memory_region_name(block->mr));
>>>> +    for (c = sanitized_name; *c != '\0'; c++) {
>>>> +        if (*c == '/') {
>>>> +            *c = '_';
>>>> +        }
>>>> +    }
>>>> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>>>> +                               sanitized_name);
>>>> +    g_free(sanitized_name);
>>>> +    fd = mkstemp(filename);
>>>> +    if (fd >= 0) {
>>>> +        unlink(filename);
>>>> +        /*
>>>> +         * ftruncate is not supported by hugetlbfs in older
>>>> +         * hosts, so don't bother bailing out on errors.
>>>> +         * If anything goes wrong with it under other filesystems,
>>>> +         * mmap will fail.
>>>> +         */
>>>> +        if (ftruncate(fd, size)) {
>>>> +            perror("ftruncate");
>>>> +        }
>>>> +    }
>>>> +    g_free(filename);
>>>> +
>>>> +    return fd;
>>>> +}
>>>> +
>>>>    static void *file_ram_alloc(RAMBlock *block,
>>>>                                ram_addr_t memory,
>>>>                                const char *path,
>>>>                                Error **errp)
>>>>    {
>>>> -    char *filename;
>>>> -    char *sanitized_name;
>>>> -    char *c;
>>>>        void *area;
>>>>        int fd;
>>>>        uint64_t pagesize;
>>>> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>>>>            goto error;
>>>>        }
>>>>
>>>> -    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>>>> -    sanitized_name = g_strdup(memory_region_name(block->mr));
>>>> -    for (c = sanitized_name; *c != '\0'; c++) {
>>>> -        if (*c == '/')
>>>> -            *c = '_';
>>>> -    }
>>>> -
>>>> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>>>> -                               sanitized_name);
>>>> -    g_free(sanitized_name);
>>>> +    memory = ROUND_UP(memory, pagesize);
>>>>
>>>> -    fd = mkstemp(filename);
>>>> +    fd = open_ram_file_path(block, path, memory);
>>>>        if (fd < 0) {
>>>>            error_setg_errno(errp, errno,
>>>>                             "unable to create backing store for path
>>>> %s", path);
>>>> -        g_free(filename);
>>>>            goto error;
>>>>        }
>>>> -    unlink(filename);
>>>> -    g_free(filename);
>>>> -
>>>> -    memory = ROUND_UP(memory, pagesize);
>>>> -
>>>> -    /*
>>>> -     * ftruncate is not supported by hugetlbfs in older
>>>> -     * hosts, so don't bother bailing out on errors.
>>>> -     * If anything goes wrong with it under other filesystems,
>>>> -     * mmap will fail.
>>>> -     */
>>>> -    if (ftruncate(fd, memory)) {
>>>> -        perror("ftruncate");
>>>> -    }
>>>>
>>>>        area = qemu_ram_mmap(fd, memory, pagesize, block->flags &
>>>> RAM_SHARED);
>>>>        if (area == MAP_FAILED) {
>>>>
>>>
>>> I was going to send tomorrow a pull request for a similar patch,
>>> "backends/hostmem-file: Allow to specify full pathname for backing file".
>>>
>>> The main difference seems to be your usage of O_EXCL.  Can you explain
>>> why you added it?
>>
>> It' used if we pass a block device as a NVDIMM backend memory:
>>   O_EXCL can be used without O_CREAT if pathname refers to a block
>> device.  If the block device
>>   is in use by the system (e.g., mounted), open() fails with the error EBUSY
>
> That makes sense, but I think it's better to be consistent with the
> handling of block devices.  Block devices do not use O_EXCL when QEMU
> opens them; I guess in principle it would also be possible to share a
> single pmem backend between multiple guests.

Yup. Will make a separate patch to do this. :)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 09/35] exec: allow file_ram_alloc to work on file
@ 2015-11-03 14:26           ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-03 14:26 UTC (permalink / raw)
  To: Paolo Bonzini, imammedo
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, dan.j.williams, rth



On 11/03/2015 09:55 PM, Paolo Bonzini wrote:
>
>
> On 03/11/2015 04:56, Xiao Guangrong wrote:
>>
>>
>> On 11/03/2015 05:12 AM, Paolo Bonzini wrote:
>>>
>>>
>>> On 02/11/2015 10:13, Xiao Guangrong wrote:
>>>> Currently, file_ram_alloc() only works on directory - it creates a file
>>>> under @path and do mmap on it
>>>>
>>>> This patch tries to allow it to work on file directly, if @path is a
>>>> directory it works as before, otherwise it treats @path as the target
>>>> file then directly allocate memory from it
>>>>
>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>>> ---
>>>>    exec.c | 80
>>>> ++++++++++++++++++++++++++++++++++++++++++------------------------
>>>>    1 file changed, 51 insertions(+), 29 deletions(-)
>>>>
>>>> diff --git a/exec.c b/exec.c
>>>> index 9075f4d..db0fdaf 100644
>>>> --- a/exec.c
>>>> +++ b/exec.c
>>>> @@ -1174,14 +1174,60 @@ void qemu_mutex_unlock_ramlist(void)
>>>>    }
>>>>
>>>>    #ifdef __linux__
>>>> +static bool path_is_dir(const char *path)
>>>> +{
>>>> +    struct stat fs;
>>>> +
>>>> +    return stat(path, &fs) == 0 && S_ISDIR(fs.st_mode);
>>>> +}
>>>> +
>>>> +static int open_ram_file_path(RAMBlock *block, const char *path,
>>>> size_t size)
>>>> +{
>>>> +    char *filename;
>>>> +    char *sanitized_name;
>>>> +    char *c;
>>>> +    int fd;
>>>> +
>>>> +    if (!path_is_dir(path)) {
>>>> +        int flags = (block->flags & RAM_SHARED) ? O_RDWR : O_RDONLY;
>>>> +
>>>> +        flags |= O_EXCL;
>>>> +        return open(path, flags);
>>>> +    }
>>>> +
>>>> +    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>>>> +    sanitized_name = g_strdup(memory_region_name(block->mr));
>>>> +    for (c = sanitized_name; *c != '\0'; c++) {
>>>> +        if (*c == '/') {
>>>> +            *c = '_';
>>>> +        }
>>>> +    }
>>>> +    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>>>> +                               sanitized_name);
>>>> +    g_free(sanitized_name);
>>>> +    fd = mkstemp(filename);
>>>> +    if (fd >= 0) {
>>>> +        unlink(filename);
>>>> +        /*
>>>> +         * ftruncate is not supported by hugetlbfs in older
>>>> +         * hosts, so don't bother bailing out on errors.
>>>> +         * If anything goes wrong with it under other filesystems,
>>>> +         * mmap will fail.
>>>> +         */
>>>> +        if (ftruncate(fd, size)) {
>>>> +            perror("ftruncate");
>>>> +        }
>>>> +    }
>>>> +    g_free(filename);
>>>> +
>>>> +    return fd;
>>>> +}
>>>> +
>>>>    static void *file_ram_alloc(RAMBlock *block,
>>>>                                ram_addr_t memory,
>>>>                                const char *path,
>>>>                                Error **errp)
>>>>    {
>>>> -    char *filename;
>>>> -    char *sanitized_name;
>>>> -    char *c;
>>>>        void *area;
>>>>        int fd;
>>>>        uint64_t pagesize;
>>>> @@ -1212,38 +1258,14 @@ static void *file_ram_alloc(RAMBlock *block,
>>>>            goto error;
>>>>        }
>>>>
>>>> -    /* Make name safe to use with mkstemp by replacing '/' with '_'. */
>>>> -    sanitized_name = g_strdup(memory_region_name(block->mr));
>>>> -    for (c = sanitized_name; *c != '\0'; c++) {
>>>> -        if (*c == '/')
>>>> -            *c = '_';
>>>> -    }
>>>> -
>>>> -    filename = g_strdup_printf("%s/qemu_back_mem.%s.XXXXXX", path,
>>>> -                               sanitized_name);
>>>> -    g_free(sanitized_name);
>>>> +    memory = ROUND_UP(memory, pagesize);
>>>>
>>>> -    fd = mkstemp(filename);
>>>> +    fd = open_ram_file_path(block, path, memory);
>>>>        if (fd < 0) {
>>>>            error_setg_errno(errp, errno,
>>>>                             "unable to create backing store for path
>>>> %s", path);
>>>> -        g_free(filename);
>>>>            goto error;
>>>>        }
>>>> -    unlink(filename);
>>>> -    g_free(filename);
>>>> -
>>>> -    memory = ROUND_UP(memory, pagesize);
>>>> -
>>>> -    /*
>>>> -     * ftruncate is not supported by hugetlbfs in older
>>>> -     * hosts, so don't bother bailing out on errors.
>>>> -     * If anything goes wrong with it under other filesystems,
>>>> -     * mmap will fail.
>>>> -     */
>>>> -    if (ftruncate(fd, memory)) {
>>>> -        perror("ftruncate");
>>>> -    }
>>>>
>>>>        area = qemu_ram_mmap(fd, memory, pagesize, block->flags &
>>>> RAM_SHARED);
>>>>        if (area == MAP_FAILED) {
>>>>
>>>
>>> I was going to send tomorrow a pull request for a similar patch,
>>> "backends/hostmem-file: Allow to specify full pathname for backing file".
>>>
>>> The main difference seems to be your usage of O_EXCL.  Can you explain
>>> why you added it?
>>
>> It' used if we pass a block device as a NVDIMM backend memory:
>>   O_EXCL can be used without O_CREAT if pathname refers to a block
>> device.  If the block device
>>   is in use by the system (e.g., mounted), open() fails with the error EBUSY
>
> That makes sense, but I think it's better to be consistent with the
> handling of block devices.  Block devices do not use O_EXCL when QEMU
> opens them; I guess in principle it would also be possible to share a
> single pmem backend between multiple guests.

Yup. Will make a separate patch to do this. :)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
  2015-11-02 16:16             ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
@ 2015-11-03 14:47               ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-03 14:47 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake



On 11/03/2015 12:16 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.11.2015 18:06, Xiao Guangrong wrote:
>>
>>
>> On 11/02/2015 10:26 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> On 02.11.2015 16:08, Xiao Guangrong wrote:
>>>>
>>>>
>>>> On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> On 02.11.2015 12:13, Xiao Guangrong wrote:
>>>>>> Curretly, the memory region of backed memory is directly mapped to
>>>>>> guest's address space, however, it is not true for nvdimm device
>>>>>>
>>>>>> This patch let dimm device realize this fact and use
>>>>>> DIMMDeviceClass->get_memory_region method to get the mapped memory
>>>>>> region
>>>>>>
>>>>>> Current code did not check the return value of get_memory_region as it
>>>>>> assumed the backend memory of pc-dimm is always properly initialized,
>>>>>> we make get_memory_region internally catch the case if something is
>>>>>> wrong
>>>
>>> but here you call not pc-dimm's get_memory_region, but common ddc->get_memory_region, which may be
>>> nvdimm or possibly other future dimm, so, why not check it here? And than pc_dimm_get_memory_region
>>> may be left untouched (error_abort is ok, because errp is unused).
>>
>> Hmm, because 'here' is not the only place calling ->get_memory_region, this method has
>> multiple callers:
>>
>> $ git grep "\->get_memory_region"
>> hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
>> hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
>> hw/mem/dimm.c:    mr = ddc->get_memory_region(dimm);
>> hw/mem/nvdimm.c:    ddc->get_memory_region = nvdimm_get_memory_region;
>> hw/mem/pc-dimm.c:    ddc->get_memory_region = pc_dimm_get_memory_region;
>> hw/ppc/spapr.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
>>
>> memory region validation is also done for NVDIMM in nvdimm device.
>>
> Ok, then it should be documented by a comment in dimm.h, where DIMMDeviceClass is defined, that this
> function should not fail
>

Okay, how about this comment:

     /*
      * get the memory region which will be mapped into guest's address
      * space. It is called after dimm device realized so it is never
      * failed.
      */
     MemoryRegion *(*get_memory_region)(DIMMDevice *dimm);

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
@ 2015-11-03 14:47               ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-03 14:47 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth



On 11/03/2015 12:16 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.11.2015 18:06, Xiao Guangrong wrote:
>>
>>
>> On 11/02/2015 10:26 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> On 02.11.2015 16:08, Xiao Guangrong wrote:
>>>>
>>>>
>>>> On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> On 02.11.2015 12:13, Xiao Guangrong wrote:
>>>>>> Curretly, the memory region of backed memory is directly mapped to
>>>>>> guest's address space, however, it is not true for nvdimm device
>>>>>>
>>>>>> This patch let dimm device realize this fact and use
>>>>>> DIMMDeviceClass->get_memory_region method to get the mapped memory
>>>>>> region
>>>>>>
>>>>>> Current code did not check the return value of get_memory_region as it
>>>>>> assumed the backend memory of pc-dimm is always properly initialized,
>>>>>> we make get_memory_region internally catch the case if something is
>>>>>> wrong
>>>
>>> but here you call not pc-dimm's get_memory_region, but common ddc->get_memory_region, which may be
>>> nvdimm or possibly other future dimm, so, why not check it here? And than pc_dimm_get_memory_region
>>> may be left untouched (error_abort is ok, because errp is unused).
>>
>> Hmm, because 'here' is not the only place calling ->get_memory_region, this method has
>> multiple callers:
>>
>> $ git grep "\->get_memory_region"
>> hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
>> hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
>> hw/mem/dimm.c:    mr = ddc->get_memory_region(dimm);
>> hw/mem/nvdimm.c:    ddc->get_memory_region = nvdimm_get_memory_region;
>> hw/mem/pc-dimm.c:    ddc->get_memory_region = pc_dimm_get_memory_region;
>> hw/ppc/spapr.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
>>
>> memory region validation is also done for NVDIMM in nvdimm device.
>>
> Ok, then it should be documented by a comment in dimm.h, where DIMMDeviceClass is defined, that this
> function should not fail
>

Okay, how about this comment:

     /*
      * get the memory region which will be mapped into guest's address
      * space. It is called after dimm device realized so it is never
      * failed.
      */
     MemoryRegion *(*get_memory_region)(DIMMDevice *dimm);

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 13/35] hostmem-file: use whole file size if possible
  2015-11-02 17:09     ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
@ 2015-11-03 14:51       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-03 14:51 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake



On 11/03/2015 01:09 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.11.2015 12:13, Xiao Guangrong wrote:
>> Use the whole file size if @size is not specified which is useful
>> if we want to directly pass a file to guest
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   backends/hostmem-file.c | 22 ++++++++++++++++++----
>>   1 file changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
>> index 9097a57..ea355c1 100644
>> --- a/backends/hostmem-file.c
>> +++ b/backends/hostmem-file.c
>> @@ -38,15 +38,29 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>>   {
>>       HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
>> -    if (!backend->size) {
>> -        error_setg(errp, "can't create backend with size 0");
>> -        return;
>> -    }
>>       if (!fb->mem_path) {
>>           error_setg(errp, "mem-path property not set");
>>           return;
>>       }
>> +    if (!backend->size) {
>> +        Error *local_err = NULL;
>> +
>> +        /*
>> +         * use the whole file size if @size is not specified.
>> +         */
>> +        backend->size = qemu_file_getlength(fb->mem_path, &local_err);
>> +        if (local_err) {
>> +            error_propagate(errp, local_err);
>> +            return;
>> +        }
>> +    }
>> +
>> +    if (!backend->size) {
>> +        error_setg(errp, "can't create backend on the file whose size is 0");
>> +        return;
>> +    }
>> +
>>       backend->force_prealloc = mem_prealloc;
>>       memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
>>                                    object_get_canonical_path(OBJECT(backend)),
>
> why not just

It look like it is a common style used in whole QEMU code.

>
> +    if (!backend->size) {
> +        /*
> +         * use the whole file size if @size is not specified.
> +         */
> +        backend->size = qemu_file_getlength(fb->mem_path, errp);
> +        if (*errp) {
> +            return;
> +        }
> +    }
>
>

But i think your way is better. :)

> what the purpose of propagating?
>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 13/35] hostmem-file: use whole file size if possible
@ 2015-11-03 14:51       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-03 14:51 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth



On 11/03/2015 01:09 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 02.11.2015 12:13, Xiao Guangrong wrote:
>> Use the whole file size if @size is not specified which is useful
>> if we want to directly pass a file to guest
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   backends/hostmem-file.c | 22 ++++++++++++++++++----
>>   1 file changed, 18 insertions(+), 4 deletions(-)
>>
>> diff --git a/backends/hostmem-file.c b/backends/hostmem-file.c
>> index 9097a57..ea355c1 100644
>> --- a/backends/hostmem-file.c
>> +++ b/backends/hostmem-file.c
>> @@ -38,15 +38,29 @@ file_backend_memory_alloc(HostMemoryBackend *backend, Error **errp)
>>   {
>>       HostMemoryBackendFile *fb = MEMORY_BACKEND_FILE(backend);
>> -    if (!backend->size) {
>> -        error_setg(errp, "can't create backend with size 0");
>> -        return;
>> -    }
>>       if (!fb->mem_path) {
>>           error_setg(errp, "mem-path property not set");
>>           return;
>>       }
>> +    if (!backend->size) {
>> +        Error *local_err = NULL;
>> +
>> +        /*
>> +         * use the whole file size if @size is not specified.
>> +         */
>> +        backend->size = qemu_file_getlength(fb->mem_path, &local_err);
>> +        if (local_err) {
>> +            error_propagate(errp, local_err);
>> +            return;
>> +        }
>> +    }
>> +
>> +    if (!backend->size) {
>> +        error_setg(errp, "can't create backend on the file whose size is 0");
>> +        return;
>> +    }
>> +
>>       backend->force_prealloc = mem_prealloc;
>>       memory_region_init_ram_from_file(&backend->mr, OBJECT(backend),
>>                                    object_get_canonical_path(OBJECT(backend)),
>
> why not just

It look like it is a common style used in whole QEMU code.

>
> +    if (!backend->size) {
> +        /*
> +         * use the whole file size if @size is not specified.
> +         */
> +        backend->size = qemu_file_getlength(fb->mem_path, errp);
> +        if (*errp) {
> +            return;
> +        }
> +    }
>
>

But i think your way is better. :)

> what the purpose of propagating?
>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 03/35] acpi: add aml_create_field
  2015-11-03  6:14     ` [Qemu-devel] " Shannon Zhao
  (?)
@ 2015-11-03 14:52     ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-03 14:52 UTC (permalink / raw)
  To: Shannon Zhao, pbonzini, imammedo
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, dan.j.williams, rth



On 11/03/2015 02:14 PM, Shannon Zhao wrote:
>
>
> On 2015/11/2 17:13, Xiao Guangrong wrote:
>> Implement CreateField term which is used by NVDIMM _DSM method in later patch
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   hw/acpi/aml-build.c         | 13 +++++++++++++
>>   include/hw/acpi/aml-build.h |  1 +
>>   2 files changed, 14 insertions(+)
>>
>> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
>> index a72214d..9fe5e7b 100644
>> --- a/hw/acpi/aml-build.c
>> +++ b/hw/acpi/aml-build.c
>> @@ -1151,6 +1151,19 @@ Aml *aml_sizeof(Aml *arg)
>>       return var;
>>   }
>>
>> +/* ACPI 1.0b: 16.2.5.2 Named Objects Encoding: DefCreateField */
>> +Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name)
>> +{
>> +    Aml *var = aml_alloc();
>> +    build_append_byte(var->buf, 0x5B); /* ExtOpPrefix */
>> +    build_append_byte(var->buf, 0x13); /* CreateFieldOp */
>> +    aml_append(var, srcbuf);
>> +    aml_append(var, index);
>> +    aml_append(var, len);
>> +    build_append_namestring(var->buf, "%s", name);
>> +    return var;
>> +}
>> +
>>   void
>>   build_header(GArray *linker, GArray *table_data,
>>                AcpiTableHeader *h, const char *sig, int len, uint8_t rev)
>> diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h
>> index 7296efb..7e1c43b 100644
>> --- a/include/hw/acpi/aml-build.h
>> +++ b/include/hw/acpi/aml-build.h
>> @@ -276,6 +276,7 @@ Aml *aml_touuid(const char *uuid);
>>   Aml *aml_unicode(const char *str);
>>   Aml *aml_derefof(Aml *arg);
>>   Aml *aml_sizeof(Aml *arg);
>> +Aml *aml_create_field(Aml *srcbuf, Aml *index, Aml *len, const char *name);
>>
> Maybe this could be moved together with existing aml_create_dword_field.

Not bad, will do. :)

>
>>   void
>>   build_header(GArray *linker, GArray *table_data,
>>
>

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-03 23:00     ` Eduardo Habkost
  -1 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-03 23:00 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake

On Mon, Nov 02, 2015 at 05:13:10PM +0800, Xiao Guangrong wrote:
> Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
> of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
> locates at DAX enabled filesystem
> 
> So this patch let it work on any kind of path
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  exec.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 9de38be..9075f4d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
>      char *c;
>      void *area;
>      int fd;
> -    uint64_t hpagesize;
> +    uint64_t pagesize;
>      Error *local_err = NULL;
>  
> -    hpagesize = qemu_file_get_page_size(path, &local_err);
> +    pagesize = qemu_file_get_page_size(path, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          goto error;
>      }
>  
> -    if (hpagesize == getpagesize()) {
> -        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
> +    if (pagesize == getpagesize()) {
> +        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");

If the point of this patch is to allow file_ram_alloc() to not be
specific to hugetlbfs anymore, this warning can simply go away.

(And in case if you really want to keep the warning, I don't see the
point of the changes you made to it.)

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
@ 2015-11-03 23:00     ` Eduardo Habkost
  0 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-03 23:00 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, dan.j.williams, rth

On Mon, Nov 02, 2015 at 05:13:10PM +0800, Xiao Guangrong wrote:
> Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
> of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
> locates at DAX enabled filesystem
> 
> So this patch let it work on any kind of path
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  exec.c | 24 ++++++++++++------------
>  1 file changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/exec.c b/exec.c
> index 9de38be..9075f4d 100644
> --- a/exec.c
> +++ b/exec.c
> @@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
>      char *c;
>      void *area;
>      int fd;
> -    uint64_t hpagesize;
> +    uint64_t pagesize;
>      Error *local_err = NULL;
>  
> -    hpagesize = qemu_file_get_page_size(path, &local_err);
> +    pagesize = qemu_file_get_page_size(path, &local_err);
>      if (local_err) {
>          error_propagate(errp, local_err);
>          goto error;
>      }
>  
> -    if (hpagesize == getpagesize()) {
> -        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
> +    if (pagesize == getpagesize()) {
> +        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");

If the point of this patch is to allow file_ram_alloc() to not be
specific to hugetlbfs anymore, this warning can simply go away.

(And in case if you really want to keep the warning, I don't see the
point of the changes you made to it.)

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 11/35] util: introduce qemu_file_getlength()
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-03 23:21     ` Eduardo Habkost
  -1 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-03 23:21 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake

On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
[...]
> +size_t qemu_file_getlength(const char *file, Error **errp)
> +{
> +    int64_t size;
[...]
> +    return size;

Can you guarantee that SIZE_MAX >= INT64_MAX on all platforms supported
by QEMU?

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 11/35] util: introduce qemu_file_getlength()
@ 2015-11-03 23:21     ` Eduardo Habkost
  0 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-03 23:21 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, dan.j.williams, rth

On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
[...]
> +size_t qemu_file_getlength(const char *file, Error **errp)
> +{
> +    int64_t size;
[...]
> +    return size;

Can you guarantee that SIZE_MAX >= INT64_MAX on all platforms supported
by QEMU?

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
  2015-11-03 23:00     ` [Qemu-devel] " Eduardo Habkost
@ 2015-11-04  3:12       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-04  3:12 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake



On 11/04/2015 07:00 AM, Eduardo Habkost wrote:
> On Mon, Nov 02, 2015 at 05:13:10PM +0800, Xiao Guangrong wrote:
>> Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
>> of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
>> locates at DAX enabled filesystem
>>
>> So this patch let it work on any kind of path
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   exec.c | 24 ++++++++++++------------
>>   1 file changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 9de38be..9075f4d 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
>>       char *c;
>>       void *area;
>>       int fd;
>> -    uint64_t hpagesize;
>> +    uint64_t pagesize;
>>       Error *local_err = NULL;
>>
>> -    hpagesize = qemu_file_get_page_size(path, &local_err);
>> +    pagesize = qemu_file_get_page_size(path, &local_err);
>>       if (local_err) {
>>           error_propagate(errp, local_err);
>>           goto error;
>>       }
>>
>> -    if (hpagesize == getpagesize()) {
>> -        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
>> +    if (pagesize == getpagesize()) {
>> +        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
>
> If the point of this patch is to allow file_ram_alloc() to not be
> specific to hugetlbfs anymore, this warning can simply go away.
>
> (And in case if you really want to keep the warning, I don't see the
> point of the changes you made to it.)
>

This is the history why we did it like this:
https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg02862.html

Q:
| What this *actually* is trying to warn against is that
| mapping a regular file (as opposed to hugetlbfs)
| means transparent huge pages don't work.

| So I don't think we should drop this warning completely.
| Either let's add the nvdimm magic, or simply check the
| page size.

A:
| Check the page size sounds good, will check:
| if (pagesize != getpagesize()) {
|        ...print something...
|}

| I agree with you that showing the info is needed, however,
| 'Warning' might scare some users, how about drop this word or
| just show “Memory is not allocated from HugeTlbfs”?

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
@ 2015-11-04  3:12       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-04  3:12 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: vsementsov, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, dan.j.williams, rth



On 11/04/2015 07:00 AM, Eduardo Habkost wrote:
> On Mon, Nov 02, 2015 at 05:13:10PM +0800, Xiao Guangrong wrote:
>> Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
>> of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
>> locates at DAX enabled filesystem
>>
>> So this patch let it work on any kind of path
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   exec.c | 24 ++++++++++++------------
>>   1 file changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/exec.c b/exec.c
>> index 9de38be..9075f4d 100644
>> --- a/exec.c
>> +++ b/exec.c
>> @@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
>>       char *c;
>>       void *area;
>>       int fd;
>> -    uint64_t hpagesize;
>> +    uint64_t pagesize;
>>       Error *local_err = NULL;
>>
>> -    hpagesize = qemu_file_get_page_size(path, &local_err);
>> +    pagesize = qemu_file_get_page_size(path, &local_err);
>>       if (local_err) {
>>           error_propagate(errp, local_err);
>>           goto error;
>>       }
>>
>> -    if (hpagesize == getpagesize()) {
>> -        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
>> +    if (pagesize == getpagesize()) {
>> +        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
>
> If the point of this patch is to allow file_ram_alloc() to not be
> specific to hugetlbfs anymore, this warning can simply go away.
>
> (And in case if you really want to keep the warning, I don't see the
> point of the changes you made to it.)
>

This is the history why we did it like this:
https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg02862.html

Q:
| What this *actually* is trying to warn against is that
| mapping a regular file (as opposed to hugetlbfs)
| means transparent huge pages don't work.

| So I don't think we should drop this warning completely.
| Either let's add the nvdimm magic, or simply check the
| page size.

A:
| Check the page size sounds good, will check:
| if (pagesize != getpagesize()) {
|        ...print something...
|}

| I agree with you that showing the info is needed, however,
| 'Warning' might scare some users, how about drop this word or
| just show “Memory is not allocated from HugeTlbfs”?

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 11/35] util: introduce qemu_file_getlength()
  2015-11-03 23:21     ` [Qemu-devel] " Eduardo Habkost
@ 2015-11-04  3:17       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-04  3:17 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: vsementsov, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, dan.j.williams, rth



On 11/04/2015 07:21 AM, Eduardo Habkost wrote:
> On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
> [...]
>> +size_t qemu_file_getlength(const char *file, Error **errp)
>> +{
>> +    int64_t size;
> [...]
>> +    return size;
>
> Can you guarantee that SIZE_MAX >= INT64_MAX on all platforms supported
> by QEMU?
>

Actually, this function is abstracted from the common function, raw_getlength(),
in raw-posix.c whose return value is int64_t.

And i think int64_t is large enough for block devices.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 11/35] util: introduce qemu_file_getlength()
@ 2015-11-04  3:17       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-04  3:17 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: vsementsov, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, dan.j.williams, rth



On 11/04/2015 07:21 AM, Eduardo Habkost wrote:
> On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
> [...]
>> +size_t qemu_file_getlength(const char *file, Error **errp)
>> +{
>> +    int64_t size;
> [...]
>> +    return size;
>
> Can you guarantee that SIZE_MAX >= INT64_MAX on all platforms supported
> by QEMU?
>

Actually, this function is abstracted from the common function, raw_getlength(),
in raw-posix.c whose return value is int64_t.

And i think int64_t is large enough for block devices.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 27/35] nvdimm acpi: build ACPI nvdimm devices
  2015-11-03 14:22       ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-04  8:56         ` Igor Mammedov
  -1 siblings, 0 replies; 200+ messages in thread
From: Igor Mammedov @ 2015-11-04  8:56 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, gleb, mtosatti, stefanha, mst, rth, ehabkost,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake

On Tue, 3 Nov 2015 22:22:40 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 11/03/2015 09:13 PM, Igor Mammedov wrote:
> > On Mon,  2 Nov 2015 17:13:29 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >
> >> NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices
> >>
> >> There is a root device under \_SB and specified NVDIMM devices are under the
> >> root device. Each NVDIMM device has _ADR which returns its handle used to
> >> associate MEMDEV structure in NFIT
> >>
> >> We reserve handle 0 for root device. In this patch, we save handle, handle,
> >> arg1 and arg2 to dsm memory. Arg3 is conditionally saved in later patch
> >>
> >> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >> ---
> >>   hw/acpi/nvdimm.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>   1 file changed, 184 insertions(+)
> >>
> >> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> >> index dd84e5f..53ed675 100644
> >> --- a/hw/acpi/nvdimm.c
> >> +++ b/hw/acpi/nvdimm.c
> >> @@ -368,6 +368,15 @@ static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
> >>       g_array_free(structures, true);
> >>   }
> >>
> >> +struct NvdimmDsmIn {
> >> +    uint32_t handle;
> >> +    uint32_t revision;
> >> +    uint32_t function;
> >> +   /* the remaining size in the page is used by arg3. */
> >> +    uint8_t arg3[0];
> >> +} QEMU_PACKED;
> >> +typedef struct NvdimmDsmIn NvdimmDsmIn;
> >> +
> >>   static uint64_t
> >>   nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
> >>   {
> >> @@ -377,6 +386,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
> >>   static void
> >>   nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
> >>   {
> >> +    fprintf(stderr, "BUG: we never write DSM notification IO Port.\n");
> > it doesn't seem like this hunk belongs here
> 
> Er, we have changed the logic:
> - others:
>    1) the buffer length is directly got from IO read rather than got
>       from dsm memory
> [ This has documented in v5's changelog. ]
> 
> So, the IO write is replaced by IO read, nvdimm_dsm_write() should not be
> triggered.
> 
> >
> >>   }
> >>
> >>   static const MemoryRegionOps nvdimm_dsm_ops = {
> >> @@ -402,6 +412,179 @@ void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
> >>       memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
> >>   }
> >>
> >> +#define BUILD_STA_METHOD(_dev_, _method_)                                  \
> >> +    do {                                                                   \
> >> +        _method_ = aml_method("_STA", 0);                                  \
> >> +        aml_append(_method_, aml_return(aml_int(0x0f)));                   \
> >> +        aml_append(_dev_, _method_);                                       \
> >> +    } while (0)
> > _STA doesn't have any logic here so drop macro and just
> > replace its call sites with:
> 
> Okay, I was just wanting to save some code lines. I will drop this macro.
> 
> >
> > aml_append(foo_dev, aml_name_decl("_STA", aml_int(0xf));
> 
> _STA is required as a method with zero argument but this statement just
> define a object. It is okay?
Spec doesn't say that it must be method, it says that it will evaluate _STA object
and result must be a combination of defined flags.
AML wise calling a method with 0 arguments and referencing named variable
is the same thing, both end up being just a namestring.

Also note that _STA here return 0xF, and spec says that if _STA is missing
OSPM shall assume its implicit value being 0xF, so you can just drop _STA
object here altogether.

> 
> >
> >
> >> +
> >> +#define BUILD_DSM_METHOD(_dev_, _method_, _handle_, _uuid_)                \
> >> +    do {                                                                   \
> >> +        Aml *ifctx, *uuid;                                                 \
> >> +        _method_ = aml_method("_DSM", 4);                                  \
> >> +        /* check UUID if it is we expect, return the errorcode if not.*/   \
> >> +        uuid = aml_touuid(_uuid_);                                         \
> >> +        ifctx = aml_if(aml_lnot(aml_equal(aml_arg(0), uuid)));             \
> >> +        aml_append(ifctx, aml_return(aml_int(1 /* Not Supported */)));     \
> >> +        aml_append(method, ifctx);                                         \
> >> +        aml_append(method, aml_return(aml_call4("NCAL", aml_int(_handle_), \
> >> +                   aml_arg(1), aml_arg(2), aml_arg(3))));                  \
> >> +        aml_append(_dev_, _method_);                                       \
> >> +    } while (0)
> >> +
> >> +#define BUILD_FIELD_UNIT_SIZE(_field_, _byte_, _name_)                     \
> >> +    aml_append(_field_, aml_named_field(_name_, (_byte_) * BITS_PER_BYTE))
> >> +
> >> +#define BUILD_FIELD_UNIT_STRUCT(_field_, _s_, _f_, _name_)                 \
> >> +    BUILD_FIELD_UNIT_SIZE(_field_, sizeof(typeof_field(_s_, _f_)), _name_)
> >> +
> >> +static void build_nvdimm_devices(GSList *device_list, Aml *root_dev)
> >> +{
> >> +    for (; device_list; device_list = device_list->next) {
> >> +        NVDIMMDevice *nvdimm = device_list->data;
> >> +        int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
> >> +                                           NULL);
> >> +        uint32_t handle = nvdimm_slot_to_handle(slot);
> >> +        Aml *dev, *method;
> >> +
> >> +        dev = aml_device("NV%02X", slot);
> >> +        aml_append(dev, aml_name_decl("_ADR", aml_int(handle)));
> >> +
> >> +        BUILD_STA_METHOD(dev, method);
> >> +
> >> +        /*
> >> +         * Chapter 4: _DSM Interface for NVDIMM Device (non-root) - Example
> >> +         * in DSM Spec Rev1.
> >> +         */
> >> +        BUILD_DSM_METHOD(dev, method,
> >> +                         handle /* NVDIMM Device Handle */,
> >> +                         "4309AC30-0D11-11E4-9191-0800200C9A66"
> >> +                         /* UUID for NVDIMM Devices. */);
> > this will add N-bytes * #NVDIMMS in worst case.
> > Please drop macro and just consolidate this method into _DSM method of parent scope
> > and then call it from here like this:
> >     Method(_DSM, 4)
> >         Return(^_DSM(Arg[0-3]))
> 
> Parent _DSM can not be directly called as _DSM in parent requires different UUID.
> UUID is not saved in dsm memory so that UUID verification should be done in AML.
> 
> This macro, BUILD_DSM_METHOD(), build its _DSM call and check if UUID is valid, if
> not, it returns error code 1 (Not Supoorted), otherwise it call the common method
> NCAL which saves input parameters into dsm memory and trigger IO exit. It seems no
> byte is wasted. No?
add an extra arg to NCAL lets say IS_ROOT
NCAL will check for root UUID if IS_ROOT true and
check for nvdimm device UUID if it's false.
That roughly will save ~150bytes per nvdimm or more than 2Kb in case of 256 NVDIMMs.

> 
> >
> >> +
> >> +        aml_append(root_dev, dev);
> >> +    }
> >> +}
> >> +
> >> +static void nvdimm_build_acpi_devices(GSList *device_list, Aml *sb_scope)
> >> +{
> >> +    Aml *dev, *method, *field;
> >> +    uint64_t page_size = TARGET_PAGE_SIZE;
> >> +
> >> +    dev = aml_device("NVDR");
> >> +    aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
> >> +
> >> +    /* map DSM memory and IO into ACPI namespace. */
> >> +    aml_append(dev, aml_operation_region("NPIO", AML_SYSTEM_IO,
> >> +               NVDIMM_ACPI_IO_BASE, NVDIMM_ACPI_IO_LEN));
> >> +    aml_append(dev, aml_operation_region("NRAM", AML_SYSTEM_MEMORY,
> >> +               NVDIMM_ACPI_MEM_BASE, page_size));
> >> +
> >> +    /*
> >> +     * DSM notifier:
> >> +     * @NOTI: Read it will notify QEMU that _DSM method is being
> >> +     *        called and the parameters can be found in NvdimmDsmIn.
> >> +     *        The value read from it is the buffer size of DSM output
> >> +     *        filled by QEMU.
> >> +     */
> >> +    field = aml_field("NPIO", AML_DWORD_ACC, AML_PRESERVE);
> >> +    BUILD_FIELD_UNIT_SIZE(field, sizeof(uint32_t), "NOTI");
> >> +    aml_append(dev, field);
> >> +
> >> +    /*
> >> +     * DSM input:
> >> +     * @HDLE: store device's handle, it's zero if the _DSM call happens
> >> +     *        on NVDIMM Root Device.
> >> +     * @REVS: store the Arg1 of _DSM call.
> >> +     * @FUNC: store the Arg2 of _DSM call.
> >> +     * @ARG3: store the Arg3 of _DSM call.
> >> +     *
> >> +     * They are RAM mapping on host so that these accesses never cause
> >> +     * VM-EXIT.
> >> +     */
> >> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
> >> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, handle, "HDLE");
> >> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, revision, "REVS");
> >> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, function, "FUNC");
> >> +    BUILD_FIELD_UNIT_SIZE(field, page_size - offsetof(NvdimmDsmIn, arg3),
> >> +                          "ARG3");
> > These macros don't make code any better and one has to jump to their
> > definition every time one sees it to figure out what it's doing.
> > Please don't hide code behind macros and just replace them with aml_foo()
> > here and at other places in this patch.
> >
> 
> Okay, will follow your way. :)
> 
> >> +    aml_append(dev, field);
> >> +
> >> +    /*
> >> +     * DSM output:
> >> +     * @ODAT: the buffer QEMU uses to store the result, the actual size
> >> +     *        filled by QEMU is the value read from NOT1.
> >> +     *
> >> +     * Since the page is reused by both input and out, the input data
> >> +     * will be lost after storing new result into @ODAT.
> >> +    */
> >> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
> >> +    BUILD_FIELD_UNIT_SIZE(field, page_size, "ODAT");
> >> +    aml_append(dev, field);
> >> +
> >> +    method = aml_method_serialized("NCAL", 4);
Why method is called with 4 arguments but only arg[0-2] are used?

> >> +    {
> >> +        Aml *buffer_size = aml_local(0);
> >> +
> >> +        aml_append(method, aml_store(aml_arg(0), aml_name("HDLE")));
> >> +        aml_append(method, aml_store(aml_arg(1), aml_name("REVS")));
> >> +        aml_append(method, aml_store(aml_arg(2), aml_name("FUNC")));
> >> +
> >> +        /*
> >> +         * transfer control to QEMU and the buffer size filled by
> >> +         * QEMU is returned.
> >> +         */
> >> +        aml_append(method, aml_store(aml_name("NOTI"), buffer_size));
> >> +
> >> +        aml_append(method, aml_store(aml_shiftleft(buffer_size,
> >> +                                       aml_int(3)), buffer_size));
> >> +
> >> +        aml_append(method, aml_create_field(aml_name("ODAT"), aml_int(0),
> >> +                                            buffer_size , "OBUF"));
> >> +        aml_append(method, aml_concatenate(aml_buffer(0, NULL),
> >> +                                           aml_name("OBUF"), aml_local(1)));
> >> +        aml_append(method, aml_return(aml_local(1)));
> >> +    }
> >> +    aml_append(dev, method);
> >> +
> >> +    BUILD_STA_METHOD(dev, method);
> >> +
> >> +    /*
> >> +     * Chapter 3: _DSM Interface for NVDIMM Root Device - Example in DSM
> >> +     * Spec Rev1.
> >> +     */
> >> +    BUILD_DSM_METHOD(dev, method,
> >> +                     0 /* 0 is reserved for NVDIMM Root Device*/,
Does 'handle' equal to slot number?


> >> +                     "2F10E7A4-9E91-11E4-89D3-123B93F75CBA"
> >> +                     /* UUID for NVDIMM Root Devices. */);
> >> +
> >> +    build_nvdimm_devices(device_list, dev);
> >> +
> >> +    aml_append(sb_scope, dev);
> >> +}
> >> +
> >> +static void nvdimm_build_ssdt(GSList *device_list, GArray *table_offsets,
> >> +                              GArray *table_data, GArray *linker)
> >> +{
> >> +    Aml *ssdt, *sb_scope;
> >> +
> >> +    acpi_add_table(table_offsets, table_data);
> >> +
> >> +    ssdt = init_aml_allocator();
> >> +    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
> >> +
> >> +    sb_scope = aml_scope("\\_SB");
> >> +    nvdimm_build_acpi_devices(device_list, sb_scope);
> > is there need for dedicated nvdimm_build_acpi_devices()?
> > Is it reused somewhere else?
> > If it's not then just inline it here.
> 
> Since building NVDIMM devices is a complex work so i designed to
> let nvdimm_build_acpi_devices() build NVDIMM root device then
> it calls build_nvdimm_devices() to build children devices. You
> can see nvdimm_build_acpi_devices is a big function.
> 
> That proposal just wants to make the code clear. If you really
> hate this, i will drop nvdimm_build_acpi_devices, no problem. :)
seeing how much this function will add to nvdimm_build_acpi_devices,
I don't see point in having a separate nvdimm_build_acpi_devices().


> 
> >
> >> +
> >> +    aml_append(ssdt, sb_scope);
> >> +    /* copy AML table into ACPI tables blob and patch header there */
> >> +    g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
> >> +    build_header(linker, table_data,
> >> +        (void *)(table_data->data + table_data->len - ssdt->buf->len),
> >> +        "SSDT", ssdt->buf->len, 1);
> > It's not ok to have several SSDT tables with exact same signature.
> > how about extending build_header(..., oem_table_id)?
> > You can set it to NULL to get original behavior but provide NVDIMM
> > specific id for this table. for example "NVDIMM"
> >
> 
> Ah, i just noticed the ACPI spec says:
> | each secondary system description table listed in the RSDT/XSDT with a unique OEM Table ID is
> | loaded.
> You are right.
> 
> Okay, i will extend this function, thanks for your suggestion.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 27/35] nvdimm acpi: build ACPI nvdimm devices
@ 2015-11-04  8:56         ` Igor Mammedov
  0 siblings, 0 replies; 200+ messages in thread
From: Igor Mammedov @ 2015-11-04  8:56 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth

On Tue, 3 Nov 2015 22:22:40 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 11/03/2015 09:13 PM, Igor Mammedov wrote:
> > On Mon,  2 Nov 2015 17:13:29 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >
> >> NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices
> >>
> >> There is a root device under \_SB and specified NVDIMM devices are under the
> >> root device. Each NVDIMM device has _ADR which returns its handle used to
> >> associate MEMDEV structure in NFIT
> >>
> >> We reserve handle 0 for root device. In this patch, we save handle, handle,
> >> arg1 and arg2 to dsm memory. Arg3 is conditionally saved in later patch
> >>
> >> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >> ---
> >>   hw/acpi/nvdimm.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>   1 file changed, 184 insertions(+)
> >>
> >> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
> >> index dd84e5f..53ed675 100644
> >> --- a/hw/acpi/nvdimm.c
> >> +++ b/hw/acpi/nvdimm.c
> >> @@ -368,6 +368,15 @@ static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
> >>       g_array_free(structures, true);
> >>   }
> >>
> >> +struct NvdimmDsmIn {
> >> +    uint32_t handle;
> >> +    uint32_t revision;
> >> +    uint32_t function;
> >> +   /* the remaining size in the page is used by arg3. */
> >> +    uint8_t arg3[0];
> >> +} QEMU_PACKED;
> >> +typedef struct NvdimmDsmIn NvdimmDsmIn;
> >> +
> >>   static uint64_t
> >>   nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
> >>   {
> >> @@ -377,6 +386,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
> >>   static void
> >>   nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
> >>   {
> >> +    fprintf(stderr, "BUG: we never write DSM notification IO Port.\n");
> > it doesn't seem like this hunk belongs here
> 
> Er, we have changed the logic:
> - others:
>    1) the buffer length is directly got from IO read rather than got
>       from dsm memory
> [ This has documented in v5's changelog. ]
> 
> So, the IO write is replaced by IO read, nvdimm_dsm_write() should not be
> triggered.
> 
> >
> >>   }
> >>
> >>   static const MemoryRegionOps nvdimm_dsm_ops = {
> >> @@ -402,6 +412,179 @@ void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
> >>       memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
> >>   }
> >>
> >> +#define BUILD_STA_METHOD(_dev_, _method_)                                  \
> >> +    do {                                                                   \
> >> +        _method_ = aml_method("_STA", 0);                                  \
> >> +        aml_append(_method_, aml_return(aml_int(0x0f)));                   \
> >> +        aml_append(_dev_, _method_);                                       \
> >> +    } while (0)
> > _STA doesn't have any logic here so drop macro and just
> > replace its call sites with:
> 
> Okay, I was just wanting to save some code lines. I will drop this macro.
> 
> >
> > aml_append(foo_dev, aml_name_decl("_STA", aml_int(0xf));
> 
> _STA is required as a method with zero argument but this statement just
> define a object. It is okay?
Spec doesn't say that it must be method, it says that it will evaluate _STA object
and result must be a combination of defined flags.
AML wise calling a method with 0 arguments and referencing named variable
is the same thing, both end up being just a namestring.

Also note that _STA here return 0xF, and spec says that if _STA is missing
OSPM shall assume its implicit value being 0xF, so you can just drop _STA
object here altogether.

> 
> >
> >
> >> +
> >> +#define BUILD_DSM_METHOD(_dev_, _method_, _handle_, _uuid_)                \
> >> +    do {                                                                   \
> >> +        Aml *ifctx, *uuid;                                                 \
> >> +        _method_ = aml_method("_DSM", 4);                                  \
> >> +        /* check UUID if it is we expect, return the errorcode if not.*/   \
> >> +        uuid = aml_touuid(_uuid_);                                         \
> >> +        ifctx = aml_if(aml_lnot(aml_equal(aml_arg(0), uuid)));             \
> >> +        aml_append(ifctx, aml_return(aml_int(1 /* Not Supported */)));     \
> >> +        aml_append(method, ifctx);                                         \
> >> +        aml_append(method, aml_return(aml_call4("NCAL", aml_int(_handle_), \
> >> +                   aml_arg(1), aml_arg(2), aml_arg(3))));                  \
> >> +        aml_append(_dev_, _method_);                                       \
> >> +    } while (0)
> >> +
> >> +#define BUILD_FIELD_UNIT_SIZE(_field_, _byte_, _name_)                     \
> >> +    aml_append(_field_, aml_named_field(_name_, (_byte_) * BITS_PER_BYTE))
> >> +
> >> +#define BUILD_FIELD_UNIT_STRUCT(_field_, _s_, _f_, _name_)                 \
> >> +    BUILD_FIELD_UNIT_SIZE(_field_, sizeof(typeof_field(_s_, _f_)), _name_)
> >> +
> >> +static void build_nvdimm_devices(GSList *device_list, Aml *root_dev)
> >> +{
> >> +    for (; device_list; device_list = device_list->next) {
> >> +        NVDIMMDevice *nvdimm = device_list->data;
> >> +        int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
> >> +                                           NULL);
> >> +        uint32_t handle = nvdimm_slot_to_handle(slot);
> >> +        Aml *dev, *method;
> >> +
> >> +        dev = aml_device("NV%02X", slot);
> >> +        aml_append(dev, aml_name_decl("_ADR", aml_int(handle)));
> >> +
> >> +        BUILD_STA_METHOD(dev, method);
> >> +
> >> +        /*
> >> +         * Chapter 4: _DSM Interface for NVDIMM Device (non-root) - Example
> >> +         * in DSM Spec Rev1.
> >> +         */
> >> +        BUILD_DSM_METHOD(dev, method,
> >> +                         handle /* NVDIMM Device Handle */,
> >> +                         "4309AC30-0D11-11E4-9191-0800200C9A66"
> >> +                         /* UUID for NVDIMM Devices. */);
> > this will add N-bytes * #NVDIMMS in worst case.
> > Please drop macro and just consolidate this method into _DSM method of parent scope
> > and then call it from here like this:
> >     Method(_DSM, 4)
> >         Return(^_DSM(Arg[0-3]))
> 
> Parent _DSM can not be directly called as _DSM in parent requires different UUID.
> UUID is not saved in dsm memory so that UUID verification should be done in AML.
> 
> This macro, BUILD_DSM_METHOD(), build its _DSM call and check if UUID is valid, if
> not, it returns error code 1 (Not Supoorted), otherwise it call the common method
> NCAL which saves input parameters into dsm memory and trigger IO exit. It seems no
> byte is wasted. No?
add an extra arg to NCAL lets say IS_ROOT
NCAL will check for root UUID if IS_ROOT true and
check for nvdimm device UUID if it's false.
That roughly will save ~150bytes per nvdimm or more than 2Kb in case of 256 NVDIMMs.

> 
> >
> >> +
> >> +        aml_append(root_dev, dev);
> >> +    }
> >> +}
> >> +
> >> +static void nvdimm_build_acpi_devices(GSList *device_list, Aml *sb_scope)
> >> +{
> >> +    Aml *dev, *method, *field;
> >> +    uint64_t page_size = TARGET_PAGE_SIZE;
> >> +
> >> +    dev = aml_device("NVDR");
> >> +    aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
> >> +
> >> +    /* map DSM memory and IO into ACPI namespace. */
> >> +    aml_append(dev, aml_operation_region("NPIO", AML_SYSTEM_IO,
> >> +               NVDIMM_ACPI_IO_BASE, NVDIMM_ACPI_IO_LEN));
> >> +    aml_append(dev, aml_operation_region("NRAM", AML_SYSTEM_MEMORY,
> >> +               NVDIMM_ACPI_MEM_BASE, page_size));
> >> +
> >> +    /*
> >> +     * DSM notifier:
> >> +     * @NOTI: Read it will notify QEMU that _DSM method is being
> >> +     *        called and the parameters can be found in NvdimmDsmIn.
> >> +     *        The value read from it is the buffer size of DSM output
> >> +     *        filled by QEMU.
> >> +     */
> >> +    field = aml_field("NPIO", AML_DWORD_ACC, AML_PRESERVE);
> >> +    BUILD_FIELD_UNIT_SIZE(field, sizeof(uint32_t), "NOTI");
> >> +    aml_append(dev, field);
> >> +
> >> +    /*
> >> +     * DSM input:
> >> +     * @HDLE: store device's handle, it's zero if the _DSM call happens
> >> +     *        on NVDIMM Root Device.
> >> +     * @REVS: store the Arg1 of _DSM call.
> >> +     * @FUNC: store the Arg2 of _DSM call.
> >> +     * @ARG3: store the Arg3 of _DSM call.
> >> +     *
> >> +     * They are RAM mapping on host so that these accesses never cause
> >> +     * VM-EXIT.
> >> +     */
> >> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
> >> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, handle, "HDLE");
> >> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, revision, "REVS");
> >> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, function, "FUNC");
> >> +    BUILD_FIELD_UNIT_SIZE(field, page_size - offsetof(NvdimmDsmIn, arg3),
> >> +                          "ARG3");
> > These macros don't make code any better and one has to jump to their
> > definition every time one sees it to figure out what it's doing.
> > Please don't hide code behind macros and just replace them with aml_foo()
> > here and at other places in this patch.
> >
> 
> Okay, will follow your way. :)
> 
> >> +    aml_append(dev, field);
> >> +
> >> +    /*
> >> +     * DSM output:
> >> +     * @ODAT: the buffer QEMU uses to store the result, the actual size
> >> +     *        filled by QEMU is the value read from NOT1.
> >> +     *
> >> +     * Since the page is reused by both input and out, the input data
> >> +     * will be lost after storing new result into @ODAT.
> >> +    */
> >> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
> >> +    BUILD_FIELD_UNIT_SIZE(field, page_size, "ODAT");
> >> +    aml_append(dev, field);
> >> +
> >> +    method = aml_method_serialized("NCAL", 4);
Why method is called with 4 arguments but only arg[0-2] are used?

> >> +    {
> >> +        Aml *buffer_size = aml_local(0);
> >> +
> >> +        aml_append(method, aml_store(aml_arg(0), aml_name("HDLE")));
> >> +        aml_append(method, aml_store(aml_arg(1), aml_name("REVS")));
> >> +        aml_append(method, aml_store(aml_arg(2), aml_name("FUNC")));
> >> +
> >> +        /*
> >> +         * transfer control to QEMU and the buffer size filled by
> >> +         * QEMU is returned.
> >> +         */
> >> +        aml_append(method, aml_store(aml_name("NOTI"), buffer_size));
> >> +
> >> +        aml_append(method, aml_store(aml_shiftleft(buffer_size,
> >> +                                       aml_int(3)), buffer_size));
> >> +
> >> +        aml_append(method, aml_create_field(aml_name("ODAT"), aml_int(0),
> >> +                                            buffer_size , "OBUF"));
> >> +        aml_append(method, aml_concatenate(aml_buffer(0, NULL),
> >> +                                           aml_name("OBUF"), aml_local(1)));
> >> +        aml_append(method, aml_return(aml_local(1)));
> >> +    }
> >> +    aml_append(dev, method);
> >> +
> >> +    BUILD_STA_METHOD(dev, method);
> >> +
> >> +    /*
> >> +     * Chapter 3: _DSM Interface for NVDIMM Root Device - Example in DSM
> >> +     * Spec Rev1.
> >> +     */
> >> +    BUILD_DSM_METHOD(dev, method,
> >> +                     0 /* 0 is reserved for NVDIMM Root Device*/,
Does 'handle' equal to slot number?


> >> +                     "2F10E7A4-9E91-11E4-89D3-123B93F75CBA"
> >> +                     /* UUID for NVDIMM Root Devices. */);
> >> +
> >> +    build_nvdimm_devices(device_list, dev);
> >> +
> >> +    aml_append(sb_scope, dev);
> >> +}
> >> +
> >> +static void nvdimm_build_ssdt(GSList *device_list, GArray *table_offsets,
> >> +                              GArray *table_data, GArray *linker)
> >> +{
> >> +    Aml *ssdt, *sb_scope;
> >> +
> >> +    acpi_add_table(table_offsets, table_data);
> >> +
> >> +    ssdt = init_aml_allocator();
> >> +    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
> >> +
> >> +    sb_scope = aml_scope("\\_SB");
> >> +    nvdimm_build_acpi_devices(device_list, sb_scope);
> > is there need for dedicated nvdimm_build_acpi_devices()?
> > Is it reused somewhere else?
> > If it's not then just inline it here.
> 
> Since building NVDIMM devices is a complex work so i designed to
> let nvdimm_build_acpi_devices() build NVDIMM root device then
> it calls build_nvdimm_devices() to build children devices. You
> can see nvdimm_build_acpi_devices is a big function.
> 
> That proposal just wants to make the code clear. If you really
> hate this, i will drop nvdimm_build_acpi_devices, no problem. :)
seeing how much this function will add to nvdimm_build_acpi_devices,
I don't see point in having a separate nvdimm_build_acpi_devices().


> 
> >
> >> +
> >> +    aml_append(ssdt, sb_scope);
> >> +    /* copy AML table into ACPI tables blob and patch header there */
> >> +    g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len);
> >> +    build_header(linker, table_data,
> >> +        (void *)(table_data->data + table_data->len - ssdt->buf->len),
> >> +        "SSDT", ssdt->buf->len, 1);
> > It's not ok to have several SSDT tables with exact same signature.
> > how about extending build_header(..., oem_table_id)?
> > You can set it to NULL to get original behavior but provide NVDIMM
> > specific id for this table. for example "NVDIMM"
> >
> 
> Ah, i just noticed the ACPI spec says:
> | each secondary system description table listed in the RSDT/XSDT with a unique OEM Table ID is
> | loaded.
> You are right.
> 
> Okay, i will extend this function, thanks for your suggestion.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
  2015-11-04  3:12       ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-04 12:40         ` Eduardo Habkost
  -1 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-04 12:40 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake

On Wed, Nov 04, 2015 at 11:12:41AM +0800, Xiao Guangrong wrote:
> On 11/04/2015 07:00 AM, Eduardo Habkost wrote:
> >On Mon, Nov 02, 2015 at 05:13:10PM +0800, Xiao Guangrong wrote:
> >>Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
> >>of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
> >>locates at DAX enabled filesystem
> >>
> >>So this patch let it work on any kind of path
> >>
> >>Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >>---
> >>  exec.c | 24 ++++++++++++------------
> >>  1 file changed, 12 insertions(+), 12 deletions(-)
> >>
> >>diff --git a/exec.c b/exec.c
> >>index 9de38be..9075f4d 100644
> >>--- a/exec.c
> >>+++ b/exec.c
> >>@@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
> >>      char *c;
> >>      void *area;
> >>      int fd;
> >>-    uint64_t hpagesize;
> >>+    uint64_t pagesize;
> >>      Error *local_err = NULL;
> >>
> >>-    hpagesize = qemu_file_get_page_size(path, &local_err);
> >>+    pagesize = qemu_file_get_page_size(path, &local_err);
> >>      if (local_err) {
> >>          error_propagate(errp, local_err);
> >>          goto error;
> >>      }
> >>
> >>-    if (hpagesize == getpagesize()) {
> >>-        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
> >>+    if (pagesize == getpagesize()) {
> >>+        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
> >
> >If the point of this patch is to allow file_ram_alloc() to not be
> >specific to hugetlbfs anymore, this warning can simply go away.
> >
> >(And in case if you really want to keep the warning, I don't see the
> >point of the changes you made to it.)
> >
> 
> This is the history why we did it like this:
> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg02862.html

The rule I am trying to follow is simple: if there are valid use cases
(e.g. nvdimm, tmpfs) where the warning will be printed every single time
QEMU runs, the warning is not appropriate.

If you really want to keep a warning, please make it not be printed on
all other valid use cases (nvdimm and tmpfs). Personally, I don't think
adding those additional checks would be worth the trouble, that's why I
suggest removing the warning.

> 
> Q:
> | What this *actually* is trying to warn against is that
> | mapping a regular file (as opposed to hugetlbfs)
> | means transparent huge pages don't work.

I don't think the author of that warning even thought about transparent
huge pages (did THP even existed when it was written?). I believe they
just assumed that the only reason for using -mem-path would be hugetlbfs
and wanted to warn about it. That assumption isn't true anymore.

> 
> | So I don't think we should drop this warning completely.
> | Either let's add the nvdimm magic, or simply check the
> | page size.
> 
> A:
> | Check the page size sounds good, will check:
> | if (pagesize != getpagesize()) {
> |        ...print something...
> |}
> 
> | I agree with you that showing the info is needed, however,
> | 'Warning' might scare some users, how about drop this word or
> | just show “Memory is not allocated from HugeTlbfs”?

With "warning:", I know it may be OK to do what I am doing and the
software is just trying to warn me. If there's no "warning:", I don't
even know if something is really broken in my config, or if it's just a
warning, and I would be very confused.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
@ 2015-11-04 12:40         ` Eduardo Habkost
  0 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-04 12:40 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, dan.j.williams, rth

On Wed, Nov 04, 2015 at 11:12:41AM +0800, Xiao Guangrong wrote:
> On 11/04/2015 07:00 AM, Eduardo Habkost wrote:
> >On Mon, Nov 02, 2015 at 05:13:10PM +0800, Xiao Guangrong wrote:
> >>Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
> >>of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
> >>locates at DAX enabled filesystem
> >>
> >>So this patch let it work on any kind of path
> >>
> >>Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >>---
> >>  exec.c | 24 ++++++++++++------------
> >>  1 file changed, 12 insertions(+), 12 deletions(-)
> >>
> >>diff --git a/exec.c b/exec.c
> >>index 9de38be..9075f4d 100644
> >>--- a/exec.c
> >>+++ b/exec.c
> >>@@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
> >>      char *c;
> >>      void *area;
> >>      int fd;
> >>-    uint64_t hpagesize;
> >>+    uint64_t pagesize;
> >>      Error *local_err = NULL;
> >>
> >>-    hpagesize = qemu_file_get_page_size(path, &local_err);
> >>+    pagesize = qemu_file_get_page_size(path, &local_err);
> >>      if (local_err) {
> >>          error_propagate(errp, local_err);
> >>          goto error;
> >>      }
> >>
> >>-    if (hpagesize == getpagesize()) {
> >>-        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
> >>+    if (pagesize == getpagesize()) {
> >>+        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
> >
> >If the point of this patch is to allow file_ram_alloc() to not be
> >specific to hugetlbfs anymore, this warning can simply go away.
> >
> >(And in case if you really want to keep the warning, I don't see the
> >point of the changes you made to it.)
> >
> 
> This is the history why we did it like this:
> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg02862.html

The rule I am trying to follow is simple: if there are valid use cases
(e.g. nvdimm, tmpfs) where the warning will be printed every single time
QEMU runs, the warning is not appropriate.

If you really want to keep a warning, please make it not be printed on
all other valid use cases (nvdimm and tmpfs). Personally, I don't think
adding those additional checks would be worth the trouble, that's why I
suggest removing the warning.

> 
> Q:
> | What this *actually* is trying to warn against is that
> | mapping a regular file (as opposed to hugetlbfs)
> | means transparent huge pages don't work.

I don't think the author of that warning even thought about transparent
huge pages (did THP even existed when it was written?). I believe they
just assumed that the only reason for using -mem-path would be hugetlbfs
and wanted to warn about it. That assumption isn't true anymore.

> 
> | So I don't think we should drop this warning completely.
> | Either let's add the nvdimm magic, or simply check the
> | page size.
> 
> A:
> | Check the page size sounds good, will check:
> | if (pagesize != getpagesize()) {
> |        ...print something...
> |}
> 
> | I agree with you that showing the info is needed, however,
> | 'Warning' might scare some users, how about drop this word or
> | just show “Memory is not allocated from HugeTlbfs”?

With "warning:", I know it may be OK to do what I am doing and the
software is just trying to warn me. If there's no "warning:", I don't
even know if something is really broken in my config, or if it's just a
warning, and I would be very confused.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 27/35] nvdimm acpi: build ACPI nvdimm devices
  2015-11-04  8:56         ` [Qemu-devel] " Igor Mammedov
@ 2015-11-04 14:11           ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-04 14:11 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: pbonzini, gleb, mtosatti, stefanha, mst, rth, ehabkost,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake



On 11/04/2015 04:56 PM, Igor Mammedov wrote:
> On Tue, 3 Nov 2015 22:22:40 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>>
>>
>> On 11/03/2015 09:13 PM, Igor Mammedov wrote:
>>> On Mon,  2 Nov 2015 17:13:29 +0800
>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>
>>>> NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices
>>>>
>>>> There is a root device under \_SB and specified NVDIMM devices are under the
>>>> root device. Each NVDIMM device has _ADR which returns its handle used to
>>>> associate MEMDEV structure in NFIT
>>>>
>>>> We reserve handle 0 for root device. In this patch, we save handle, handle,
>>>> arg1 and arg2 to dsm memory. Arg3 is conditionally saved in later patch
>>>>
>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>>> ---
>>>>    hw/acpi/nvdimm.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 184 insertions(+)
>>>>
>>>> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
>>>> index dd84e5f..53ed675 100644
>>>> --- a/hw/acpi/nvdimm.c
>>>> +++ b/hw/acpi/nvdimm.c
>>>> @@ -368,6 +368,15 @@ static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
>>>>        g_array_free(structures, true);
>>>>    }
>>>>
>>>> +struct NvdimmDsmIn {
>>>> +    uint32_t handle;
>>>> +    uint32_t revision;
>>>> +    uint32_t function;
>>>> +   /* the remaining size in the page is used by arg3. */
>>>> +    uint8_t arg3[0];
>>>> +} QEMU_PACKED;
>>>> +typedef struct NvdimmDsmIn NvdimmDsmIn;
>>>> +
>>>>    static uint64_t
>>>>    nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
>>>>    {
>>>> @@ -377,6 +386,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
>>>>    static void
>>>>    nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
>>>>    {
>>>> +    fprintf(stderr, "BUG: we never write DSM notification IO Port.\n");
>>> it doesn't seem like this hunk belongs here
>>
>> Er, we have changed the logic:
>> - others:
>>     1) the buffer length is directly got from IO read rather than got
>>        from dsm memory
>> [ This has documented in v5's changelog. ]
>>
>> So, the IO write is replaced by IO read, nvdimm_dsm_write() should not be
>> triggered.
>>
>>>
>>>>    }
>>>>
>>>>    static const MemoryRegionOps nvdimm_dsm_ops = {
>>>> @@ -402,6 +412,179 @@ void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
>>>>        memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
>>>>    }
>>>>
>>>> +#define BUILD_STA_METHOD(_dev_, _method_)                                  \
>>>> +    do {                                                                   \
>>>> +        _method_ = aml_method("_STA", 0);                                  \
>>>> +        aml_append(_method_, aml_return(aml_int(0x0f)));                   \
>>>> +        aml_append(_dev_, _method_);                                       \
>>>> +    } while (0)
>>> _STA doesn't have any logic here so drop macro and just
>>> replace its call sites with:
>>
>> Okay, I was just wanting to save some code lines. I will drop this macro.
>>
>>>
>>> aml_append(foo_dev, aml_name_decl("_STA", aml_int(0xf));
>>
>> _STA is required as a method with zero argument but this statement just
>> define a object. It is okay?
> Spec doesn't say that it must be method, it says that it will evaluate _STA object
> and result must be a combination of defined flags.
> AML wise calling a method with 0 arguments and referencing named variable
> is the same thing, both end up being just a namestring.

I just tested it, it works.

>
> Also note that _STA here return 0xF, and spec says that if _STA is missing
> OSPM shall assume its implicit value being 0xF, so you can just drop _STA
> object here altogether.

Actually, it will be needed for NVDIMM hotplug, but it is okay to me
to drop it at present. Let's introduce it when we implement hotplug.

>
>>
>>>
>>>
>>>> +
>>>> +#define BUILD_DSM_METHOD(_dev_, _method_, _handle_, _uuid_)                \
>>>> +    do {                                                                   \
>>>> +        Aml *ifctx, *uuid;                                                 \
>>>> +        _method_ = aml_method("_DSM", 4);                                  \
>>>> +        /* check UUID if it is we expect, return the errorcode if not.*/   \
>>>> +        uuid = aml_touuid(_uuid_);                                         \
>>>> +        ifctx = aml_if(aml_lnot(aml_equal(aml_arg(0), uuid)));             \
>>>> +        aml_append(ifctx, aml_return(aml_int(1 /* Not Supported */)));     \
>>>> +        aml_append(method, ifctx);                                         \
>>>> +        aml_append(method, aml_return(aml_call4("NCAL", aml_int(_handle_), \
>>>> +                   aml_arg(1), aml_arg(2), aml_arg(3))));                  \
>>>> +        aml_append(_dev_, _method_);                                       \
>>>> +    } while (0)
>>>> +
>>>> +#define BUILD_FIELD_UNIT_SIZE(_field_, _byte_, _name_)                     \
>>>> +    aml_append(_field_, aml_named_field(_name_, (_byte_) * BITS_PER_BYTE))
>>>> +
>>>> +#define BUILD_FIELD_UNIT_STRUCT(_field_, _s_, _f_, _name_)                 \
>>>> +    BUILD_FIELD_UNIT_SIZE(_field_, sizeof(typeof_field(_s_, _f_)), _name_)
>>>> +
>>>> +static void build_nvdimm_devices(GSList *device_list, Aml *root_dev)
>>>> +{
>>>> +    for (; device_list; device_list = device_list->next) {
>>>> +        NVDIMMDevice *nvdimm = device_list->data;
>>>> +        int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
>>>> +                                           NULL);
>>>> +        uint32_t handle = nvdimm_slot_to_handle(slot);
>>>> +        Aml *dev, *method;
>>>> +
>>>> +        dev = aml_device("NV%02X", slot);
>>>> +        aml_append(dev, aml_name_decl("_ADR", aml_int(handle)));
>>>> +
>>>> +        BUILD_STA_METHOD(dev, method);
>>>> +
>>>> +        /*
>>>> +         * Chapter 4: _DSM Interface for NVDIMM Device (non-root) - Example
>>>> +         * in DSM Spec Rev1.
>>>> +         */
>>>> +        BUILD_DSM_METHOD(dev, method,
>>>> +                         handle /* NVDIMM Device Handle */,
>>>> +                         "4309AC30-0D11-11E4-9191-0800200C9A66"
>>>> +                         /* UUID for NVDIMM Devices. */);
>>> this will add N-bytes * #NVDIMMS in worst case.
>>> Please drop macro and just consolidate this method into _DSM method of parent scope
>>> and then call it from here like this:
>>>      Method(_DSM, 4)
>>>          Return(^_DSM(Arg[0-3]))
>>
>> Parent _DSM can not be directly called as _DSM in parent requires different UUID.
>> UUID is not saved in dsm memory so that UUID verification should be done in AML.
>>
>> This macro, BUILD_DSM_METHOD(), build its _DSM call and check if UUID is valid, if
>> not, it returns error code 1 (Not Supoorted), otherwise it call the common method
>> NCAL which saves input parameters into dsm memory and trigger IO exit. It seems no
>> byte is wasted. No?
> add an extra arg to NCAL lets say IS_ROOT
> NCAL will check for root UUID if IS_ROOT true and
> check for nvdimm device UUID if it's false.
> That roughly will save ~150bytes per nvdimm or more than 2Kb in case of 256 NVDIMMs.

Okay, good to me.

>
>>
>>>
>>>> +
>>>> +        aml_append(root_dev, dev);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void nvdimm_build_acpi_devices(GSList *device_list, Aml *sb_scope)
>>>> +{
>>>> +    Aml *dev, *method, *field;
>>>> +    uint64_t page_size = TARGET_PAGE_SIZE;
>>>> +
>>>> +    dev = aml_device("NVDR");
>>>> +    aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
>>>> +
>>>> +    /* map DSM memory and IO into ACPI namespace. */
>>>> +    aml_append(dev, aml_operation_region("NPIO", AML_SYSTEM_IO,
>>>> +               NVDIMM_ACPI_IO_BASE, NVDIMM_ACPI_IO_LEN));
>>>> +    aml_append(dev, aml_operation_region("NRAM", AML_SYSTEM_MEMORY,
>>>> +               NVDIMM_ACPI_MEM_BASE, page_size));
>>>> +
>>>> +    /*
>>>> +     * DSM notifier:
>>>> +     * @NOTI: Read it will notify QEMU that _DSM method is being
>>>> +     *        called and the parameters can be found in NvdimmDsmIn.
>>>> +     *        The value read from it is the buffer size of DSM output
>>>> +     *        filled by QEMU.
>>>> +     */
>>>> +    field = aml_field("NPIO", AML_DWORD_ACC, AML_PRESERVE);
>>>> +    BUILD_FIELD_UNIT_SIZE(field, sizeof(uint32_t), "NOTI");
>>>> +    aml_append(dev, field);
>>>> +
>>>> +    /*
>>>> +     * DSM input:
>>>> +     * @HDLE: store device's handle, it's zero if the _DSM call happens
>>>> +     *        on NVDIMM Root Device.
>>>> +     * @REVS: store the Arg1 of _DSM call.
>>>> +     * @FUNC: store the Arg2 of _DSM call.
>>>> +     * @ARG3: store the Arg3 of _DSM call.
>>>> +     *
>>>> +     * They are RAM mapping on host so that these accesses never cause
>>>> +     * VM-EXIT.
>>>> +     */
>>>> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
>>>> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, handle, "HDLE");
>>>> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, revision, "REVS");
>>>> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, function, "FUNC");
>>>> +    BUILD_FIELD_UNIT_SIZE(field, page_size - offsetof(NvdimmDsmIn, arg3),
>>>> +                          "ARG3");
>>> These macros don't make code any better and one has to jump to their
>>> definition every time one sees it to figure out what it's doing.
>>> Please don't hide code behind macros and just replace them with aml_foo()
>>> here and at other places in this patch.
>>>
>>
>> Okay, will follow your way. :)
>>
>>>> +    aml_append(dev, field);
>>>> +
>>>> +    /*
>>>> +     * DSM output:
>>>> +     * @ODAT: the buffer QEMU uses to store the result, the actual size
>>>> +     *        filled by QEMU is the value read from NOT1.
>>>> +     *
>>>> +     * Since the page is reused by both input and out, the input data
>>>> +     * will be lost after storing new result into @ODAT.
>>>> +    */
>>>> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
>>>> +    BUILD_FIELD_UNIT_SIZE(field, page_size, "ODAT");
>>>> +    aml_append(dev, field);
>>>> +
>>>> +    method = aml_method_serialized("NCAL", 4);
> Why method is called with 4 arguments but only arg[0-2] are used?

The arg3 is used in the later patch:
[PATCH v7 28/35] nvdimm acpi: save arg3 for NVDIMM device _DSM method

>
>>>> +    {
>>>> +        Aml *buffer_size = aml_local(0);
>>>> +
>>>> +        aml_append(method, aml_store(aml_arg(0), aml_name("HDLE")));
>>>> +        aml_append(method, aml_store(aml_arg(1), aml_name("REVS")));
>>>> +        aml_append(method, aml_store(aml_arg(2), aml_name("FUNC")));
>>>> +
>>>> +        /*
>>>> +         * transfer control to QEMU and the buffer size filled by
>>>> +         * QEMU is returned.
>>>> +         */
>>>> +        aml_append(method, aml_store(aml_name("NOTI"), buffer_size));
>>>> +
>>>> +        aml_append(method, aml_store(aml_shiftleft(buffer_size,
>>>> +                                       aml_int(3)), buffer_size));
>>>> +
>>>> +        aml_append(method, aml_create_field(aml_name("ODAT"), aml_int(0),
>>>> +                                            buffer_size , "OBUF"));
>>>> +        aml_append(method, aml_concatenate(aml_buffer(0, NULL),
>>>> +                                           aml_name("OBUF"), aml_local(1)));
>>>> +        aml_append(method, aml_return(aml_local(1)));
>>>> +    }
>>>> +    aml_append(dev, method);
>>>> +
>>>> +    BUILD_STA_METHOD(dev, method);
>>>> +
>>>> +    /*
>>>> +     * Chapter 3: _DSM Interface for NVDIMM Root Device - Example in DSM
>>>> +     * Spec Rev1.
>>>> +     */
>>>> +    BUILD_DSM_METHOD(dev, method,
>>>> +                     0 /* 0 is reserved for NVDIMM Root Device*/,
> Does 'handle' equal to slot number?

handle = slot + 1, to reserve 0 for NVDIMM root device:

/*
  * handle is used to uniquely associate nfit_memdev structure with NVDIMM
  * ACPI device - nfit_memdev.nfit_handle matches with the value returned
  * by ACPI device _ADR method.
  *
  * We generate the handle with the slot id of NVDIMM device and reserve
  * 0 for NVDIMM root device.
  */
static uint32_t nvdimm_slot_to_handle(int slot)
{
     return slot + 1;
}

>
>
>>>> +                     "2F10E7A4-9E91-11E4-89D3-123B93F75CBA"
>>>> +                     /* UUID for NVDIMM Root Devices. */);
>>>> +
>>>> +    build_nvdimm_devices(device_list, dev);
>>>> +
>>>> +    aml_append(sb_scope, dev);
>>>> +}
>>>> +
>>>> +static void nvdimm_build_ssdt(GSList *device_list, GArray *table_offsets,
>>>> +                              GArray *table_data, GArray *linker)
>>>> +{
>>>> +    Aml *ssdt, *sb_scope;
>>>> +
>>>> +    acpi_add_table(table_offsets, table_data);
>>>> +
>>>> +    ssdt = init_aml_allocator();
>>>> +    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
>>>> +
>>>> +    sb_scope = aml_scope("\\_SB");
>>>> +    nvdimm_build_acpi_devices(device_list, sb_scope);
>>> is there need for dedicated nvdimm_build_acpi_devices()?
>>> Is it reused somewhere else?
>>> If it's not then just inline it here.
>>
>> Since building NVDIMM devices is a complex work so i designed to
>> let nvdimm_build_acpi_devices() build NVDIMM root device then
>> it calls build_nvdimm_devices() to build children devices. You
>> can see nvdimm_build_acpi_devices is a big function.
>>
>> That proposal just wants to make the code clear. If you really
>> hate this, i will drop nvdimm_build_acpi_devices, no problem. :)
> seeing how much this function will add to nvdimm_build_acpi_devices,
> I don't see point in having a separate nvdimm_build_acpi_devices().
>

Well, let's happily drop it.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 27/35] nvdimm acpi: build ACPI nvdimm devices
@ 2015-11-04 14:11           ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-04 14:11 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth



On 11/04/2015 04:56 PM, Igor Mammedov wrote:
> On Tue, 3 Nov 2015 22:22:40 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>>
>>
>> On 11/03/2015 09:13 PM, Igor Mammedov wrote:
>>> On Mon,  2 Nov 2015 17:13:29 +0800
>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>
>>>> NVDIMM devices is defined in ACPI 6.0 9.20 NVDIMM Devices
>>>>
>>>> There is a root device under \_SB and specified NVDIMM devices are under the
>>>> root device. Each NVDIMM device has _ADR which returns its handle used to
>>>> associate MEMDEV structure in NFIT
>>>>
>>>> We reserve handle 0 for root device. In this patch, we save handle, handle,
>>>> arg1 and arg2 to dsm memory. Arg3 is conditionally saved in later patch
>>>>
>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>>> ---
>>>>    hw/acpi/nvdimm.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    1 file changed, 184 insertions(+)
>>>>
>>>> diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
>>>> index dd84e5f..53ed675 100644
>>>> --- a/hw/acpi/nvdimm.c
>>>> +++ b/hw/acpi/nvdimm.c
>>>> @@ -368,6 +368,15 @@ static void nvdimm_build_nfit(GSList *device_list, GArray *table_offsets,
>>>>        g_array_free(structures, true);
>>>>    }
>>>>
>>>> +struct NvdimmDsmIn {
>>>> +    uint32_t handle;
>>>> +    uint32_t revision;
>>>> +    uint32_t function;
>>>> +   /* the remaining size in the page is used by arg3. */
>>>> +    uint8_t arg3[0];
>>>> +} QEMU_PACKED;
>>>> +typedef struct NvdimmDsmIn NvdimmDsmIn;
>>>> +
>>>>    static uint64_t
>>>>    nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
>>>>    {
>>>> @@ -377,6 +386,7 @@ nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
>>>>    static void
>>>>    nvdimm_dsm_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
>>>>    {
>>>> +    fprintf(stderr, "BUG: we never write DSM notification IO Port.\n");
>>> it doesn't seem like this hunk belongs here
>>
>> Er, we have changed the logic:
>> - others:
>>     1) the buffer length is directly got from IO read rather than got
>>        from dsm memory
>> [ This has documented in v5's changelog. ]
>>
>> So, the IO write is replaced by IO read, nvdimm_dsm_write() should not be
>> triggered.
>>
>>>
>>>>    }
>>>>
>>>>    static const MemoryRegionOps nvdimm_dsm_ops = {
>>>> @@ -402,6 +412,179 @@ void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
>>>>        memory_region_add_subregion(io, NVDIMM_ACPI_IO_BASE, &state->io_mr);
>>>>    }
>>>>
>>>> +#define BUILD_STA_METHOD(_dev_, _method_)                                  \
>>>> +    do {                                                                   \
>>>> +        _method_ = aml_method("_STA", 0);                                  \
>>>> +        aml_append(_method_, aml_return(aml_int(0x0f)));                   \
>>>> +        aml_append(_dev_, _method_);                                       \
>>>> +    } while (0)
>>> _STA doesn't have any logic here so drop macro and just
>>> replace its call sites with:
>>
>> Okay, I was just wanting to save some code lines. I will drop this macro.
>>
>>>
>>> aml_append(foo_dev, aml_name_decl("_STA", aml_int(0xf));
>>
>> _STA is required as a method with zero argument but this statement just
>> define a object. It is okay?
> Spec doesn't say that it must be method, it says that it will evaluate _STA object
> and result must be a combination of defined flags.
> AML wise calling a method with 0 arguments and referencing named variable
> is the same thing, both end up being just a namestring.

I just tested it, it works.

>
> Also note that _STA here return 0xF, and spec says that if _STA is missing
> OSPM shall assume its implicit value being 0xF, so you can just drop _STA
> object here altogether.

Actually, it will be needed for NVDIMM hotplug, but it is okay to me
to drop it at present. Let's introduce it when we implement hotplug.

>
>>
>>>
>>>
>>>> +
>>>> +#define BUILD_DSM_METHOD(_dev_, _method_, _handle_, _uuid_)                \
>>>> +    do {                                                                   \
>>>> +        Aml *ifctx, *uuid;                                                 \
>>>> +        _method_ = aml_method("_DSM", 4);                                  \
>>>> +        /* check UUID if it is we expect, return the errorcode if not.*/   \
>>>> +        uuid = aml_touuid(_uuid_);                                         \
>>>> +        ifctx = aml_if(aml_lnot(aml_equal(aml_arg(0), uuid)));             \
>>>> +        aml_append(ifctx, aml_return(aml_int(1 /* Not Supported */)));     \
>>>> +        aml_append(method, ifctx);                                         \
>>>> +        aml_append(method, aml_return(aml_call4("NCAL", aml_int(_handle_), \
>>>> +                   aml_arg(1), aml_arg(2), aml_arg(3))));                  \
>>>> +        aml_append(_dev_, _method_);                                       \
>>>> +    } while (0)
>>>> +
>>>> +#define BUILD_FIELD_UNIT_SIZE(_field_, _byte_, _name_)                     \
>>>> +    aml_append(_field_, aml_named_field(_name_, (_byte_) * BITS_PER_BYTE))
>>>> +
>>>> +#define BUILD_FIELD_UNIT_STRUCT(_field_, _s_, _f_, _name_)                 \
>>>> +    BUILD_FIELD_UNIT_SIZE(_field_, sizeof(typeof_field(_s_, _f_)), _name_)
>>>> +
>>>> +static void build_nvdimm_devices(GSList *device_list, Aml *root_dev)
>>>> +{
>>>> +    for (; device_list; device_list = device_list->next) {
>>>> +        NVDIMMDevice *nvdimm = device_list->data;
>>>> +        int slot = object_property_get_int(OBJECT(nvdimm), DIMM_SLOT_PROP,
>>>> +                                           NULL);
>>>> +        uint32_t handle = nvdimm_slot_to_handle(slot);
>>>> +        Aml *dev, *method;
>>>> +
>>>> +        dev = aml_device("NV%02X", slot);
>>>> +        aml_append(dev, aml_name_decl("_ADR", aml_int(handle)));
>>>> +
>>>> +        BUILD_STA_METHOD(dev, method);
>>>> +
>>>> +        /*
>>>> +         * Chapter 4: _DSM Interface for NVDIMM Device (non-root) - Example
>>>> +         * in DSM Spec Rev1.
>>>> +         */
>>>> +        BUILD_DSM_METHOD(dev, method,
>>>> +                         handle /* NVDIMM Device Handle */,
>>>> +                         "4309AC30-0D11-11E4-9191-0800200C9A66"
>>>> +                         /* UUID for NVDIMM Devices. */);
>>> this will add N-bytes * #NVDIMMS in worst case.
>>> Please drop macro and just consolidate this method into _DSM method of parent scope
>>> and then call it from here like this:
>>>      Method(_DSM, 4)
>>>          Return(^_DSM(Arg[0-3]))
>>
>> Parent _DSM can not be directly called as _DSM in parent requires different UUID.
>> UUID is not saved in dsm memory so that UUID verification should be done in AML.
>>
>> This macro, BUILD_DSM_METHOD(), build its _DSM call and check if UUID is valid, if
>> not, it returns error code 1 (Not Supoorted), otherwise it call the common method
>> NCAL which saves input parameters into dsm memory and trigger IO exit. It seems no
>> byte is wasted. No?
> add an extra arg to NCAL lets say IS_ROOT
> NCAL will check for root UUID if IS_ROOT true and
> check for nvdimm device UUID if it's false.
> That roughly will save ~150bytes per nvdimm or more than 2Kb in case of 256 NVDIMMs.

Okay, good to me.

>
>>
>>>
>>>> +
>>>> +        aml_append(root_dev, dev);
>>>> +    }
>>>> +}
>>>> +
>>>> +static void nvdimm_build_acpi_devices(GSList *device_list, Aml *sb_scope)
>>>> +{
>>>> +    Aml *dev, *method, *field;
>>>> +    uint64_t page_size = TARGET_PAGE_SIZE;
>>>> +
>>>> +    dev = aml_device("NVDR");
>>>> +    aml_append(dev, aml_name_decl("_HID", aml_string("ACPI0012")));
>>>> +
>>>> +    /* map DSM memory and IO into ACPI namespace. */
>>>> +    aml_append(dev, aml_operation_region("NPIO", AML_SYSTEM_IO,
>>>> +               NVDIMM_ACPI_IO_BASE, NVDIMM_ACPI_IO_LEN));
>>>> +    aml_append(dev, aml_operation_region("NRAM", AML_SYSTEM_MEMORY,
>>>> +               NVDIMM_ACPI_MEM_BASE, page_size));
>>>> +
>>>> +    /*
>>>> +     * DSM notifier:
>>>> +     * @NOTI: Read it will notify QEMU that _DSM method is being
>>>> +     *        called and the parameters can be found in NvdimmDsmIn.
>>>> +     *        The value read from it is the buffer size of DSM output
>>>> +     *        filled by QEMU.
>>>> +     */
>>>> +    field = aml_field("NPIO", AML_DWORD_ACC, AML_PRESERVE);
>>>> +    BUILD_FIELD_UNIT_SIZE(field, sizeof(uint32_t), "NOTI");
>>>> +    aml_append(dev, field);
>>>> +
>>>> +    /*
>>>> +     * DSM input:
>>>> +     * @HDLE: store device's handle, it's zero if the _DSM call happens
>>>> +     *        on NVDIMM Root Device.
>>>> +     * @REVS: store the Arg1 of _DSM call.
>>>> +     * @FUNC: store the Arg2 of _DSM call.
>>>> +     * @ARG3: store the Arg3 of _DSM call.
>>>> +     *
>>>> +     * They are RAM mapping on host so that these accesses never cause
>>>> +     * VM-EXIT.
>>>> +     */
>>>> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
>>>> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, handle, "HDLE");
>>>> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, revision, "REVS");
>>>> +    BUILD_FIELD_UNIT_STRUCT(field, NvdimmDsmIn, function, "FUNC");
>>>> +    BUILD_FIELD_UNIT_SIZE(field, page_size - offsetof(NvdimmDsmIn, arg3),
>>>> +                          "ARG3");
>>> These macros don't make code any better and one has to jump to their
>>> definition every time one sees it to figure out what it's doing.
>>> Please don't hide code behind macros and just replace them with aml_foo()
>>> here and at other places in this patch.
>>>
>>
>> Okay, will follow your way. :)
>>
>>>> +    aml_append(dev, field);
>>>> +
>>>> +    /*
>>>> +     * DSM output:
>>>> +     * @ODAT: the buffer QEMU uses to store the result, the actual size
>>>> +     *        filled by QEMU is the value read from NOT1.
>>>> +     *
>>>> +     * Since the page is reused by both input and out, the input data
>>>> +     * will be lost after storing new result into @ODAT.
>>>> +    */
>>>> +    field = aml_field("NRAM", AML_DWORD_ACC, AML_PRESERVE);
>>>> +    BUILD_FIELD_UNIT_SIZE(field, page_size, "ODAT");
>>>> +    aml_append(dev, field);
>>>> +
>>>> +    method = aml_method_serialized("NCAL", 4);
> Why method is called with 4 arguments but only arg[0-2] are used?

The arg3 is used in the later patch:
[PATCH v7 28/35] nvdimm acpi: save arg3 for NVDIMM device _DSM method

>
>>>> +    {
>>>> +        Aml *buffer_size = aml_local(0);
>>>> +
>>>> +        aml_append(method, aml_store(aml_arg(0), aml_name("HDLE")));
>>>> +        aml_append(method, aml_store(aml_arg(1), aml_name("REVS")));
>>>> +        aml_append(method, aml_store(aml_arg(2), aml_name("FUNC")));
>>>> +
>>>> +        /*
>>>> +         * transfer control to QEMU and the buffer size filled by
>>>> +         * QEMU is returned.
>>>> +         */
>>>> +        aml_append(method, aml_store(aml_name("NOTI"), buffer_size));
>>>> +
>>>> +        aml_append(method, aml_store(aml_shiftleft(buffer_size,
>>>> +                                       aml_int(3)), buffer_size));
>>>> +
>>>> +        aml_append(method, aml_create_field(aml_name("ODAT"), aml_int(0),
>>>> +                                            buffer_size , "OBUF"));
>>>> +        aml_append(method, aml_concatenate(aml_buffer(0, NULL),
>>>> +                                           aml_name("OBUF"), aml_local(1)));
>>>> +        aml_append(method, aml_return(aml_local(1)));
>>>> +    }
>>>> +    aml_append(dev, method);
>>>> +
>>>> +    BUILD_STA_METHOD(dev, method);
>>>> +
>>>> +    /*
>>>> +     * Chapter 3: _DSM Interface for NVDIMM Root Device - Example in DSM
>>>> +     * Spec Rev1.
>>>> +     */
>>>> +    BUILD_DSM_METHOD(dev, method,
>>>> +                     0 /* 0 is reserved for NVDIMM Root Device*/,
> Does 'handle' equal to slot number?

handle = slot + 1, to reserve 0 for NVDIMM root device:

/*
  * handle is used to uniquely associate nfit_memdev structure with NVDIMM
  * ACPI device - nfit_memdev.nfit_handle matches with the value returned
  * by ACPI device _ADR method.
  *
  * We generate the handle with the slot id of NVDIMM device and reserve
  * 0 for NVDIMM root device.
  */
static uint32_t nvdimm_slot_to_handle(int slot)
{
     return slot + 1;
}

>
>
>>>> +                     "2F10E7A4-9E91-11E4-89D3-123B93F75CBA"
>>>> +                     /* UUID for NVDIMM Root Devices. */);
>>>> +
>>>> +    build_nvdimm_devices(device_list, dev);
>>>> +
>>>> +    aml_append(sb_scope, dev);
>>>> +}
>>>> +
>>>> +static void nvdimm_build_ssdt(GSList *device_list, GArray *table_offsets,
>>>> +                              GArray *table_data, GArray *linker)
>>>> +{
>>>> +    Aml *ssdt, *sb_scope;
>>>> +
>>>> +    acpi_add_table(table_offsets, table_data);
>>>> +
>>>> +    ssdt = init_aml_allocator();
>>>> +    acpi_data_push(ssdt->buf, sizeof(AcpiTableHeader));
>>>> +
>>>> +    sb_scope = aml_scope("\\_SB");
>>>> +    nvdimm_build_acpi_devices(device_list, sb_scope);
>>> is there need for dedicated nvdimm_build_acpi_devices()?
>>> Is it reused somewhere else?
>>> If it's not then just inline it here.
>>
>> Since building NVDIMM devices is a complex work so i designed to
>> let nvdimm_build_acpi_devices() build NVDIMM root device then
>> it calls build_nvdimm_devices() to build children devices. You
>> can see nvdimm_build_acpi_devices is a big function.
>>
>> That proposal just wants to make the code clear. If you really
>> hate this, i will drop nvdimm_build_acpi_devices, no problem. :)
> seeing how much this function will add to nvdimm_build_acpi_devices,
> I don't see point in having a separate nvdimm_build_acpi_devices().
>

Well, let's happily drop it.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
  2015-11-04 12:40         ` [Qemu-devel] " Eduardo Habkost
@ 2015-11-04 14:22           ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-04 14:22 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake



On 11/04/2015 08:40 PM, Eduardo Habkost wrote:
> On Wed, Nov 04, 2015 at 11:12:41AM +0800, Xiao Guangrong wrote:
>> On 11/04/2015 07:00 AM, Eduardo Habkost wrote:
>>> On Mon, Nov 02, 2015 at 05:13:10PM +0800, Xiao Guangrong wrote:
>>>> Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
>>>> of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
>>>> locates at DAX enabled filesystem
>>>>
>>>> So this patch let it work on any kind of path
>>>>
>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>>> ---
>>>>   exec.c | 24 ++++++++++++------------
>>>>   1 file changed, 12 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/exec.c b/exec.c
>>>> index 9de38be..9075f4d 100644
>>>> --- a/exec.c
>>>> +++ b/exec.c
>>>> @@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
>>>>       char *c;
>>>>       void *area;
>>>>       int fd;
>>>> -    uint64_t hpagesize;
>>>> +    uint64_t pagesize;
>>>>       Error *local_err = NULL;
>>>>
>>>> -    hpagesize = qemu_file_get_page_size(path, &local_err);
>>>> +    pagesize = qemu_file_get_page_size(path, &local_err);
>>>>       if (local_err) {
>>>>           error_propagate(errp, local_err);
>>>>           goto error;
>>>>       }
>>>>
>>>> -    if (hpagesize == getpagesize()) {
>>>> -        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
>>>> +    if (pagesize == getpagesize()) {
>>>> +        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
>>>
>>> If the point of this patch is to allow file_ram_alloc() to not be
>>> specific to hugetlbfs anymore, this warning can simply go away.
>>>
>>> (And in case if you really want to keep the warning, I don't see the
>>> point of the changes you made to it.)
>>>
>>
>> This is the history why we did it like this:
>> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg02862.html
>
> The rule I am trying to follow is simple: if there are valid use cases
> (e.g. nvdimm, tmpfs) where the warning will be printed every single time
> QEMU runs, the warning is not appropriate.
>
> If you really want to keep a warning, please make it not be printed on
> all other valid use cases (nvdimm and tmpfs). Personally, I don't think
> adding those additional checks would be worth the trouble, that's why I
> suggest removing the warning.
>
>>
>> Q:
>> | What this *actually* is trying to warn against is that
>> | mapping a regular file (as opposed to hugetlbfs)
>> | means transparent huge pages don't work.
>
> I don't think the author of that warning even thought about transparent
> huge pages (did THP even existed when it was written?). I believe they
> just assumed that the only reason for using -mem-path would be hugetlbfs
> and wanted to warn about it. That assumption isn't true anymore.

Michael, your idea?

If Michael will not beat me, i will drop this. :)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path
@ 2015-11-04 14:22           ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-04 14:22 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: vsementsov, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, dan.j.williams, rth



On 11/04/2015 08:40 PM, Eduardo Habkost wrote:
> On Wed, Nov 04, 2015 at 11:12:41AM +0800, Xiao Guangrong wrote:
>> On 11/04/2015 07:00 AM, Eduardo Habkost wrote:
>>> On Mon, Nov 02, 2015 at 05:13:10PM +0800, Xiao Guangrong wrote:
>>>> Currently file_ram_alloc() is designed for hugetlbfs, however, the memory
>>>> of nvdimm can come from either raw pmem device eg, /dev/pmem, or the file
>>>> locates at DAX enabled filesystem
>>>>
>>>> So this patch let it work on any kind of path
>>>>
>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>>> ---
>>>>   exec.c | 24 ++++++++++++------------
>>>>   1 file changed, 12 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/exec.c b/exec.c
>>>> index 9de38be..9075f4d 100644
>>>> --- a/exec.c
>>>> +++ b/exec.c
>>>> @@ -1184,25 +1184,25 @@ static void *file_ram_alloc(RAMBlock *block,
>>>>       char *c;
>>>>       void *area;
>>>>       int fd;
>>>> -    uint64_t hpagesize;
>>>> +    uint64_t pagesize;
>>>>       Error *local_err = NULL;
>>>>
>>>> -    hpagesize = qemu_file_get_page_size(path, &local_err);
>>>> +    pagesize = qemu_file_get_page_size(path, &local_err);
>>>>       if (local_err) {
>>>>           error_propagate(errp, local_err);
>>>>           goto error;
>>>>       }
>>>>
>>>> -    if (hpagesize == getpagesize()) {
>>>> -        fprintf(stderr, "Warning: path not on HugeTLBFS: %s\n", path);
>>>> +    if (pagesize == getpagesize()) {
>>>> +        fprintf(stderr, "Memory is not allocated from HugeTlbfs.\n");
>>>
>>> If the point of this patch is to allow file_ram_alloc() to not be
>>> specific to hugetlbfs anymore, this warning can simply go away.
>>>
>>> (And in case if you really want to keep the warning, I don't see the
>>> point of the changes you made to it.)
>>>
>>
>> This is the history why we did it like this:
>> https://lists.gnu.org/archive/html/qemu-devel/2015-10/msg02862.html
>
> The rule I am trying to follow is simple: if there are valid use cases
> (e.g. nvdimm, tmpfs) where the warning will be printed every single time
> QEMU runs, the warning is not appropriate.
>
> If you really want to keep a warning, please make it not be printed on
> all other valid use cases (nvdimm and tmpfs). Personally, I don't think
> adding those additional checks would be worth the trouble, that's why I
> suggest removing the warning.
>
>>
>> Q:
>> | What this *actually* is trying to warn against is that
>> | mapping a regular file (as opposed to hugetlbfs)
>> | means transparent huge pages don't work.
>
> I don't think the author of that warning even thought about transparent
> huge pages (did THP even existed when it was written?). I believe they
> just assumed that the only reason for using -mem-path would be hugetlbfs
> and wanted to warn about it. That assumption isn't true anymore.

Michael, your idea?

If Michael will not beat me, i will drop this. :)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 11/35] util: introduce qemu_file_getlength()
  2015-11-04  3:17       ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-04 14:44         ` Eduardo Habkost
  -1 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-04 14:44 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake

On Wed, Nov 04, 2015 at 11:17:09AM +0800, Xiao Guangrong wrote:
> 
> 
> On 11/04/2015 07:21 AM, Eduardo Habkost wrote:
> >On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
> >[...]
> >>+size_t qemu_file_getlength(const char *file, Error **errp)
> >>+{
> >>+    int64_t size;
> >[...]
> >>+    return size;
> >
> >Can you guarantee that SIZE_MAX >= INT64_MAX on all platforms supported
> >by QEMU?
> >
> 
> Actually, this function is abstracted from the common function, raw_getlength(),
> in raw-posix.c whose return value is int64_t.
> 
> And i think int64_t is large enough for block devices.

int64_t should be enough, but I don't know if size_t is large enough on
all platforms.

I believe it's going to be either one of those cases:

* If you are absolutely sure SIZE_MAX >= INT64_MAX on all platforms,
  please explain why (and maybe add a QEMU_BUILD_BUG_ON?). (I don't
  think this will be the case)
* If SIZE_MAX < INT64_MAX is possible but you believe
  size <= SIZE_MAX is always true here, please explain why (and maybe
  add an assert()).
* Otherwise, we need to set an appropriate error if size > SIZE_MAX
  or change the type of qemu_file_getlength(). What about making it
  uint64_t?

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 11/35] util: introduce qemu_file_getlength()
@ 2015-11-04 14:44         ` Eduardo Habkost
  0 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-04 14:44 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, dan.j.williams, rth

On Wed, Nov 04, 2015 at 11:17:09AM +0800, Xiao Guangrong wrote:
> 
> 
> On 11/04/2015 07:21 AM, Eduardo Habkost wrote:
> >On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
> >[...]
> >>+size_t qemu_file_getlength(const char *file, Error **errp)
> >>+{
> >>+    int64_t size;
> >[...]
> >>+    return size;
> >
> >Can you guarantee that SIZE_MAX >= INT64_MAX on all platforms supported
> >by QEMU?
> >
> 
> Actually, this function is abstracted from the common function, raw_getlength(),
> in raw-posix.c whose return value is int64_t.
> 
> And i think int64_t is large enough for block devices.

int64_t should be enough, but I don't know if size_t is large enough on
all platforms.

I believe it's going to be either one of those cases:

* If you are absolutely sure SIZE_MAX >= INT64_MAX on all platforms,
  please explain why (and maybe add a QEMU_BUILD_BUG_ON?). (I don't
  think this will be the case)
* If SIZE_MAX < INT64_MAX is possible but you believe
  size <= SIZE_MAX is always true here, please explain why (and maybe
  add an assert()).
* Otherwise, we need to set an appropriate error if size > SIZE_MAX
  or change the type of qemu_file_getlength(). What about making it
  uint64_t?

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 11/35] util: introduce qemu_file_getlength()
  2015-11-04 14:44         ` [Qemu-devel] " Eduardo Habkost
@ 2015-11-04 14:44           ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-04 14:44 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake



On 11/04/2015 10:44 PM, Eduardo Habkost wrote:
> On Wed, Nov 04, 2015 at 11:17:09AM +0800, Xiao Guangrong wrote:
>>
>>
>> On 11/04/2015 07:21 AM, Eduardo Habkost wrote:
>>> On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
>>> [...]
>>>> +size_t qemu_file_getlength(const char *file, Error **errp)
>>>> +{
>>>> +    int64_t size;
>>> [...]
>>>> +    return size;
>>>
>>> Can you guarantee that SIZE_MAX >= INT64_MAX on all platforms supported
>>> by QEMU?
>>>
>>
>> Actually, this function is abstracted from the common function, raw_getlength(),
>> in raw-posix.c whose return value is int64_t.
>>
>> And i think int64_t is large enough for block devices.
>
> int64_t should be enough, but I don't know if size_t is large enough on
> all platforms.
>
> I believe it's going to be either one of those cases:
>
> * If you are absolutely sure SIZE_MAX >= INT64_MAX on all platforms,
>    please explain why (and maybe add a QEMU_BUILD_BUG_ON?). (I don't
>    think this will be the case)
> * If SIZE_MAX < INT64_MAX is possible but you believe
>    size <= SIZE_MAX is always true here, please explain why (and maybe
>    add an assert()).
> * Otherwise, we need to set an appropriate error if size > SIZE_MAX
>    or change the type of qemu_file_getlength(). What about making it
>    uint64_t?
>

It sounds better, I will change the return value from size_t to uint64_t.

Thank you for pointing it out, Eduardo!




^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 11/35] util: introduce qemu_file_getlength()
@ 2015-11-04 14:44           ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-04 14:44 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: vsementsov, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, dan.j.williams, rth



On 11/04/2015 10:44 PM, Eduardo Habkost wrote:
> On Wed, Nov 04, 2015 at 11:17:09AM +0800, Xiao Guangrong wrote:
>>
>>
>> On 11/04/2015 07:21 AM, Eduardo Habkost wrote:
>>> On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
>>> [...]
>>>> +size_t qemu_file_getlength(const char *file, Error **errp)
>>>> +{
>>>> +    int64_t size;
>>> [...]
>>>> +    return size;
>>>
>>> Can you guarantee that SIZE_MAX >= INT64_MAX on all platforms supported
>>> by QEMU?
>>>
>>
>> Actually, this function is abstracted from the common function, raw_getlength(),
>> in raw-posix.c whose return value is int64_t.
>>
>> And i think int64_t is large enough for block devices.
>
> int64_t should be enough, but I don't know if size_t is large enough on
> all platforms.
>
> I believe it's going to be either one of those cases:
>
> * If you are absolutely sure SIZE_MAX >= INT64_MAX on all platforms,
>    please explain why (and maybe add a QEMU_BUILD_BUG_ON?). (I don't
>    think this will be the case)
> * If SIZE_MAX < INT64_MAX is possible but you believe
>    size <= SIZE_MAX is always true here, please explain why (and maybe
>    add an assert()).
> * Otherwise, we need to set an appropriate error if size > SIZE_MAX
>    or change the type of qemu_file_getlength(). What about making it
>    uint64_t?
>

It sounds better, I will change the return value from size_t to uint64_t.

Thank you for pointing it out, Eduardo!

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
  2015-11-03 14:47               ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-05  8:53                 ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-05  8:53 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake

On 03.11.2015 17:47, Xiao Guangrong wrote:
>
>
> On 11/03/2015 12:16 AM, Vladimir Sementsov-Ogievskiy wrote:
>> On 02.11.2015 18:06, Xiao Guangrong wrote:
>>>
>>>
>>> On 11/02/2015 10:26 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>> On 02.11.2015 16:08, Xiao Guangrong wrote:
>>>>>
>>>>>
>>>>> On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> On 02.11.2015 12:13, Xiao Guangrong wrote:
>>>>>>> Curretly, the memory region of backed memory is directly mapped to
>>>>>>> guest's address space, however, it is not true for nvdimm device
>>>>>>>
>>>>>>> This patch let dimm device realize this fact and use
>>>>>>> DIMMDeviceClass->get_memory_region method to get the mapped memory
>>>>>>> region
>>>>>>>
>>>>>>> Current code did not check the return value of get_memory_region 
>>>>>>> as it
>>>>>>> assumed the backend memory of pc-dimm is always properly 
>>>>>>> initialized,
>>>>>>> we make get_memory_region internally catch the case if something is
>>>>>>> wrong
>>>>
>>>> but here you call not pc-dimm's get_memory_region, but common 
>>>> ddc->get_memory_region, which may be
>>>> nvdimm or possibly other future dimm, so, why not check it here? 
>>>> And than pc_dimm_get_memory_region
>>>> may be left untouched (error_abort is ok, because errp is unused).
>>>
>>> Hmm, because 'here' is not the only place calling 
>>> ->get_memory_region, this method has
>>> multiple callers:
>>>
>>> $ git grep "\->get_memory_region"
>>> hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
>>> hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
>>> hw/mem/dimm.c:    mr = ddc->get_memory_region(dimm);
>>> hw/mem/nvdimm.c:    ddc->get_memory_region = nvdimm_get_memory_region;
>>> hw/mem/pc-dimm.c:    ddc->get_memory_region = 
>>> pc_dimm_get_memory_region;
>>> hw/ppc/spapr.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
>>>
>>> memory region validation is also done for NVDIMM in nvdimm device.
>>>
>> Ok, then it should be documented by a comment in dimm.h, where 
>> DIMMDeviceClass is defined, that this
>> function should not fail
>>
>
> Okay, how about this comment:
>
>     /*
>      * get the memory region which will be mapped into guest's address
>      * space. It is called after dimm device realized so it is never
>      * failed.
>      */
>     MemoryRegion *(*get_memory_region)(DIMMDevice *dimm);

if you don't mind:
s/it is never failed/it should never fail and assumed to return valid 
not-NULL address

I'll ok with this if others don't mind, but personally I prefer explicit 
error handling for such functions.




-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
@ 2015-11-05  8:53                 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-05  8:53 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth

On 03.11.2015 17:47, Xiao Guangrong wrote:
>
>
> On 11/03/2015 12:16 AM, Vladimir Sementsov-Ogievskiy wrote:
>> On 02.11.2015 18:06, Xiao Guangrong wrote:
>>>
>>>
>>> On 11/02/2015 10:26 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>> On 02.11.2015 16:08, Xiao Guangrong wrote:
>>>>>
>>>>>
>>>>> On 11/02/2015 08:19 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> On 02.11.2015 12:13, Xiao Guangrong wrote:
>>>>>>> Curretly, the memory region of backed memory is directly mapped to
>>>>>>> guest's address space, however, it is not true for nvdimm device
>>>>>>>
>>>>>>> This patch let dimm device realize this fact and use
>>>>>>> DIMMDeviceClass->get_memory_region method to get the mapped memory
>>>>>>> region
>>>>>>>
>>>>>>> Current code did not check the return value of get_memory_region 
>>>>>>> as it
>>>>>>> assumed the backend memory of pc-dimm is always properly 
>>>>>>> initialized,
>>>>>>> we make get_memory_region internally catch the case if something is
>>>>>>> wrong
>>>>
>>>> but here you call not pc-dimm's get_memory_region, but common 
>>>> ddc->get_memory_region, which may be
>>>> nvdimm or possibly other future dimm, so, why not check it here? 
>>>> And than pc_dimm_get_memory_region
>>>> may be left untouched (error_abort is ok, because errp is unused).
>>>
>>> Hmm, because 'here' is not the only place calling 
>>> ->get_memory_region, this method has
>>> multiple callers:
>>>
>>> $ git grep "\->get_memory_region"
>>> hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
>>> hw/i386/pc.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
>>> hw/mem/dimm.c:    mr = ddc->get_memory_region(dimm);
>>> hw/mem/nvdimm.c:    ddc->get_memory_region = nvdimm_get_memory_region;
>>> hw/mem/pc-dimm.c:    ddc->get_memory_region = 
>>> pc_dimm_get_memory_region;
>>> hw/ppc/spapr.c:    MemoryRegion *mr = ddc->get_memory_region(dimm);
>>>
>>> memory region validation is also done for NVDIMM in nvdimm device.
>>>
>> Ok, then it should be documented by a comment in dimm.h, where 
>> DIMMDeviceClass is defined, that this
>> function should not fail
>>
>
> Okay, how about this comment:
>
>     /*
>      * get the memory region which will be mapped into guest's address
>      * space. It is called after dimm device realized so it is never
>      * failed.
>      */
>     MemoryRegion *(*get_memory_region)(DIMMDevice *dimm);

if you don't mind:
s/it is never failed/it should never fail and assumed to return valid 
not-NULL address

I'll ok with this if others don't mind, but personally I prefer explicit 
error handling for such functions.




-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-05  9:58     ` Igor Mammedov
  -1 siblings, 0 replies; 200+ messages in thread
From: Igor Mammedov @ 2015-11-05  9:58 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, gleb, mtosatti, stefanha, mst, rth, ehabkost,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake

On Mon,  2 Nov 2015 17:13:27 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
                               ^^ missing one 0???

> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> for detailed design
> 
> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
> that controls if nvdimm support is enabled, it is true on default and
> it is false on 2.4 and its earlier version to keep compatibility
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[...]

> @@ -33,6 +33,15 @@
>   */
>  #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
>  
> +/*
> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
                                 ^^^ missing 0 or value in define below has an extra 0

> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> + * for detailed design.
> + */
> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
it still maps RAM at arbitrary place,
that's the reason why VMGenID patches hasn't been merged for
more than several months.
I'm not against of using (more exactly I'm for it) direct mapping
but we should reach consensus when and how to use it first.

I'd wouldn't use addresses below 4G as it may be used firmware or
legacy hardware and I won't bet that 0xFF000000ULL won't conflict
with anything.
An alternative place to allocate reserve from could be high memory.
For pc we have "reserved-memory-end" which currently makes sure
that hotpluggable memory range isn't used by firmware.

How about making API that allows to map additional memory
ranges before reserved-memory-end and pushes it up as mappings are
added.

Michael, Paolo what do you think about it?


> +#define NVDIMM_ACPI_IO_BASE           0x0a18
> +#define NVDIMM_ACPI_IO_LEN            4
> +
>  #define TYPE_NVDIMM      "nvdimm"
>  #define NVDIMM(obj)      OBJECT_CHECK(NVDIMMDevice, (obj), TYPE_NVDIMM)
>  #define NVDIMM_CLASS(oc) OBJECT_CLASS_CHECK(NVDIMMClass, (oc), TYPE_NVDIMM)
> @@ -80,4 +89,29 @@ struct NVDIMMClass {
>  };
>  typedef struct NVDIMMClass NVDIMMClass;
>  
> +/*
> + * AcpiNVDIMMState:
> + * @is_enabled: detect if NVDIMM support is enabled.
> + *
> + * @fit: fit buffer which will be accessed via ACPI _FIT method. It is
> + *       dynamically built based on current NVDIMM devices so that it does
> + *       not require to keep consistent during live migration.
> + *
> + * @ram_mr: RAM-based memory region which is mapped into guest address
> + *          space and used to transfer data between OSPM and QEMU.
> + * @io_mr: the IO region used by OSPM to transfer control to QEMU.
> + */
> +struct AcpiNVDIMMState {
> +    bool is_enabled;
> +
> +    GArray *fit;
> +
> +    MemoryRegion ram_mr;
> +    MemoryRegion io_mr;
> +};
> +typedef struct AcpiNVDIMMState AcpiNVDIMMState;
> +
> +/* Initialize the memory and IO region needed by NVDIMM ACPI emulation.*/
> +void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
> +                            Object *owner, AcpiNVDIMMState *state);
>  #endif


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
@ 2015-11-05  9:58     ` Igor Mammedov
  0 siblings, 0 replies; 200+ messages in thread
From: Igor Mammedov @ 2015-11-05  9:58 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth

On Mon,  2 Nov 2015 17:13:27 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
                               ^^ missing one 0???

> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> for detailed design
> 
> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
> that controls if nvdimm support is enabled, it is true on default and
> it is false on 2.4 and its earlier version to keep compatibility
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[...]

> @@ -33,6 +33,15 @@
>   */
>  #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
>  
> +/*
> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
                                 ^^^ missing 0 or value in define below has an extra 0

> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> + * for detailed design.
> + */
> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
it still maps RAM at arbitrary place,
that's the reason why VMGenID patches hasn't been merged for
more than several months.
I'm not against of using (more exactly I'm for it) direct mapping
but we should reach consensus when and how to use it first.

I'd wouldn't use addresses below 4G as it may be used firmware or
legacy hardware and I won't bet that 0xFF000000ULL won't conflict
with anything.
An alternative place to allocate reserve from could be high memory.
For pc we have "reserved-memory-end" which currently makes sure
that hotpluggable memory range isn't used by firmware.

How about making API that allows to map additional memory
ranges before reserved-memory-end and pushes it up as mappings are
added.

Michael, Paolo what do you think about it?


> +#define NVDIMM_ACPI_IO_BASE           0x0a18
> +#define NVDIMM_ACPI_IO_LEN            4
> +
>  #define TYPE_NVDIMM      "nvdimm"
>  #define NVDIMM(obj)      OBJECT_CHECK(NVDIMMDevice, (obj), TYPE_NVDIMM)
>  #define NVDIMM_CLASS(oc) OBJECT_CLASS_CHECK(NVDIMMClass, (oc), TYPE_NVDIMM)
> @@ -80,4 +89,29 @@ struct NVDIMMClass {
>  };
>  typedef struct NVDIMMClass NVDIMMClass;
>  
> +/*
> + * AcpiNVDIMMState:
> + * @is_enabled: detect if NVDIMM support is enabled.
> + *
> + * @fit: fit buffer which will be accessed via ACPI _FIT method. It is
> + *       dynamically built based on current NVDIMM devices so that it does
> + *       not require to keep consistent during live migration.
> + *
> + * @ram_mr: RAM-based memory region which is mapped into guest address
> + *          space and used to transfer data between OSPM and QEMU.
> + * @io_mr: the IO region used by OSPM to transfer control to QEMU.
> + */
> +struct AcpiNVDIMMState {
> +    bool is_enabled;
> +
> +    GArray *fit;
> +
> +    MemoryRegion ram_mr;
> +    MemoryRegion io_mr;
> +};
> +typedef struct AcpiNVDIMMState AcpiNVDIMMState;
> +
> +/* Initialize the memory and IO region needed by NVDIMM ACPI emulation.*/
> +void nvdimm_init_acpi_state(MemoryRegion *memory, MemoryRegion *io,
> +                            Object *owner, AcpiNVDIMMState *state);
>  #endif

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
  2015-11-05  9:58     ` [Qemu-devel] " Igor Mammedov
@ 2015-11-05 10:15       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-05 10:15 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: pbonzini, gleb, mtosatti, stefanha, mst, rth, ehabkost,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake



On 11/05/2015 05:58 PM, Igor Mammedov wrote:
> On Mon,  2 Nov 2015 17:13:27 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>                                 ^^ missing one 0???
>
>> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>> for detailed design
>>
>> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
>> that controls if nvdimm support is enabled, it is true on default and
>> it is false on 2.4 and its earlier version to keep compatibility
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> [...]
>
>> @@ -33,6 +33,15 @@
>>    */
>>   #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
>>
>> +/*
>> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>                                   ^^^ missing 0 or value in define below has an extra 0
>
>> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>> + * for detailed design.
>> + */
>> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
> it still maps RAM at arbitrary place,
> that's the reason why VMGenID patches hasn't been merged for
> more than several months.
> I'm not against of using (more exactly I'm for it) direct mapping
> but we should reach consensus when and how to use it first.
>
> I'd wouldn't use addresses below 4G as it may be used firmware or
> legacy hardware and I won't bet that 0xFF000000ULL won't conflict
> with anything.
> An alternative place to allocate reserve from could be high memory.
> For pc we have "reserved-memory-end" which currently makes sure
> that hotpluggable memory range isn't used by firmware.
>
> How about making API that allows to map additional memory
> ranges before reserved-memory-end and pushes it up as mappings are
> added.

That what i did in the v1/v2 versions, however, as you noticed, using 64-bit
address in ACPI in QEMU is not a easy work - we can not simply make
SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
v3's changelog:

   3) we figure out a unused memory hole below 4G that is 0xFF00000 ~
      0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
      ACPI SSDT/DSDT table will break windows XP.
      BTW, only make SSDT.rev = 2 can not work since the width is only depended
      on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
      in ACPI spec:
| Note: For compatibility with ACPI versions before ACPI 2.0, the bit
| width of Integer objects is dependent on the ComplianceRevision of the DSDT.
| If the ComplianceRevision is less than 2, all integers are restricted to 32
| bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
| the global integer width for all integers, including integers in SSDTs.
   4) use the lowest ACPI spec version to document AML terms.

The only way introducing 64 bit address is adding XSDT support that what
Michael did before, however, it seems it need great efforts to do it as
it will break OVMF. It's a long term workload. :(

The luck thing is, the ACPI part is not ABI, we can move it to the high
memory if QEMU supports XSDT is ready in future development.

Thanks!

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
@ 2015-11-05 10:15       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-05 10:15 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth



On 11/05/2015 05:58 PM, Igor Mammedov wrote:
> On Mon,  2 Nov 2015 17:13:27 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>                                 ^^ missing one 0???
>
>> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>> for detailed design
>>
>> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
>> that controls if nvdimm support is enabled, it is true on default and
>> it is false on 2.4 and its earlier version to keep compatibility
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> [...]
>
>> @@ -33,6 +33,15 @@
>>    */
>>   #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
>>
>> +/*
>> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>                                   ^^^ missing 0 or value in define below has an extra 0
>
>> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>> + * for detailed design.
>> + */
>> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
> it still maps RAM at arbitrary place,
> that's the reason why VMGenID patches hasn't been merged for
> more than several months.
> I'm not against of using (more exactly I'm for it) direct mapping
> but we should reach consensus when and how to use it first.
>
> I'd wouldn't use addresses below 4G as it may be used firmware or
> legacy hardware and I won't bet that 0xFF000000ULL won't conflict
> with anything.
> An alternative place to allocate reserve from could be high memory.
> For pc we have "reserved-memory-end" which currently makes sure
> that hotpluggable memory range isn't used by firmware.
>
> How about making API that allows to map additional memory
> ranges before reserved-memory-end and pushes it up as mappings are
> added.

That what i did in the v1/v2 versions, however, as you noticed, using 64-bit
address in ACPI in QEMU is not a easy work - we can not simply make
SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
v3's changelog:

   3) we figure out a unused memory hole below 4G that is 0xFF00000 ~
      0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
      ACPI SSDT/DSDT table will break windows XP.
      BTW, only make SSDT.rev = 2 can not work since the width is only depended
      on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
      in ACPI spec:
| Note: For compatibility with ACPI versions before ACPI 2.0, the bit
| width of Integer objects is dependent on the ComplianceRevision of the DSDT.
| If the ComplianceRevision is less than 2, all integers are restricted to 32
| bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
| the global integer width for all integers, including integers in SSDTs.
   4) use the lowest ACPI spec version to document AML terms.

The only way introducing 64 bit address is adding XSDT support that what
Michael did before, however, it seems it need great efforts to do it as
it will break OVMF. It's a long term workload. :(

The luck thing is, the ACPI part is not ABI, we can move it to the high
memory if QEMU supports XSDT is ready in future development.

Thanks!

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
  2015-11-05 10:15       ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-05 13:03         ` Igor Mammedov
  -1 siblings, 0 replies; 200+ messages in thread
From: Igor Mammedov @ 2015-11-05 13:03 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, gleb, mtosatti, stefanha, mst, rth, ehabkost,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake

On Thu, 5 Nov 2015 18:15:31 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 11/05/2015 05:58 PM, Igor Mammedov wrote:
> > On Mon,  2 Nov 2015 17:13:27 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >
> >> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
> >                                 ^^ missing one 0???
> >
> >> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> >> for detailed design
> >>
> >> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
> >> that controls if nvdimm support is enabled, it is true on default and
> >> it is false on 2.4 and its earlier version to keep compatibility
> >>
> >> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> > [...]
> >
> >> @@ -33,6 +33,15 @@
> >>    */
> >>   #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
> >>
> >> +/*
> >> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
> >                                   ^^^ missing 0 or value in define below has an extra 0
> >
> >> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> >> + * for detailed design.
> >> + */
> >> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
> > it still maps RAM at arbitrary place,
> > that's the reason why VMGenID patches hasn't been merged for
> > more than several months.
> > I'm not against of using (more exactly I'm for it) direct mapping
> > but we should reach consensus when and how to use it first.
> >
> > I'd wouldn't use addresses below 4G as it may be used firmware or
> > legacy hardware and I won't bet that 0xFF000000ULL won't conflict
> > with anything.
> > An alternative place to allocate reserve from could be high memory.
> > For pc we have "reserved-memory-end" which currently makes sure
> > that hotpluggable memory range isn't used by firmware.
> >
> > How about making API that allows to map additional memory
> > ranges before reserved-memory-end and pushes it up as mappings are
> > added.
> 
> That what i did in the v1/v2 versions, however, as you noticed, using 64-bit
> address in ACPI in QEMU is not a easy work - we can not simply make
> SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
> v3's changelog:
> 
>    3) we figure out a unused memory hole below 4G that is 0xFF00000 ~
>       0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
>       ACPI SSDT/DSDT table will break windows XP.
>       BTW, only make SSDT.rev = 2 can not work since the width is only depended
>       on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
>       in ACPI spec:
> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit
> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
> | If the ComplianceRevision is less than 2, all integers are restricted to 32
> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
> | the global integer width for all integers, including integers in SSDTs.
>    4) use the lowest ACPI spec version to document AML terms.
> 
> The only way introducing 64 bit address is adding XSDT support that what
> Michael did before, however, it seems it need great efforts to do it as
> it will break OVMF. It's a long term workload. :(
to enable 64-bit integers in AML it's sufficient to change DSDT revision to 2,
I already have a patch that switches DSDT/SSDT to rev2.
Tests show it doesn't break WindowsXP (which is rev1) and uses 64-bit integers
on linux & later Windows versions.

> 
> The luck thing is, the ACPI part is not ABI, we can move it to the high
> memory if QEMU supports XSDT is ready in future development.
But mapped control region is ABI and we can't change it if we find out later
that it breaks something.

> 
> Thanks!


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
@ 2015-11-05 13:03         ` Igor Mammedov
  0 siblings, 0 replies; 200+ messages in thread
From: Igor Mammedov @ 2015-11-05 13:03 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth

On Thu, 5 Nov 2015 18:15:31 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 11/05/2015 05:58 PM, Igor Mammedov wrote:
> > On Mon,  2 Nov 2015 17:13:27 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >
> >> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
> >                                 ^^ missing one 0???
> >
> >> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> >> for detailed design
> >>
> >> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
> >> that controls if nvdimm support is enabled, it is true on default and
> >> it is false on 2.4 and its earlier version to keep compatibility
> >>
> >> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> > [...]
> >
> >> @@ -33,6 +33,15 @@
> >>    */
> >>   #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
> >>
> >> +/*
> >> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
> >                                   ^^^ missing 0 or value in define below has an extra 0
> >
> >> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> >> + * for detailed design.
> >> + */
> >> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
> > it still maps RAM at arbitrary place,
> > that's the reason why VMGenID patches hasn't been merged for
> > more than several months.
> > I'm not against of using (more exactly I'm for it) direct mapping
> > but we should reach consensus when and how to use it first.
> >
> > I'd wouldn't use addresses below 4G as it may be used firmware or
> > legacy hardware and I won't bet that 0xFF000000ULL won't conflict
> > with anything.
> > An alternative place to allocate reserve from could be high memory.
> > For pc we have "reserved-memory-end" which currently makes sure
> > that hotpluggable memory range isn't used by firmware.
> >
> > How about making API that allows to map additional memory
> > ranges before reserved-memory-end and pushes it up as mappings are
> > added.
> 
> That what i did in the v1/v2 versions, however, as you noticed, using 64-bit
> address in ACPI in QEMU is not a easy work - we can not simply make
> SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
> v3's changelog:
> 
>    3) we figure out a unused memory hole below 4G that is 0xFF00000 ~
>       0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
>       ACPI SSDT/DSDT table will break windows XP.
>       BTW, only make SSDT.rev = 2 can not work since the width is only depended
>       on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
>       in ACPI spec:
> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit
> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
> | If the ComplianceRevision is less than 2, all integers are restricted to 32
> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
> | the global integer width for all integers, including integers in SSDTs.
>    4) use the lowest ACPI spec version to document AML terms.
> 
> The only way introducing 64 bit address is adding XSDT support that what
> Michael did before, however, it seems it need great efforts to do it as
> it will break OVMF. It's a long term workload. :(
to enable 64-bit integers in AML it's sufficient to change DSDT revision to 2,
I already have a patch that switches DSDT/SSDT to rev2.
Tests show it doesn't break WindowsXP (which is rev1) and uses 64-bit integers
on linux & later Windows versions.

> 
> The luck thing is, the ACPI part is not ABI, we can move it to the high
> memory if QEMU supports XSDT is ready in future development.
But mapped control region is ABI and we can't change it if we find out later
that it breaks something.

> 
> Thanks!

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
  2015-11-05 13:03         ` [Qemu-devel] " Igor Mammedov
@ 2015-11-05 13:33           ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-05 13:33 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: pbonzini, gleb, mtosatti, stefanha, mst, rth, ehabkost,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake



On 11/05/2015 09:03 PM, Igor Mammedov wrote:
> On Thu, 5 Nov 2015 18:15:31 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>>
>>
>> On 11/05/2015 05:58 PM, Igor Mammedov wrote:
>>> On Mon,  2 Nov 2015 17:13:27 +0800
>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>
>>>> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>                                  ^^ missing one 0???
>>>
>>>> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>> for detailed design
>>>>
>>>> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
>>>> that controls if nvdimm support is enabled, it is true on default and
>>>> it is false on 2.4 and its earlier version to keep compatibility
>>>>
>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>> [...]
>>>
>>>> @@ -33,6 +33,15 @@
>>>>     */
>>>>    #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
>>>>
>>>> +/*
>>>> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>                                    ^^^ missing 0 or value in define below has an extra 0
>>>
>>>> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>> + * for detailed design.
>>>> + */
>>>> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
>>> it still maps RAM at arbitrary place,
>>> that's the reason why VMGenID patches hasn't been merged for
>>> more than several months.
>>> I'm not against of using (more exactly I'm for it) direct mapping
>>> but we should reach consensus when and how to use it first.
>>>
>>> I'd wouldn't use addresses below 4G as it may be used firmware or
>>> legacy hardware and I won't bet that 0xFF000000ULL won't conflict
>>> with anything.
>>> An alternative place to allocate reserve from could be high memory.
>>> For pc we have "reserved-memory-end" which currently makes sure
>>> that hotpluggable memory range isn't used by firmware.
>>>
>>> How about making API that allows to map additional memory
>>> ranges before reserved-memory-end and pushes it up as mappings are
>>> added.
>>
>> That what i did in the v1/v2 versions, however, as you noticed, using 64-bit
>> address in ACPI in QEMU is not a easy work - we can not simply make
>> SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
>> v3's changelog:
>>
>>     3) we figure out a unused memory hole below 4G that is 0xFF00000 ~
>>        0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
>>        ACPI SSDT/DSDT table will break windows XP.
>>        BTW, only make SSDT.rev = 2 can not work since the width is only depended
>>        on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
>>        in ACPI spec:
>> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit
>> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
>> | If the ComplianceRevision is less than 2, all integers are restricted to 32
>> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
>> | the global integer width for all integers, including integers in SSDTs.
>>     4) use the lowest ACPI spec version to document AML terms.
>>
>> The only way introducing 64 bit address is adding XSDT support that what
>> Michael did before, however, it seems it need great efforts to do it as
>> it will break OVMF. It's a long term workload. :(
> to enable 64-bit integers in AML it's sufficient to change DSDT revision to 2,
> I already have a patch that switches DSDT/SSDT to rev2.
> Tests show it doesn't break WindowsXP (which is rev1) and uses 64-bit integers
> on linux & later Windows versions.

Great, i remembered i did the similar test (directly change DSDT to rev2) and it
caused winXP blue screen. Could you please tell me where i can find your patch?

>
>>
>> The luck thing is, the ACPI part is not ABI, we can move it to the high
>> memory if QEMU supports XSDT is ready in future development.
> But mapped control region is ABI and we can't change it if we find out later
> that it breaks something.

But the ACPI code is completely built by QEMU, which is transparent to guest
and guest should not depend on it, no?


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
@ 2015-11-05 13:33           ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-05 13:33 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth



On 11/05/2015 09:03 PM, Igor Mammedov wrote:
> On Thu, 5 Nov 2015 18:15:31 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>>
>>
>> On 11/05/2015 05:58 PM, Igor Mammedov wrote:
>>> On Mon,  2 Nov 2015 17:13:27 +0800
>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>
>>>> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>                                  ^^ missing one 0???
>>>
>>>> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>> for detailed design
>>>>
>>>> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
>>>> that controls if nvdimm support is enabled, it is true on default and
>>>> it is false on 2.4 and its earlier version to keep compatibility
>>>>
>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>> [...]
>>>
>>>> @@ -33,6 +33,15 @@
>>>>     */
>>>>    #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
>>>>
>>>> +/*
>>>> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>                                    ^^^ missing 0 or value in define below has an extra 0
>>>
>>>> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>> + * for detailed design.
>>>> + */
>>>> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
>>> it still maps RAM at arbitrary place,
>>> that's the reason why VMGenID patches hasn't been merged for
>>> more than several months.
>>> I'm not against of using (more exactly I'm for it) direct mapping
>>> but we should reach consensus when and how to use it first.
>>>
>>> I'd wouldn't use addresses below 4G as it may be used firmware or
>>> legacy hardware and I won't bet that 0xFF000000ULL won't conflict
>>> with anything.
>>> An alternative place to allocate reserve from could be high memory.
>>> For pc we have "reserved-memory-end" which currently makes sure
>>> that hotpluggable memory range isn't used by firmware.
>>>
>>> How about making API that allows to map additional memory
>>> ranges before reserved-memory-end and pushes it up as mappings are
>>> added.
>>
>> That what i did in the v1/v2 versions, however, as you noticed, using 64-bit
>> address in ACPI in QEMU is not a easy work - we can not simply make
>> SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
>> v3's changelog:
>>
>>     3) we figure out a unused memory hole below 4G that is 0xFF00000 ~
>>        0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
>>        ACPI SSDT/DSDT table will break windows XP.
>>        BTW, only make SSDT.rev = 2 can not work since the width is only depended
>>        on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
>>        in ACPI spec:
>> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit
>> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
>> | If the ComplianceRevision is less than 2, all integers are restricted to 32
>> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
>> | the global integer width for all integers, including integers in SSDTs.
>>     4) use the lowest ACPI spec version to document AML terms.
>>
>> The only way introducing 64 bit address is adding XSDT support that what
>> Michael did before, however, it seems it need great efforts to do it as
>> it will break OVMF. It's a long term workload. :(
> to enable 64-bit integers in AML it's sufficient to change DSDT revision to 2,
> I already have a patch that switches DSDT/SSDT to rev2.
> Tests show it doesn't break WindowsXP (which is rev1) and uses 64-bit integers
> on linux & later Windows versions.

Great, i remembered i did the similar test (directly change DSDT to rev2) and it
caused winXP blue screen. Could you please tell me where i can find your patch?

>
>>
>> The luck thing is, the ACPI part is not ABI, we can move it to the high
>> memory if QEMU supports XSDT is ready in future development.
> But mapped control region is ABI and we can't change it if we find out later
> that it breaks something.

But the ACPI code is completely built by QEMU, which is transparent to guest
and guest should not depend on it, no?

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
  2015-11-05 13:33           ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-05 14:49             ` Igor Mammedov
  -1 siblings, 0 replies; 200+ messages in thread
From: Igor Mammedov @ 2015-11-05 14:49 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, gleb, mtosatti, stefanha, mst, rth, ehabkost,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake

On Thu, 5 Nov 2015 21:33:39 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 11/05/2015 09:03 PM, Igor Mammedov wrote:
> > On Thu, 5 Nov 2015 18:15:31 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >
> >>
> >>
> >> On 11/05/2015 05:58 PM, Igor Mammedov wrote:
> >>> On Mon,  2 Nov 2015 17:13:27 +0800
> >>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >>>
> >>>> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
> >>>                                  ^^ missing one 0???
> >>>
> >>>> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> >>>> for detailed design
> >>>>
> >>>> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
> >>>> that controls if nvdimm support is enabled, it is true on default and
> >>>> it is false on 2.4 and its earlier version to keep compatibility
> >>>>
> >>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >>> [...]
> >>>
> >>>> @@ -33,6 +33,15 @@
> >>>>     */
> >>>>    #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
> >>>>
> >>>> +/*
> >>>> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
> >>>                                    ^^^ missing 0 or value in define below has an extra 0
> >>>
> >>>> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> >>>> + * for detailed design.
> >>>> + */
> >>>> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
> >>> it still maps RAM at arbitrary place,
> >>> that's the reason why VMGenID patches hasn't been merged for
> >>> more than several months.
> >>> I'm not against of using (more exactly I'm for it) direct mapping
> >>> but we should reach consensus when and how to use it first.
> >>>
> >>> I'd wouldn't use addresses below 4G as it may be used firmware or
> >>> legacy hardware and I won't bet that 0xFF000000ULL won't conflict
> >>> with anything.
> >>> An alternative place to allocate reserve from could be high memory.
> >>> For pc we have "reserved-memory-end" which currently makes sure
> >>> that hotpluggable memory range isn't used by firmware.
> >>>
> >>> How about making API that allows to map additional memory
> >>> ranges before reserved-memory-end and pushes it up as mappings are
> >>> added.
> >>
> >> That what i did in the v1/v2 versions, however, as you noticed, using 64-bit
> >> address in ACPI in QEMU is not a easy work - we can not simply make
> >> SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
> >> v3's changelog:
> >>
> >>     3) we figure out a unused memory hole below 4G that is 0xFF00000 ~
> >>        0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
> >>        ACPI SSDT/DSDT table will break windows XP.
> >>        BTW, only make SSDT.rev = 2 can not work since the width is only depended
> >>        on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
> >>        in ACPI spec:
> >> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit
> >> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
> >> | If the ComplianceRevision is less than 2, all integers are restricted to 32
> >> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
> >> | the global integer width for all integers, including integers in SSDTs.
> >>     4) use the lowest ACPI spec version to document AML terms.
> >>
> >> The only way introducing 64 bit address is adding XSDT support that what
> >> Michael did before, however, it seems it need great efforts to do it as
> >> it will break OVMF. It's a long term workload. :(
> > to enable 64-bit integers in AML it's sufficient to change DSDT revision to 2,
> > I already have a patch that switches DSDT/SSDT to rev2.
> > Tests show it doesn't break WindowsXP (which is rev1) and uses 64-bit integers
> > on linux & later Windows versions.
> 
> Great, i remembered i did the similar test (directly change DSDT to rev2) and it
> caused winXP blue screen. Could you please tell me where i can find your patch?
https://github.com/imammedo/qemu/commits/mhpt_table_v2
following changes revision:
 pc: acpi: bump DSDT/SSDT compliance revision to v2
and here is user:
 acpi: memhp: simplify MCRS() using 64-bit math

when writing ASL one shall make sure that only XP supported
features are in global scope, which is evaluated when tables
are loaded and features of rev2 and higher are inside methods.
That way XP doesn't crash as far as it doesn't evaluate unsupported
opcode and one can guard those opcodes checking _REV object if neccesary.


> >>
> >> The luck thing is, the ACPI part is not ABI, we can move it to the high
> >> memory if QEMU supports XSDT is ready in future development.
> > But mapped control region is ABI and we can't change it if we find out later
> > that it breaks something.
> 
> But the ACPI code is completely built by QEMU, which is transparent to guest
> and guest should not depend on it, no?
unfortunately no, think about cross-version migration.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
@ 2015-11-05 14:49             ` Igor Mammedov
  0 siblings, 0 replies; 200+ messages in thread
From: Igor Mammedov @ 2015-11-05 14:49 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth

On Thu, 5 Nov 2015 21:33:39 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 11/05/2015 09:03 PM, Igor Mammedov wrote:
> > On Thu, 5 Nov 2015 18:15:31 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >
> >>
> >>
> >> On 11/05/2015 05:58 PM, Igor Mammedov wrote:
> >>> On Mon,  2 Nov 2015 17:13:27 +0800
> >>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >>>
> >>>> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
> >>>                                  ^^ missing one 0???
> >>>
> >>>> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> >>>> for detailed design
> >>>>
> >>>> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
> >>>> that controls if nvdimm support is enabled, it is true on default and
> >>>> it is false on 2.4 and its earlier version to keep compatibility
> >>>>
> >>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >>> [...]
> >>>
> >>>> @@ -33,6 +33,15 @@
> >>>>     */
> >>>>    #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
> >>>>
> >>>> +/*
> >>>> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
> >>>                                    ^^^ missing 0 or value in define below has an extra 0
> >>>
> >>>> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> >>>> + * for detailed design.
> >>>> + */
> >>>> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
> >>> it still maps RAM at arbitrary place,
> >>> that's the reason why VMGenID patches hasn't been merged for
> >>> more than several months.
> >>> I'm not against of using (more exactly I'm for it) direct mapping
> >>> but we should reach consensus when and how to use it first.
> >>>
> >>> I'd wouldn't use addresses below 4G as it may be used firmware or
> >>> legacy hardware and I won't bet that 0xFF000000ULL won't conflict
> >>> with anything.
> >>> An alternative place to allocate reserve from could be high memory.
> >>> For pc we have "reserved-memory-end" which currently makes sure
> >>> that hotpluggable memory range isn't used by firmware.
> >>>
> >>> How about making API that allows to map additional memory
> >>> ranges before reserved-memory-end and pushes it up as mappings are
> >>> added.
> >>
> >> That what i did in the v1/v2 versions, however, as you noticed, using 64-bit
> >> address in ACPI in QEMU is not a easy work - we can not simply make
> >> SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
> >> v3's changelog:
> >>
> >>     3) we figure out a unused memory hole below 4G that is 0xFF00000 ~
> >>        0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
> >>        ACPI SSDT/DSDT table will break windows XP.
> >>        BTW, only make SSDT.rev = 2 can not work since the width is only depended
> >>        on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
> >>        in ACPI spec:
> >> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit
> >> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
> >> | If the ComplianceRevision is less than 2, all integers are restricted to 32
> >> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
> >> | the global integer width for all integers, including integers in SSDTs.
> >>     4) use the lowest ACPI spec version to document AML terms.
> >>
> >> The only way introducing 64 bit address is adding XSDT support that what
> >> Michael did before, however, it seems it need great efforts to do it as
> >> it will break OVMF. It's a long term workload. :(
> > to enable 64-bit integers in AML it's sufficient to change DSDT revision to 2,
> > I already have a patch that switches DSDT/SSDT to rev2.
> > Tests show it doesn't break WindowsXP (which is rev1) and uses 64-bit integers
> > on linux & later Windows versions.
> 
> Great, i remembered i did the similar test (directly change DSDT to rev2) and it
> caused winXP blue screen. Could you please tell me where i can find your patch?
https://github.com/imammedo/qemu/commits/mhpt_table_v2
following changes revision:
 pc: acpi: bump DSDT/SSDT compliance revision to v2
and here is user:
 acpi: memhp: simplify MCRS() using 64-bit math

when writing ASL one shall make sure that only XP supported
features are in global scope, which is evaluated when tables
are loaded and features of rev2 and higher are inside methods.
That way XP doesn't crash as far as it doesn't evaluate unsupported
opcode and one can guard those opcodes checking _REV object if neccesary.


> >>
> >> The luck thing is, the ACPI part is not ABI, we can move it to the high
> >> memory if QEMU supports XSDT is ready in future development.
> > But mapped control region is ABI and we can't change it if we find out later
> > that it breaks something.
> 
> But the ACPI code is completely built by QEMU, which is transparent to guest
> and guest should not depend on it, no?
unfortunately no, think about cross-version migration.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-05 17:29     ` Eduardo Habkost
  -1 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-05 17:29 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake

On Mon, Nov 02, 2015 at 05:13:22PM +0800, Xiao Guangrong wrote:
[...]
>  static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
>  {
> -    return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
> +    Error *local_err = NULL;
> +    MemoryRegion *mr;
> +
> +    mr = host_memory_backend_get_memory(dimm->hostmem, &local_err);
> +
> +    /*
> +     * plug a pc-dimm device whose backend memory was not properly
> +     * initialized?
> +     */
> +    assert(!local_err && mr);

I don't know if you are going to remove the errp parameter in the next
version, but if you want to simply abort in case an error is reported by
a function, you can use &error_abort.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
@ 2015-11-05 17:29     ` Eduardo Habkost
  0 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-05 17:29 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, dan.j.williams, rth

On Mon, Nov 02, 2015 at 05:13:22PM +0800, Xiao Guangrong wrote:
[...]
>  static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
>  {
> -    return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
> +    Error *local_err = NULL;
> +    MemoryRegion *mr;
> +
> +    mr = host_memory_backend_get_memory(dimm->hostmem, &local_err);
> +
> +    /*
> +     * plug a pc-dimm device whose backend memory was not properly
> +     * initialized?
> +     */
> +    assert(!local_err && mr);

I don't know if you are going to remove the errp parameter in the next
version, but if you want to simply abort in case an error is reported by
a function, you can use &error_abort.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
  2015-11-05 17:29     ` [Qemu-devel] " Eduardo Habkost
@ 2015-11-06  2:50       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-06  2:50 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: vsementsov, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, dan.j.williams, rth



On 11/06/2015 01:29 AM, Eduardo Habkost wrote:
> On Mon, Nov 02, 2015 at 05:13:22PM +0800, Xiao Guangrong wrote:
> [...]
>>   static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
>>   {
>> -    return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
>> +    Error *local_err = NULL;
>> +    MemoryRegion *mr;
>> +
>> +    mr = host_memory_backend_get_memory(dimm->hostmem, &local_err);
>> +
>> +    /*
>> +     * plug a pc-dimm device whose backend memory was not properly
>> +     * initialized?
>> +     */
>> +    assert(!local_err && mr);
>
> I don't know if you are going to remove the errp parameter in the next
> version, but if you want to simply abort in case an error is reported by
> a function, you can use &error_abort.
>

Thank you, Eduardo! let's happily drop the unused errp parameter in
host_memory_backend_get_memory in the next version. :)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region
@ 2015-11-06  2:50       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-06  2:50 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: vsementsov, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, dan.j.williams, rth



On 11/06/2015 01:29 AM, Eduardo Habkost wrote:
> On Mon, Nov 02, 2015 at 05:13:22PM +0800, Xiao Guangrong wrote:
> [...]
>>   static MemoryRegion *pc_dimm_get_memory_region(DIMMDevice *dimm)
>>   {
>> -    return host_memory_backend_get_memory(dimm->hostmem, &error_abort);
>> +    Error *local_err = NULL;
>> +    MemoryRegion *mr;
>> +
>> +    mr = host_memory_backend_get_memory(dimm->hostmem, &local_err);
>> +
>> +    /*
>> +     * plug a pc-dimm device whose backend memory was not properly
>> +     * initialized?
>> +     */
>> +    assert(!local_err && mr);
>
> I don't know if you are going to remove the errp parameter in the next
> version, but if you want to simply abort in case an error is reported by
> a function, you can use &error_abort.
>

Thank you, Eduardo! let's happily drop the unused errp parameter in
host_memory_backend_get_memory in the next version. :)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
  2015-11-05 14:49             ` [Qemu-devel] " Igor Mammedov
@ 2015-11-06  8:31               ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-06  8:31 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: pbonzini, gleb, mtosatti, stefanha, mst, rth, ehabkost,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake



On 11/05/2015 10:49 PM, Igor Mammedov wrote:
> On Thu, 5 Nov 2015 21:33:39 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>>
>>
>> On 11/05/2015 09:03 PM, Igor Mammedov wrote:
>>> On Thu, 5 Nov 2015 18:15:31 +0800
>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>
>>>>
>>>>
>>>> On 11/05/2015 05:58 PM, Igor Mammedov wrote:
>>>>> On Mon,  2 Nov 2015 17:13:27 +0800
>>>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>>>
>>>>>> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>>>                                   ^^ missing one 0???
>>>>>
>>>>>> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>>>> for detailed design
>>>>>>
>>>>>> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
>>>>>> that controls if nvdimm support is enabled, it is true on default and
>>>>>> it is false on 2.4 and its earlier version to keep compatibility
>>>>>>
>>>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>>>> [...]
>>>>>
>>>>>> @@ -33,6 +33,15 @@
>>>>>>      */
>>>>>>     #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
>>>>>>
>>>>>> +/*
>>>>>> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>>>                                     ^^^ missing 0 or value in define below has an extra 0
>>>>>
>>>>>> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>>>> + * for detailed design.
>>>>>> + */
>>>>>> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
>>>>> it still maps RAM at arbitrary place,
>>>>> that's the reason why VMGenID patches hasn't been merged for
>>>>> more than several months.
>>>>> I'm not against of using (more exactly I'm for it) direct mapping
>>>>> but we should reach consensus when and how to use it first.
>>>>>
>>>>> I'd wouldn't use addresses below 4G as it may be used firmware or
>>>>> legacy hardware and I won't bet that 0xFF000000ULL won't conflict
>>>>> with anything.
>>>>> An alternative place to allocate reserve from could be high memory.
>>>>> For pc we have "reserved-memory-end" which currently makes sure
>>>>> that hotpluggable memory range isn't used by firmware.
>>>>>
>>>>> How about making API that allows to map additional memory
>>>>> ranges before reserved-memory-end and pushes it up as mappings are
>>>>> added.
>>>>
>>>> That what i did in the v1/v2 versions, however, as you noticed, using 64-bit
>>>> address in ACPI in QEMU is not a easy work - we can not simply make
>>>> SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
>>>> v3's changelog:
>>>>
>>>>      3) we figure out a unused memory hole below 4G that is 0xFF00000 ~
>>>>         0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
>>>>         ACPI SSDT/DSDT table will break windows XP.
>>>>         BTW, only make SSDT.rev = 2 can not work since the width is only depended
>>>>         on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
>>>>         in ACPI spec:
>>>> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit
>>>> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
>>>> | If the ComplianceRevision is less than 2, all integers are restricted to 32
>>>> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
>>>> | the global integer width for all integers, including integers in SSDTs.
>>>>      4) use the lowest ACPI spec version to document AML terms.
>>>>
>>>> The only way introducing 64 bit address is adding XSDT support that what
>>>> Michael did before, however, it seems it need great efforts to do it as
>>>> it will break OVMF. It's a long term workload. :(
>>> to enable 64-bit integers in AML it's sufficient to change DSDT revision to 2,
>>> I already have a patch that switches DSDT/SSDT to rev2.
>>> Tests show it doesn't break WindowsXP (which is rev1) and uses 64-bit integers
>>> on linux & later Windows versions.
>>
>> Great, i remembered i did the similar test (directly change DSDT to rev2) and it
>> caused winXP blue screen. Could you please tell me where i can find your patch?
> https://github.com/imammedo/qemu/commits/mhpt_table_v2
> following changes revision:
>   pc: acpi: bump DSDT/SSDT compliance revision to v2
> and here is user:
>   acpi: memhp: simplify MCRS() using 64-bit math
>
> when writing ASL one shall make sure that only XP supported
> features are in global scope, which is evaluated when tables
> are loaded and features of rev2 and higher are inside methods.
> That way XP doesn't crash as far as it doesn't evaluate unsupported
> opcode and one can guard those opcodes checking _REV object if neccesary.
>

Really a good study case to me, i tried your patch and moved the 64 bit
staffs to the private method, it worked. :)

Igor, is this the API you want?

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 6bf569a..aba29df 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1291,6 +1291,38 @@ FWCfgState *xen_load_linux(PCMachineState *pcms,
      return fw_cfg;
  }

+static void pc_reserve_high_memory_init(PCMachineState *pcms,
+                                        uint64_t base, uint64_t align)
+{
+    pcms->reserve_high_memory.current_addr = ROUND_UP(base, align);
+}
+
+static uint64_t
+pc_reserve_high_memory_end(PCMachineState *pcms, int64_t align)
+{
+    return ROUND_UP(pcms->reserve_high_memory.current_addr, align);
+}
+
+uint64_t pc_reserve_high_memory(PCMachineState *pcms, uint64_t size,
+                                int64_t align, Error **errp)
+{
+    uint64_t end_addr, current_addr = pcms->reserve_high_memory.current_addr;
+
+    if (!current_addr) {
+        error_setg(errp, "reserved high memory is not initialized.");
+        return 0;
+    }
+
+    end_addr = pc_reserve_high_memory_end(pcms, align) + size;
+    if (current_addr > end_addr) {
+        error_setg(errp, "reserved high memory is not enough.");
+        return 0;
+    }
+
+    pcms->reserve_high_memory.current_addr = end_addr;
+    return end_addr;
+}
+
  FWCfgState *pc_memory_init(PCMachineState *pcms,
                             MemoryRegion *system_memory,
                             MemoryRegion *rom_memory,
@@ -1379,6 +1411,8 @@ FWCfgState *pc_memory_init(PCMachineState *pcms,
                             "hotplug-memory", hotplug_mem_size);
          memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
                                      &pcms->hotplug_memory.mr);
+        pc_reserve_high_memory_init(pcms, pcms->hotplug_memory.base +
+                                    hotplug_mem_size, 1ULL << 30);
      }

      /* Initialize PC system firmware */
@@ -1403,7 +1437,7 @@ FWCfgState *pc_memory_init(PCMachineState *pcms,
          uint64_t res_mem_end = pcms->hotplug_memory.base;

          if (!pcmc->broken_reserved_end) {
-            res_mem_end += memory_region_size(&pcms->hotplug_memory.mr);
+            res_mem_end = pc_reserve_high_memory_end(pcms, 0x1ULL << 30);
          }
          *val = cpu_to_le64(ROUND_UP(res_mem_end, 0x1ULL << 30));
          fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val, sizeof(*val));
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 47162cf..fae3fea 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -20,6 +20,11 @@

  #define HPET_INTCAP "hpet-intcap"

+struct PCReserveHighMemory {
+    uint64_t current_addr;
+};
+typedef struct PCReserveHighMemory PCReserveHighMemory;
+
  /**
   * PCMachineState:
   * @acpi_dev: link to ACPI PM device that performs ACPI hotplug handling
@@ -41,6 +46,7 @@ struct PCMachineState {
      OnOffAuto smm;
      bool enforce_aligned_dimm;
      ram_addr_t below_4g_mem_size, above_4g_mem_size;
+    PCReserveHighMemory reserve_high_memory;
  };

  #define PC_MACHINE_ACPI_DEVICE_PROP "acpi-device"
@@ -175,6 +181,9 @@ PcGuestInfo *pc_guest_info_init(PCMachineState *pcms);

  void pc_set_legacy_acpi_data_size(void);

+uint64_t pc_reserve_high_memory(PCMachineState *pcms, uint64_t size,
+                                int64_t align, Error **errp);
+
  #define PCI_HOST_PROP_PCI_HOLE_START   "pci-hole-start"
  #define PCI_HOST_PROP_PCI_HOLE_END     "pci-hole-end"
  #define PCI_HOST_PROP_PCI_HOLE64_START "pci-hole64-start"


>
>>>>
>>>> The luck thing is, the ACPI part is not ABI, we can move it to the high
>>>> memory if QEMU supports XSDT is ready in future development.
>>> But mapped control region is ABI and we can't change it if we find out later
>>> that it breaks something.
>>
>> But the ACPI code is completely built by QEMU, which is transparent to guest
>> and guest should not depend on it, no?
> unfortunately no, think about cross-version migration.

It makes sense. Stupid me. :(



^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
@ 2015-11-06  8:31               ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-06  8:31 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth



On 11/05/2015 10:49 PM, Igor Mammedov wrote:
> On Thu, 5 Nov 2015 21:33:39 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>>
>>
>> On 11/05/2015 09:03 PM, Igor Mammedov wrote:
>>> On Thu, 5 Nov 2015 18:15:31 +0800
>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>
>>>>
>>>>
>>>> On 11/05/2015 05:58 PM, Igor Mammedov wrote:
>>>>> On Mon,  2 Nov 2015 17:13:27 +0800
>>>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>>>
>>>>>> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>>>                                   ^^ missing one 0???
>>>>>
>>>>>> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>>>> for detailed design
>>>>>>
>>>>>> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
>>>>>> that controls if nvdimm support is enabled, it is true on default and
>>>>>> it is false on 2.4 and its earlier version to keep compatibility
>>>>>>
>>>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>>>> [...]
>>>>>
>>>>>> @@ -33,6 +33,15 @@
>>>>>>      */
>>>>>>     #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
>>>>>>
>>>>>> +/*
>>>>>> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>>>                                     ^^^ missing 0 or value in define below has an extra 0
>>>>>
>>>>>> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>>>> + * for detailed design.
>>>>>> + */
>>>>>> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
>>>>> it still maps RAM at arbitrary place,
>>>>> that's the reason why VMGenID patches hasn't been merged for
>>>>> more than several months.
>>>>> I'm not against of using (more exactly I'm for it) direct mapping
>>>>> but we should reach consensus when and how to use it first.
>>>>>
>>>>> I'd wouldn't use addresses below 4G as it may be used firmware or
>>>>> legacy hardware and I won't bet that 0xFF000000ULL won't conflict
>>>>> with anything.
>>>>> An alternative place to allocate reserve from could be high memory.
>>>>> For pc we have "reserved-memory-end" which currently makes sure
>>>>> that hotpluggable memory range isn't used by firmware.
>>>>>
>>>>> How about making API that allows to map additional memory
>>>>> ranges before reserved-memory-end and pushes it up as mappings are
>>>>> added.
>>>>
>>>> That what i did in the v1/v2 versions, however, as you noticed, using 64-bit
>>>> address in ACPI in QEMU is not a easy work - we can not simply make
>>>> SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
>>>> v3's changelog:
>>>>
>>>>      3) we figure out a unused memory hole below 4G that is 0xFF00000 ~
>>>>         0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
>>>>         ACPI SSDT/DSDT table will break windows XP.
>>>>         BTW, only make SSDT.rev = 2 can not work since the width is only depended
>>>>         on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
>>>>         in ACPI spec:
>>>> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit
>>>> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
>>>> | If the ComplianceRevision is less than 2, all integers are restricted to 32
>>>> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
>>>> | the global integer width for all integers, including integers in SSDTs.
>>>>      4) use the lowest ACPI spec version to document AML terms.
>>>>
>>>> The only way introducing 64 bit address is adding XSDT support that what
>>>> Michael did before, however, it seems it need great efforts to do it as
>>>> it will break OVMF. It's a long term workload. :(
>>> to enable 64-bit integers in AML it's sufficient to change DSDT revision to 2,
>>> I already have a patch that switches DSDT/SSDT to rev2.
>>> Tests show it doesn't break WindowsXP (which is rev1) and uses 64-bit integers
>>> on linux & later Windows versions.
>>
>> Great, i remembered i did the similar test (directly change DSDT to rev2) and it
>> caused winXP blue screen. Could you please tell me where i can find your patch?
> https://github.com/imammedo/qemu/commits/mhpt_table_v2
> following changes revision:
>   pc: acpi: bump DSDT/SSDT compliance revision to v2
> and here is user:
>   acpi: memhp: simplify MCRS() using 64-bit math
>
> when writing ASL one shall make sure that only XP supported
> features are in global scope, which is evaluated when tables
> are loaded and features of rev2 and higher are inside methods.
> That way XP doesn't crash as far as it doesn't evaluate unsupported
> opcode and one can guard those opcodes checking _REV object if neccesary.
>

Really a good study case to me, i tried your patch and moved the 64 bit
staffs to the private method, it worked. :)

Igor, is this the API you want?

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 6bf569a..aba29df 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1291,6 +1291,38 @@ FWCfgState *xen_load_linux(PCMachineState *pcms,
      return fw_cfg;
  }

+static void pc_reserve_high_memory_init(PCMachineState *pcms,
+                                        uint64_t base, uint64_t align)
+{
+    pcms->reserve_high_memory.current_addr = ROUND_UP(base, align);
+}
+
+static uint64_t
+pc_reserve_high_memory_end(PCMachineState *pcms, int64_t align)
+{
+    return ROUND_UP(pcms->reserve_high_memory.current_addr, align);
+}
+
+uint64_t pc_reserve_high_memory(PCMachineState *pcms, uint64_t size,
+                                int64_t align, Error **errp)
+{
+    uint64_t end_addr, current_addr = pcms->reserve_high_memory.current_addr;
+
+    if (!current_addr) {
+        error_setg(errp, "reserved high memory is not initialized.");
+        return 0;
+    }
+
+    end_addr = pc_reserve_high_memory_end(pcms, align) + size;
+    if (current_addr > end_addr) {
+        error_setg(errp, "reserved high memory is not enough.");
+        return 0;
+    }
+
+    pcms->reserve_high_memory.current_addr = end_addr;
+    return end_addr;
+}
+
  FWCfgState *pc_memory_init(PCMachineState *pcms,
                             MemoryRegion *system_memory,
                             MemoryRegion *rom_memory,
@@ -1379,6 +1411,8 @@ FWCfgState *pc_memory_init(PCMachineState *pcms,
                             "hotplug-memory", hotplug_mem_size);
          memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
                                      &pcms->hotplug_memory.mr);
+        pc_reserve_high_memory_init(pcms, pcms->hotplug_memory.base +
+                                    hotplug_mem_size, 1ULL << 30);
      }

      /* Initialize PC system firmware */
@@ -1403,7 +1437,7 @@ FWCfgState *pc_memory_init(PCMachineState *pcms,
          uint64_t res_mem_end = pcms->hotplug_memory.base;

          if (!pcmc->broken_reserved_end) {
-            res_mem_end += memory_region_size(&pcms->hotplug_memory.mr);
+            res_mem_end = pc_reserve_high_memory_end(pcms, 0x1ULL << 30);
          }
          *val = cpu_to_le64(ROUND_UP(res_mem_end, 0x1ULL << 30));
          fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val, sizeof(*val));
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 47162cf..fae3fea 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -20,6 +20,11 @@

  #define HPET_INTCAP "hpet-intcap"

+struct PCReserveHighMemory {
+    uint64_t current_addr;
+};
+typedef struct PCReserveHighMemory PCReserveHighMemory;
+
  /**
   * PCMachineState:
   * @acpi_dev: link to ACPI PM device that performs ACPI hotplug handling
@@ -41,6 +46,7 @@ struct PCMachineState {
      OnOffAuto smm;
      bool enforce_aligned_dimm;
      ram_addr_t below_4g_mem_size, above_4g_mem_size;
+    PCReserveHighMemory reserve_high_memory;
  };

  #define PC_MACHINE_ACPI_DEVICE_PROP "acpi-device"
@@ -175,6 +181,9 @@ PcGuestInfo *pc_guest_info_init(PCMachineState *pcms);

  void pc_set_legacy_acpi_data_size(void);

+uint64_t pc_reserve_high_memory(PCMachineState *pcms, uint64_t size,
+                                int64_t align, Error **errp);
+
  #define PCI_HOST_PROP_PCI_HOLE_START   "pci-hole-start"
  #define PCI_HOST_PROP_PCI_HOLE_END     "pci-hole-end"
  #define PCI_HOST_PROP_PCI_HOLE64_START "pci-hole64-start"


>
>>>>
>>>> The luck thing is, the ACPI part is not ABI, we can move it to the high
>>>> memory if QEMU supports XSDT is ready in future development.
>>> But mapped control region is ABI and we can't change it if we find out later
>>> that it breaks something.
>>
>> But the ACPI code is completely built by QEMU, which is transparent to guest
>> and guest should not depend on it, no?
> unfortunately no, think about cross-version migration.

It makes sense. Stupid me. :(

^ permalink raw reply related	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
  2015-11-06  8:31               ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-06  8:56                 ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-06  8:56 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: pbonzini, gleb, mtosatti, stefanha, mst, rth, ehabkost,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake



On 11/06/2015 04:31 PM, Xiao Guangrong wrote:
>
>
> On 11/05/2015 10:49 PM, Igor Mammedov wrote:
>> On Thu, 5 Nov 2015 21:33:39 +0800
>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>
>>>
>>>
>>> On 11/05/2015 09:03 PM, Igor Mammedov wrote:
>>>> On Thu, 5 Nov 2015 18:15:31 +0800
>>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On 11/05/2015 05:58 PM, Igor Mammedov wrote:
>>>>>> On Mon,  2 Nov 2015 17:13:27 +0800
>>>>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>>>>
>>>>>>> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>>>>                                   ^^ missing one 0???
>>>>>>
>>>>>>> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>>>>> for detailed design
>>>>>>>
>>>>>>> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
>>>>>>> that controls if nvdimm support is enabled, it is true on default and
>>>>>>> it is false on 2.4 and its earlier version to keep compatibility
>>>>>>>
>>>>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>>>>> [...]
>>>>>>
>>>>>>> @@ -33,6 +33,15 @@
>>>>>>>      */
>>>>>>>     #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
>>>>>>>
>>>>>>> +/*
>>>>>>> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>>>>                                     ^^^ missing 0 or value in define below has an extra 0
>>>>>>
>>>>>>> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>>>>> + * for detailed design.
>>>>>>> + */
>>>>>>> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
>>>>>> it still maps RAM at arbitrary place,
>>>>>> that's the reason why VMGenID patches hasn't been merged for
>>>>>> more than several months.
>>>>>> I'm not against of using (more exactly I'm for it) direct mapping
>>>>>> but we should reach consensus when and how to use it first.
>>>>>>
>>>>>> I'd wouldn't use addresses below 4G as it may be used firmware or
>>>>>> legacy hardware and I won't bet that 0xFF000000ULL won't conflict
>>>>>> with anything.
>>>>>> An alternative place to allocate reserve from could be high memory.
>>>>>> For pc we have "reserved-memory-end" which currently makes sure
>>>>>> that hotpluggable memory range isn't used by firmware.
>>>>>>
>>>>>> How about making API that allows to map additional memory
>>>>>> ranges before reserved-memory-end and pushes it up as mappings are
>>>>>> added.
>>>>>
>>>>> That what i did in the v1/v2 versions, however, as you noticed, using 64-bit
>>>>> address in ACPI in QEMU is not a easy work - we can not simply make
>>>>> SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
>>>>> v3's changelog:
>>>>>
>>>>>      3) we figure out a unused memory hole below 4G that is 0xFF00000 ~
>>>>>         0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
>>>>>         ACPI SSDT/DSDT table will break windows XP.
>>>>>         BTW, only make SSDT.rev = 2 can not work since the width is only depended
>>>>>         on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
>>>>>         in ACPI spec:
>>>>> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit
>>>>> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
>>>>> | If the ComplianceRevision is less than 2, all integers are restricted to 32
>>>>> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
>>>>> | the global integer width for all integers, including integers in SSDTs.
>>>>>      4) use the lowest ACPI spec version to document AML terms.
>>>>>
>>>>> The only way introducing 64 bit address is adding XSDT support that what
>>>>> Michael did before, however, it seems it need great efforts to do it as
>>>>> it will break OVMF. It's a long term workload. :(
>>>> to enable 64-bit integers in AML it's sufficient to change DSDT revision to 2,
>>>> I already have a patch that switches DSDT/SSDT to rev2.
>>>> Tests show it doesn't break WindowsXP (which is rev1) and uses 64-bit integers
>>>> on linux & later Windows versions.
>>>
>>> Great, i remembered i did the similar test (directly change DSDT to rev2) and it
>>> caused winXP blue screen. Could you please tell me where i can find your patch?
>> https://github.com/imammedo/qemu/commits/mhpt_table_v2
>> following changes revision:
>>   pc: acpi: bump DSDT/SSDT compliance revision to v2
>> and here is user:
>>   acpi: memhp: simplify MCRS() using 64-bit math
>>
>> when writing ASL one shall make sure that only XP supported
>> features are in global scope, which is evaluated when tables
>> are loaded and features of rev2 and higher are inside methods.
>> That way XP doesn't crash as far as it doesn't evaluate unsupported
>> opcode and one can guard those opcodes checking _REV object if neccesary.
>>
>
> Really a good study case to me, i tried your patch and moved the 64 bit
> staffs to the private method, it worked. :)
>
> Igor, is this the API you want?

BTW, after move the control region to 4G above, is it acceptable to reserve enough
memory containing whole FIT and dsm memory as we did it in previous version?

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
@ 2015-11-06  8:56                 ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-06  8:56 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth



On 11/06/2015 04:31 PM, Xiao Guangrong wrote:
>
>
> On 11/05/2015 10:49 PM, Igor Mammedov wrote:
>> On Thu, 5 Nov 2015 21:33:39 +0800
>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>
>>>
>>>
>>> On 11/05/2015 09:03 PM, Igor Mammedov wrote:
>>>> On Thu, 5 Nov 2015 18:15:31 +0800
>>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On 11/05/2015 05:58 PM, Igor Mammedov wrote:
>>>>>> On Mon,  2 Nov 2015 17:13:27 +0800
>>>>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>>>>
>>>>>>> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>>>>                                   ^^ missing one 0???
>>>>>>
>>>>>>> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>>>>> for detailed design
>>>>>>>
>>>>>>> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
>>>>>>> that controls if nvdimm support is enabled, it is true on default and
>>>>>>> it is false on 2.4 and its earlier version to keep compatibility
>>>>>>>
>>>>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>>>>> [...]
>>>>>>
>>>>>>> @@ -33,6 +33,15 @@
>>>>>>>      */
>>>>>>>     #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
>>>>>>>
>>>>>>> +/*
>>>>>>> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>>>>                                     ^^^ missing 0 or value in define below has an extra 0
>>>>>>
>>>>>>> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>>>>> + * for detailed design.
>>>>>>> + */
>>>>>>> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
>>>>>> it still maps RAM at arbitrary place,
>>>>>> that's the reason why VMGenID patches hasn't been merged for
>>>>>> more than several months.
>>>>>> I'm not against of using (more exactly I'm for it) direct mapping
>>>>>> but we should reach consensus when and how to use it first.
>>>>>>
>>>>>> I'd wouldn't use addresses below 4G as it may be used firmware or
>>>>>> legacy hardware and I won't bet that 0xFF000000ULL won't conflict
>>>>>> with anything.
>>>>>> An alternative place to allocate reserve from could be high memory.
>>>>>> For pc we have "reserved-memory-end" which currently makes sure
>>>>>> that hotpluggable memory range isn't used by firmware.
>>>>>>
>>>>>> How about making API that allows to map additional memory
>>>>>> ranges before reserved-memory-end and pushes it up as mappings are
>>>>>> added.
>>>>>
>>>>> That what i did in the v1/v2 versions, however, as you noticed, using 64-bit
>>>>> address in ACPI in QEMU is not a easy work - we can not simply make
>>>>> SSDT.rev = 2 to apply 64 bit address, the reason i have documented in
>>>>> v3's changelog:
>>>>>
>>>>>      3) we figure out a unused memory hole below 4G that is 0xFF00000 ~
>>>>>         0xFFF00000, this range is large enough for NVDIMM ACPI as build 64-bit
>>>>>         ACPI SSDT/DSDT table will break windows XP.
>>>>>         BTW, only make SSDT.rev = 2 can not work since the width is only depended
>>>>>         on DSDT.rev based on 19.6.28 DefinitionBlock (Declare Definition Block)
>>>>>         in ACPI spec:
>>>>> | Note: For compatibility with ACPI versions before ACPI 2.0, the bit
>>>>> | width of Integer objects is dependent on the ComplianceRevision of the DSDT.
>>>>> | If the ComplianceRevision is less than 2, all integers are restricted to 32
>>>>> | bits. Otherwise, full 64-bit integers are used. The version of the DSDT sets
>>>>> | the global integer width for all integers, including integers in SSDTs.
>>>>>      4) use the lowest ACPI spec version to document AML terms.
>>>>>
>>>>> The only way introducing 64 bit address is adding XSDT support that what
>>>>> Michael did before, however, it seems it need great efforts to do it as
>>>>> it will break OVMF. It's a long term workload. :(
>>>> to enable 64-bit integers in AML it's sufficient to change DSDT revision to 2,
>>>> I already have a patch that switches DSDT/SSDT to rev2.
>>>> Tests show it doesn't break WindowsXP (which is rev1) and uses 64-bit integers
>>>> on linux & later Windows versions.
>>>
>>> Great, i remembered i did the similar test (directly change DSDT to rev2) and it
>>> caused winXP blue screen. Could you please tell me where i can find your patch?
>> https://github.com/imammedo/qemu/commits/mhpt_table_v2
>> following changes revision:
>>   pc: acpi: bump DSDT/SSDT compliance revision to v2
>> and here is user:
>>   acpi: memhp: simplify MCRS() using 64-bit math
>>
>> when writing ASL one shall make sure that only XP supported
>> features are in global scope, which is evaluated when tables
>> are loaded and features of rev2 and higher are inside methods.
>> That way XP doesn't crash as far as it doesn't evaluate unsupported
>> opcode and one can guard those opcodes checking _REV object if neccesary.
>>
>
> Really a good study case to me, i tried your patch and moved the 64 bit
> staffs to the private method, it worked. :)
>
> Igor, is this the API you want?

BTW, after move the control region to 4G above, is it acceptable to reserve enough
memory containing whole FIT and dsm memory as we did it in previous version?

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 07/35] util: introduce qemu_file_get_page_size()
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-06 15:36     ` Eduardo Habkost
  -1 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-06 15:36 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake

On Mon, Nov 02, 2015 at 05:13:09PM +0800, Xiao Guangrong wrote:
> There are three places use the some logic to get the page size on
> the file path or file fd
> 
> Windows did not support file hugepage, so it will return normal page
> for this case. And this interface has not been used on windows so far
>  
> This patch introduces qemu_file_get_page_size() to unify the code
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[...]
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 914cef5..51437ff 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -340,7 +340,7 @@ static void sigbus_handler(int signal)
>      siglongjmp(sigjump, 1);
>  }
>  
> -static size_t fd_getpagesize(int fd)
> +static size_t fd_getpagesize(int fd, Error **errp)
>  {
>  #ifdef CONFIG_LINUX
>      struct statfs fs;
> @@ -351,7 +351,12 @@ static size_t fd_getpagesize(int fd)
>              ret = fstatfs(fd, &fs);
>          } while (ret != 0 && errno == EINTR);
>  
> -        if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {
> +        if (ret) {
> +            error_setg_errno(errp, errno, "fstatfs is failed");
> +            return 0;
> +        }
> +
> +        if (fs.f_type == HUGETLBFS_MAGIC) {
>              return fs.f_bsize;
>          }

You are changing os_mem_prealloc() behavior when fstatfs() fails, here.
Have you ensured there are no cases where fstatfs() fails but this code
is still expected to work?

The change looks safe: gethugepagesize() seems to be always called in
the path that would make fd_getpagesize() be called from
os_mem_prealloc(), so allocation would abort much earlier if statfs()
failed. But I haven't confirmed that yet, and I wanted to be sure.


-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 07/35] util: introduce qemu_file_get_page_size()
@ 2015-11-06 15:36     ` Eduardo Habkost
  0 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-06 15:36 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, dan.j.williams, rth

On Mon, Nov 02, 2015 at 05:13:09PM +0800, Xiao Guangrong wrote:
> There are three places use the some logic to get the page size on
> the file path or file fd
> 
> Windows did not support file hugepage, so it will return normal page
> for this case. And this interface has not been used on windows so far
>  
> This patch introduces qemu_file_get_page_size() to unify the code
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
[...]
> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> index 914cef5..51437ff 100644
> --- a/util/oslib-posix.c
> +++ b/util/oslib-posix.c
> @@ -340,7 +340,7 @@ static void sigbus_handler(int signal)
>      siglongjmp(sigjump, 1);
>  }
>  
> -static size_t fd_getpagesize(int fd)
> +static size_t fd_getpagesize(int fd, Error **errp)
>  {
>  #ifdef CONFIG_LINUX
>      struct statfs fs;
> @@ -351,7 +351,12 @@ static size_t fd_getpagesize(int fd)
>              ret = fstatfs(fd, &fs);
>          } while (ret != 0 && errno == EINTR);
>  
> -        if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {
> +        if (ret) {
> +            error_setg_errno(errp, errno, "fstatfs is failed");
> +            return 0;
> +        }
> +
> +        if (fs.f_type == HUGETLBFS_MAGIC) {
>              return fs.f_bsize;
>          }

You are changing os_mem_prealloc() behavior when fstatfs() fails, here.
Have you ensured there are no cases where fstatfs() fails but this code
is still expected to work?

The change looks safe: gethugepagesize() seems to be always called in
the path that would make fd_getpagesize() be called from
os_mem_prealloc(), so allocation would abort much earlier if statfs()
failed. But I haven't confirmed that yet, and I wanted to be sure.


-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 12/35] util: let qemu_fd_getlength support block device
  2015-11-02 16:21       ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-06 15:44         ` Eduardo Habkost
  -1 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-06 15:44 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Vladimir Sementsov-Ogievskiy, pbonzini, imammedo, gleb, mtosatti,
	stefanha, mst, rth, dan.j.williams, kvm, qemu-devel, eblake

On Tue, Nov 03, 2015 at 12:21:30AM +0800, Xiao Guangrong wrote:
> On 11/03/2015 12:11 AM, Vladimir Sementsov-Ogievskiy wrote:
> >On 02.11.2015 12:13, Xiao Guangrong wrote:
> >>lseek can not work for all block devices as the man page says:
> >>| Some devices are incapable of seeking and POSIX does not specify
> >>| which devices must support lseek().
> >>
> >>This patch tries to add the support on Linux by using BLKGETSIZE64
> >>ioctl
> >>
> >>Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >>---
> >>  util/osdep.c | 20 ++++++++++++++++++++
> >>  1 file changed, 20 insertions(+)
> >>
> >>diff --git a/util/osdep.c b/util/osdep.c
> >>index 5a61e19..b20c793 100644
> >>--- a/util/osdep.c
> >>+++ b/util/osdep.c
> >>@@ -45,6 +45,11 @@
> >>  extern int madvise(caddr_t, size_t, int);
> >>  #endif
> >>+#ifdef CONFIG_LINUX
> >>+#include <sys/ioctl.h>
> >>+#include <linux/fs.h>
> >>+#endif
> >>+
> >>  #include "qemu-common.h"
> >>  #include "qemu/sockets.h"
> >>  #include "qemu/error-report.h"
> >>@@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
> >>  {
> >>      int64_t size;
> >>+#ifdef CONFIG_LINUX
> >>+    struct stat stat_buf;
> >>+    if (fstat(fd, &stat_buf) < 0) {
> >>+        return -errno;
> >>+    }
> >>+
> >>+    if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) {
> >>+        /* The size of block device is larger than max int64_t? */
> >>+        if (size < 0) {
> >>+            return -EOVERFLOW;
> >>+        }
> >>+        return size;
> >>+    }
> >>+#endif
> >>+
> >>      size = lseek(fd, 0, SEEK_END);
> >>      if (size < 0) {
> >>          return -errno;
> >
> >Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >
> >just a question: is there any use for stat.st_size ? Is it always worse then lseek?
> 
> The man page says:
> The  st_size field gives the size of the file (if it is a regular file or a symbolic link)
> in bytes.  The size of a symbolic link is the length of the pathname it contains, without a
> terminating null byte.
> 
> So it can not work on symbolic link.

stat() and fstat() will get you information for the file pointed by the
symlink. You will get the symlink information only if you use lstat().

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/35] util: let qemu_fd_getlength support block device
@ 2015-11-06 15:44         ` Eduardo Habkost
  0 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-06 15:44 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Vladimir Sementsov-Ogievskiy, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, imammedo, pbonzini, dan.j.williams, rth

On Tue, Nov 03, 2015 at 12:21:30AM +0800, Xiao Guangrong wrote:
> On 11/03/2015 12:11 AM, Vladimir Sementsov-Ogievskiy wrote:
> >On 02.11.2015 12:13, Xiao Guangrong wrote:
> >>lseek can not work for all block devices as the man page says:
> >>| Some devices are incapable of seeking and POSIX does not specify
> >>| which devices must support lseek().
> >>
> >>This patch tries to add the support on Linux by using BLKGETSIZE64
> >>ioctl
> >>
> >>Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >>---
> >>  util/osdep.c | 20 ++++++++++++++++++++
> >>  1 file changed, 20 insertions(+)
> >>
> >>diff --git a/util/osdep.c b/util/osdep.c
> >>index 5a61e19..b20c793 100644
> >>--- a/util/osdep.c
> >>+++ b/util/osdep.c
> >>@@ -45,6 +45,11 @@
> >>  extern int madvise(caddr_t, size_t, int);
> >>  #endif
> >>+#ifdef CONFIG_LINUX
> >>+#include <sys/ioctl.h>
> >>+#include <linux/fs.h>
> >>+#endif
> >>+
> >>  #include "qemu-common.h"
> >>  #include "qemu/sockets.h"
> >>  #include "qemu/error-report.h"
> >>@@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
> >>  {
> >>      int64_t size;
> >>+#ifdef CONFIG_LINUX
> >>+    struct stat stat_buf;
> >>+    if (fstat(fd, &stat_buf) < 0) {
> >>+        return -errno;
> >>+    }
> >>+
> >>+    if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) {
> >>+        /* The size of block device is larger than max int64_t? */
> >>+        if (size < 0) {
> >>+            return -EOVERFLOW;
> >>+        }
> >>+        return size;
> >>+    }
> >>+#endif
> >>+
> >>      size = lseek(fd, 0, SEEK_END);
> >>      if (size < 0) {
> >>          return -errno;
> >
> >Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> >
> >just a question: is there any use for stat.st_size ? Is it always worse then lseek?
> 
> The man page says:
> The  st_size field gives the size of the file (if it is a regular file or a symbolic link)
> in bytes.  The size of a symbolic link is the length of the pathname it contains, without a
> terminating null byte.
> 
> So it can not work on symbolic link.

stat() and fstat() will get you information for the file pointed by the
symlink. You will get the symlink information only if you use lstat().

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 12/35] util: let qemu_fd_getlength support block device
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-06 15:48     ` Eduardo Habkost
  -1 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-06 15:48 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake, Kevin Wolf,
	qemu-block

As this patch affects raw_getlength(), CCing the raw block driver
maintainer and the qemu-block mailing list.

On Mon, Nov 02, 2015 at 05:13:14PM +0800, Xiao Guangrong wrote:
> lseek can not work for all block devices as the man page says:
> | Some devices are incapable of seeking and POSIX does not specify
> | which devices must support lseek().
> 
> This patch tries to add the support on Linux by using BLKGETSIZE64
> ioctl
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  util/osdep.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/util/osdep.c b/util/osdep.c
> index 5a61e19..b20c793 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -45,6 +45,11 @@
>  extern int madvise(caddr_t, size_t, int);
>  #endif
>  
> +#ifdef CONFIG_LINUX
> +#include <sys/ioctl.h>
> +#include <linux/fs.h>
> +#endif
> +
>  #include "qemu-common.h"
>  #include "qemu/sockets.h"
>  #include "qemu/error-report.h"
> @@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
>  {
>      int64_t size;
>  
> +#ifdef CONFIG_LINUX
> +    struct stat stat_buf;
> +    if (fstat(fd, &stat_buf) < 0) {
> +        return -errno;
> +    }
> +
> +    if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) {
> +        /* The size of block device is larger than max int64_t? */
> +        if (size < 0) {
> +            return -EOVERFLOW;
> +        }
> +        return size;
> +    }
> +#endif
> +
>      size = lseek(fd, 0, SEEK_END);
>      if (size < 0) {
>          return -errno;
> -- 
> 1.8.3.1
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/35] util: let qemu_fd_getlength support block device
@ 2015-11-06 15:48     ` Eduardo Habkost
  0 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-06 15:48 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Kevin Wolf, vsementsov, qemu-block, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, imammedo, pbonzini, dan.j.williams, rth

As this patch affects raw_getlength(), CCing the raw block driver
maintainer and the qemu-block mailing list.

On Mon, Nov 02, 2015 at 05:13:14PM +0800, Xiao Guangrong wrote:
> lseek can not work for all block devices as the man page says:
> | Some devices are incapable of seeking and POSIX does not specify
> | which devices must support lseek().
> 
> This patch tries to add the support on Linux by using BLKGETSIZE64
> ioctl
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  util/osdep.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/util/osdep.c b/util/osdep.c
> index 5a61e19..b20c793 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -45,6 +45,11 @@
>  extern int madvise(caddr_t, size_t, int);
>  #endif
>  
> +#ifdef CONFIG_LINUX
> +#include <sys/ioctl.h>
> +#include <linux/fs.h>
> +#endif
> +
>  #include "qemu-common.h"
>  #include "qemu/sockets.h"
>  #include "qemu/error-report.h"
> @@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
>  {
>      int64_t size;
>  
> +#ifdef CONFIG_LINUX
> +    struct stat stat_buf;
> +    if (fstat(fd, &stat_buf) < 0) {
> +        return -errno;
> +    }
> +
> +    if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) {
> +        /* The size of block device is larger than max int64_t? */
> +        if (size < 0) {
> +            return -EOVERFLOW;
> +        }
> +        return size;
> +    }
> +#endif
> +
>      size = lseek(fd, 0, SEEK_END);
>      if (size < 0) {
>          return -errno;
> -- 
> 1.8.3.1
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 11/35] util: introduce qemu_file_getlength()
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-06 15:50     ` Eduardo Habkost
  -1 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-06 15:50 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake, Kevin Wolf,
	qemu-block

As this patch affects raw_getlength(), CCing the raw block driver
maintainer and the qemu-block mailing list.

On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
> It is used to get the size of the specified file, also qemu_fd_getlength()
> is introduced to unify the code with raw_getlength() in block/raw-posix.c
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  block/raw-posix.c    |  7 +------
>  include/qemu/osdep.h |  2 ++
>  util/osdep.c         | 31 +++++++++++++++++++++++++++++++

I know I was the one who suggested osdep.c, but maybe oslib-posix.c is a
more appropriate place for the new function?


>  3 files changed, 34 insertions(+), 6 deletions(-)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 918c756..734e6dd 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1592,18 +1592,13 @@ static int64_t raw_getlength(BlockDriverState *bs)
>  {
>      BDRVRawState *s = bs->opaque;
>      int ret;
> -    int64_t size;
>  
>      ret = fd_open(bs);
>      if (ret < 0) {
>          return ret;
>      }
>  
> -    size = lseek(s->fd, 0, SEEK_END);
> -    if (size < 0) {
> -        return -errno;
> -    }
> -    return size;
> +    return qemu_fd_getlength(s->fd);
>  }
>  #endif
>  
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index dbc17dc..ca4c3fa 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -303,4 +303,6 @@ int qemu_read_password(char *buf, int buf_size);
>  pid_t qemu_fork(Error **errp);
>  
>  size_t qemu_file_get_page_size(const char *mem_path, Error **errp);
> +int64_t qemu_fd_getlength(int fd);
> +size_t qemu_file_getlength(const char *file, Error **errp);
>  #endif
> diff --git a/util/osdep.c b/util/osdep.c
> index 0092bb6..5a61e19 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -428,3 +428,34 @@ writev(int fd, const struct iovec *iov, int iov_cnt)
>      return readv_writev(fd, iov, iov_cnt, true);
>  }
>  #endif
> +
> +int64_t qemu_fd_getlength(int fd)
> +{
> +    int64_t size;
> +
> +    size = lseek(fd, 0, SEEK_END);
> +    if (size < 0) {
> +        return -errno;
> +    }
> +    return size;
> +}
> +
> +size_t qemu_file_getlength(const char *file, Error **errp)
> +{
> +    int64_t size;
> +    int fd = qemu_open(file, O_RDONLY);
> +
> +    if (fd < 0) {
> +        error_setg_file_open(errp, errno, file);
> +        return 0;
> +    }
> +
> +    size = qemu_fd_getlength(fd);
> +    if (size < 0) {
> +        error_setg_errno(errp, -size, "can't get size of file %s", file);
> +        size = 0;
> +    }
> +
> +    qemu_close(fd);
> +    return size;
> +}
> -- 
> 1.8.3.1
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 11/35] util: introduce qemu_file_getlength()
@ 2015-11-06 15:50     ` Eduardo Habkost
  0 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-06 15:50 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Kevin Wolf, vsementsov, qemu-block, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, imammedo, pbonzini, dan.j.williams, rth

As this patch affects raw_getlength(), CCing the raw block driver
maintainer and the qemu-block mailing list.

On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
> It is used to get the size of the specified file, also qemu_fd_getlength()
> is introduced to unify the code with raw_getlength() in block/raw-posix.c
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  block/raw-posix.c    |  7 +------
>  include/qemu/osdep.h |  2 ++
>  util/osdep.c         | 31 +++++++++++++++++++++++++++++++

I know I was the one who suggested osdep.c, but maybe oslib-posix.c is a
more appropriate place for the new function?


>  3 files changed, 34 insertions(+), 6 deletions(-)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 918c756..734e6dd 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1592,18 +1592,13 @@ static int64_t raw_getlength(BlockDriverState *bs)
>  {
>      BDRVRawState *s = bs->opaque;
>      int ret;
> -    int64_t size;
>  
>      ret = fd_open(bs);
>      if (ret < 0) {
>          return ret;
>      }
>  
> -    size = lseek(s->fd, 0, SEEK_END);
> -    if (size < 0) {
> -        return -errno;
> -    }
> -    return size;
> +    return qemu_fd_getlength(s->fd);
>  }
>  #endif
>  
> diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
> index dbc17dc..ca4c3fa 100644
> --- a/include/qemu/osdep.h
> +++ b/include/qemu/osdep.h
> @@ -303,4 +303,6 @@ int qemu_read_password(char *buf, int buf_size);
>  pid_t qemu_fork(Error **errp);
>  
>  size_t qemu_file_get_page_size(const char *mem_path, Error **errp);
> +int64_t qemu_fd_getlength(int fd);
> +size_t qemu_file_getlength(const char *file, Error **errp);
>  #endif
> diff --git a/util/osdep.c b/util/osdep.c
> index 0092bb6..5a61e19 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -428,3 +428,34 @@ writev(int fd, const struct iovec *iov, int iov_cnt)
>      return readv_writev(fd, iov, iov_cnt, true);
>  }
>  #endif
> +
> +int64_t qemu_fd_getlength(int fd)
> +{
> +    int64_t size;
> +
> +    size = lseek(fd, 0, SEEK_END);
> +    if (size < 0) {
> +        return -errno;
> +    }
> +    return size;
> +}
> +
> +size_t qemu_file_getlength(const char *file, Error **errp)
> +{
> +    int64_t size;
> +    int fd = qemu_open(file, O_RDONLY);
> +
> +    if (fd < 0) {
> +        error_setg_file_open(errp, errno, file);
> +        return 0;
> +    }
> +
> +    size = qemu_fd_getlength(fd);
> +    if (size < 0) {
> +        error_setg_errno(errp, -size, "can't get size of file %s", file);
> +        size = 0;
> +    }
> +
> +    qemu_close(fd);
> +    return size;
> +}
> -- 
> 1.8.3.1
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 12/35] util: let qemu_fd_getlength support block device
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-06 15:54     ` Eduardo Habkost
  -1 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-06 15:54 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Kevin Wolf, vsementsov, qemu-block, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, imammedo, pbonzini, dan.j.williams, rth

On Mon, Nov 02, 2015 at 05:13:14PM +0800, Xiao Guangrong wrote:
> lseek can not work for all block devices as the man page says:
> | Some devices are incapable of seeking and POSIX does not specify
> | which devices must support lseek().
> 
> This patch tries to add the support on Linux by using BLKGETSIZE64
> ioctl
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>

On which cases is this patch necessary? Do you know any examples of
Linux block devices that won't work with lseek(SEEK_END)?

> ---
>  util/osdep.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/util/osdep.c b/util/osdep.c
> index 5a61e19..b20c793 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -45,6 +45,11 @@
>  extern int madvise(caddr_t, size_t, int);
>  #endif
>  
> +#ifdef CONFIG_LINUX
> +#include <sys/ioctl.h>
> +#include <linux/fs.h>
> +#endif
> +
>  #include "qemu-common.h"
>  #include "qemu/sockets.h"
>  #include "qemu/error-report.h"
> @@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
>  {
>      int64_t size;
>  
> +#ifdef CONFIG_LINUX
> +    struct stat stat_buf;
> +    if (fstat(fd, &stat_buf) < 0) {
> +        return -errno;
> +    }
> +
> +    if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) {
> +        /* The size of block device is larger than max int64_t? */
> +        if (size < 0) {
> +            return -EOVERFLOW;
> +        }
> +        return size;
> +    }
> +#endif
> +
>      size = lseek(fd, 0, SEEK_END);
>      if (size < 0) {
>          return -errno;
> -- 
> 1.8.3.1
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/35] util: let qemu_fd_getlength support block device
@ 2015-11-06 15:54     ` Eduardo Habkost
  0 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-06 15:54 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Kevin Wolf, vsementsov, qemu-block, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, imammedo, pbonzini, dan.j.williams, rth

On Mon, Nov 02, 2015 at 05:13:14PM +0800, Xiao Guangrong wrote:
> lseek can not work for all block devices as the man page says:
> | Some devices are incapable of seeking and POSIX does not specify
> | which devices must support lseek().
> 
> This patch tries to add the support on Linux by using BLKGETSIZE64
> ioctl
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>

On which cases is this patch necessary? Do you know any examples of
Linux block devices that won't work with lseek(SEEK_END)?

> ---
>  util/osdep.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/util/osdep.c b/util/osdep.c
> index 5a61e19..b20c793 100644
> --- a/util/osdep.c
> +++ b/util/osdep.c
> @@ -45,6 +45,11 @@
>  extern int madvise(caddr_t, size_t, int);
>  #endif
>  
> +#ifdef CONFIG_LINUX
> +#include <sys/ioctl.h>
> +#include <linux/fs.h>
> +#endif
> +
>  #include "qemu-common.h"
>  #include "qemu/sockets.h"
>  #include "qemu/error-report.h"
> @@ -433,6 +438,21 @@ int64_t qemu_fd_getlength(int fd)
>  {
>      int64_t size;
>  
> +#ifdef CONFIG_LINUX
> +    struct stat stat_buf;
> +    if (fstat(fd, &stat_buf) < 0) {
> +        return -errno;
> +    }
> +
> +    if ((S_ISBLK(stat_buf.st_mode)) && !ioctl(fd, BLKGETSIZE64, &size)) {
> +        /* The size of block device is larger than max int64_t? */
> +        if (size < 0) {
> +            return -EOVERFLOW;
> +        }
> +        return size;
> +    }
> +#endif
> +
>      size = lseek(fd, 0, SEEK_END);
>      if (size < 0) {
>          return -errno;
> -- 
> 1.8.3.1
> 

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 07/35] util: introduce qemu_file_get_page_size()
  2015-11-06 15:36     ` [Qemu-devel] " Eduardo Habkost
@ 2015-11-09  4:36       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-09  4:36 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake



On 11/06/2015 11:36 PM, Eduardo Habkost wrote:
> On Mon, Nov 02, 2015 at 05:13:09PM +0800, Xiao Guangrong wrote:
>> There are three places use the some logic to get the page size on
>> the file path or file fd
>>
>> Windows did not support file hugepage, so it will return normal page
>> for this case. And this interface has not been used on windows so far
>>
>> This patch introduces qemu_file_get_page_size() to unify the code
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> [...]
>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
>> index 914cef5..51437ff 100644
>> --- a/util/oslib-posix.c
>> +++ b/util/oslib-posix.c
>> @@ -340,7 +340,7 @@ static void sigbus_handler(int signal)
>>       siglongjmp(sigjump, 1);
>>   }
>>
>> -static size_t fd_getpagesize(int fd)
>> +static size_t fd_getpagesize(int fd, Error **errp)
>>   {
>>   #ifdef CONFIG_LINUX
>>       struct statfs fs;
>> @@ -351,7 +351,12 @@ static size_t fd_getpagesize(int fd)
>>               ret = fstatfs(fd, &fs);
>>           } while (ret != 0 && errno == EINTR);
>>
>> -        if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {
>> +        if (ret) {
>> +            error_setg_errno(errp, errno, "fstatfs is failed");
>> +            return 0;
>> +        }
>> +
>> +        if (fs.f_type == HUGETLBFS_MAGIC) {
>>               return fs.f_bsize;
>>           }
>
> You are changing os_mem_prealloc() behavior when fstatfs() fails, here.
> Have you ensured there are no cases where fstatfs() fails but this code
> is still expected to work?

stat() is supported for all kinds of files, so failed stat() is caused by
file is not exist or kernel internal error (e,g memory is not enough) or
security check is not passed. Whichever we should not do any operation on
the file if stat() failed. The origin code did not check it but it is worth
being fixed i think.

>
> The change looks safe: gethugepagesize() seems to be always called in
> the path that would make fd_getpagesize() be called from
> os_mem_prealloc(), so allocation would abort much earlier if statfs()
> failed. But I haven't confirmed that yet, and I wanted to be sure.
>

Yes, I am entirely agree with you. :)


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 07/35] util: introduce qemu_file_get_page_size()
@ 2015-11-09  4:36       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-09  4:36 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: vsementsov, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, dan.j.williams, rth



On 11/06/2015 11:36 PM, Eduardo Habkost wrote:
> On Mon, Nov 02, 2015 at 05:13:09PM +0800, Xiao Guangrong wrote:
>> There are three places use the some logic to get the page size on
>> the file path or file fd
>>
>> Windows did not support file hugepage, so it will return normal page
>> for this case. And this interface has not been used on windows so far
>>
>> This patch introduces qemu_file_get_page_size() to unify the code
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> [...]
>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
>> index 914cef5..51437ff 100644
>> --- a/util/oslib-posix.c
>> +++ b/util/oslib-posix.c
>> @@ -340,7 +340,7 @@ static void sigbus_handler(int signal)
>>       siglongjmp(sigjump, 1);
>>   }
>>
>> -static size_t fd_getpagesize(int fd)
>> +static size_t fd_getpagesize(int fd, Error **errp)
>>   {
>>   #ifdef CONFIG_LINUX
>>       struct statfs fs;
>> @@ -351,7 +351,12 @@ static size_t fd_getpagesize(int fd)
>>               ret = fstatfs(fd, &fs);
>>           } while (ret != 0 && errno == EINTR);
>>
>> -        if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {
>> +        if (ret) {
>> +            error_setg_errno(errp, errno, "fstatfs is failed");
>> +            return 0;
>> +        }
>> +
>> +        if (fs.f_type == HUGETLBFS_MAGIC) {
>>               return fs.f_bsize;
>>           }
>
> You are changing os_mem_prealloc() behavior when fstatfs() fails, here.
> Have you ensured there are no cases where fstatfs() fails but this code
> is still expected to work?

stat() is supported for all kinds of files, so failed stat() is caused by
file is not exist or kernel internal error (e,g memory is not enough) or
security check is not passed. Whichever we should not do any operation on
the file if stat() failed. The origin code did not check it but it is worth
being fixed i think.

>
> The change looks safe: gethugepagesize() seems to be always called in
> the path that would make fd_getpagesize() be called from
> os_mem_prealloc(), so allocation would abort much earlier if statfs()
> failed. But I haven't confirmed that yet, and I wanted to be sure.
>

Yes, I am entirely agree with you. :)

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 11/35] util: introduce qemu_file_getlength()
  2015-11-06 15:50     ` [Qemu-devel] " Eduardo Habkost
@ 2015-11-09  4:44       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-09  4:44 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake, Kevin Wolf,
	qemu-block



On 11/06/2015 11:50 PM, Eduardo Habkost wrote:
> As this patch affects raw_getlength(), CCing the raw block driver
> maintainer and the qemu-block mailing list.

Eduardo, thanks for your reminder. I will keep CCing Kevin and qemu-block mail
list for future version.

>
> On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
>> It is used to get the size of the specified file, also qemu_fd_getlength()
>> is introduced to unify the code with raw_getlength() in block/raw-posix.c
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   block/raw-posix.c    |  7 +------
>>   include/qemu/osdep.h |  2 ++
>>   util/osdep.c         | 31 +++++++++++++++++++++++++++++++
>
> I know I was the one who suggested osdep.c, but maybe oslib-posix.c is a
> more appropriate place for the new function?
>

Since the function we introduced here can work on both windows and posix, so
i thing osdep.c is the right place. Otherwise we should implement it for multiple
platforms.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 11/35] util: introduce qemu_file_getlength()
@ 2015-11-09  4:44       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-09  4:44 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Kevin Wolf, vsementsov, qemu-block, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, imammedo, pbonzini, dan.j.williams, rth



On 11/06/2015 11:50 PM, Eduardo Habkost wrote:
> As this patch affects raw_getlength(), CCing the raw block driver
> maintainer and the qemu-block mailing list.

Eduardo, thanks for your reminder. I will keep CCing Kevin and qemu-block mail
list for future version.

>
> On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
>> It is used to get the size of the specified file, also qemu_fd_getlength()
>> is introduced to unify the code with raw_getlength() in block/raw-posix.c
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   block/raw-posix.c    |  7 +------
>>   include/qemu/osdep.h |  2 ++
>>   util/osdep.c         | 31 +++++++++++++++++++++++++++++++
>
> I know I was the one who suggested osdep.c, but maybe oslib-posix.c is a
> more appropriate place for the new function?
>

Since the function we introduced here can work on both windows and posix, so
i thing osdep.c is the right place. Otherwise we should implement it for multiple
platforms.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 12/35] util: let qemu_fd_getlength support block device
  2015-11-06 15:54     ` [Qemu-devel] " Eduardo Habkost
@ 2015-11-09  5:58       ` Xiao Guangrong
  -1 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-09  5:58 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Kevin Wolf, vsementsov, qemu-block, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, imammedo, pbonzini, dan.j.williams, rth



On 11/06/2015 11:54 PM, Eduardo Habkost wrote:
> On Mon, Nov 02, 2015 at 05:13:14PM +0800, Xiao Guangrong wrote:
>> lseek can not work for all block devices as the man page says:
>> | Some devices are incapable of seeking and POSIX does not specify
>> | which devices must support lseek().
>>
>> This patch tries to add the support on Linux by using BLKGETSIZE64
>> ioctl
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>
> On which cases is this patch necessary? Do you know any examples of
> Linux block devices that won't work with lseek(SEEK_END)?

To be honest, i have not checked all block device, this patch was made
based on the man page. However, i do not mind drop this patch (and maybe
other patches) to make this pachset smaller. BLKGETSIZE64 can be added
in the future if we meet such device.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/35] util: let qemu_fd_getlength support block device
@ 2015-11-09  5:58       ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-09  5:58 UTC (permalink / raw)
  To: Eduardo Habkost
  Cc: Kevin Wolf, vsementsov, qemu-block, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, imammedo, pbonzini, dan.j.williams, rth



On 11/06/2015 11:54 PM, Eduardo Habkost wrote:
> On Mon, Nov 02, 2015 at 05:13:14PM +0800, Xiao Guangrong wrote:
>> lseek can not work for all block devices as the man page says:
>> | Some devices are incapable of seeking and POSIX does not specify
>> | which devices must support lseek().
>>
>> This patch tries to add the support on Linux by using BLKGETSIZE64
>> ioctl
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>
> On which cases is this patch necessary? Do you know any examples of
> Linux block devices that won't work with lseek(SEEK_END)?

To be honest, i have not checked all block device, this patch was made
based on the man page. However, i do not mind drop this patch (and maybe
other patches) to make this pachset smaller. BLKGETSIZE64 can be added
in the future if we meet such device.

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI
  2015-11-06  8:31               ` [Qemu-devel] " Xiao Guangrong
  (?)
  (?)
@ 2015-11-09 11:13               ` Igor Mammedov
  2015-11-11  3:01                   ` [Qemu-devel] " Xiao Guangrong
  -1 siblings, 1 reply; 200+ messages in thread
From: Igor Mammedov @ 2015-11-09 11:13 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth

On Fri, 6 Nov 2015 16:31:43 +0800
Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:

> 
> 
> On 11/05/2015 10:49 PM, Igor Mammedov wrote:
> > On Thu, 5 Nov 2015 21:33:39 +0800
> > Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >
> >>
> >>
> >> On 11/05/2015 09:03 PM, Igor Mammedov wrote:
> >>> On Thu, 5 Nov 2015 18:15:31 +0800
> >>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >>>
> >>>>
> >>>>
> >>>> On 11/05/2015 05:58 PM, Igor Mammedov wrote:
> >>>>> On Mon,  2 Nov 2015 17:13:27 +0800
> >>>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
> >>>>>
> >>>>>> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
> >>>>>                                   ^^ missing one 0???
> >>>>>
> >>>>>> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> >>>>>> for detailed design
> >>>>>>
> >>>>>> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
> >>>>>> that controls if nvdimm support is enabled, it is true on default and
> >>>>>> it is false on 2.4 and its earlier version to keep compatibility
> >>>>>>
> >>>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >>>>> [...]
> >>>>>
> >>>>>> @@ -33,6 +33,15 @@
> >>>>>>      */
> >>>>>>     #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
> >>>>>>
> >>>>>> +/*
> >>>>>> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
> >>>>>                                     ^^^ missing 0 or value in define below has an extra 0
> >>>>>
> >>>>>> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
> >>>>>> + * for detailed design.
> >>>>>> + */
> >>>>>> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
> >>>>> it still maps RAM at arbitrary place,
> >>>>> that's the reason why VMGenID patches hasn't been merged for
> >>>>> more than several months.
> >>>>> I'm not against of using (more exactly I'm for it) direct mapping
> >>>>> but we should reach consensus when and how to use it first.
> >>>>>
> >>>>> I'd wouldn't use addresses below 4G as it may be used firmware or
> >>>>> legacy hardware and I won't bet that 0xFF000000ULL won't conflict
> >>>>> with anything.
> >>>>> An alternative place to allocate reserve from could be high memory.
> >>>>> For pc we have "reserved-memory-end" which currently makes sure
> >>>>> that hotpluggable memory range isn't used by firmware.
> >>>>>
> >>>>> How about making API that allows to map additional memory
> >>>>> ranges before reserved-memory-end and pushes it up as mappings are
> >>>>> added.
[...]

> 
> Really a good study case to me, i tried your patch and moved the 64 bit
> staffs to the private method, it worked. :)
> 
> Igor, is this the API you want?

Lets get ack from Michael on the idea of RAM mapping before
"reserved-memory-end" first.
If he rejects it then there isn't any other way except of switching
to MMIO instead.

> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 6bf569a..aba29df 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1291,6 +1291,38 @@ FWCfgState *xen_load_linux(PCMachineState *pcms,
>       return fw_cfg;
>   }
> 
> +static void pc_reserve_high_memory_init(PCMachineState *pcms,
> +                                        uint64_t base, uint64_t align)
> +{
> +    pcms->reserve_high_memory.current_addr = ROUND_UP(base, align);
> +}
> +
> +static uint64_t
> +pc_reserve_high_memory_end(PCMachineState *pcms, int64_t align)
> +{
> +    return ROUND_UP(pcms->reserve_high_memory.current_addr, align);
> +}
> +
> +uint64_t pc_reserve_high_memory(PCMachineState *pcms, uint64_t size,
> +                                int64_t align, Error **errp)
> +{
> +    uint64_t end_addr, current_addr = pcms->reserve_high_memory.current_addr;
> +
> +    if (!current_addr) {
> +        error_setg(errp, "reserved high memory is not initialized.");
> +        return 0;
> +    }
> +
> +    end_addr = pc_reserve_high_memory_end(pcms, align) + size;
> +    if (current_addr > end_addr) {
> +        error_setg(errp, "reserved high memory is not enough.");
> +        return 0;
> +    }
> +
> +    pcms->reserve_high_memory.current_addr = end_addr;
> +    return end_addr;
> +}
> +
>   FWCfgState *pc_memory_init(PCMachineState *pcms,
>                              MemoryRegion *system_memory,
>                              MemoryRegion *rom_memory,
> @@ -1379,6 +1411,8 @@ FWCfgState *pc_memory_init(PCMachineState *pcms,
>                              "hotplug-memory", hotplug_mem_size);
>           memory_region_add_subregion(system_memory, pcms->hotplug_memory.base,
>                                       &pcms->hotplug_memory.mr);
> +        pc_reserve_high_memory_init(pcms, pcms->hotplug_memory.base +
> +                                    hotplug_mem_size, 1ULL << 30);
>       }
> 
>       /* Initialize PC system firmware */
> @@ -1403,7 +1437,7 @@ FWCfgState *pc_memory_init(PCMachineState *pcms,
>           uint64_t res_mem_end = pcms->hotplug_memory.base;
> 
>           if (!pcmc->broken_reserved_end) {
> -            res_mem_end += memory_region_size(&pcms->hotplug_memory.mr);
> +            res_mem_end = pc_reserve_high_memory_end(pcms, 0x1ULL << 30);
>           }
>           *val = cpu_to_le64(ROUND_UP(res_mem_end, 0x1ULL << 30));
>           fw_cfg_add_file(fw_cfg, "etc/reserved-memory-end", val, sizeof(*val));
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 47162cf..fae3fea 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -20,6 +20,11 @@
> 
>   #define HPET_INTCAP "hpet-intcap"
> 
> +struct PCReserveHighMemory {
> +    uint64_t current_addr;
> +};
> +typedef struct PCReserveHighMemory PCReserveHighMemory;
> +
>   /**
>    * PCMachineState:
>    * @acpi_dev: link to ACPI PM device that performs ACPI hotplug handling
> @@ -41,6 +46,7 @@ struct PCMachineState {
>       OnOffAuto smm;
>       bool enforce_aligned_dimm;
>       ram_addr_t below_4g_mem_size, above_4g_mem_size;
> +    PCReserveHighMemory reserve_high_memory;
>   };
> 
>   #define PC_MACHINE_ACPI_DEVICE_PROP "acpi-device"
> @@ -175,6 +181,9 @@ PcGuestInfo *pc_guest_info_init(PCMachineState *pcms);
> 
>   void pc_set_legacy_acpi_data_size(void);
> 
> +uint64_t pc_reserve_high_memory(PCMachineState *pcms, uint64_t size,
> +                                int64_t align, Error **errp);
> +
>   #define PCI_HOST_PROP_PCI_HOLE_START   "pci-hole-start"
>   #define PCI_HOST_PROP_PCI_HOLE_END     "pci-hole-end"
>   #define PCI_HOST_PROP_PCI_HOLE64_START "pci-hole64-start"
> 
> 
> >
> >>>>
> >>>> The luck thing is, the ACPI part is not ABI, we can move it to the high
> >>>> memory if QEMU supports XSDT is ready in future development.
> >>> But mapped control region is ABI and we can't change it if we find out later
> >>> that it breaks something.
> >>
> >> But the ACPI code is completely built by QEMU, which is transparent to guest
> >> and guest should not depend on it, no?
> > unfortunately no, think about cross-version migration.
> 
> It makes sense. Stupid me. :(
> 
> 
> 


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 07/35] util: introduce qemu_file_get_page_size()
  2015-11-09  4:36       ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-09 18:34         ` Eduardo Habkost
  -1 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-09 18:34 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake

On Mon, Nov 09, 2015 at 12:36:36PM +0800, Xiao Guangrong wrote:
> On 11/06/2015 11:36 PM, Eduardo Habkost wrote:
> >On Mon, Nov 02, 2015 at 05:13:09PM +0800, Xiao Guangrong wrote:
> >>There are three places use the some logic to get the page size on
> >>the file path or file fd
> >>
> >>Windows did not support file hugepage, so it will return normal page
> >>for this case. And this interface has not been used on windows so far
> >>
> >>This patch introduces qemu_file_get_page_size() to unify the code
> >>
> >>Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >[...]
> >>diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> >>index 914cef5..51437ff 100644
> >>--- a/util/oslib-posix.c
> >>+++ b/util/oslib-posix.c
> >>@@ -340,7 +340,7 @@ static void sigbus_handler(int signal)
> >>      siglongjmp(sigjump, 1);
> >>  }
> >>
> >>-static size_t fd_getpagesize(int fd)
> >>+static size_t fd_getpagesize(int fd, Error **errp)
> >>  {
> >>  #ifdef CONFIG_LINUX
> >>      struct statfs fs;
> >>@@ -351,7 +351,12 @@ static size_t fd_getpagesize(int fd)
> >>              ret = fstatfs(fd, &fs);
> >>          } while (ret != 0 && errno == EINTR);
> >>
> >>-        if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {
> >>+        if (ret) {
> >>+            error_setg_errno(errp, errno, "fstatfs is failed");
> >>+            return 0;
> >>+        }
> >>+
> >>+        if (fs.f_type == HUGETLBFS_MAGIC) {
> >>              return fs.f_bsize;
> >>          }
> >
> >You are changing os_mem_prealloc() behavior when fstatfs() fails, here.
> >Have you ensured there are no cases where fstatfs() fails but this code
> >is still expected to work?
> 
> stat() is supported for all kinds of files, so failed stat() is caused by
> file is not exist or kernel internal error (e,g memory is not enough) or
> security check is not passed. Whichever we should not do any operation on
> the file if stat() failed. The origin code did not check it but it is worth
> being fixed i think.

Note that this is fstatfs(), not stat(). It's possible go get
ENOSYS as error from statfs() if it is not implemented by the
filesystem, I just don't know if this really can happen in
practice.

(But the answer won't matter, as we already aborted on statfs()
errors on all codepaths that call fd_getpagesize(). See below.)

> 
> >
> >The change looks safe: gethugepagesize() seems to be always called in
> >the path that would make fd_getpagesize() be called from
> >os_mem_prealloc(), so allocation would abort much earlier if statfs()
> >failed. But I haven't confirmed that yet, and I wanted to be sure.
> >
> 
> Yes, I am entirely agree with you. :)
> 

I have just confirmed that this is the case, as:

* fd_getpagesize() is only called from os_mem_prealloc(),
* os_mem_prealloc() is called from:
  * host_memory_backend_set_prealloc()
    * Using memory_region_get_fd() as the fd argument
  * host_memory_backend_memory_complete()
    * Using memory_region_get_fd() as the fd argument
  * file_ram_alloc()
    * After qemu_file_get_page_size()/gethugepagesize() was already called
      in the same fd (with errors checked)
* fd_getpagesize() checks for fd == -1
* The only code that sets the fd for a RAMBlock is file_ram_alloc(),
  which checks for qemu_file_get_page_size()/gethugepagesize()
  errors

So, it was already impossible to get os_mem_prealloc() called
with fd != -1 without having gethugepagesize() called first (and
gethugepagesize() already checked for statfs() errors).

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 07/35] util: introduce qemu_file_get_page_size()
@ 2015-11-09 18:34         ` Eduardo Habkost
  0 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-09 18:34 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: vsementsov, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	imammedo, pbonzini, dan.j.williams, rth

On Mon, Nov 09, 2015 at 12:36:36PM +0800, Xiao Guangrong wrote:
> On 11/06/2015 11:36 PM, Eduardo Habkost wrote:
> >On Mon, Nov 02, 2015 at 05:13:09PM +0800, Xiao Guangrong wrote:
> >>There are three places use the some logic to get the page size on
> >>the file path or file fd
> >>
> >>Windows did not support file hugepage, so it will return normal page
> >>for this case. And this interface has not been used on windows so far
> >>
> >>This patch introduces qemu_file_get_page_size() to unify the code
> >>
> >>Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >[...]
> >>diff --git a/util/oslib-posix.c b/util/oslib-posix.c
> >>index 914cef5..51437ff 100644
> >>--- a/util/oslib-posix.c
> >>+++ b/util/oslib-posix.c
> >>@@ -340,7 +340,7 @@ static void sigbus_handler(int signal)
> >>      siglongjmp(sigjump, 1);
> >>  }
> >>
> >>-static size_t fd_getpagesize(int fd)
> >>+static size_t fd_getpagesize(int fd, Error **errp)
> >>  {
> >>  #ifdef CONFIG_LINUX
> >>      struct statfs fs;
> >>@@ -351,7 +351,12 @@ static size_t fd_getpagesize(int fd)
> >>              ret = fstatfs(fd, &fs);
> >>          } while (ret != 0 && errno == EINTR);
> >>
> >>-        if (ret == 0 && fs.f_type == HUGETLBFS_MAGIC) {
> >>+        if (ret) {
> >>+            error_setg_errno(errp, errno, "fstatfs is failed");
> >>+            return 0;
> >>+        }
> >>+
> >>+        if (fs.f_type == HUGETLBFS_MAGIC) {
> >>              return fs.f_bsize;
> >>          }
> >
> >You are changing os_mem_prealloc() behavior when fstatfs() fails, here.
> >Have you ensured there are no cases where fstatfs() fails but this code
> >is still expected to work?
> 
> stat() is supported for all kinds of files, so failed stat() is caused by
> file is not exist or kernel internal error (e,g memory is not enough) or
> security check is not passed. Whichever we should not do any operation on
> the file if stat() failed. The origin code did not check it but it is worth
> being fixed i think.

Note that this is fstatfs(), not stat(). It's possible go get
ENOSYS as error from statfs() if it is not implemented by the
filesystem, I just don't know if this really can happen in
practice.

(But the answer won't matter, as we already aborted on statfs()
errors on all codepaths that call fd_getpagesize(). See below.)

> 
> >
> >The change looks safe: gethugepagesize() seems to be always called in
> >the path that would make fd_getpagesize() be called from
> >os_mem_prealloc(), so allocation would abort much earlier if statfs()
> >failed. But I haven't confirmed that yet, and I wanted to be sure.
> >
> 
> Yes, I am entirely agree with you. :)
> 

I have just confirmed that this is the case, as:

* fd_getpagesize() is only called from os_mem_prealloc(),
* os_mem_prealloc() is called from:
  * host_memory_backend_set_prealloc()
    * Using memory_region_get_fd() as the fd argument
  * host_memory_backend_memory_complete()
    * Using memory_region_get_fd() as the fd argument
  * file_ram_alloc()
    * After qemu_file_get_page_size()/gethugepagesize() was already called
      in the same fd (with errors checked)
* fd_getpagesize() checks for fd == -1
* The only code that sets the fd for a RAMBlock is file_ram_alloc(),
  which checks for qemu_file_get_page_size()/gethugepagesize()
  errors

So, it was already impossible to get os_mem_prealloc() called
with fd != -1 without having gethugepagesize() called first (and
gethugepagesize() already checked for statfs() errors).

Reviewed-by: Eduardo Habkost <ehabkost@redhat.com>

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 12/35] util: let qemu_fd_getlength support block device
  2015-11-09  5:58       ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-09 18:43         ` Eduardo Habkost
  -1 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-09 18:43 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Kevin Wolf, vsementsov, qemu-block, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, imammedo, pbonzini, dan.j.williams, rth

On Mon, Nov 09, 2015 at 01:58:27PM +0800, Xiao Guangrong wrote:
> 
> 
> On 11/06/2015 11:54 PM, Eduardo Habkost wrote:
> >On Mon, Nov 02, 2015 at 05:13:14PM +0800, Xiao Guangrong wrote:
> >>lseek can not work for all block devices as the man page says:
> >>| Some devices are incapable of seeking and POSIX does not specify
> >>| which devices must support lseek().
> >>
> >>This patch tries to add the support on Linux by using BLKGETSIZE64
> >>ioctl
> >>
> >>Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >
> >On which cases is this patch necessary? Do you know any examples of
> >Linux block devices that won't work with lseek(SEEK_END)?
> 
> To be honest, i have not checked all block device, this patch was made
> based on the man page. However, i do not mind drop this patch (and maybe
> other patches) to make this pachset smaller. BLKGETSIZE64 can be added
> in the future if we meet such device.

By looking at the Linux source code implementing BLKGETSIZE64, it looks
like it should work for all block devices where SEEK_END works:

* BLKGETSIZE64 returns i_size_read(bdev->bd_inode)
  (block/ioctl.c:blkdev_ioctl())
* llseek(SEEK_END) uses i_size_read(bd_inode) as the offset
  (fs/block_dev.c:block_llseek())

That's probably why raw_getlength() never needed a Linux-specific
BLKGETSIZE call.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 12/35] util: let qemu_fd_getlength support block device
@ 2015-11-09 18:43         ` Eduardo Habkost
  0 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-09 18:43 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Kevin Wolf, vsementsov, qemu-block, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, imammedo, pbonzini, dan.j.williams, rth

On Mon, Nov 09, 2015 at 01:58:27PM +0800, Xiao Guangrong wrote:
> 
> 
> On 11/06/2015 11:54 PM, Eduardo Habkost wrote:
> >On Mon, Nov 02, 2015 at 05:13:14PM +0800, Xiao Guangrong wrote:
> >>lseek can not work for all block devices as the man page says:
> >>| Some devices are incapable of seeking and POSIX does not specify
> >>| which devices must support lseek().
> >>
> >>This patch tries to add the support on Linux by using BLKGETSIZE64
> >>ioctl
> >>
> >>Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >
> >On which cases is this patch necessary? Do you know any examples of
> >Linux block devices that won't work with lseek(SEEK_END)?
> 
> To be honest, i have not checked all block device, this patch was made
> based on the man page. However, i do not mind drop this patch (and maybe
> other patches) to make this pachset smaller. BLKGETSIZE64 can be added
> in the future if we meet such device.

By looking at the Linux source code implementing BLKGETSIZE64, it looks
like it should work for all block devices where SEEK_END works:

* BLKGETSIZE64 returns i_size_read(bdev->bd_inode)
  (block/ioctl.c:blkdev_ioctl())
* llseek(SEEK_END) uses i_size_read(bd_inode) as the offset
  (fs/block_dev.c:block_llseek())

That's probably why raw_getlength() never needed a Linux-specific
BLKGETSIZE call.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 11/35] util: introduce qemu_file_getlength()
  2015-11-09  4:44       ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-09 19:21         ` Eduardo Habkost
  -1 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-09 19:21 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: pbonzini, imammedo, gleb, mtosatti, stefanha, mst, rth,
	dan.j.williams, kvm, qemu-devel, vsementsov, eblake, Kevin Wolf,
	qemu-block

On Mon, Nov 09, 2015 at 12:44:55PM +0800, Xiao Guangrong wrote:
> On 11/06/2015 11:50 PM, Eduardo Habkost wrote:
> >As this patch affects raw_getlength(), CCing the raw block driver
> >maintainer and the qemu-block mailing list.
> 
> Eduardo, thanks for your reminder. I will keep CCing Kevin and qemu-block mail
> list for future version.
> 
> >
> >On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
> >>It is used to get the size of the specified file, also qemu_fd_getlength()
> >>is introduced to unify the code with raw_getlength() in block/raw-posix.c
> >>
> >>Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >>---
> >>  block/raw-posix.c    |  7 +------
> >>  include/qemu/osdep.h |  2 ++
> >>  util/osdep.c         | 31 +++++++++++++++++++++++++++++++
> >
> >I know I was the one who suggested osdep.c, but maybe oslib-posix.c is a
> >more appropriate place for the new function?
> >
> 
> Since the function we introduced here can work on both windows and posix, so
> i thing osdep.c is the right place. Otherwise we should implement it for multiple
> platforms.

I didn't notice it was going to be used by a platform-independent
qemu_file_getlength() function in addition to the posix-specific
raw_getlength(). Have you tested it on Windows, though?

If you didn't test it on Windows, what about keeping
qemu_file_getlength() available only on posix, by now? The only
users are raw-posix.c and hostmem-file.c, currently. If in the
future somebody need it on Windows, they can decide between
moving the SEEK_END code to osdep.c (after testing it), or moving
the existing raw-win32.c:raw_getlength() code to a
oslib-win32.c:qemu_file_getlength() function.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 11/35] util: introduce qemu_file_getlength()
@ 2015-11-09 19:21         ` Eduardo Habkost
  0 siblings, 0 replies; 200+ messages in thread
From: Eduardo Habkost @ 2015-11-09 19:21 UTC (permalink / raw)
  To: Xiao Guangrong
  Cc: Kevin Wolf, vsementsov, qemu-block, kvm, mst, gleb, mtosatti,
	qemu-devel, stefanha, imammedo, pbonzini, dan.j.williams, rth

On Mon, Nov 09, 2015 at 12:44:55PM +0800, Xiao Guangrong wrote:
> On 11/06/2015 11:50 PM, Eduardo Habkost wrote:
> >As this patch affects raw_getlength(), CCing the raw block driver
> >maintainer and the qemu-block mailing list.
> 
> Eduardo, thanks for your reminder. I will keep CCing Kevin and qemu-block mail
> list for future version.
> 
> >
> >On Mon, Nov 02, 2015 at 05:13:13PM +0800, Xiao Guangrong wrote:
> >>It is used to get the size of the specified file, also qemu_fd_getlength()
> >>is introduced to unify the code with raw_getlength() in block/raw-posix.c
> >>
> >>Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> >>---
> >>  block/raw-posix.c    |  7 +------
> >>  include/qemu/osdep.h |  2 ++
> >>  util/osdep.c         | 31 +++++++++++++++++++++++++++++++
> >
> >I know I was the one who suggested osdep.c, but maybe oslib-posix.c is a
> >more appropriate place for the new function?
> >
> 
> Since the function we introduced here can work on both windows and posix, so
> i thing osdep.c is the right place. Otherwise we should implement it for multiple
> platforms.

I didn't notice it was going to be used by a platform-independent
qemu_file_getlength() function in addition to the posix-specific
raw_getlength(). Have you tested it on Windows, though?

If you didn't test it on Windows, what about keeping
qemu_file_getlength() available only on posix, by now? The only
users are raw-posix.c and hostmem-file.c, currently. If in the
future somebody need it on Windows, they can decide between
moving the SEEK_END code to osdep.c (after testing it), or moving
the existing raw-win32.c:raw_getlength() code to a
oslib-win32.c:qemu_file_getlength() function.

-- 
Eduardo

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Ask for ACK (was Re: [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI)
  2015-11-09 11:13               ` Igor Mammedov
@ 2015-11-11  3:01                   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-11  3:01 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth



On 11/09/2015 07:13 PM, Igor Mammedov wrote:
> On Fri, 6 Nov 2015 16:31:43 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>>
>>
>> On 11/05/2015 10:49 PM, Igor Mammedov wrote:
>>> On Thu, 5 Nov 2015 21:33:39 +0800
>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>
>>>>
>>>>
>>>> On 11/05/2015 09:03 PM, Igor Mammedov wrote:
>>>>> On Thu, 5 Nov 2015 18:15:31 +0800
>>>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/05/2015 05:58 PM, Igor Mammedov wrote:
>>>>>>> On Mon,  2 Nov 2015 17:13:27 +0800
>>>>>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>>>>>
>>>>>>>> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>>>>>                                    ^^ missing one 0???
>>>>>>>
>>>>>>>> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>>>>>> for detailed design
>>>>>>>>
>>>>>>>> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
>>>>>>>> that controls if nvdimm support is enabled, it is true on default and
>>>>>>>> it is false on 2.4 and its earlier version to keep compatibility
>>>>>>>>
>>>>>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>>>>>> [...]
>>>>>>>
>>>>>>>> @@ -33,6 +33,15 @@
>>>>>>>>       */
>>>>>>>>      #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
>>>>>>>>
>>>>>>>> +/*
>>>>>>>> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>>>>>                                      ^^^ missing 0 or value in define below has an extra 0
>>>>>>>
>>>>>>>> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>>>>>> + * for detailed design.
>>>>>>>> + */
>>>>>>>> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
>>>>>>> it still maps RAM at arbitrary place,
>>>>>>> that's the reason why VMGenID patches hasn't been merged for
>>>>>>> more than several months.
>>>>>>> I'm not against of using (more exactly I'm for it) direct mapping
>>>>>>> but we should reach consensus when and how to use it first.
>>>>>>>
>>>>>>> I'd wouldn't use addresses below 4G as it may be used firmware or
>>>>>>> legacy hardware and I won't bet that 0xFF000000ULL won't conflict
>>>>>>> with anything.
>>>>>>> An alternative place to allocate reserve from could be high memory.
>>>>>>> For pc we have "reserved-memory-end" which currently makes sure
>>>>>>> that hotpluggable memory range isn't used by firmware.
>>>>>>>
>>>>>>> How about making API that allows to map additional memory
>>>>>>> ranges before reserved-memory-end and pushes it up as mappings are
>>>>>>> added.
> [...]
>
>>
>> Really a good study case to me, i tried your patch and moved the 64 bit
>> staffs to the private method, it worked. :)
>>
>> Igor, is this the API you want?
>
> Lets get ack from Michael on the idea of RAM mapping before
> "reserved-memory-end" first.
> If he rejects it then there isn't any other way except of switching
> to MMIO instead.

Michael, what's your idea?

BTW, this is the reason why we prefer to reserve memory space just in case
if you missed the thread:
       http://marc.info/?l=qemu-devel&m=144530844718146&w=2

^ permalink raw reply	[flat|nested] 200+ messages in thread

* [Qemu-devel] Ask for ACK (was Re: [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI)
@ 2015-11-11  3:01                   ` Xiao Guangrong
  0 siblings, 0 replies; 200+ messages in thread
From: Xiao Guangrong @ 2015-11-11  3:01 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: vsementsov, ehabkost, kvm, mst, gleb, mtosatti, qemu-devel,
	stefanha, pbonzini, dan.j.williams, rth



On 11/09/2015 07:13 PM, Igor Mammedov wrote:
> On Fri, 6 Nov 2015 16:31:43 +0800
> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>
>>
>>
>> On 11/05/2015 10:49 PM, Igor Mammedov wrote:
>>> On Thu, 5 Nov 2015 21:33:39 +0800
>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>
>>>>
>>>>
>>>> On 11/05/2015 09:03 PM, Igor Mammedov wrote:
>>>>> On Thu, 5 Nov 2015 18:15:31 +0800
>>>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On 11/05/2015 05:58 PM, Igor Mammedov wrote:
>>>>>>> On Mon,  2 Nov 2015 17:13:27 +0800
>>>>>>> Xiao Guangrong <guangrong.xiao@linux.intel.com> wrote:
>>>>>>>
>>>>>>>> A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>>>>>                                    ^^ missing one 0???
>>>>>>>
>>>>>>>> reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>>>>>> for detailed design
>>>>>>>>
>>>>>>>> A parameter, 'nvdimm-support', is introduced for PIIX4_PM and ICH9-LPC
>>>>>>>> that controls if nvdimm support is enabled, it is true on default and
>>>>>>>> it is false on 2.4 and its earlier version to keep compatibility
>>>>>>>>
>>>>>>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>>>>>> [...]
>>>>>>>
>>>>>>>> @@ -33,6 +33,15 @@
>>>>>>>>       */
>>>>>>>>      #define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
>>>>>>>>
>>>>>>>> +/*
>>>>>>>> + * A page staring from 0xFF00000 and IO port 0x0a18 - 0xa1b in guest are
>>>>>>>                                      ^^^ missing 0 or value in define below has an extra 0
>>>>>>>
>>>>>>>> + * reserved for NVDIMM ACPI emulation, refer to docs/specs/acpi_nvdimm.txt
>>>>>>>> + * for detailed design.
>>>>>>>> + */
>>>>>>>> +#define NVDIMM_ACPI_MEM_BASE          0xFF000000ULL
>>>>>>> it still maps RAM at arbitrary place,
>>>>>>> that's the reason why VMGenID patches hasn't been merged for
>>>>>>> more than several months.
>>>>>>> I'm not against of using (more exactly I'm for it) direct mapping
>>>>>>> but we should reach consensus when and how to use it first.
>>>>>>>
>>>>>>> I'd wouldn't use addresses below 4G as it may be used firmware or
>>>>>>> legacy hardware and I won't bet that 0xFF000000ULL won't conflict
>>>>>>> with anything.
>>>>>>> An alternative place to allocate reserve from could be high memory.
>>>>>>> For pc we have "reserved-memory-end" which currently makes sure
>>>>>>> that hotpluggable memory range isn't used by firmware.
>>>>>>>
>>>>>>> How about making API that allows to map additional memory
>>>>>>> ranges before reserved-memory-end and pushes it up as mappings are
>>>>>>> added.
> [...]
>
>>
>> Really a good study case to me, i tried your patch and moved the 64 bit
>> staffs to the private method, it worked. :)
>>
>> Igor, is this the API you want?
>
> Lets get ack from Michael on the idea of RAM mapping before
> "reserved-memory-end" first.
> If he rejects it then there isn't any other way except of switching
> to MMIO instead.

Michael, what's your idea?

BTW, this is the reason why we prefer to reserve memory space just in case
if you missed the thread:
       http://marc.info/?l=qemu-devel&m=144530844718146&w=2

^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [PATCH v7 23/35] nvdimm: implement NVDIMM device abstract
  2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
@ 2015-11-13 16:53     ` Vladimir Sementsov-Ogievskiy
  -1 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-13 16:53 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: gleb, mtosatti, stefanha, mst, rth, ehabkost, dan.j.williams,
	kvm, qemu-devel, eblake

On 02.11.2015 12:13, Xiao Guangrong wrote:
> Introduce "nvdimm" device which is based on dimm device type
>
> 128K memory region which is the minimum namespace label size
> required by NVDIMM Namespace Spec locates at the end of
> backend memory device is reserved for label data
>
> We can use "-m 1G,maxmem=100G,slots=10 -object memory-backend-file,
> id=mem1,size=1G,mem-path=/dev/pmem0 -device nvdimm,memdev=mem1" to
> create NVDIMM device for guest
>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   default-configs/i386-softmmu.mak   |   1 +
>   default-configs/x86_64-softmmu.mak |   1 +
>   hw/acpi/memory_hotplug.c           |   6 ++
>   hw/mem/Makefile.objs               |   1 +
>   hw/mem/nvdimm.c                    | 116 +++++++++++++++++++++++++++++++++++++
>   include/hw/mem/nvdimm.h            |  83 ++++++++++++++++++++++++++
>   6 files changed, 208 insertions(+)
>   create mode 100644 hw/mem/nvdimm.c
>   create mode 100644 include/hw/mem/nvdimm.h
>
> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> index 3ece8bb..4e84a1c 100644
> --- a/default-configs/i386-softmmu.mak
> +++ b/default-configs/i386-softmmu.mak
> @@ -47,6 +47,7 @@ CONFIG_APIC=y
>   CONFIG_IOAPIC=y
>   CONFIG_PVPANIC=y
>   CONFIG_MEM_HOTPLUG=y
> +CONFIG_NVDIMM=y
>   CONFIG_XIO3130=y
>   CONFIG_IOH3420=y
>   CONFIG_I82801B11=y
> diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> index 92ea7c1..e877a86 100644
> --- a/default-configs/x86_64-softmmu.mak
> +++ b/default-configs/x86_64-softmmu.mak
> @@ -47,6 +47,7 @@ CONFIG_APIC=y
>   CONFIG_IOAPIC=y
>   CONFIG_PVPANIC=y
>   CONFIG_MEM_HOTPLUG=y
> +CONFIG_NVDIMM=y
>   CONFIG_XIO3130=y
>   CONFIG_IOH3420=y
>   CONFIG_I82801B11=y
> diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
> index 20d3093..bb5a29f 100644
> --- a/hw/acpi/memory_hotplug.c
> +++ b/hw/acpi/memory_hotplug.c
> @@ -1,6 +1,7 @@
>   #include "hw/acpi/memory_hotplug.h"
>   #include "hw/acpi/pc-hotplug.h"
>   #include "hw/mem/dimm.h"
> +#include "hw/mem/nvdimm.h"
>   #include "hw/boards.h"
>   #include "hw/qdev-core.h"
>   #include "trace.h"
> @@ -231,6 +232,11 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st,
>   {
>       MemStatus *mdev;
>   
> +    /* Currently, NVDIMM hotplug has not been supported yet. */
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
> +        return;
> +    }
> +

hmm, should not check for DIMM_GET_CLASS(dev)->hotpluggable be used here 
instead for more common case covering?

>       mdev = acpi_memory_slot_status(mem_st, dev, errp);
>       if (!mdev) {
>           return;
> diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
> index cebb4b1..12d9b72 100644
> --- a/hw/mem/Makefile.objs
> +++ b/hw/mem/Makefile.objs
> @@ -1,2 +1,3 @@
>   common-obj-$(CONFIG_DIMM) += dimm.o
>   common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
> +common-obj-$(CONFIG_NVDIMM) += nvdimm.o
> diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> new file mode 100644
> index 0000000..c310887
> --- /dev/null
> +++ b/hw/mem/nvdimm.c
> @@ -0,0 +1,116 @@
> +/*
> + * Non-Volatile Dual In-line Memory Module Virtualization Implementation
> + *
> + * Copyright(C) 2015 Intel Corporation.
> + *
> + * Author:
> + *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
> + *
> + * Currently, it only supports PMEM Virtualization.
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> + */
> +
> +#include "qapi/visitor.h"
> +#include "hw/mem/nvdimm.h"
> +
> +static MemoryRegion *nvdimm_get_memory_region(DIMMDevice *dimm)
> +{
> +    NVDIMMDevice *nvdimm = NVDIMM(dimm);
> +
> +    /* plug a NVDIMM device which is not properly realized? */
> +    assert(memory_region_size(&nvdimm->nvdimm_mr));
> +
> +    return &nvdimm->nvdimm_mr;
> +}
> +
> +static void nvdimm_realize(DIMMDevice *dimm, Error **errp)
> +{
> +    MemoryRegion *mr;
> +    NVDIMMDevice *nvdimm = NVDIMM(dimm);
> +    uint64_t size;
> +
> +    nvdimm->label_size = MIN_NAMESPACE_LABEL_SIZE;
> +
> +    mr = host_memory_backend_get_memory(dimm->hostmem, errp);
> +    size = memory_region_size(mr);
> +
> +    if (size <= nvdimm->label_size) {
> +        char *path = object_get_canonical_path_component(OBJECT(dimm->hostmem));
> +        error_setg(errp, "the size of memdev %s (0x%" PRIx64 ") is too small"
> +                   " to contain nvdimm namespace label (0x%" PRIx64 ")", path,
> +                   memory_region_size(mr), nvdimm->label_size);
> +        return;
> +    }
> +
> +    memory_region_init_alias(&nvdimm->nvdimm_mr, OBJECT(dimm), "nvdimm-memory",
> +                             mr, 0, size - nvdimm->label_size);
> +    nvdimm->label_data = memory_region_get_ram_ptr(mr) +
> +                         memory_region_size(&nvdimm->nvdimm_mr);
> +}
> +
> +static void nvdimm_read_label_data(NVDIMMDevice *nvdimm, void *buf,
> +                                   uint64_t size, uint64_t offset)
> +{
> +    assert((nvdimm->label_size >= size + offset) && (offset + size > offset));
> +
> +    memcpy(buf, nvdimm->label_data + offset, size);
> +}
> +
> +static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf,
> +                                    uint64_t size, uint64_t offset)
> +{
> +    MemoryRegion *mr;
> +    DIMMDevice *dimm = DIMM(nvdimm);
> +    uint64_t backend_offset;
> +
> +    assert((nvdimm->label_size >= size + offset) && (offset + size > offset));
> +
> +    memcpy(nvdimm->label_data + offset, buf, size);
> +
> +    mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort);
> +    backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
> +    memory_region_set_dirty(mr, backend_offset, size);
> +}
> +
> +static void nvdimm_class_init(ObjectClass *oc, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(oc);
> +    DIMMDeviceClass *ddc = DIMM_CLASS(oc);
> +    NVDIMMClass *nvc = NVDIMM_CLASS(oc);
> +
> +    /* nvdimm hotplug has not been supported yet. */
> +    dc->hotpluggable = false;
> +
> +    ddc->realize = nvdimm_realize;
> +    ddc->get_memory_region = nvdimm_get_memory_region;
> +
> +    nvc->read_label_data = nvdimm_read_label_data;
> +    nvc->write_label_data = nvdimm_write_label_data;
> +}
> +
> +static TypeInfo nvdimm_info = {
> +    .name          = TYPE_NVDIMM,
> +    .parent        = TYPE_DIMM,
> +    .instance_size = sizeof(NVDIMMDevice),
> +    .class_init    = nvdimm_class_init,
> +    .class_size    = sizeof(NVDIMMClass),
> +};
> +
> +static void nvdimm_register_types(void)
> +{
> +    type_register_static(&nvdimm_info);
> +}
> +
> +type_init(nvdimm_register_types)
> diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
> new file mode 100644
> index 0000000..cd90957
> --- /dev/null
> +++ b/include/hw/mem/nvdimm.h
> @@ -0,0 +1,83 @@
> +/*
> + * Non-Volatile Dual In-line Memory Module Virtualization Implementation
> + *
> + * Copyright(C) 2015 Intel Corporation.
> + *
> + * Author:
> + *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
> + *
> + * NVDIMM specifications and some documents can be found at:
> + * NVDIMM ACPI device and NFIT are introduced in ACPI 6:
> + *      http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
> + * NVDIMM Namespace specification:
> + *      http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
> + * DSM Interface Example:
> + *      http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
> + * Driver Writer's Guide:
> + *      http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_NVDIMM_H
> +#define QEMU_NVDIMM_H
> +
> +#include "hw/mem/dimm.h"
> +
> +/*
> + * The minimum label data size is required by NVDIMM Namespace
> + * specification, please refer to chapter 2 Namespaces:
> + *   "NVDIMMs following the NVDIMM Block Mode Specification use an area
> + *    at least 128KB in size, which holds around 1000 labels."
> + */
> +#define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
> +
> +#define TYPE_NVDIMM      "nvdimm"
> +#define NVDIMM(obj)      OBJECT_CHECK(NVDIMMDevice, (obj), TYPE_NVDIMM)
> +#define NVDIMM_CLASS(oc) OBJECT_CLASS_CHECK(NVDIMMClass, (oc), TYPE_NVDIMM)
> +#define NVDIMM_GET_CLASS(obj) OBJECT_GET_CLASS(NVDIMMClass, (obj), \
> +                                               TYPE_NVDIMM)
> +
> +struct NVDIMMDevice {
> +    /* private */
> +    DIMMDevice parent_obj;
> +
> +    /* public */
> +
> +    /*
> +     * the size of label data in NVDIMM device which is presented to
> +     * guest via __DSM "Get Namespace Label Size" command.
> +     */
> +    uint64_t label_size;
> +
> +    /*
> +     * the address of label data which is read by __DSM "Get Namespace
> +     * Label Data" command and written by __DSM "Set Namespace Label
> +     * Data" command.
> +     */
> +    void *label_data;
> +
> +    /*
> +     * it's the PMEM region in NVDIMM device, which is presented to
> +     * guest via ACPI NFIT and _FIT method if NVDIMM hotplug is supported.
> +     */
> +    MemoryRegion nvdimm_mr;
> +};
> +typedef struct NVDIMMDevice NVDIMMDevice;
> +
> +struct NVDIMMClass {
> +    /* private */
> +    DIMMDeviceClass parent_class;
> +
> +    /* public */
> +    /* read @size bytes from NVDIMM label data at @offset into @buf. */
> +    void (*read_label_data)(NVDIMMDevice *nvdimm, void *buf,
> +                            uint64_t size, uint64_t offset);
> +    /* write @size bytes from @buf to NVDIMM label data at @offset. */
> +    void (*write_label_data)(NVDIMMDevice *nvdimm, const void *buf,
> +                             uint64_t size, uint64_t offset);
> +};
> +typedef struct NVDIMMClass NVDIMMClass;
> +
> +#endif


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.


^ permalink raw reply	[flat|nested] 200+ messages in thread

* Re: [Qemu-devel] [PATCH v7 23/35] nvdimm: implement NVDIMM device abstract
@ 2015-11-13 16:53     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 200+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2015-11-13 16:53 UTC (permalink / raw)
  To: Xiao Guangrong, pbonzini, imammedo
  Cc: ehabkost, kvm, mst, gleb, mtosatti, qemu-devel, stefanha,
	dan.j.williams, rth

On 02.11.2015 12:13, Xiao Guangrong wrote:
> Introduce "nvdimm" device which is based on dimm device type
>
> 128K memory region which is the minimum namespace label size
> required by NVDIMM Namespace Spec locates at the end of
> backend memory device is reserved for label data
>
> We can use "-m 1G,maxmem=100G,slots=10 -object memory-backend-file,
> id=mem1,size=1G,mem-path=/dev/pmem0 -device nvdimm,memdev=mem1" to
> create NVDIMM device for guest
>
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>   default-configs/i386-softmmu.mak   |   1 +
>   default-configs/x86_64-softmmu.mak |   1 +
>   hw/acpi/memory_hotplug.c           |   6 ++
>   hw/mem/Makefile.objs               |   1 +
>   hw/mem/nvdimm.c                    | 116 +++++++++++++++++++++++++++++++++++++
>   include/hw/mem/nvdimm.h            |  83 ++++++++++++++++++++++++++
>   6 files changed, 208 insertions(+)
>   create mode 100644 hw/mem/nvdimm.c
>   create mode 100644 include/hw/mem/nvdimm.h
>
> diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
> index 3ece8bb..4e84a1c 100644
> --- a/default-configs/i386-softmmu.mak
> +++ b/default-configs/i386-softmmu.mak
> @@ -47,6 +47,7 @@ CONFIG_APIC=y
>   CONFIG_IOAPIC=y
>   CONFIG_PVPANIC=y
>   CONFIG_MEM_HOTPLUG=y
> +CONFIG_NVDIMM=y
>   CONFIG_XIO3130=y
>   CONFIG_IOH3420=y
>   CONFIG_I82801B11=y
> diff --git a/default-configs/x86_64-softmmu.mak b/default-configs/x86_64-softmmu.mak
> index 92ea7c1..e877a86 100644
> --- a/default-configs/x86_64-softmmu.mak
> +++ b/default-configs/x86_64-softmmu.mak
> @@ -47,6 +47,7 @@ CONFIG_APIC=y
>   CONFIG_IOAPIC=y
>   CONFIG_PVPANIC=y
>   CONFIG_MEM_HOTPLUG=y
> +CONFIG_NVDIMM=y
>   CONFIG_XIO3130=y
>   CONFIG_IOH3420=y
>   CONFIG_I82801B11=y
> diff --git a/hw/acpi/memory_hotplug.c b/hw/acpi/memory_hotplug.c
> index 20d3093..bb5a29f 100644
> --- a/hw/acpi/memory_hotplug.c
> +++ b/hw/acpi/memory_hotplug.c
> @@ -1,6 +1,7 @@
>   #include "hw/acpi/memory_hotplug.h"
>   #include "hw/acpi/pc-hotplug.h"
>   #include "hw/mem/dimm.h"
> +#include "hw/mem/nvdimm.h"
>   #include "hw/boards.h"
>   #include "hw/qdev-core.h"
>   #include "trace.h"
> @@ -231,6 +232,11 @@ void acpi_memory_plug_cb(ACPIREGS *ar, qemu_irq irq, MemHotplugState *mem_st,
>   {
>       MemStatus *mdev;
>   
> +    /* Currently, NVDIMM hotplug has not been supported yet. */
> +    if (object_dynamic_cast(OBJECT(dev), TYPE_NVDIMM)) {
> +        return;
> +    }
> +

hmm, should not check for DIMM_GET_CLASS(dev)->hotpluggable be used here 
instead for more common case covering?

>       mdev = acpi_memory_slot_status(mem_st, dev, errp);
>       if (!mdev) {
>           return;
> diff --git a/hw/mem/Makefile.objs b/hw/mem/Makefile.objs
> index cebb4b1..12d9b72 100644
> --- a/hw/mem/Makefile.objs
> +++ b/hw/mem/Makefile.objs
> @@ -1,2 +1,3 @@
>   common-obj-$(CONFIG_DIMM) += dimm.o
>   common-obj-$(CONFIG_MEM_HOTPLUG) += pc-dimm.o
> +common-obj-$(CONFIG_NVDIMM) += nvdimm.o
> diff --git a/hw/mem/nvdimm.c b/hw/mem/nvdimm.c
> new file mode 100644
> index 0000000..c310887
> --- /dev/null
> +++ b/hw/mem/nvdimm.c
> @@ -0,0 +1,116 @@
> +/*
> + * Non-Volatile Dual In-line Memory Module Virtualization Implementation
> + *
> + * Copyright(C) 2015 Intel Corporation.
> + *
> + * Author:
> + *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
> + *
> + * Currently, it only supports PMEM Virtualization.
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
> + */
> +
> +#include "qapi/visitor.h"
> +#include "hw/mem/nvdimm.h"
> +
> +static MemoryRegion *nvdimm_get_memory_region(DIMMDevice *dimm)
> +{
> +    NVDIMMDevice *nvdimm = NVDIMM(dimm);
> +
> +    /* plug a NVDIMM device which is not properly realized? */
> +    assert(memory_region_size(&nvdimm->nvdimm_mr));
> +
> +    return &nvdimm->nvdimm_mr;
> +}
> +
> +static void nvdimm_realize(DIMMDevice *dimm, Error **errp)
> +{
> +    MemoryRegion *mr;
> +    NVDIMMDevice *nvdimm = NVDIMM(dimm);
> +    uint64_t size;
> +
> +    nvdimm->label_size = MIN_NAMESPACE_LABEL_SIZE;
> +
> +    mr = host_memory_backend_get_memory(dimm->hostmem, errp);
> +    size = memory_region_size(mr);
> +
> +    if (size <= nvdimm->label_size) {
> +        char *path = object_get_canonical_path_component(OBJECT(dimm->hostmem));
> +        error_setg(errp, "the size of memdev %s (0x%" PRIx64 ") is too small"
> +                   " to contain nvdimm namespace label (0x%" PRIx64 ")", path,
> +                   memory_region_size(mr), nvdimm->label_size);
> +        return;
> +    }
> +
> +    memory_region_init_alias(&nvdimm->nvdimm_mr, OBJECT(dimm), "nvdimm-memory",
> +                             mr, 0, size - nvdimm->label_size);
> +    nvdimm->label_data = memory_region_get_ram_ptr(mr) +
> +                         memory_region_size(&nvdimm->nvdimm_mr);
> +}
> +
> +static void nvdimm_read_label_data(NVDIMMDevice *nvdimm, void *buf,
> +                                   uint64_t size, uint64_t offset)
> +{
> +    assert((nvdimm->label_size >= size + offset) && (offset + size > offset));
> +
> +    memcpy(buf, nvdimm->label_data + offset, size);
> +}
> +
> +static void nvdimm_write_label_data(NVDIMMDevice *nvdimm, const void *buf,
> +                                    uint64_t size, uint64_t offset)
> +{
> +    MemoryRegion *mr;
> +    DIMMDevice *dimm = DIMM(nvdimm);
> +    uint64_t backend_offset;
> +
> +    assert((nvdimm->label_size >= size + offset) && (offset + size > offset));
> +
> +    memcpy(nvdimm->label_data + offset, buf, size);
> +
> +    mr = host_memory_backend_get_memory(dimm->hostmem, &error_abort);
> +    backend_offset = memory_region_size(mr) - nvdimm->label_size + offset;
> +    memory_region_set_dirty(mr, backend_offset, size);
> +}
> +
> +static void nvdimm_class_init(ObjectClass *oc, void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(oc);
> +    DIMMDeviceClass *ddc = DIMM_CLASS(oc);
> +    NVDIMMClass *nvc = NVDIMM_CLASS(oc);
> +
> +    /* nvdimm hotplug has not been supported yet. */
> +    dc->hotpluggable = false;
> +
> +    ddc->realize = nvdimm_realize;
> +    ddc->get_memory_region = nvdimm_get_memory_region;
> +
> +    nvc->read_label_data = nvdimm_read_label_data;
> +    nvc->write_label_data = nvdimm_write_label_data;
> +}
> +
> +static TypeInfo nvdimm_info = {
> +    .name          = TYPE_NVDIMM,
> +    .parent        = TYPE_DIMM,
> +    .instance_size = sizeof(NVDIMMDevice),
> +    .class_init    = nvdimm_class_init,
> +    .class_size    = sizeof(NVDIMMClass),
> +};
> +
> +static void nvdimm_register_types(void)
> +{
> +    type_register_static(&nvdimm_info);
> +}
> +
> +type_init(nvdimm_register_types)
> diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
> new file mode 100644
> index 0000000..cd90957
> --- /dev/null
> +++ b/include/hw/mem/nvdimm.h
> @@ -0,0 +1,83 @@
> +/*
> + * Non-Volatile Dual In-line Memory Module Virtualization Implementation
> + *
> + * Copyright(C) 2015 Intel Corporation.
> + *
> + * Author:
> + *  Xiao Guangrong <guangrong.xiao@linux.intel.com>
> + *
> + * NVDIMM specifications and some documents can be found at:
> + * NVDIMM ACPI device and NFIT are introduced in ACPI 6:
> + *      http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
> + * NVDIMM Namespace specification:
> + *      http://pmem.io/documents/NVDIMM_Namespace_Spec.pdf
> + * DSM Interface Example:
> + *      http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
> + * Driver Writer's Guide:
> + *      http://pmem.io/documents/NVDIMM_Driver_Writers_Guide.pdf
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_NVDIMM_H
> +#define QEMU_NVDIMM_H
> +
> +#include "hw/mem/dimm.h"
> +
> +/*
> + * The minimum label data size is required by NVDIMM Namespace
> + * specification, please refer to chapter 2 Namespaces:
> + *   "NVDIMMs following the NVDIMM Block Mode Specification use an area
> + *    at least 128KB in size, which holds around 1000 labels."
> + */
> +#define MIN_NAMESPACE_LABEL_SIZE      (128UL << 10)
> +
> +#define TYPE_NVDIMM      "nvdimm"
> +#define NVDIMM(obj)      OBJECT_CHECK(NVDIMMDevice, (obj), TYPE_NVDIMM)
> +#define NVDIMM_CLASS(oc) OBJECT_CLASS_CHECK(NVDIMMClass, (oc), TYPE_NVDIMM)
> +#define NVDIMM_GET_CLASS(obj) OBJECT_GET_CLASS(NVDIMMClass, (obj), \
> +                                               TYPE_NVDIMM)
> +
> +struct NVDIMMDevice {
> +    /* private */
> +    DIMMDevice parent_obj;
> +
> +    /* public */
> +
> +    /*
> +     * the size of label data in NVDIMM device which is presented to
> +     * guest via __DSM "Get Namespace Label Size" command.
> +     */
> +    uint64_t label_size;
> +
> +    /*
> +     * the address of label data which is read by __DSM "Get Namespace
> +     * Label Data" command and written by __DSM "Set Namespace Label
> +     * Data" command.
> +     */
> +    void *label_data;
> +
> +    /*
> +     * it's the PMEM region in NVDIMM device, which is presented to
> +     * guest via ACPI NFIT and _FIT method if NVDIMM hotplug is supported.
> +     */
> +    MemoryRegion nvdimm_mr;
> +};
> +typedef struct NVDIMMDevice NVDIMMDevice;
> +
> +struct NVDIMMClass {
> +    /* private */
> +    DIMMDeviceClass parent_class;
> +
> +    /* public */
> +    /* read @size bytes from NVDIMM label data at @offset into @buf. */
> +    void (*read_label_data)(NVDIMMDevice *nvdimm, void *buf,
> +                            uint64_t size, uint64_t offset);
> +    /* write @size bytes from @buf to NVDIMM label data at @offset. */
> +    void (*write_label_data)(NVDIMMDevice *nvdimm, const void *buf,
> +                             uint64_t size, uint64_t offset);
> +};
> +typedef struct NVDIMMClass NVDIMMClass;
> +
> +#endif


-- 
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.

^ permalink raw reply	[flat|nested] 200+ messages in thread

end of thread, other threads:[~2015-11-13 16:54 UTC | newest]

Thread overview: 200+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-02  9:13 [PATCH v7 00/35] implement vNVDIMM Xiao Guangrong
2015-11-02  9:13 ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 01/35] acpi: add aml_derefof Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 02/35] acpi: add aml_sizeof Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 03/35] acpi: add aml_create_field Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-03  6:14   ` Shannon Zhao
2015-11-03  6:14     ` [Qemu-devel] " Shannon Zhao
2015-11-03 14:52     ` Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 04/35] acpi: add aml_concatenate Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 05/35] acpi: add aml_object_type Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 06/35] acpi: add aml_method_serialized Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-03 12:30   ` Igor Mammedov
2015-11-03 12:30     ` Igor Mammedov
2015-11-03 13:27     ` Xiao Guangrong
2015-11-03 13:27       ` Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 07/35] util: introduce qemu_file_get_page_size() Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02 13:56   ` Vladimir Sementsov-Ogievskiy
2015-11-02 13:56     ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2015-11-06 15:36   ` Eduardo Habkost
2015-11-06 15:36     ` [Qemu-devel] " Eduardo Habkost
2015-11-09  4:36     ` Xiao Guangrong
2015-11-09  4:36       ` [Qemu-devel] " Xiao Guangrong
2015-11-09 18:34       ` Eduardo Habkost
2015-11-09 18:34         ` [Qemu-devel] " Eduardo Habkost
2015-11-02  9:13 ` [PATCH v7 08/35] exec: allow memory to be allocated from any kind of path Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02 14:51   ` Vladimir Sementsov-Ogievskiy
2015-11-02 14:51     ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2015-11-02 15:22     ` Xiao Guangrong
2015-11-02 15:22       ` [Qemu-devel] " Xiao Guangrong
2015-11-02 15:52       ` Vladimir Sementsov-Ogievskiy
2015-11-02 15:52         ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2015-11-03 23:00   ` Eduardo Habkost
2015-11-03 23:00     ` [Qemu-devel] " Eduardo Habkost
2015-11-04  3:12     ` Xiao Guangrong
2015-11-04  3:12       ` [Qemu-devel] " Xiao Guangrong
2015-11-04 12:40       ` Eduardo Habkost
2015-11-04 12:40         ` [Qemu-devel] " Eduardo Habkost
2015-11-04 14:22         ` Xiao Guangrong
2015-11-04 14:22           ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 09/35] exec: allow file_ram_alloc to work on file Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02 15:12   ` Vladimir Sementsov-Ogievskiy
2015-11-02 15:12     ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2015-11-02 15:25     ` Xiao Guangrong
2015-11-02 15:25       ` [Qemu-devel] " Xiao Guangrong
2015-11-02 15:58       ` Vladimir Sementsov-Ogievskiy
2015-11-02 15:58         ` Vladimir Sementsov-Ogievskiy
2015-11-02 21:12   ` Paolo Bonzini
2015-11-02 21:12     ` [Qemu-devel] " Paolo Bonzini
2015-11-03  3:56     ` Xiao Guangrong
2015-11-03  3:56       ` [Qemu-devel] " Xiao Guangrong
2015-11-03 13:55       ` Paolo Bonzini
2015-11-03 13:55         ` [Qemu-devel] " Paolo Bonzini
2015-11-03 14:26         ` Xiao Guangrong
2015-11-03 14:26           ` [Qemu-devel] " Xiao Guangrong
2015-11-03 12:34   ` Igor Mammedov
2015-11-03 12:34     ` [Qemu-devel] " Igor Mammedov
2015-11-03 13:32     ` Xiao Guangrong
2015-11-03 13:32       ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 10/35] hostmem-file: clean up memory allocation Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 11/35] util: introduce qemu_file_getlength() Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02 15:26   ` Vladimir Sementsov-Ogievskiy
2015-11-02 15:26     ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2015-11-03 23:21   ` Eduardo Habkost
2015-11-03 23:21     ` [Qemu-devel] " Eduardo Habkost
2015-11-04  3:17     ` Xiao Guangrong
2015-11-04  3:17       ` [Qemu-devel] " Xiao Guangrong
2015-11-04 14:44       ` Eduardo Habkost
2015-11-04 14:44         ` [Qemu-devel] " Eduardo Habkost
2015-11-04 14:44         ` Xiao Guangrong
2015-11-04 14:44           ` [Qemu-devel] " Xiao Guangrong
2015-11-06 15:50   ` Eduardo Habkost
2015-11-06 15:50     ` [Qemu-devel] " Eduardo Habkost
2015-11-09  4:44     ` Xiao Guangrong
2015-11-09  4:44       ` [Qemu-devel] " Xiao Guangrong
2015-11-09 19:21       ` Eduardo Habkost
2015-11-09 19:21         ` [Qemu-devel] " Eduardo Habkost
2015-11-02  9:13 ` [PATCH v7 12/35] util: let qemu_fd_getlength support block device Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02 16:11   ` Vladimir Sementsov-Ogievskiy
2015-11-02 16:11     ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2015-11-02 16:21     ` Xiao Guangrong
2015-11-02 16:21       ` [Qemu-devel] " Xiao Guangrong
2015-11-06 15:44       ` Eduardo Habkost
2015-11-06 15:44         ` [Qemu-devel] " Eduardo Habkost
2015-11-06 15:48   ` Eduardo Habkost
2015-11-06 15:48     ` [Qemu-devel] " Eduardo Habkost
2015-11-06 15:54   ` Eduardo Habkost
2015-11-06 15:54     ` [Qemu-devel] " Eduardo Habkost
2015-11-09  5:58     ` Xiao Guangrong
2015-11-09  5:58       ` [Qemu-devel] " Xiao Guangrong
2015-11-09 18:43       ` Eduardo Habkost
2015-11-09 18:43         ` [Qemu-devel] " Eduardo Habkost
2015-11-02  9:13 ` [PATCH v7 13/35] hostmem-file: use whole file size if possible Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02 17:09   ` Vladimir Sementsov-Ogievskiy
2015-11-02 17:09     ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2015-11-03 14:51     ` Xiao Guangrong
2015-11-03 14:51       ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 14/35] pc-dimm: remove DEFAULT_PC_DIMMSIZE Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 15/35] pc-dimm: make pc_existing_dimms_capacity static and rename it Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 16/35] pc-dimm: drop the prefix of pc-dimm Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 17/35] stubs: rename qmp_pc_dimm_device_list.c Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 18/35] pc-dimm: rename pc-dimm.c and pc-dimm.h Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 19/35] dimm: abstract dimm device from pc-dimm Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 20/35] dimm: get mapped memory region from DIMMDeviceClass->get_memory_region Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02 12:19   ` Vladimir Sementsov-Ogievskiy
2015-11-02 12:19     ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2015-11-02 13:08     ` Xiao Guangrong
2015-11-02 13:08       ` [Qemu-devel] " Xiao Guangrong
2015-11-02 14:26       ` Vladimir Sementsov-Ogievskiy
2015-11-02 14:26         ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2015-11-02 15:06         ` Xiao Guangrong
2015-11-02 15:06           ` [Qemu-devel] " Xiao Guangrong
2015-11-02 16:16           ` Vladimir Sementsov-Ogievskiy
2015-11-02 16:16             ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2015-11-03 14:47             ` Xiao Guangrong
2015-11-03 14:47               ` [Qemu-devel] " Xiao Guangrong
2015-11-05  8:53               ` Vladimir Sementsov-Ogievskiy
2015-11-05  8:53                 ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2015-11-05 17:29   ` Eduardo Habkost
2015-11-05 17:29     ` [Qemu-devel] " Eduardo Habkost
2015-11-06  2:50     ` Xiao Guangrong
2015-11-06  2:50       ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 21/35] dimm: keep the state of the whole backend memory Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 22/35] dimm: introduce realize callback Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 23/35] nvdimm: implement NVDIMM device abstract Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-13 16:53   ` Vladimir Sementsov-Ogievskiy
2015-11-13 16:53     ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2015-11-02  9:13 ` [PATCH v7 24/35] docs: add NVDIMM ACPI documentation Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-05  9:58   ` Igor Mammedov
2015-11-05  9:58     ` [Qemu-devel] " Igor Mammedov
2015-11-05 10:15     ` Xiao Guangrong
2015-11-05 10:15       ` [Qemu-devel] " Xiao Guangrong
2015-11-05 13:03       ` Igor Mammedov
2015-11-05 13:03         ` [Qemu-devel] " Igor Mammedov
2015-11-05 13:33         ` Xiao Guangrong
2015-11-05 13:33           ` [Qemu-devel] " Xiao Guangrong
2015-11-05 14:49           ` Igor Mammedov
2015-11-05 14:49             ` [Qemu-devel] " Igor Mammedov
2015-11-06  8:31             ` Xiao Guangrong
2015-11-06  8:31               ` [Qemu-devel] " Xiao Guangrong
2015-11-06  8:56               ` Xiao Guangrong
2015-11-06  8:56                 ` [Qemu-devel] " Xiao Guangrong
2015-11-09 11:13               ` Igor Mammedov
2015-11-11  3:01                 ` Ask for ACK (was Re: [PATCH v7 25/35] nvdimm acpi: init the resource used by NVDIMM ACPI) Xiao Guangrong
2015-11-11  3:01                   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 26/35] nvdimm acpi: build ACPI NFIT table Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 27/35] nvdimm acpi: build ACPI nvdimm devices Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-03 13:13   ` Igor Mammedov
2015-11-03 13:13     ` [Qemu-devel] " Igor Mammedov
2015-11-03 14:22     ` Xiao Guangrong
2015-11-03 14:22       ` [Qemu-devel] " Xiao Guangrong
2015-11-04  8:56       ` Igor Mammedov
2015-11-04  8:56         ` [Qemu-devel] " Igor Mammedov
2015-11-04 14:11         ` Xiao Guangrong
2015-11-04 14:11           ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 28/35] nvdimm acpi: save arg3 for NVDIMM device _DSM method Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 29/35] nvdimm acpi: support function 0 Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 30/35] nvdimm acpi: support Get Namespace Label Size function Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 31/35] nvdimm acpi: support Get Namespace Label Data function Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 32/35] nvdimm acpi: support Set " Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 33/35] nvdimm: allow using whole backend memory as pmem Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 34/35] nvdimm acpi: support _FIT method Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02  9:13 ` [PATCH v7 35/35] nvdimm: add maintain info Xiao Guangrong
2015-11-02  9:13   ` [Qemu-devel] " Xiao Guangrong
2015-11-02 11:51 ` [PATCH v7 00/35] implement vNVDIMM Stefan Hajnoczi
2015-11-02 11:51   ` [Qemu-devel] " Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.