On Wed, Mar 18, 2020 at 01:03:03PM -0700, Andrzej Jakowski wrote: > This patch introduces support for PMR that has been defined as part of NVMe 1.4 > spec. User can now specify a pmrdev option that should point to HostMemoryBackend. > pmrdev memory region will subsequently be exposed as PCI BAR 2 in emulated NVMe > device. Guest OS can perform mmio read and writes to the PMR region that will stay > persistent across system reboot. > > Signed-off-by: Andrzej Jakowski > --- > v2: > - reworked PMR to use HostMemoryBackend instead of directly mapping PMR > backend file into qemu [1] (Stefan) > > v1: > - provided support for Bit 1 from PMRWBM register instead of Bit 0 to ensure > improved performance in virtualized environment [2] (Stefan) > > - added check if pmr size is power of two in size [3] (David) > > - addressed cross compilation build problems reported by CI environment > > [1]: https://lore.kernel.org/qemu-devel/20200306223853.37958-1-andrzej.jakowski@linux.intel.com/ > [2]: https://nvmexpress.org/wp-content/uploads/NVM-Express-1_4-2019.06.10-Ratified.pdf > [3]: https://lore.kernel.org/qemu-devel/20200218224811.30050-1-andrzej.jakowski@linux.intel.com/ > --- > Persistent Memory Region (PMR) is a new optional feature provided in NVMe 1.4 > specification. This patch implements initial support for it in NVMe driver. > --- > hw/block/nvme.c | 117 +++++++++++++++++++++++++++- > hw/block/nvme.h | 2 + > hw/block/trace-events | 5 ++ > include/block/nvme.h | 172 ++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 294 insertions(+), 2 deletions(-) > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c > index d28335cbf3..70fd09d293 100644 > --- a/hw/block/nvme.c > +++ b/hw/block/nvme.c > @@ -19,10 +19,18 @@ > * -drive file=,if=none,id= > * -device nvme,drive=,serial=,id=, \ > * cmb_size_mb=, \ > + * [pmrdev=,] \ > * num_queues= > * > * Note cmb_size_mb denotes size of CMB in MB. CMB is assumed to be at > * offset 0 in BAR2 and supports only WDS, RDS and SQS for now. > + * > + * Either cmb or pmr - due to limitation in available BAR indexes. This isn't 100% clear. Does this mean "cmb_size_mb= and pmrdev= are mutually exclusive due to limited availability of BARs"? Please rephrase. > + * pmr_file file needs to be power of two in size. > + * Enabling pmr emulation can be achieved by pointing to memory-backend-file. > + * For example: > + * -object memory-backend-file,id=,share=on,mem-path=, \ > + * size= .... -device nvme,...,pmrdev= > */ > > #include "qemu/osdep.h" > @@ -35,7 +43,9 @@ > #include "sysemu/sysemu.h" > #include "qapi/error.h" > #include "qapi/visitor.h" > +#include "sysemu/hostmem.h" > #include "sysemu/block-backend.h" > +#include "exec/ramblock.h" > > #include "qemu/log.h" > #include "qemu/module.h" > @@ -1141,6 +1151,26 @@ static void nvme_write_bar(NvmeCtrl *n, hwaddr offset, uint64_t data, > NVME_GUEST_ERR(nvme_ub_mmiowr_cmbsz_readonly, > "invalid write to read only CMBSZ, ignored"); > return; > + case 0xE00: /* PMRCAP */ > + NVME_GUEST_ERR(nvme_ub_mmiowr_pmrcap_readonly, > + "invalid write to PMRCAP register, ignored"); > + return; > + case 0xE04: /* TODO PMRCTL */ > + break; > + case 0xE08: /* PMRSTS */ > + NVME_GUEST_ERR(nvme_ub_mmiowr_pmrsts_readonly, > + "invalid write to PMRSTS register, ignored"); > + return; > + case 0xE0C: /* PMREBS */ > + NVME_GUEST_ERR(nvme_ub_mmiowr_pmrebs_readonly, > + "invalid write to PMREBS register, ignored"); > + return; > + case 0xE10: /* PMRSWTP */ > + NVME_GUEST_ERR(nvme_ub_mmiowr_pmrswtp_readonly, > + "invalid write to PMRSWTP register, ignored"); > + return; > + case 0xE14: /* TODO PMRMSC */ > + break; > default: > NVME_GUEST_ERR(nvme_ub_mmiowr_invalid, > "invalid MMIO write," > @@ -1169,6 +1199,23 @@ static uint64_t nvme_mmio_read(void *opaque, hwaddr addr, unsigned size) > } > > if (addr < sizeof(n->bar)) { > + /* > + * When PMRWBM bit 1 is set then read from > + * from PMRSTS should ensure prior writes > + * made it to persistent media > + */ > + if (addr == 0xE08 && > + (NVME_PMRCAP_PMRWBM(n->bar.pmrcap) & 0x02) >> 1) { > + int status; > + > + status = qemu_msync((void *)n->pmrdev->mr.ram_block->host, > + n->pmrdev->size, > + n->pmrdev->mr.ram_block->fd); Please use qemu_ram_writeback() so that pmem_persist() and qemu_msync() are used as appropriate. Stefan