Re: [Qemu-devel] [RFC PATCH 0/4] nvdimm: enable flush hint address structure

From: Xiao Guangrong <guangrong.xiao@gmail.com>
To: Haozhong Zhang <haozhong.zhang@intel.com>, qemu-devel@nongnu.org
Cc: dan.j.williams@intel.com, "Michael S. Tsirkin" <mst@redhat.com>,
	Igor Mammedov <imammedo@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Richard Henderson <rth@twiddle.net>,
	Eduardo Habkost <ehabkost@redhat.com>
Subject: Re: [Qemu-devel] [RFC PATCH 0/4] nvdimm: enable flush hint address structure
Date: Thu, 6 Apr 2017 17:39:37 +0800	[thread overview]
Message-ID: <2f298f0a-cf07-d131-90c2-bef89537d981@gmail.com> (raw)
In-Reply-To: <20170331084147.32716-1-haozhong.zhang@intel.com>

On 31/03/2017 4:41 PM, Haozhong Zhang wrote:
> This patch series constructs the flush hint address structures for
> nvdimm devices in QEMU.
>
> It's of course not for 2.9. I send it out early in order to get
> comments on one point I'm uncertain (see the detailed explanation
> below). Thanks for any comments in advance!
>
>
> Background
> ---------------
> Flush hint address structure is a substructure of NFIT and specifies
> one or more addresses, namely Flush Hint Addresses. Software can write
> to any one of these flush hint addresses to cause any preceding writes
> to the NVDIMM region to be flushed out of the intervening platform
> buffers to the targeted NVDIMM. More details can be found in ACPI Spec
> 6.1, Section 5.2.25.8 "Flush Hint Address Structure".
>
>
> Why is it RFC?
> ---------------
> RFC is added because I'm not sure whether the way in this patch series
> that allocates the guest flush hint addresses is right.
>
> QEMU needs to trap guest accesses (at least for writes) to the flush
> hint addresses in order to perform the necessary flush on the host
> back store. Therefore, QEMU needs to create IO memory regions that
> cover those flush hint addresses. In order to create those IO memory
> regions, QEMU needs to know the flush hint addresses or their offsets
> to other known memory regions in advance. So far looks good.
>
> Flush hint addresses are in the guest address space. Looking at how
> the current NVDIMM ACPI in QEMU allocates the DSM buffer, it's natural
> to take the same way for flush hint addresses, i.e. let the guest
> firmware allocate from free addresses and patch them in the flush hint
> address structure. (*Please correct me If my following understand is wrong*)
> However, the current allocation and pointer patching are transparent
> to QEMU, so QEMU will be unaware of the flush hint addresses, and
> consequently have no way to create corresponding IO memory regions in
> order to trap guest accesses.

Er, it is awkward and flush-hint-table is static which may not be
easily patched.

>
> Alternatively, this patch series moves the allocation of flush hint
> addresses to QEMU:
>
> 1. (Patch 1) We reserve an address range after the end address of each
>    nvdimm device. Its size is specified by the user via a new pc-dimm
>    option 'reserved-size'.
>

We should make it only work for nvdimm?

>    For the following example,
>         -object memory-backend-file,id=mem0,size=4G,...
>         -device nvdimm,id=dimm0,memdev=mem0,reserved-size=4K,...
>         -device pc-dimm,id=dimm1,...
>    if dimm0 is allocated to address N ~ N+4G, the address of dimm1
>    will start from N+4G+4K or higher. N+4G ~ N+4G+4K is reserved for
>    dimm0.
>
> 2. (Patch 4) When NVDIMM ACPI code builds the flush hint address
>    structure for each nvdimm device, it will allocate them from the
>    above reserved area, e.g. the flush hint addresses of above dimm0
>    are allocated in N+4G ~ N+4G+4K. The addresses are known to QEMU in
>    this way, so QEMU can easily create IO memory regions for them.
>
>    If the reserved area is not present or too small, QEMU will report
>    errors.
>

We should make 'reserved-size' always be page-aligned and should be
transparent to the user, i.e, automatically reserve 4k if 'flush-hint'
is specified?