From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46002) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aVrbO-00076O-Hj for qemu-devel@nongnu.org; Tue, 16 Feb 2016 21:12:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aVrbK-0005tg-27 for qemu-devel@nongnu.org; Tue, 16 Feb 2016 21:12:26 -0500 Received: from mga14.intel.com ([192.55.52.115]:21917) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aVrbJ-0005tV-Nq for qemu-devel@nongnu.org; Tue, 16 Feb 2016 21:12:21 -0500 References: <1452624610-46945-1-git-send-email-guangrong.xiao@linux.intel.com> <1452624610-46945-7-git-send-email-guangrong.xiao@linux.intel.com> <20160208120337.44720b0b@nial.brq.redhat.com> <56C01747.3030102@linux.intel.com> <20160215101105.47e9245a@nial.brq.redhat.com> <20160215111705-mutt-send-email-mst@redhat.com> <56C1A4D2.3060402@linux.intel.com> <20160215114742.382c951e@nial.brq.redhat.com> <20160215133722-mutt-send-email-mst@redhat.com> <20160215143234.29320a5f@nial.brq.redhat.com> <56C1F469.2040602@linux.intel.com> <20160215182404.0878474f@nial.brq.redhat.com> <56C21A7D.5040902@linux.intel.com> <20160216120047.5a50eccf@nial.brq.redhat.com> From: Xiao Guangrong Message-ID: <56C3D522.6090401@linux.intel.com> Date: Wed, 17 Feb 2016 10:04:18 +0800 MIME-Version: 1.0 In-Reply-To: <20160216120047.5a50eccf@nial.brq.redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCH v2 06/11] nvdimm acpi: initialize the resource used by NVDIMM ACPI List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Mammedov Cc: ehabkost@redhat.com, kvm@vger.kernel.org, "Michael S. Tsirkin" , gleb@kernel.org, mtosatti@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, pbonzini@redhat.com, dan.j.williams@intel.com, rth@twiddle.net On 02/16/2016 07:00 PM, Igor Mammedov wrote: > On Tue, 16 Feb 2016 02:35:41 +0800 > Xiao Guangrong wrote: > >> On 02/16/2016 01:24 AM, Igor Mammedov wrote: >>> On Mon, 15 Feb 2016 23:53:13 +0800 >>> Xiao Guangrong wrote: >>> >>>> On 02/15/2016 09:32 PM, Igor Mammedov wrote: >>>>> On Mon, 15 Feb 2016 13:45:59 +0200 >>>>> "Michael S. Tsirkin" wrote: >>>>> >>>>>> On Mon, Feb 15, 2016 at 11:47:42AM +0100, Igor Mammedov wrote: >>>>>>> On Mon, 15 Feb 2016 18:13:38 +0800 >>>>>>> Xiao Guangrong wrote: >>>>>>> >>>>>>>> On 02/15/2016 05:18 PM, Michael S. Tsirkin wrote: >>>>>>>>> On Mon, Feb 15, 2016 at 10:11:05AM +0100, Igor Mammedov wrote: >>>>>>>>>> On Sun, 14 Feb 2016 13:57:27 +0800 >>>>>>>>>> Xiao Guangrong wrote: >>>>>>>>>> >>>>>>>>>>> On 02/08/2016 07:03 PM, Igor Mammedov wrote: >>>>>>>>>>>> On Wed, 13 Jan 2016 02:50:05 +0800 >>>>>>>>>>>> Xiao Guangrong wrote: >>>>>>>>>>>> >>>>>>>>>>>>> 32 bits IO port starting from 0x0a18 in guest is reserved for NVDIMM >>>>>>>>>>>>> ACPI emulation. The table, NVDIMM_DSM_MEM_FILE, will be patched into >>>>>>>>>>>>> NVDIMM ACPI binary code >>>>>>>>>>>>> >>>>>>>>>>>>> OSPM uses this port to tell QEMU the final address of the DSM memory >>>>>>>>>>>>> and notify QEMU to emulate the DSM method >>>>>>>>>>>> Would you need to pass control to QEMU if each NVDIMM had its whole >>>>>>>>>>>> label area MemoryRegion mapped right after its storage MemoryRegion? >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> No, label data is not mapped into guest's address space and it only >>>>>>>>>>> can be accessed by DSM method indirectly. >>>>>>>>>> Yep, per spec label data should be accessed via _DSM but question >>>>>>>>>> wasn't about it, >>>>>>>> >>>>>>>> Ah, sorry, i missed your question. >>>>>>>> >>>>>>>>>> Why would one map only 4Kb window and serialize label data >>>>>>>>>> via it if it could be mapped as whole, that way _DMS method will be >>>>>>>>>> much less complicated and there won't be need to add/support a protocol >>>>>>>>>> for its serialization. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Is it ever accessed on data path? If not I prefer the current approach: >>>>>>>> >>>>>>>> The label data is only accessed via two DSM commands - Get Namespace Label >>>>>>>> Data and Set Namespace Label Data, no other place need to be emulated. >>>>>>>> >>>>>>>>> limit the window used, the serialization protocol seems rather simple. >>>>>>>>> >>>>>>>> >>>>>>>> Yes. >>>>>>>> >>>>>>>> Label data is at least 128k which is big enough for BIOS as it allocates >>>>>>>> memory at 0 ~ 4G which is tight region. It also needs guest OS to support >>>>>>>> lager max-xfer (the max size that can be transferred one time), the size >>>>>>>> in current Linux NVDIMM driver is 4k. >>>>>>>> >>>>>>>> However, using lager DSM buffer can help us to simplify NVDIMM hotplug for >>>>>>>> the case that too many nvdimm devices present in the system and their FIT >>>>>>>> info can not be filled into one page. Each PMEM-only device needs 0xb8 bytes >>>>>>>> and we can append 256 memory devices at most, so 12 pages are needed to >>>>>>>> contain this info. The prototype we implemented is using ourself-defined >>>>>>>> protocol to read piece of _FIT and concatenate them before return to Guest, >>>>>>>> please refer to: >>>>>>>> https://github.com/xiaogr/qemu/commit/c46ce01c8433ac0870670304360b3c4aa414143a >>>>>>>> >>>>>>>> As 12 pages are not small region for BIOS and the _FIT size may be extended in the >>>>>>>> future development (eg, if PBLK is introduced) i am not sure if we need this. Of >>>>>>>> course, another approach to simplify it is that we limit the number of NVDIMM >>>>>>>> device to make sure their _FIT < 4k. >>>>>>> My suggestion is not to have only one label area for every NVDIMM but >>>>>>> rather to map each label area right after each NVDIMM's data memory. >>>>>>> That way _DMS can be made non-serialized and guest could handle >>>>>>> label data in parallel. >>>>>> >>>>>> I think that alignment considerations would mean we are burning up >>>>>> 1G of phys address space for this. For PAE we only have 64G >>>>>> of this address space, so this would be a problem. >>>>> That's true that it will burning away address space, however that >>>>> just means that PAE guests would not be able to handle as many >>>>> NVDIMMs as 64bit guests. The same applies to DIMMs as well, with >>>>> alignment enforced. If one needs more DIMMs he/she can switch >>>>> to 64bit guest to use them. >>>>> >>>>> It's trade of inefficient GPA consumption vs efficient NVDIMMs access. >>>>> Also with fully mapped label area for each NVDIMM we don't have to >>>>> introduce and maintain any guest visible serialization protocol >>>>> (protocol for serializing _DSM via 4K window) which becomes ABI. >>>> >>>> It's true for label access but it is not for the long term as we will >>>> need to support other _DSM commands such as vendor specific command, >>>> PBLK dsm command, also NVDIMM MCE related commands will be introduced >>>> in the future, so we will come back here at that time. :( >>> I believe for block mode NVDIMM would also need per NVDIMM mapping >>> for performance reasons (parallel access). >>> As for the rest could that commands go via MMIO that we usually >>> use for control path? >> >> So both input data and output data go through single MMIO, we need to >> introduce a protocol to pass these data, that is complex? >> >> And is any MMIO we can reuse (more complexer?) or we should allocate this >> MMIO page (the old question - where to allocated?)? > Maybe you could reuse/extend memhotplug IO interface, > or alternatively as Michael suggested add a vendor specific PCI_Config, > I'd suggest PM device for that (hw/acpi/[piix4.c|ihc9.c]) > which I like even better since you won't need to care about which ports > to allocate at all. Well, if Michael does not object, i will do it in the next version. :)