From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52859) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Uz44Q-0002WT-2a for qemu-devel@nongnu.org; Tue, 16 Jul 2013 08:09:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Uz44J-0005xF-MT for qemu-devel@nongnu.org; Tue, 16 Jul 2013 08:09:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54131) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Uz44J-0005x6-E1 for qemu-devel@nongnu.org; Tue, 16 Jul 2013 08:09:23 -0400 Date: Tue, 16 Jul 2013 14:00:06 +0200 From: Igor Mammedov Message-ID: <20130716140006.25b5175c@nial.usersys.redhat.com> In-Reply-To: <51E52112.3010803@redhat.com> References: <6b5ff346b23fba9a8707507fda7f9b71719a55be.1372234719.git.hutao@cn.fujitsu.com> <51CAB866.2080507@redhat.com> <51CBC8B3.8070708@cn.fujitsu.com> <51CBE1DD.9020301@redhat.com> <20130715170551.GC11958@dhcp-192-168-178-175.profitbricks.localdomain> <51E42D06.9080004@redhat.com> <20130716012744.GE10917@G08FNSTD100614.fnst.cn.fujitsu.com> <51E4E604.6060608@redhat.com> <20130716121901.157d7e85@nial.usersys.redhat.com> <51E52112.3010803@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v5 05/14] vl: handle "-device dimm" List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Eduardo Habkost , Hu Tao , qemu-devel@nongnu.org, Vasilis Liaskovitis , Bandan Das , gaowanlong@cn.fujitsu.com On Tue, 16 Jul 2013 12:31:46 +0200 Paolo Bonzini wrote: > Il 16/07/2013 12:19, Igor Mammedov ha scritto: > > On Tue, 16 Jul 2013 08:19:48 +0200 > > Paolo Bonzini wrote: > > > >> Il 16/07/2013 03:27, Hu Tao ha scritto: > >>>> I think it's the same. One "-numa mem" option = one "-device dimm" > >>>> option; both define one range. Unused memory ranges may remain if you > >>>> stumble upon a unusable range such as the PCI window. For example two > >>>> "-numa mem,size=2G" options would allocate memory from 0 to 2G and from > >>>> 4 to 6G. > >>> > >>> So we can drop -dimm if we agree on -numa mem? > >> > >> Yes, the point of the "-numa mem" proposal was to avoid the concept of a > >> "partially initialized device" that you had for DIMMs. > > I've though -numa mem was for mapping initial memory to numa nodes. > > It seem wrong to use it for representing dimm device and also limiting > > possible hotplugged regions to specified at startup ranges. > > It's not for DIMM devices, it is for reserving areas of the address > space for hot-plugged RAM. DIMM hotplug is done with "device_add dimm" > (and you can also use "-numa mem,populated=no,... -device dimm,..." to > start a VM with hot-unpluggable memory). There isn't a real need to reserve from ACPI pov, memory device in ACPI could provide _PXM() method to return mapping to numa node. And from my testing linux and windows guest are using it, even if is there is unnecessary mapping in SRAT table overriding SRAT mammping with dynamic one. It would be better not to use "populated" concept at all. If there is -device dim on cmd line, then it populated and for hotplugged dimm all necessary information could be generated dynamically. > > we can leave -numa for initial memory mapping and manage of the mapping > > of hotpluggable regions with -device dimm,node=X,size=Y. > > > > It that case command line -device dimm will provide a fully initialized > > dimm device usable at startup (but hot-unplugable) and > > (monitor) device_add dimm,,node=X,size=Y > > would serve hot-plug case. > > > > That way arbitrary sized dimm could be hot-pluged without specifying them > > at startup, like it's done on bare-metal. > > But the memory ranges need to be specified at startup in the ACPI > tables, and that's what "-numa mem" is for. not really, there is caveat with windows, which needs a hotplugable SRAT entry that tells it max possible limit (otherwise windows sees new dimm device but refuses to use it saying "server is not configured for hotplug" or something like this), but as far as such entry exists, windows is happily uses dynamic _CRS() and _PXM() if they are below that limit (even if a new range is not in any range defined by SRAT). And ACPI spec doesn't say that SRAT MUST be populated with hotplug ranges. It's kind of simplier for bare-metal, where they might do it due to limited supported DIMM capacity by reserving static entries with max supported ranges per DIMM and know in advance DIMM count for platform. But actual _CRS() anyway dynamic since plugged in DIMM could have a smaller capacity then supported max for slot. To summarize ACPI + windows limitations: - ACPI needs to pre-allocate memory devices, i.e. number of possible increments OSPM could utilize. It might be possible to overcome limitation be using Load() or LoadTable() in runtime, but I haven't tried it. - Windows needs to know max supported limit, a fake entry in SRAT from RamSize to max_mem works nicely there (tested with ws2008r2DC and ws2012DC). That's why I was proposing to extend "-m" option for "slots" number (i.e. nr of memory devices) and 'max_mem' to make Windows happy and cap mgmt tools from going over initially configured limit. then -device dimm could be used for hotpluggable mem available at startup and device_add fir adding more dimms with user defined sizes to desired nodes at runtime. Works nice without any need for 'populated=xxx' and predefined ranges. PS: I'll be able to post more or less usable RFC that does it on top of mst's ACPI tables in QEMU by the end of this week. > > > In addition command line -device would be used in migration case to describe > > already hot-plugged dimms on target. > > Yep. > > Paolo