From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43284) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V5cKQ-0001Fb-3A for qemu-devel@nongnu.org; Sat, 03 Aug 2013 09:57:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1V5cKI-0001EN-OE for qemu-devel@nongnu.org; Sat, 03 Aug 2013 09:57:06 -0400 Received: from cantor2.suse.de ([195.135.220.15]:51401 helo=mx2.suse.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V5cKI-0001E0-Ev for qemu-devel@nongnu.org; Sat, 03 Aug 2013 09:56:58 -0400 Message-ID: <51FD0C22.1070005@suse.de> Date: Sat, 03 Aug 2013 15:56:50 +0200 From: =?ISO-8859-1?Q?Andreas_F=E4rber?= MIME-Version: 1.0 References: <1374596592-7027-1-git-send-email-imammedo@redhat.com> <1374596592-7027-11-git-send-email-imammedo@redhat.com> <51EEB8C6.1030601@redhat.com> <20130724103622.73cf0869@nial.usersys.redhat.com> <51EFA130.5050400@redhat.com> <20130724133420.5c0c5653@nial.usersys.redhat.com> <51EFCB80.50308@redhat.com> <20130726093840.41bd7106@nial.usersys.redhat.com> <51F240B8.3070303@redhat.com> <20130726145115.73a1b9a8@nial.usersys.redhat.com> <51F2898C.20405@redhat.com> In-Reply-To: <51F2898C.20405@redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 10/16] dimm: add busy slot check and slot auto-allocation List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , Igor Mammedov Cc: vasilis.liaskovitis@profitbricks.com, hutao@cn.fujitsu.com, qemu-devel@nongnu.org, Anthony Liguori Am 26.07.2013 16:37, schrieb Paolo Bonzini: > Il 26/07/2013 14:51, Igor Mammedov ha scritto: >> On Fri, 26 Jul 2013 11:26:16 +0200 >> Paolo Bonzini wrote: >>> Il 26/07/2013 09:38, Igor Mammedov ha scritto: >>>>>> I agree that specifying the policy on every hotplug complicates >>>>>> management and may be overkill. But then, most guests are not NUM= A at >>>>>> all and you would hardly perceive the difference, you would just h= ave to >>>>>> separate >>>>>> >>>>>> set-mem-policy 0 size=3D2G >>>>>> device_add dimm,slot=3D0 >>>> I'm confused, size is inherent property of generic DimmDevice and po= licies >>>> are NUMA specific of node. >>> >>> No, the size is not a property of the DimmDevice, it is a property of >>> the host object that backs the DimmDevice. This is like the size is = not >>> a property of a disk, but rather of the image that backs it. >>> >>> Now, there may be a good reason to remove the host/guest distinction = in >>> the case of memory, but I'm still not sure this is the case. >> I don't like split model in this case because it's complicates interfa= ce >> and confuses user. On bare-metal when user adds DIMM, it definitely ha= s >> size property. So when user adds DIMM device like: >> >> set-mem-policy 0 size=3D2G,somehostprop=3Dy >> device_add dimm,slot=3D0 >> >> it's not very clear/intuitive to what 'size' relates to. On contrary: >> >> set-mem-policy 0 somehostprop=3Dy >> device_add dimm,slot=3D0,size=3D2G >> >> clearly says that we are adding 2Gb DIMM and allocator of memory that >> bakes up DIMM, takes care about host specific details isolating them >> from DIMM device model (which resembles baremetal one as much as possi= ble). >=20 > How is this different from >=20 > drive-add id=3Dfoo,aio=3Dnative > device-add virtio-block,drive=3Dfoo,file=3D/vm/foo.img >=20 > We clearly do not do that, we put the file in the drive-add. The difference is that a user can understand drive-add, whereas set-mem-policy in this use case is really hard to grasp as a prereq. If it were memory-add id=3Dfoo,size=3D2G device-add dimm,slot=3D0,mem=3Dfoo that may be different, but that is unhandy implementation-wise because we assume initial RAM to be in one chunk and want to partition it into guest NUMA nodes IIUC. Or has that changed? I don't care if we name the device DIMM or MemoryBar, point is it should represent the physical thing one plugs into a physical board, to match what admins do with physical servers. And that physical bar has a size, as Igor points out. It should not just represent some virtual hot-plugged policy thing. The drive is separate from the device because we can exchange the CD media while leaving the device connected to ATA/AHCI/SCSI (and it allows us to keep file, cache, etc. properties in a central place). Admins buy servers/boards and memory bars, they won't solder chips onto it nor change the NUMA configuration of the board while running, so we should consider it immutable once created. If we unplug physical DIMMs, the data can be considered lost (ignoring deep cooling tricks etc.), so no point to transfer memory data from one DIMM to another. Would we seriously want to exchange the memory a DIMM is backed by while leaving the DIMM plugged? Rather the entity that owns the DIMM slots (as opposed to DIMMs) has the guest NUMA policy configured for the unpopulated slots. The slot has a maximum size and the DIMM has a size less or equal to slot's maximum size. I just wonder whether we need a DimmBus at all? Because if we get the slot specified as in your examples then we could just set some dimm[n] link on realize (question is where). We had a discussion once about a missing callback when a link property is set to a new value, not sure if that has been resolved meantime? Can't think of a situation where we would have multiple DimmBus'es, only some cases where it's on a SoC or SoM and not directly on the mainboard, i.e. not /machine/dimm[n]. Code seems to be copying ICC bus, but ICC bus was added because APIC needed a hot-pluggable bus to replace SysBus. Hadn't Anthony and Ping Fang factored out a memory controller on the i440fx? Patch 9 seems to add the bus directly to the i440fx PCI device. Patch 13 seems to use a Notifier for PIIX4 ACPI rather than having the bus handle hotplug itself and bus / memory controller communicating with ACPI as necessary (layering violation). For the CPU we initially had no bus at all to handle such event (we still don't outside x86). What I am not finding in this series is a translation from -m to initial non-hotplugged DIMMs? Some random other remarks on the series: * s/"dimmbus"/"dimm-bus"/g * s/klass/oc/g -- there were loud protests against k* (class reserved) * gtk-doc markup for parent fields is missing in header * s/qdev/parent_obj/, s/qbus/parent_obj/ -- don't invite to use them * s/dc/dbc/g -- dc is for DeviceClass * DimmBusClass::register_memory() is never overwritten? In that case it could well be just a static function called from realizefn - only disctinction I can think of is how to allocate MemoryRegion. A caller-settable DimmBus::slot_plugged() for i440fx to notify piix4 would seem to solve the issue of the Notifier elegantly. PCIBus has such hooks. Regards, Andreas --=20 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 N=FCrnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imend=F6rffer; HRB 16746 AG N=FCrnbe= rg