From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36310) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vbb2f-0002AX-D8 for qemu-devel@nongnu.org; Wed, 30 Oct 2013 15:03:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Vbb2Z-0008Qh-5E for qemu-devel@nongnu.org; Wed, 30 Oct 2013 15:02:57 -0400 Received: from mx1.redhat.com ([209.132.183.28]:6366) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Vbb2Y-0008Q9-TR for qemu-devel@nongnu.org; Wed, 30 Oct 2013 15:02:51 -0400 Date: Wed, 30 Oct 2013 16:51:29 -0200 From: Marcelo Tosatti Message-ID: <20131030185129.GB18378@amt.cnet> References: <20131028140406.GA18025@amt.cnet> <1383070729-19427-1-git-send-email-imammedo@redhat.com> <20131029213844.GB32615@amt.cnet> <20131030174949.2fb0d2c2@nial.usersys.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131030174949.2fb0d2c2@nial.usersys.redhat.com> Subject: Re: [Qemu-devel] [RFC PATCH] pc: align gpa<->hpa on 1GB boundary by splitting RAM on several regions List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Igor Mammedov Cc: aarcange@redhat.com, peter.maydell@linaro.org, gleb@redhat.com, quintela@redhat.com, jan.kiszka@siemens.com, qemu-devel@nongnu.org, aliguori@amazon.com, pbonzini@redhat.com, afaerber@suse.de, rth@twiddle.net On Wed, Oct 30, 2013 at 05:49:49PM +0100, Igor Mammedov wrote: > On Tue, 29 Oct 2013 19:38:44 -0200 > Marcelo Tosatti wrote: > > > On Tue, Oct 29, 2013 at 07:18:49PM +0100, Igor Mammedov wrote: > > > Otherwise 1GB TLBs cannot be cached for the range. > > > > This fails to back non-1GB-aligned gpas, but 2MB aligned, with 2MB large > > pages. > With current command line only one hugetlbfs mount point is possible, so it > will back with whatever alignment specified hugetlbfs mount point has. > Anything that doesn't fit into page aligned region goes to tail using > non hugepage baked phys_mem_set_alloc()=qemu_anon_ram_alloc() allocator. The patch you propose allocates the non-1GB aligned tail of RAM with 4k pages. As mentioned, this is not acceptable (2MB pages should be used whenever 1GB alignment is not possible). I believe its easier for the user to allocate enough 1GB pages to back all of guest RAM, since allocation is static, than for him to allocate mixed 1GB/2MB pages in hugetlbfs. > > Since hugetlbfs allocation is static, it requires the user to inform > > different 1GB and 2MB sized hugetlbfs mount points (with proper number > > of corresponding hugetlbfs pages allocated). This is incompatible with > > the current command line, and i'd like to see this problem handled in a > > way that is command line backwards compatible. > patch doesn't change that, it uses provided hugetlbfs and fallbacks (hunk 2) > to phys_mem_alloc if requested memory region is not hugepage size aligned. > So there is no any CLI change, only fixing memory leak. > > > Also, if the argument for one-to-one mapping between dimms and linear host > > virtual address sections holds, it means virtual DIMMs must be > > partitioned into whatever hugepage alignment is necessary (and in > > that case, why they can't be partitioned similarly with the memory > > region aliases?). > Because during hotplug a new memory region of desired size is allocated > and it could be mapped directly without any aliasing. And if some day we > convert adhoc initial memory allocation to dimm devices there is no reason to > alloc one huge block and then invent means how to alias hole somewhere else, > we could just reuse memdev/dimm and allocate several memory regions > with desired properties each represented by a memdev/dimm pair. > > one-one mapping simplifies design and interface with ACPI part during memory > hotplug. > > for hotplug case flow could look like: > memdev_add id=x1,size=1Gb,mem-path=/hugetlbfs/1gb,other-host-related-stuff-options > #memdev could enforce size to be backend aligned > device_add dimm,id=y1,backend=x1,addr=xxxxxx > #dimm could get alignment from associated memdev or fail if addr > #doesn't meet alignment of memdev backend > > memdev_add id=x2,size=2mb,mem-path=/hugetlbfs/2mb > device_add dimm,id=y2,backend=x2,addr=yyyyyyy > > memdev_add id=x3,size=1mb > device_add dimm,id=y3,backend=x3,addr=xxxxxxx > > linear memory block is allocated at runtime (user has to make sure that enough > hugepages are available) by each memdev_add command and that RAM memory region > is mapped into GPA by virtual DIMM as is, there wouldn't be any need for > aliasing. > > Now back to intial memory and bright future we are looking forward to (i.e. > ability to create machine from configuration file without adhoc codding > like(pc_memory_init)): > > legacy cmdline "-m 4512 -mem-path /hugetlbfs/1gb" could be automatically > translated into: > > -memdev id=x1,size=3g,mem-path=/hugetlbfs/1gb -device dimm,backend=x1,addr=0 > -memdev id=x2,size=1g,mem-path=/hugetlbfs/1gb -device dimm,backend=x2,addr=4g > -memdev id=x3,size=512m -device dimm,backend=x3,addr=5g > > or user could drop legacy CLI and assume fine grained control over memory > configuration: > > -memdev id=x1,size=3g,mem-path=/hugetlbfs/1gb -device dimm,backend=x1,addr=0 > -memdev id=x2,size=1g,mem-path=/hugetlbfs/1gb -device dimm,backend=x2,addr=4g > -memdev id=x3,size=512m,mem-path=/hugetlbfs/2mb -device dimm,backend=x3,addr=5g > > So if we are going to break migration compatibility for new machine type > lets do a way that could painlessly changed to memdev/device in future. Ok then please improve your proposal to allow for multiple hugetlbfs mount points. > > > PS: > > > as side effect we are not wasting ~1Gb of memory if > > > 1Gb hugepages are used and -m "hpagesize(in Mb)*n + 1" > > > > This is how hugetlbfs works. You waste 1GB hugepage if an extra > > byte is requested. > it looks more a bug than feature, > why do it if leak could be avoided as shown below? Because IMO it is confusing for the user, since hugetlbfs allocation is static. But if you have a necessity for the one-to-one relationship, feel free to support mixed hugetlbfs page sizes.