From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47591) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VbDbJ-0006rc-Cm for qemu-devel@nongnu.org; Tue, 29 Oct 2013 14:01:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VbDbD-0007xW-DF for qemu-devel@nongnu.org; Tue, 29 Oct 2013 14:01:09 -0400 Received: from mx1.redhat.com ([209.132.183.28]:18364) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VbDbD-0007xM-4v for qemu-devel@nongnu.org; Tue, 29 Oct 2013 14:01:03 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r9TI11jU018267 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 29 Oct 2013 14:01:02 -0400 Date: Tue, 29 Oct 2013 19:00:54 +0100 From: Igor Mammedov Message-ID: <20131029190054.0c9faec5@nial.usersys.redhat.com> In-Reply-To: <20131028140406.GA18025@amt.cnet> References: <20131024211158.064049176@amt.cnet> <20131024211249.723543071@amt.cnet> <5269B378.6040409@redhat.com> <20131025045805.GA18280@amt.cnet> <20131025115718.15b6e788@redhat.com> <20131025133421.GA27529@amt.cnet> <20131027162044.19769397@redhat.com> <20131028140406.GA18025@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [patch 2/2] i386: pc: align gpa<->hpa on 1GB boundary List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Marcelo Tosatti Cc: aarcange@redhat.com, Paolo Bonzini , qemu-devel@nongnu.org, gleb@redhat.com On Mon, 28 Oct 2013 12:04:06 -0200 Marcelo Tosatti wrote: > On Sun, Oct 27, 2013 at 04:20:44PM +0100, igor Mammedov wrote: > > > Yes, thought of that, unfortunately its cumbersome to add an interface > > > for the user to supply both 2MB and 1GB hugetlbfs pages. > > Could 2Mb tails be automated, meaning if host uses 1Gb hugepages and > > there is/are tail/s, QEMU should be able to figure out alignment > > issues and allocate with appropriate pages. > > Yes that would be ideal but the problem with hugetlbfs is that pages are > preallocated. > > So in the end you'd have to expose the split of guest RAM in 2MB/1GB types > to the user (it would be necessary for the user to calculate the size of > the hole, etc). exposing it to the user might be not necessary, QEMU could allocate 5Gb+3Mb ram without user intervention: 3Gb low.ram.aligned.region // using huge pages 1mb low.ram.unaligned.region if below_4g_ram_size - 3Gb; // so not to waste precious low ram, using fallback allocation //hypothetically hole starts at 3Gb+1mb 2Gb high.ram.aligned.region // using huge pages 2Mb high.ram.unaligned.region // so that not to waste 1Gb on memory using huge page > > > Goal is separate host part allocation aspect from guest related one, > > aliasing 32-bit hole size at the end doesn't help it at all, it's quite > > opposite, it's making current code more complicated and harder to fix > > in the future. > > You can simply back the 1GB areas which the hole reside with 2MB pages. I'm not getting what do you mean here. > Can't see why having the tail of RAM map to the hole is problematic. Problem I see is that with proposed aliasing there is no one-one mapping to future "memdev" where each Dimm device (guest/model visible memory block) has a corresponding memdev backend (host memory block). Moreover with current hugepages handling in QEMU including this patch and usage of 1Gb hugepages, QEMU might loose ~1Gb if -m "hpagesize*n+1", which is by itself is a good reason to use several allocations with different allocator backends. > Understand your concern, but the complication is necessary: the host > virtual/physical address and guest physical addresses must be aligned on > largepage boundaries. I don't argue against it, only about the best way to achieve it. If we assume possible conversion from adhoc way of allocating initial RAM to DIMM devices in the future then changing region layout several times in incompatible way doesn't seems to be the best approach. If we are going to change it, let at least minimize compatibility issues and do it right in the first place. I'll post RFC patch as reply to this thread. > > Do you foresee any problem with memory hotplug? I don't see any problem with memory hotplug so far, but as noted above there will be problems with converting initial ram to DIMM devices. > > Could add a warning to memory API: if memory region is larger than 1GB > and RAM is 1GB backed, and not properly aligned, warn. Perhaps it would be better do abort and ask user to fix configuration, and on hugepage allocation failure not fallback to malloc but abort and tell user amount of hugepages needed to run guest with hugepage backend.