From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54918) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VghE4-0003IU-9V for qemu-devel@nongnu.org; Wed, 13 Nov 2013 15:39:54 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VghDy-0006Dw-7p for qemu-devel@nongnu.org; Wed, 13 Nov 2013 15:39:48 -0500 Received: from mx1.redhat.com ([209.132.183.28]:4242) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VghDx-0006Dp-Vv for qemu-devel@nongnu.org; Wed, 13 Nov 2013 15:39:42 -0500 Received: from int-mx02.intmail.prod.int.phx2.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id rADKde7v006926 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Wed, 13 Nov 2013 15:39:41 -0500 Date: Wed, 13 Nov 2013 18:39:27 -0200 From: Marcelo Tosatti Message-ID: <20131113203926.GA30546@amt.cnet> References: <20131024211249.723543071@amt.cnet> <20131106014930.GA20468@amt.cnet> <20131106015543.GA20766@amt.cnet> <20131106213119.GA15543@amt.cnet> <20131107162459.6bdc39d7@nial.usersys.redhat.com> <20131107215304.GA10866@amt.cnet> <20131110204753.GA11389@amt.cnet> <20131112211637.GA11395@amt.cnet> <5283B32B.6020602@redhat.com> <20131113195832.GA29433@amt.cnet> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20131113195832.GA29433@amt.cnet> Subject: Re: [Qemu-devel] i386: pc: align gpa<->hpa on 1GB boundary (v6) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: aarcange@redhat.com, gleb@redhat.com, "Michael S. Tsirkin" , qemu-devel@nongnu.org, Gerd Hoffmann , Igor Mammedov On Wed, Nov 13, 2013 at 05:58:32PM -0200, Marcelo Tosatti wrote: > On Wed, Nov 13, 2013 at 06:13:15PM +0100, Paolo Bonzini wrote: > > > assert(piecetwosize <= holesize); > > > > > > piecetwosize = MIN(above_4g_mem_size, piecetwosize); > > > if ((above_4g_mem_size - piecetwosize) > 0) { > > > memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", > > > ram, 0x100000000ULL, > > > above_4g_mem_size - piecetwosize); > > > memory_region_add_subregion(system_memory, 0x100000000ULL, > > > ram_above_4g); > > > } else { > > > g_free(ram_above_4g); > > > } > > > memory_region_init_alias(ram_above_4g_piecetwo, NULL, > > > "ram-above-4g-piecetwo", ram, > > > 0x100000000ULL - holesize, piecetwosize); > > > memory_region_add_subregion(system_memory, > > > 0x100000000ULL + > > > above_4g_mem_size - piecetwosize, > > > ram_above_4g_piecetwo); > > > > There is still a small problem in that the 2MB rounding must not be > > done for old machine types. > > > > I did a really careful review of the code and everything else looks okay > > to me. However, it grew by accretion from v1 and now it took me really a > > long time to figure it out... I adjusted it a bit and the result seems > > easier to understand to me. > > > > Here's the hw/i386/pc.c part of the patch (the patch from v6 is unreadable): > > > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c > > index 12c436e..f2fd138 100644 > > --- a/hw/i386/pc.c > > +++ b/hw/i386/pc.c > > @@ -1156,8 +1156,10 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory, > > { > > int linux_boot, i; > > MemoryRegion *ram, *option_rom_mr; > > - MemoryRegion *ram_below_4g, *ram_above_4g; > > + MemoryRegion *ram_below_4g, *ram_above_4g_pieceone, *ram_above_4g_piecetwo; > > FWCfgState *fw_cfg; > > + uint64_t holesize, pieceonesize, piecetwosize; > > + uint64_t memsize, align_offset; > > > > linux_boot = (kernel_filename != NULL); > > > > @@ -1165,26 +1167,74 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory, > > * aliases to address portions of it, mostly for backwards compatibility > > * with older qemus that used qemu_ram_alloc(). > > */ > > + memsize = below_4g_mem_size + above_4g_mem_size; > > + holesize = 0x100000000ULL - below_4g_mem_size; > > + > > + /* If 1GB hugepages are used to back guest RAM, we want the > > + * physical address 4GB to map to 4GB in the RAM, so that > > + * memory beyond 4GB is aligned on a 1GB boundary, at the > > + * host physical address space. Thus, the ram block range > > + * [holestart, 4GB] is mapped to the last holesize bytes of RAM: > > + * > > + * 0 h 4G memsize-holesize > > + * > > + * contiguous-ram-block [xxxxxx][yyy][zzzzz] > > + * '-----------. > > + * guest-addr-space [xxxxxx] [zzzzz][yyy] > > + * > > + * This is only done in new-enough machine types, and of course > > + * it is only necessary if the [zzzzz] block exists at all. > > + */ > > + if (guest_info->gb_align && above_4g_mem_size > holesize) { > > + /* Round the allocation up to 2 MB to use more hugepages. To align to 2MB boundary, the number of hugepages is the same. > > + * Remove the slack from the [yyy] piece so that pieceonesize > > + * (and thus the start of piecetwo) remains aligned. > > + */ > > + align_offset = ROUND_UP(memsize, 1UL << 21) - memsize; > > + piecetwosize = holesize - align_offset; > > + } else { > > + /* There's no "piece one", all memory above 4G starts Piece two. > > + * at below_4g_mem_size in the RAM block. Also no need > > + * to align anything. > > + */ > > + align_offset = 0; > > + piecetwosize = above_4g_mem_size; > > + } > > + > > ram = g_malloc(sizeof(*ram)); > > - memory_region_init_ram(ram, NULL, "pc.ram", > > - below_4g_mem_size + above_4g_mem_size); > > + memory_region_init_ram(ram, NULL, "pc.ram", memsize + align_offset); > > vmstate_register_ram_global(ram); > > *ram_memory = ram; > > + > > ram_below_4g = g_malloc(sizeof(*ram_below_4g)); > > memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram, > > 0, below_4g_mem_size); > > memory_region_add_subregion(system_memory, 0, ram_below_4g); > > + > > + pieceonesize = above_4g_mem_size - piecetwosize; > > + if (pieceonesize) { > > + ram_above_4g_pieceone = g_malloc(sizeof(*ram_above_4g_pieceone)); > > + memory_region_init_alias(ram_above_4g_pieceone, NULL, > > + "ram-above-4g-pieceone", ram, > > + 0x100000000ULL, pieceonesize); > > + memory_region_add_subregion(system_memory, 0x100000000ULL, > > + ram_above_4g_pieceone); > > + } > > Can you change the name of aliases and subregions without breaking > migration? > > Its much simpler, i'm fine with it. Test with Q35?