From mboxrd@z Thu Jan 1 00:00:00 1970 From: George Dunlap Subject: Re: [v8][PATCH 07/16] hvmloader/e820: construct guest e820 table Date: Thu, 16 Jul 2015 16:33:15 +0100 Message-ID: <55A7CEBB.8090701@citrix.com> References: <1437029582-19564-1-git-send-email-tiejun.chen@intel.com> <1437029582-19564-8-git-send-email-tiejun.chen@intel.com> <55A7ADAC.1010004@intel.com> <55A7C804.2040101@intel.com> <55A7CABE.4040005@citrix.com> <55A7CDBF.1030104@intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <55A7CDBF.1030104@intel.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "Chen, Tiejun" , George Dunlap Cc: Keir Fraser , Ian Campbell , Stefano Stabellini , Andrew Cooper , Ian Jackson , "xen-devel@lists.xen.org" , Jan Beulich , Wei Liu List-Id: xen-devel@lists.xenproject.org On 07/16/2015 04:29 PM, Chen, Tiejun wrote: > > > On 2015/7/16 23:16, George Dunlap wrote: >> On 07/16/2015 04:04 PM, Chen, Tiejun wrote: >>>> Yes, sorry, add_high_mem will be the size of memory *relocated*, not >>>> the actual end of it (unless, as you say, the original highmem region >>>> didn't exist). >>>> >>>> What I really meant was that either way, after adjusting the highmem >>>> region in the e820, the end of that region should correspond to >>>> hvm_info->high_mem_pgend. >>>> >>>> What about something like this? >>>> --- >>>> /* >>>> * And then we also need to adjust highmem. >>>> */ >>>> if ( add_high_mem ) >>>> { >>>> /* >>>> * Modify the existing highmem region if it exists >>>> */ >>>> for ( i = 0; i < nr; i++ ) >>>> { >>>> if ( e820[i].type == E820_RAM && >>>> e820[i].addr == (1ull << 32)) >>>> { >>>> e820[i].size += add_high_mem; >>>> break; >>>> } >>>> } >>>> >>>> /* >>>> * If we didn't find a highmem region, make one >>>> */ >>>> if ( i == nr ) >>>> { >>>> e820[nr].addr = ((uint64_t)1 << 32); >>>> e820[nr].size = e820[nr].addr + add_high_mem; >>>> e820[nr].type = E820_RAM; >>>> nr++; >>>> } >>>> >>>> /* >>>> * Either way, at this point i points to the entry containing >>>> * highmem. Compare it to what's in hvm_info as a sanity >>>> * check. >>>> */ >>>> BUG_ON(e820[i].addr+e820[i].size != >>>> ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT)); >>>> } >>>> >>> >>> Looks really better. >>> >>> I just introduce a little change based on yours, and I post this as a >>> whole, >>> >>> diff --git a/tools/firmware/hvmloader/e820.c >>> b/tools/firmware/hvmloader/e820.c >>> index 7a414ab..8c9b01f 100644 >>> --- a/tools/firmware/hvmloader/e820.c >>> +++ b/tools/firmware/hvmloader/e820.c >>> @@ -105,7 +105,10 @@ int build_e820_table(struct e820entry *e820, >>> unsigned int lowmem_reserved_base, >>> unsigned int bios_image_base) >>> { >>> - unsigned int nr = 0; >>> + unsigned int nr = 0, i, j; >>> + uint32_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT; >>> + uint64_t high_mem_end = (uint64_t)hvm_info->high_mem_pgend << >>> PAGE_SHIFT; >>> + uint64_t add_high_mem = 0; >>> >>> if ( !lowmem_reserved_base ) >>> lowmem_reserved_base = 0xA0000; >>> @@ -149,13 +152,6 @@ int build_e820_table(struct e820entry *e820, >>> e820[nr].type = E820_RESERVED; >>> nr++; >>> >>> - /* Low RAM goes here. Reserve space for special pages. */ >>> - BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20)); >>> - e820[nr].addr = 0x100000; >>> - e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - >>> e820[nr].addr; >>> - e820[nr].type = E820_RAM; >>> - nr++; >>> - >>> /* >>> * Explicitly reserve space for special pages. >>> * This space starts at RESERVED_MEMBASE an extends to cover >>> various >>> @@ -191,16 +187,91 @@ int build_e820_table(struct e820entry *e820, >>> nr++; >>> } >>> >>> - >>> - if ( hvm_info->high_mem_pgend ) >>> + /* >>> + * Construct E820 table according to recorded memory map. >>> + * >>> + * The memory map created by toolstack may include, >>> + * >>> + * #1. Low memory region >>> + * >>> + * Low RAM starts at least from 1M to make sure all standard >>> regions >>> + * of the PC memory map, like BIOS, VGA memory-mapped I/O and >>> vgabios, >>> + * have enough space. >>> + * >>> + * #2. Reserved regions if they exist >>> + * >>> + * #3. High memory region if it exists >>> + */ >>> + for ( i = 0; i < memory_map.nr_map; i++ ) >>> { >>> - e820[nr].addr = ((uint64_t)1 << 32); >>> - e820[nr].size = >>> - ((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - >>> e820[nr].addr; >>> - e820[nr].type = E820_RAM; >>> + e820[nr] = memory_map.map[i]; >>> nr++; >>> } >>> >>> + /* Low RAM goes here. Reserve space for special pages. */ >>> + BUG_ON(low_mem_end < (2u << 20)); >>> + >>> + /* >>> + * Its possible to relocate RAM to allocate sufficient MMIO >>> previously >>> + * so low_mem_pgend would be changed over there. And here >>> memory_map[] >>> + * records the original low/high memory, so if low_mem_end is less >>> than >>> + * the original we need to revise low/high memory range in e820. >>> + */ >>> + for ( i = 0; i < nr; i++ ) >>> + { >>> + uint64_t end = e820[i].addr + e820[i].size; >>> + if ( e820[i].type == E820_RAM && >>> + low_mem_end > e820[i].addr && low_mem_end < end ) >>> + { >>> + add_high_mem = end - low_mem_end; >>> + e820[i].size = low_mem_end - e820[i].addr; >>> + } >>> + } >>> + >>> + /* >>> + * And then we also need to adjust highmem. >>> + */ >>> + if ( add_high_mem ) >>> + { >>> + /* Modify the existing highmem region if it exists. */ >>> + for ( i = 0; i < nr; i++ ) >>> + { >>> + if ( e820[i].type == E820_RAM && >>> + e820[i].addr == ((uint64_t)1 << 32)) >>> + { >>> + e820[i].size += add_high_mem; >>> + break; >>> + } >>> + } >>> + >>> + /* If there was no highmem region, just create one. */ >>> + if ( i == nr ) >>> + { >>> + e820[nr].addr = ((uint64_t)1 << 32); >>> + e820[nr].size = high_mem_end - e820[nr].addr; >>> + e820[nr].type = E820_RAM; >>> + nr++; >>> + } >>> + >>> + /* A sanity check if high memory is broken. */ >>> + BUG_ON( high_mem_end != e820[i].addr + e820[i].size); >> >> The reason I wrote it the way I did was so that we would cross-check our >> lowmem adjustments (via add_high_mem) with the value in hvm_info in >> *both cases*. >> >> In the code above, you'll get the sanity check if we modify an existing >> e820 entry; but if we create a new entry, then we don't check to make >> sure that the amount we removed from the lowmem entry equals the amount >> we added to the highmem entry. > > Are you saying the following two cases are not same? > > uint64_t high_mem_end = (uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT; > BUG_ON( high_mem_end != e820[i].addr + e820[i].size); > vs. > BUG_ON(e820[i].addr+e820[i].size != ((uint64_t)hvm_info->high_mem_pgend > << PAGE_SHIFT)); > > Why? Note hvm_info->high_mem_pgend don't change while build e820 table. > > Honestly I didn't try to change that point but maybe I'm missing something? Yes, you are missing something. :-) I told you exactly what I wanted changed and what I said could remain the same: >> By all means, calculate high_mem_end so it's easier to read. But then, >> when creating a new region, set e820[nr].size = add_high_mem, so that >> the BUG_ON() that follows actually checks something useful. Just to be clear, I want the second if() statement to look like this: >>> + if ( i == nr ) >>> + { >>> + e820[nr].addr = ((uint64_t)1 << 32); >>> + e820[nr].size = add_high_mem; >>> + e820[nr].type = E820_RAM; >>> + nr++; >>> + } Think about why and maybe that will help you understand what I'm talking about. -George