From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751883AbcGUIOs (ORCPT ); Thu, 21 Jul 2016 04:14:48 -0400 Received: from mail-wm0-f65.google.com ([74.125.82.65]:34243 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751810AbcGUIOn (ORCPT ); Thu, 21 Jul 2016 04:14:43 -0400 Date: Thu, 21 Jul 2016 10:14:38 +0200 From: Ingo Molnar To: Andy Lutomirski Cc: "H. Peter Anvin" , x86@kernel.org, Mario Limonciello , Matthew Garrett , linux-kernel@vger.kernel.org, Kees Cook , Linus Torvalds , Thomas Gleixner , Andrew Morton , Peter Zijlstra Subject: [PATCH] x86/boot: Reorganize and clean up the BIOS area reservation code Message-ID: <20160721081438.GA26531@gmail.com> References: <7c190e9b3376002014fb99234ed5e5c64df86322.1469064662.git.luto@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <7c190e9b3376002014fb99234ed5e5c64df86322.1469064662.git.luto@kernel.org> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Andy Lutomirski wrote: > Under some conditions, my Dell XPS 13 9350 puts the EBDA at 0x2c000 > but reports the lowmem cutoff as 0. The old code reserves > everything above 0x2c000 and I can't boot [1]. > [1] This only breaks boot in practice when some other firmware or > GRUB oddity that I don't fully understand kicks in causing the > memory below 0x2c000 to be unusable. Exactly why can't Linux boot if *more* memory is reserved? Is it perhaps the SMP trampoline that cannot be allocated? Or does this shift some firmware memory corruption pattern that makes it more lethal than it already is? Without more information we don't really know the root cause here and I'm hesitant to touch such really old quirks without first understanding what's going on on your machine. Is the boot failure deterministic - if yes, could you try to dig a bit more into this? My guess it's the SMP trampoline, and I think we should robustify that in a different way: lets put it aside very early as a reservation (possibly in this very function), to guarantee that we have a below 1MB buffer for the SMP trampoline. This would be a lot more robust ... > --- a/arch/x86/kernel/ebda.c > +++ b/arch/x86/kernel/ebda.c > @@ -62,10 +62,12 @@ void __init reserve_ebda_region(void) > if (lowmem < INSANE_CUTOFF) > lowmem = LOWMEM_CAP; > > - /* Use the lower of the lowmem and EBDA markers as the cutoff */ > - lowmem = min(lowmem, ebda_addr); > lowmem = min(lowmem, LOWMEM_CAP); /* Absolute cap */ > > /* reserve all memory between lowmem and the 1MB mark */ > memblock_reserve(lowmem, 0x100000 - lowmem); > + > + /* if the EBDA is in lowmem, reserve it separately. */ > + if (ebda_addr < lowmem) > + memblock_reserve(ebda_addr, 4096); Ugh, this code is a mess, and your fix does not help make it better: - Firstly, could we please organize this code so that we first calculate ebda_addr and then lowmem - not this interleaved mess of first lowmem, then ebda_addr, then lowmem tweaks... - Also, could we please rename things to make it more obvious what's going on, and then apply changes on top? Here's a list of problems: - 'lowmem' here means 'super low mem' - i.e. 16-bit addressable memory. In other parts of the x86 code 'lowmem' means 32-bit addressable memory... This makes it super confusing to read. - It does not help at all that we have various memory range markers, half of which are 'start of range', half of which are 'end of range' - but this crucial property is not obvious in the naming at all ... gave me a headache trying to understand all this. - Also, the 'ebda_addr' name sucks: it highlights that it's an address (which is obvious, all values here are addresses!), while it does not highlight that it's the start of the EBDA region ... - 'BIOS_LOWMEM_KILOBYTES' says a lot of things, except that this is the only value that is a pointer to a value, not a memory range address. - The function name itself is a misnomer: it says 'reserve_ebda_region()' while its main purpose is to reserve all the firmware ROM typically between 640K and 1MB, while the 'EBDA' part is only a small part of that ... It should be renamed to something like reserve_bios_regions(). (The plural is intentional, to signal that this is potentially about multiple memory regions.) - Likewise, the paravirt quirk flag name 'ebda_search' is misleading as well: this too should be about whether to reserve firmware areas in the paravirt case. Something like ::reserve_bios_regions would work for me. - In fact thinking about this as 'end of RAM' is confusing: what this function *really* wants to reserve is firmware data and code areas! Once the thinking is inverted from a mixed 'ram' and 'reserved firmware area' notion to a pure 'reserved area' notion everything becomes a lot clearer. So something like this, names ordered in (probable) address order: BIOS_RAM_SIZE_KB_PTR // was: BIOS_LOWMEM_KILOBYTES BIOS_RAM_END_MIN // was: INSANE_CUTOFF ebda_start // was: ebda_addr bios_ram_end // was: lowmem BIOS_RAM_END_MAX // was: LOWMEM_CAP would work for me. Also the comments could need some fixing, refreshing and general harmonization. Blah - the patch below does this all (untested). Just look how much more readable it all became! In fact I believe we should try another patch on top of this and massively simplify the whole EBDA mess: - Find a suitable address for the SMP trampoline and remember it for the SMP boot code. - Reserve *ALL* of the first 2MB of RAM. - (are there any other pieces of kernel code that require memory below 1MB/2MB?) The thing is, x86 CPUs are already treating the first 1MB of RAM in a special way: for example 2MB TLBs get split up into 4K TLBs regardless of what the pagetable or the MTRRs say. So it's a substandard piece of RAM already that we want to skip for performance reasons. Furthermore we had various problems in the past where firmware corrupted the first 1MB of RAM. So could we please just get rid of this class of problems and reserve it all? The only thing we need from there is a suitable place for the SMP trampoline. We should try to allocate that somewhere in the middle of the (suspected...) available RAM range based on BIOS_RAM_SIZE_KB_PTR (no EBDA quirking at this point), where the probability of firmware interference is the lowest. If the EBDA pointer happens to point to the exact same location we should bump the trampoline address down so that it's below the EBDA. (and should generate a low-key KERN_INFO printk that we've done so.) This strategy should solve your boot problem too I believe. Thanks, Ingo