From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DDC13C43381 for ; Thu, 14 Mar 2019 09:47:40 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id A40B62070D for ; Thu, 14 Mar 2019 09:47:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727361AbfCNJri (ORCPT ); Thu, 14 Mar 2019 05:47:38 -0400 Received: from mx1.redhat.com ([209.132.183.28]:15794 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726737AbfCNJri (ORCPT ); Thu, 14 Mar 2019 05:47:38 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id B8FDC63803; Thu, 14 Mar 2019 09:47:37 +0000 (UTC) Received: from MiWiFi-R3L-srv.redhat.com (ovpn-12-22.pek2.redhat.com [10.72.12.22]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0CAF569500; Thu, 14 Mar 2019 09:47:31 +0000 (UTC) From: Baoquan He To: linux-kernel@vger.kernel.org Cc: mingo@kernel.org, keescook@chromium.org, kirill@shutemov.name, yamada.masahiro@socionext.com, tglx@linutronix.de, bp@alien8.de, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, x86@kernel.org, thgarnie@google.com, Baoquan He Subject: [PATCH v4 6/6] x86/mm/KASLR: Do not adapt the size of the direct mapping region for SGI UV system Date: Thu, 14 Mar 2019 17:46:45 +0800 Message-Id: <20190314094645.4883-7-bhe@redhat.com> In-Reply-To: <20190314094645.4883-1-bhe@redhat.com> References: <20190314094645.4883-1-bhe@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 14 Mar 2019 09:47:37 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On SGI UV system, kernel often hangs when KASLR is enabled. Disabling KASLR makes kernel work well. The back trace is: kernel BUG at arch/x86/mm/init_64.c:311! invalid opcode: 0000 [#1] SMP [...] RIP: 0010:__init_extra_mapping+0x188/0x196 [...] Call Trace: init_extra_mapping_uc+0x13/0x15 map_high+0x67/0x75 map_mmioh_high_uv3+0x20a/0x219 uv_system_init_hub+0x12d9/0x1496 uv_system_init+0x27/0x29 native_smp_prepare_cpus+0x28d/0x2d8 kernel_init_freeable+0xdd/0x253 ? rest_init+0x80/0x80 kernel_init+0xe/0x110 ret_from_fork+0x2c/0x40 This is because the SGI UV system need map its MMIOH region to the direct mapping section, and the mapping happens in rest_init() which is much later than the calling of kernel_randomize_memory() to do mm KASLR. So mm KASLR can't count in the size of the MMIOH region when calculate the needed size of address space for the direct mapping section. When KASLR is disabled, there are 64TB address space for both system RAM and the MMIOH regions to share. When KASLR is enabled, the current code of mm KASLR only reserves the actual size of system RAM plus extra 10TB for the direct mapping. Thus later the MMIOH mapping could go beyond the upper bound of the direct mapping to step into VMALLOC or VMEMMAP area. Then BUG_ON() in __init_extra_mapping() will be triggered. E.g on the SGI UV3 machine where this bug is reported , there are two MMIOH regions: [ 1.519001] UV: Map MMIOH0_HI 0xffc00000000 - 0x100000000000 [ 1.523001] UV: Map MMIOH1_HI 0x100000000000 - 0x200000000000 They are [16TB-16G, 16TB) and [16TB, 32TB). On this machine, 512G RAM are spread out to 1TB regions. Then above two SGI MMIOH regions also will be mapped into the direct mapping section. To fix it, we need check if it's SGI UV system by calling is_early_uv_system() in kernel_randomize_memory(). If yes, do not adapt the size of the direct mapping section, just keep it as 64TB. Signed-off-by: Baoquan He --- arch/x86/mm/kaslr.c | 60 +++++++++++++++++++++++++++++++++------------ 1 file changed, 45 insertions(+), 15 deletions(-) diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c index 615a79f6b701..7584124fca82 100644 --- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -29,6 +29,7 @@ #include #include #include +#include #include "mm_internal.h" @@ -104,6 +105,46 @@ static inline bool kaslr_memory_enabled(void) return kaslr_enabled() && !IS_ENABLED(CONFIG_KASAN); } +/* + * calc_direct_mapping_size - calculate the needed size of the direct + * mapping area. + * + * Even though a huge virtual address space is reserved for the direct + * mapping of physical memory, e.g in 4-level pageing mode, it's 64TB, + * rare system can own enough physical memory to use it up, most are + * even less than 1TB. So with KASLR enabled, we adapt the size of + * direct mapping area to size of actual physical memory plus the + * configured padding CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING. + * The left part will be taken out to join memory randomization. + * + * Note that UV system is an exception, its MMIOH region need be mapped + * into the direct mapping area too, while the size can't be got until + * rest_init() calling. Hence for UV system, do not adapt the size + * of direct mapping area. + */ +static inline unsigned long calc_direct_mapping_size(void) +{ + unsigned long size_tb, memory_tb; + + /* + * Update Physical memory mapping to available and + * add padding if needed (especially for memory hotplug support). + */ + memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) + + CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; + + size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT); + + /* + * Adapt phyiscal memory region size based on available memory if + * it's not UV system. + */ + if (memory_tb < size_tb && !is_early_uv_system()) + size_tb = memory_tb; + + return size_tb; +} + /* * kernel_randomize_memory - initialize base and padding for each * memory region randomized with KASLR. @@ -113,12 +154,11 @@ static inline bool kaslr_memory_enabled(void) */ void __init kernel_randomize_memory(void) { - size_t i; - unsigned long vaddr_start, vaddr; - unsigned long rand, memory_tb; - struct rnd_state rand_state; + unsigned long vaddr_start, vaddr, rand; unsigned long remain_entropy; unsigned long vmemmap_size; + struct rnd_state rand_state; + size_t i; vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4; vaddr = vaddr_start; @@ -135,20 +175,10 @@ void __init kernel_randomize_memory(void) if (!kaslr_memory_enabled()) return; - kaslr_regions[0].size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT); + kaslr_regions[0].size_tb = calc_direct_mapping_size(); kaslr_regions[1].size_tb = VMALLOC_SIZE_TB; - /* - * Update Physical memory mapping to available and - * add padding if needed (especially for memory hotplug support). - */ BUG_ON(kaslr_regions[0].base != &page_offset_base); - memory_tb = DIV_ROUND_UP(max_pfn << PAGE_SHIFT, 1UL << TB_SHIFT) + - CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING; - - /* Adapt phyiscal memory region size based on available memory */ - if (memory_tb < kaslr_regions[0].size_tb) - kaslr_regions[0].size_tb = memory_tb; /* * Calculate how many TB vmemmap region needs, and align to 1 TB -- 2.17.2