From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932692AbdEAPzy (ORCPT ); Mon, 1 May 2017 11:55:54 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55632 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755503AbdEAPzs (ORCPT ); Mon, 1 May 2017 11:55:48 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com A66E8C00AFD1 Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx08.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=bhe@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com A66E8C00AFD1 From: Baoquan He To: linux-kernel@vger.kernel.org Cc: Baoquan He , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , x86@kernel.org, Kees Cook , Thomas Garnier , Andrew Morton , Yasuaki Ishimatsu , Jinbum Park , Dave Hansen , "Kirill A. Shutemov" , Yinghai Lu , Dan Williams , Dave Young Subject: [PATCH v2] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds Date: Mon, 1 May 2017 23:55:35 +0800 Message-Id: <1493654135-16645-1-git-send-email-bhe@redhat.com> X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Mon, 01 May 2017 15:55:48 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jeff Moyer reported that on his system with two memory regions 0~64G and 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr will make system hang intermittently during boot. While adding 'nokaslr' won't. This is because the for loop count calculation in sync_global_pgds is not correct. When a mapping area crosses pgd entries, we should calculate the starting address of region which next pgd covers and assign it to next for loop count, but not add PGDIR_SIZE directly. The old code works right only if the mapping area is times of PGDIR_SIZE, otherwize the end region could be skipped so that it can't be synchronized to all other processes from kernel pgd init_mm.pgd. In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it makes this area be mapped inside one pgd entry. With kaslr enabled, this area could cross two pgd entries, then the next pgd entry won't be synced to all other processes. That is why we saw empty PGD. Fix it in this patch. The back trace is pasted as below: [ 9.988867] IP: memcpy_erms+0x6/0x10 [ 9.988868] PGD 0 [ 9.988868] [ 9.988870] Oops: 0000 [#1] SMP [ 9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) ttm(E) libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) nd_pmem(E) dca(E) drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E) i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [ 9.988886] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: G E 4.11.0-rc5+ #43 [ 9.988887] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS SE5C600.86B.02.01.SP06.050920141054 05/09/2014 [ 9.988888] task: ffff9267dc2f8000 task.stack: ffffba92c783c000 [ 9.988890] RIP: 0010:memcpy_erms+0x6/0x10 [ 9.988891] RSP: 0018:ffffba92c783f9b8 EFLAGS: 00010286 [ 9.988892] RAX: ffff925f19e27000 RBX: 0000000000000000 RCX: 0000000000001000 [ 9.988893] RDX: 0000000000001000 RSI: ffff9387bfff0000 RDI: ffff925f19e27000 [ 9.988893] RBP: ffffba92c783fa38 R08: 0000000000000000 R09: 0000000017ffff80 [ 9.988894] R10: 0000000000000000 R11: ffff9387bfff0000 R12: ffff925fde811ed8 [ 9.988895] R13: 0000002fffff0000 R14: 0000000000001000 R15: ffff925f19e27000 [ 9.988896] FS: 00007f1ee18e68c0(0000) GS:ffff925fdec00000(0000) knlGS:0000000000000000 [ 9.988896] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 9.988897] CR2: ffff9387bfff0000 CR3: 000000081ba28000 CR4: 00000000001406f0 [ 9.988897] Call Trace: [ 9.988902] ? pmem_do_bvec+0x93/0x290 [nd_pmem] [ 9.988904] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 [ 9.988905] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 [ 9.988907] pmem_rw_page+0x3a/0x60 [nd_pmem] [ 9.988909] bdev_read_page+0x81/0xb0 [ 9.988911] do_mpage_readpage+0x56f/0x770 [ 9.988912] ? I_BDEV+0x20/0x20 [ 9.988915] ? lru_cache_add+0xe/0x10 [ 9.988917] mpage_readpages+0x148/0x1e0 [ 9.988917] ? I_BDEV+0x20/0x20 [ 9.988918] ? I_BDEV+0x20/0x20 [ 9.988921] ? alloc_pages_current+0x88/0x120 [ 9.988923] blkdev_readpages+0x1d/0x20 [ 9.988924] __do_page_cache_readahead+0x1ce/0x2c0 [ 9.988926] force_page_cache_readahead+0xa2/0x100 [ 9.988927] page_cache_sync_readahead+0x3f/0x50 [ 9.988930] generic_file_read_iter+0x60d/0x8c0 [ 9.988931] blkdev_read_iter+0x37/0x40 [ 9.988933] __vfs_read+0xe0/0x150 [ 9.988934] vfs_read+0x8c/0x130 [ 9.988936] SyS_read+0x55/0xc0 [ 9.988939] entry_SYSCALL_64_fastpath+0x1a/0xa9 [ 9.988940] RIP: 0033:0x7f1ee0822480 [ 9.988941] RSP: 002b:00007ffcf9e741f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 9.988942] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1ee0822480 [ 9.988943] RDX: 0000000000000040 RSI: 0000561b7e1aabc8 RDI: 0000000000000008 [ 9.988943] RBP: 0000561b7e1a86a0 R08: 0000000000000005 R09: 0000000000000068 [ 9.988944] R10: 00007ffcf9e73f80 R11: 0000000000000246 R12: 0000000000000000 [ 9.988945] R13: 0000000000000001 R14: 0000561b7e1a61b0 R15: 0000561b7e1a55e0 [ 9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 [ 9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ffffba92c783f9b8 [ 9.988962] CR2: ffff9387bfff0000 [ 9.989022] ---[ end trace fe34c0fc0fe685ab ]--- [ 9.998690] Kernel panic - not syncing: Fatal exception [ 10.004708] Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) Reported-by: Jeff Moyer Signed-off-by: Baoquan He Cc: Thomas Gleixner Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: x86@kernel.org Cc: Kees Cook Cc: Thomas Garnier Cc: Andrew Morton Cc: Yasuaki Ishimatsu Cc: Jinbum Park Cc: Dave Hansen Cc: "Kirill A. Shutemov" Cc: Yinghai Lu Cc: Dan Williams Cc: Dave Young --- v1->v2: Code format optimized suggested by Dan. arch/x86/mm/init_64.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 15173d3..dfa9edb 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -96,7 +96,9 @@ void sync_global_pgds(unsigned long start, unsigned long end) { unsigned long address; - for (address = start; address <= end; address += PGDIR_SIZE) { + for (address = start; address <= end; + address = ALIGN(address + 1, PGDIR_SIZE)) { + const pgd_t *pgd_ref = pgd_offset_k(address); struct page *page; -- 2.5.5