From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1165806AbdEAOQA (ORCPT ); Mon, 1 May 2017 10:16:00 -0400 Received: from mail-it0-f43.google.com ([209.85.214.43]:36288 "EHLO mail-it0-f43.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1165217AbdEAOPx (ORCPT ); Mon, 1 May 2017 10:15:53 -0400 MIME-Version: 1.0 In-Reply-To: <1493638874-4014-1-git-send-email-bhe@redhat.com> References: <1493638874-4014-1-git-send-email-bhe@redhat.com> From: Thomas Garnier Date: Mon, 1 May 2017 07:15:51 -0700 Message-ID: Subject: Re: [PATCH] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds To: Baoquan He Cc: LKML , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , "the arch/x86 maintainers" , Kees Cook , Andrew Morton , Yasuaki Ishimatsu , Jinbum Park , Dave Hansen , "Kirill A. Shutemov" , Yinghai Lu , Dan Williams , Dave Young Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id v41EGA5a014193 On Mon, May 1, 2017 at 4:41 AM, Baoquan He wrote: > > Jeff Moyer reported that on his system with two memory regions 0~64G and > 1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr > will make system hang intermittently during boot. While adding 'nokaslr' > won't. > > This is because the for loop count calculation in sync_global_pgds is > not correct. When a mapping area crosses pgd entries, we should > calculate the starting address of region which next pgd covers and assign > it to next for loop count, but not add PGDIR_SIZE directly. The old > code works right only if the mapping area is times of PGDIR_SIZE, > otherwize the end region could be skipped so that it can't be synchronized > to all other processes from kernel pgd init_mm.pgd. > > In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than > PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it > makes this area be mapped inside one pgd entry. With kaslr enabled, > this area could cross two pgd entries, then the next pgd entry won't > be synced to all other processes. That is why we saw empty PGD. Make a lot of sense. Thanks a lot for investigating this issue! Acked-by: Thomas Garnier > > Fix it in this patch. > > The back trace is pasted as below: > > [ 9.988867] IP: memcpy_erms+0x6/0x10 > [ 9.988868] PGD 0 > [ 9.988868] > [ 9.988870] Oops: 0000 [#1] SMP > [ 9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) ttm(E) libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) nd_pmem(E) dca(E) drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E) i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) > [ 9.988886] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: G E 4.11.0-rc5+ #43 > [ 9.988887] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS SE5C600.86B.02.01.SP06.050920141054 05/09/2014 > [ 9.988888] task: ffff9267dc2f8000 task.stack: ffffba92c783c000 > [ 9.988890] RIP: 0010:memcpy_erms+0x6/0x10 > [ 9.988891] RSP: 0018:ffffba92c783f9b8 EFLAGS: 00010286 > [ 9.988892] RAX: ffff925f19e27000 RBX: 0000000000000000 RCX: 0000000000001000 > [ 9.988893] RDX: 0000000000001000 RSI: ffff9387bfff0000 RDI: ffff925f19e27000 > [ 9.988893] RBP: ffffba92c783fa38 R08: 0000000000000000 R09: 0000000017ffff80 > [ 9.988894] R10: 0000000000000000 R11: ffff9387bfff0000 R12: ffff925fde811ed8 > [ 9.988895] R13: 0000002fffff0000 R14: 0000000000001000 R15: ffff925f19e27000 > [ 9.988896] FS: 00007f1ee18e68c0(0000) GS:ffff925fdec00000(0000) knlGS:0000000000000000 > [ 9.988896] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 9.988897] CR2: ffff9387bfff0000 CR3: 000000081ba28000 CR4: 00000000001406f0 > [ 9.988897] Call Trace: > [ 9.988902] ? pmem_do_bvec+0x93/0x290 [nd_pmem] > [ 9.988904] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 > [ 9.988905] ? radix_tree_node_alloc.constprop.20+0x85/0xc0 > [ 9.988907] pmem_rw_page+0x3a/0x60 [nd_pmem] > [ 9.988909] bdev_read_page+0x81/0xb0 > [ 9.988911] do_mpage_readpage+0x56f/0x770 > [ 9.988912] ? I_BDEV+0x20/0x20 > [ 9.988915] ? lru_cache_add+0xe/0x10 > [ 9.988917] mpage_readpages+0x148/0x1e0 > [ 9.988917] ? I_BDEV+0x20/0x20 > [ 9.988918] ? I_BDEV+0x20/0x20 > [ 9.988921] ? alloc_pages_current+0x88/0x120 > [ 9.988923] blkdev_readpages+0x1d/0x20 > [ 9.988924] __do_page_cache_readahead+0x1ce/0x2c0 > [ 9.988926] force_page_cache_readahead+0xa2/0x100 > [ 9.988927] page_cache_sync_readahead+0x3f/0x50 > [ 9.988930] generic_file_read_iter+0x60d/0x8c0 > [ 9.988931] blkdev_read_iter+0x37/0x40 > [ 9.988933] __vfs_read+0xe0/0x150 > [ 9.988934] vfs_read+0x8c/0x130 > [ 9.988936] SyS_read+0x55/0xc0 > [ 9.988939] entry_SYSCALL_64_fastpath+0x1a/0xa9 > [ 9.988940] RIP: 0033:0x7f1ee0822480 > [ 9.988941] RSP: 002b:00007ffcf9e741f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 > [ 9.988942] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1ee0822480 > [ 9.988943] RDX: 0000000000000040 RSI: 0000561b7e1aabc8 RDI: 0000000000000008 > [ 9.988943] RBP: 0000561b7e1a86a0 R08: 0000000000000005 R09: 0000000000000068 > [ 9.988944] R10: 00007ffcf9e73f80 R11: 0000000000000246 R12: 0000000000000000 > [ 9.988945] R13: 0000000000000001 R14: 0000561b7e1a61b0 R15: 0000561b7e1a55e0 > [ 9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 > [ 9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ffffba92c783f9b8 > [ 9.988962] CR2: ffff9387bfff0000 > [ 9.989022] ---[ end trace fe34c0fc0fe685ab ]--- > [ 9.998690] Kernel panic - not syncing: Fatal exception > [ 10.004708] Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > Reported-by: Jeff Moyer > Signed-off-by: Baoquan He > Cc: Thomas Gleixner > Cc: Ingo Molnar > Cc: "H. Peter Anvin" > Cc: x86@kernel.org > Cc: Kees Cook > Cc: Thomas Garnier > Cc: Andrew Morton > Cc: Yasuaki Ishimatsu > Cc: Jinbum Park > Cc: Dave Hansen > Cc: "Kirill A. Shutemov" > Cc: Yinghai Lu > Cc: Dan Williams > Cc: Dave Young > --- > arch/x86/mm/init_64.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c > index 15173d3..dbf4f00 100644 > --- a/arch/x86/mm/init_64.c > +++ b/arch/x86/mm/init_64.c > @@ -94,12 +94,14 @@ __setup("noexec32=", nonx32_setup); > */ > void sync_global_pgds(unsigned long start, unsigned long end) > { > - unsigned long address; > + unsigned long address, address_next; > > - for (address = start; address <= end; address += PGDIR_SIZE) { > + for (address = start; address <= end; address = address_next) { > const pgd_t *pgd_ref = pgd_offset_k(address); > struct page *page; > > + address_next = (address & PGDIR_MASK) + PGDIR_SIZE; > + > if (pgd_none(*pgd_ref)) > continue; > > -- > 2.5.5 > -- Thomas