linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Baoquan He <bhe@redhat.com>
To: linux-kernel@vger.kernel.org
Cc: Baoquan He <bhe@redhat.com>, Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, Kees Cook <keescook@chromium.org>,
	Thomas Garnier <thgarnie@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Yasuaki Ishimatsu <yasu.isimatu@gmail.com>,
	Jinbum Park <jinb.park7@gmail.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Yinghai Lu <yinghai@kernel.org>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Young <dyoung@redhat.com>
Subject: [PATCH] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds
Date: Mon,  1 May 2017 19:41:14 +0800	[thread overview]
Message-ID: <1493638874-4014-1-git-send-email-bhe@redhat.com> (raw)

Jeff Moyer reported that on his system with two memory regions 0~64G and
1T~1T+192G, and kernel option "memmap=192G!1024G" added, enabling kaslr
will make system hang intermittently during boot. While adding 'nokaslr'
won't.

This is because the for loop count calculation in sync_global_pgds is
not correct. When a mapping area crosses pgd entries, we should
calculate the starting address of region which next pgd covers and assign
it to next for loop count, but not add PGDIR_SIZE directly. The old
code works right only if the mapping area is times of PGDIR_SIZE,
otherwize the end region could be skipped so that it can't be synchronized
to all other processes from kernel pgd init_mm.pgd.

In Jeff's system, emulated pmem area [1024G, 1216G) is smaller than
PGDIR_SIZE. While 'nokaslr' works because PAGE_OFFSET is 1T aligned, it
makes this area be mapped inside one pgd entry. With kaslr enabled,
this area could cross two pgd entries, then the next pgd entry won't
be synced to all other processes. That is why we saw empty PGD.

Fix it in this patch.

The back trace is pasted as below:

[    9.988867] IP: memcpy_erms+0x6/0x10
[    9.988868] PGD 0
[    9.988868]
[    9.988870] Oops: 0000 [#1] SMP
[    9.988871] Modules linked in: isci(E) mgag200(E+) drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) igb(E) ahci(E) ttm(E) libsas(E) libahci(E) scsi_transport_sas(E) ptp(E) pps_core(E) nd_pmem(E) dca(E) drm(E) i2c_algo_bit(E) libata(E) crc32c_intel(E) nd_btt(E) i2c_core(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[    9.988886] CPU: 0 PID: 442 Comm: systemd-udevd Tainted: G            E   4.11.0-rc5+ #43
[    9.988887] Hardware name: Intel Corporation LH Pass/SVRBD-ROW_P, BIOS SE5C600.86B.02.01.SP06.050920141054 05/09/2014
[    9.988888] task: ffff9267dc2f8000 task.stack: ffffba92c783c000
[    9.988890] RIP: 0010:memcpy_erms+0x6/0x10
[    9.988891] RSP: 0018:ffffba92c783f9b8 EFLAGS: 00010286
[    9.988892] RAX: ffff925f19e27000 RBX: 0000000000000000 RCX: 0000000000001000
[    9.988893] RDX: 0000000000001000 RSI: ffff9387bfff0000 RDI: ffff925f19e27000
[    9.988893] RBP: ffffba92c783fa38 R08: 0000000000000000 R09: 0000000017ffff80
[    9.988894] R10: 0000000000000000 R11: ffff9387bfff0000 R12: ffff925fde811ed8
[    9.988895] R13: 0000002fffff0000 R14: 0000000000001000 R15: ffff925f19e27000
[    9.988896] FS:  00007f1ee18e68c0(0000) GS:ffff925fdec00000(0000) knlGS:0000000000000000
[    9.988896] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.988897] CR2: ffff9387bfff0000 CR3: 000000081ba28000 CR4: 00000000001406f0
[    9.988897] Call Trace:
[    9.988902]  ? pmem_do_bvec+0x93/0x290 [nd_pmem]
[    9.988904]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
[    9.988905]  ? radix_tree_node_alloc.constprop.20+0x85/0xc0
[    9.988907]  pmem_rw_page+0x3a/0x60 [nd_pmem]
[    9.988909]  bdev_read_page+0x81/0xb0
[    9.988911]  do_mpage_readpage+0x56f/0x770
[    9.988912]  ? I_BDEV+0x20/0x20
[    9.988915]  ? lru_cache_add+0xe/0x10
[    9.988917]  mpage_readpages+0x148/0x1e0
[    9.988917]  ? I_BDEV+0x20/0x20
[    9.988918]  ? I_BDEV+0x20/0x20
[    9.988921]  ? alloc_pages_current+0x88/0x120
[    9.988923]  blkdev_readpages+0x1d/0x20
[    9.988924]  __do_page_cache_readahead+0x1ce/0x2c0
[    9.988926]  force_page_cache_readahead+0xa2/0x100
[    9.988927]  page_cache_sync_readahead+0x3f/0x50
[    9.988930]  generic_file_read_iter+0x60d/0x8c0
[    9.988931]  blkdev_read_iter+0x37/0x40
[    9.988933]  __vfs_read+0xe0/0x150
[    9.988934]  vfs_read+0x8c/0x130
[    9.988936]  SyS_read+0x55/0xc0
[    9.988939]  entry_SYSCALL_64_fastpath+0x1a/0xa9
[    9.988940] RIP: 0033:0x7f1ee0822480
[    9.988941] RSP: 002b:00007ffcf9e741f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[    9.988942] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f1ee0822480
[    9.988943] RDX: 0000000000000040 RSI: 0000561b7e1aabc8 RDI: 0000000000000008
[    9.988943] RBP: 0000561b7e1a86a0 R08: 0000000000000005 R09: 0000000000000068
[    9.988944] R10: 00007ffcf9e73f80 R11: 0000000000000246 R12: 0000000000000000
[    9.988945] R13: 0000000000000001 R14: 0000561b7e1a61b0 R15: 0000561b7e1a55e0
[    9.988946] Code: ff 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48 89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
[    9.988962] RIP: memcpy_erms+0x6/0x10 RSP: ffffba92c783f9b8
[    9.988962] CR2: ffff9387bfff0000
[    9.989022] ---[ end trace fe34c0fc0fe685ab ]---
[    9.998690] Kernel panic - not syncing: Fatal exception
[   10.004708] Kernel Offset: 0x11000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Reported-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Baoquan He <bhe@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com> 
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org 
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Garnier <thgarnie@google.com> 
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yasuaki Ishimatsu <yasu.isimatu@gmail.com>
Cc: Jinbum Park <jinb.park7@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com> 
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Young <dyoung@redhat.com>
---
 arch/x86/mm/init_64.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 15173d3..dbf4f00 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -94,12 +94,14 @@ __setup("noexec32=", nonx32_setup);
  */
 void sync_global_pgds(unsigned long start, unsigned long end)
 {
-	unsigned long address;
+	unsigned long address, address_next;
 
-	for (address = start; address <= end; address += PGDIR_SIZE) {
+	for (address = start; address <= end; address = address_next) {
 		const pgd_t *pgd_ref = pgd_offset_k(address);
 		struct page *page;
 
+		address_next = (address & PGDIR_MASK) + PGDIR_SIZE;
+
 		if (pgd_none(*pgd_ref))
 			continue;
 
-- 
2.5.5

             reply	other threads:[~2017-05-01 11:41 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-05-01 11:41 Baoquan He [this message]
2017-05-01 14:15 ` Thomas Garnier
2017-05-01 14:40 ` Dan Williams
2017-05-01 14:52   ` Baoquan He
2017-05-01 15:24     ` Dan Williams
2017-05-01 15:31       ` Baoquan He
2017-05-01 22:37 ` Yinghai Lu
2017-05-02  7:18   ` Baoquan He
2017-05-02  7:24     ` Ingo Molnar
2017-05-02  7:40       ` Baoquan He

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1493638874-4014-1-git-send-email-bhe@redhat.com \
    --to=bhe@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dyoung@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jinb.park7@gmail.com \
    --cc=keescook@chromium.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=thgarnie@google.com \
    --cc=x86@kernel.org \
    --cc=yasu.isimatu@gmail.com \
    --cc=yinghai@kernel.org \
    --subject='Re: [PATCH] x86/mm: Fix incorrect for loop count calculation in sync_global_pgds' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).