All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sasha.levin@oracle.com>
To: stable@vger.kernel.org, stable-commits@vger.kernel.org
Cc: Toshi Kani <toshi.kani@hpe.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@amacapital.net>,
	Brian Gerst <brgerst@gmail.com>,
	Denys Vlasenko <dvlasenk@redhat.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"Luis R. Rodriguez" <mcgrof@suse.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Toshi Kani <toshi.kani@hp.com>,
	linux-mm@kvack.org, linux-nvdimm@ml01.01.org,
	Ingo Molnar <mingo@kernel.org>,
	Sasha Levin <sasha.levin@oracle.com>
Subject: [added to the 4.1 stable tree] x86/mm: Fix vmalloc_fault() to handle large pages properly
Date: Wed,  2 Mar 2016 15:14:26 -0500	[thread overview]
Message-ID: <1456949673-25036-79-git-send-email-sasha.levin@oracle.com> (raw)
In-Reply-To: <1456949673-25036-1-git-send-email-sasha.levin@oracle.com>

From: Toshi Kani <toshi.kani@hpe.com>

This patch has been added to the 4.1 stable tree. If you have any
objections, please let us know.

===============

[ Upstream commit f4eafd8bcd5229e998aa252627703b8462c3b90f ]

A kernel page fault oops with the callstack below was observed
when a read syscall was made to a pmem device after a huge amount
(>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64
system:

     BUG: unable to handle kernel paging request at ffff880840000ff8
     IP: vmalloc_fault+0x1be/0x300
     PGD c7f03a067 PUD 0
     Oops: 0000 [#1] SM
     Call Trace:
        __do_page_fault+0x285/0x3e0
        do_page_fault+0x2f/0x80
        ? put_prev_entity+0x35/0x7a0
        page_fault+0x28/0x30
        ? memcpy_erms+0x6/0x10
        ? schedule+0x35/0x80
        ? pmem_rw_bytes+0x6a/0x190 [nd_pmem]
        ? schedule_timeout+0x183/0x240
        btt_log_read+0x63/0x140 [nd_btt]
         :
        ? __symbol_put+0x60/0x60
        ? kernel_read+0x50/0x80
        SyS_finit_module+0xb9/0xf0
        entry_SYSCALL_64_fastpath+0x1a/0xa4

Since v4.1, ioremap() supports large page (pud/pmd) mappings in
x86_64 and PAE.  vmalloc_fault() however assumes that the vmalloc
range is limited to pte mappings.

vmalloc faults do not normally happen in ioremap'd ranges since
ioremap() sets up the kernel page tables, which are shared by
user processes.  pgd_ctor() sets the kernel's PGD entries to
user's during fork().  When allocation of the vmalloc ranges
crosses a 512GB boundary, ioremap() allocates a new pud table
and updates the kernel PGD entry to point it.  If user process's
PGD entry does not have this update yet, a read/write syscall
to the range will cause a vmalloc fault, which hits the Oops
above as it does not handle a large page properly.

Following changes are made to vmalloc_fault().

64-bit:

 - No change for the PGD sync operation as it handles large
   pages already.
 - Add pud_huge() and pmd_huge() to the validation code to
   handle large pages.
 - Change pud_page_vaddr() to pud_pfn() since an ioremap range
   is not directly mapped (while the if-statement still works
   with a bogus addr).
 - Change pmd_page() to pmd_pfn() since an ioremap range is not
   backed by struct page (while the if-statement still works
   with a bogus addr).

32-bit:
 - No change for the sync operation since the index3 PGD entry
   covers the entire vmalloc range, which is always valid.
   (A separate change to sync PGD entry is necessary if this
    memory layout is changed regardless of the page size.)
 - Add pmd_huge() to the validation code to handle large pages.
   This is for completeness since vmalloc_fault() won't happen
   in ioremap'd ranges as its PGD entry is always valid.

Reported-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: <stable@vger.kernel.org> # 4.1+
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
---
 arch/x86/mm/fault.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 181c53b..62855ac 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -285,6 +285,9 @@ static noinline int vmalloc_fault(unsigned long address)
 	if (!pmd_k)
 		return -1;
 
+	if (pmd_huge(*pmd_k))
+		return 0;
+
 	pte_k = pte_offset_kernel(pmd_k, address);
 	if (!pte_present(*pte_k))
 		return -1;
@@ -356,8 +359,6 @@ void vmalloc_sync_all(void)
  * 64-bit:
  *
  *   Handle a fault on the vmalloc area
- *
- * This assumes no large pages in there.
  */
 static noinline int vmalloc_fault(unsigned long address)
 {
@@ -399,17 +400,23 @@ static noinline int vmalloc_fault(unsigned long address)
 	if (pud_none(*pud_ref))
 		return -1;
 
-	if (pud_none(*pud) || pud_page_vaddr(*pud) != pud_page_vaddr(*pud_ref))
+	if (pud_none(*pud) || pud_pfn(*pud) != pud_pfn(*pud_ref))
 		BUG();
 
+	if (pud_huge(*pud))
+		return 0;
+
 	pmd = pmd_offset(pud, address);
 	pmd_ref = pmd_offset(pud_ref, address);
 	if (pmd_none(*pmd_ref))
 		return -1;
 
-	if (pmd_none(*pmd) || pmd_page(*pmd) != pmd_page(*pmd_ref))
+	if (pmd_none(*pmd) || pmd_pfn(*pmd) != pmd_pfn(*pmd_ref))
 		BUG();
 
+	if (pmd_huge(*pmd))
+		return 0;
+
 	pte_ref = pte_offset_kernel(pmd_ref, address);
 	if (!pte_present(*pte_ref))
 		return -1;
-- 
2.5.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-03-02 20:14 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-02 20:13 [added to the 4.1 stable tree] iommu/vt-d: Fix 64-bit accesses to 32-bit DMAR_GSTS_REG Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] ASoC: rt5645: fix the shift bit of IN1 boost Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] cgroup: separate out include/linux/cgroup-defs.h Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] cgroup: make sure a parent css isn't offlined before its children Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] PCI/AER: Flush workqueue on device remove to avoid use-after-free Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] libata: disable forced PORTS_IMPL for >= AHCI 1.3 Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] Input: vmmouse - fix absolute device registration Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] qla2xxx: cleanup cmd in qla workqueue before processing TMR Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] qla2xxx: delay plogi/prli ack until existing sessions are deleted Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] qla2xxx: drop cmds/tmrs arrived while session is being deleted Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] qla2xxx: Abort stale cmds on qla_tgt_wq when plogi arrives Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] qla2xxx: added sess generations to detect RSCN update races Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] qla2xxx: terminate exchange when command is aborted by LIO Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] intel_scu_ipcutil: underflow in scu_reg_access() Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] libata: fix sff host state machine locking while polling Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] MIPS: Fix buffer overflow in syscall_get_arguments() Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] cputime: Prevent 32bit overflow in time[val|spec]_to_cputime() Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] ASoC: dpcm: fix the BE state on hw_free Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] target: Remove first argument of target_{get,put}_sess_cmd() Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] target: Fix LUN_RESET active TMR descriptor handling Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] Btrfs: fix hang on extent buffer lock caused by the inode_paths ioctl Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] scsi_dh_rdac: always retry MODE SELECT on command lock violation Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] SCSI: Add Marvell Console to VPD blacklist Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] ALSA: hda - Fix static checker warning in patch_hdmi.c Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] Revert "ALSA: hda - Fix noise on Gigabyte Z170X mobo" Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] crypto: user - lock crypto_alg_list on alg dump Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] Backport fix for crypto: algif_skcipher - Require setkey before accept(2) Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] Backport fix for crypto: algif_skcipher - Add nokey compatibility path Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] Backport fix for crypto: algif_skcipher - Remove custom release parent function Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] Backport fix for crypto: algif_skcipher - Fix race condition in skcipher_check_key Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] crypto: atmel - use devm_xxx() managed function Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] crypto: atmel - Check for clk_prepare_enable() return value Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] crypto: atmel-sha - remove calls of clk_prepare() from atomic contexts Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] qla2xxx: Use pci_enable_msix_range() instead of pci_enable_msix() Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] serial: omap: Prevent DoS using unprivileged ioctl(TIOCSRS485) Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] tty: Add support for PCIe WCH382 2S multi-IO card Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] pty: fix possible use after free of tty->driver_data Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] pty: make sure super_block is still valid in final /dev/tty close Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] ALSA: hda - Fix speaker output from VAIO AiO machines Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] klist: fix starting point removed bug in klist iterators Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] ALSA: dummy: Implement timer backend switching more safely Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] drm/i915/dsi: defend gpio table against out of bounds access Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] drm/i915/dsi: don't pass arbitrary data to sideband Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] powerpc: Simplify module TOC handling Sasha Levin
2016-03-03  0:59   ` Michael Ellerman
2016-03-03  3:43     ` Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] powerpc: Fix dedotify for binutils >= 2.26 Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] ALSA: timer: Fix wrong instance passed to slave callbacks Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] ARM: 8517/1: ICST: avoid arithmetic overflow in icst_hz() Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] ALSA: timer: Fix race between stop and interrupt Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] ALSA: hda - Fix bad dereference of jack object Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] ALSA: timer: Fix race at concurrent reads Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] Revert "workqueue: make sure delayed work run in local cpu" Sasha Levin
2016-03-02 20:13 ` [added to the 4.1 stable tree] phy: core: fix wrong err handle for phy_power_on Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] phy: twl4030-usb: Relase usb phy on unload Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] phy: twl4030-usb: Fix unbalanced pm_runtime_enable on module reload Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] ahci: Intel DNV device IDs SATA Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] workqueue: split apply_workqueue_attrs() into 3 stages Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] workqueue: wq_pool_mutex protects the attrs-installation Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] drm/radeon: hold reference to fences in radeon_sa_bo_new Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] cifs: fix erroneous return value Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] s390/dasd: prevent incorrect length error under z/VM after PAV changes Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] s390/dasd: fix refcount for PAV reassignment Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] ARM: 8519/1: ICST: try other dividends than 1 Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] btrfs: properly set the termination value of ctx->pos in readdir Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] ext4: fix potential integer overflow Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] ext4: don't read blocks from disk after extents being swapped Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] EVM: Use crypto_memneq() for digest comparisons Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] bio: return EINTR if copying to user space got interrupted Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] ALSA: usb-audio: avoid freeing umidi object twice Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] powerpc/eeh: Fix stale cached primary bus Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] ALSA: seq: Fix leak of pool buffer at concurrent writes Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] ALSA: hda - Cancel probe work instead of flush at remove Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] dmaengine: dw: disable BLOCK IRQs for non-cyclic xfer Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] tracepoints: Do not trace when cpu is offline Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] tracing: Fix freak link error caused by branch tracer Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] ALSA: seq: Fix double port list deletion Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] drm/radeon: use post-decrement in error handling Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] drm/qxl: use kmalloc_array to alloc reloc_info in qxl_process_single_command Sasha Levin
2016-03-02 20:14 ` Sasha Levin [this message]
2016-03-02 20:14 ` [added to the 4.1 stable tree] ALSA: pcm: Fix rwsem deadlock for non-atomic PCM stream Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] mm: fix regression in remap_file_pages() emulation Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] ipc,shm: move BUG_ON check into shm_lock Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] ipc: convert invalid scenarios to use WARN_ON Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] ipc/shm: handle removed segments gracefully in shm_mmap() Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] ext4: fix crashes in dioread_nolock mode Sasha Levin
2016-03-02 20:14 ` [added to the 4.1 stable tree] powerpc/eeh: Fix build error caused by pci_dn Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1456949673-25036-79-git-send-email-sasha.levin@oracle.com \
    --to=sasha.levin@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=brgerst@gmail.com \
    --cc=dvlasenk@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-nvdimm@ml01.01.org \
    --cc=luto@amacapital.net \
    --cc=mcgrof@suse.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=stable-commits@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=toshi.kani@hp.com \
    --cc=toshi.kani@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.