linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	Hillf Danton <hillf.zj@alibaba-inc.com>,
	Hugh Dickins <hughd@google.com>, Michal Hocko <mhocko@kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Gerald Schaefer <gerald.schaefer@de.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [PATCH 4.10 70/81] mm, hugetlb: use pte_present() instead of pmd_present() in follow_huge_pmd()
Date: Thu,  6 Apr 2017 10:39:02 +0200	[thread overview]
Message-ID: <20170406083627.139671116@linuxfoundation.org> (raw)
In-Reply-To: <20170406083624.322941631@linuxfoundation.org>

4.10-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>

commit c9d398fa237882ea07167e23bcfc5e6847066518 upstream.

I found the race condition which triggers the following bug when
move_pages() and soft offline are called on a single hugetlb page
concurrently.

    Soft offlining page 0x119400 at 0x700000000000
    BUG: unable to handle kernel paging request at ffffea0011943820
    IP: follow_huge_pmd+0x143/0x190
    PGD 7ffd2067
    PUD 7ffd1067
    PMD 0
        [61163.582052] Oops: 0000 [#1] SMP
    Modules linked in: binfmt_misc ppdev virtio_balloon parport_pc pcspkr i2c_piix4 parport i2c_core acpi_cpufreq ip_tables xfs libcrc32c ata_generic pata_acpi virtio_blk 8139too crc32c_intel ata_piix serio_raw libata virtio_pci 8139cp virtio_ring virtio mii floppy dm_mirror dm_region_hash dm_log dm_mod [last unloaded: cap_check]
    CPU: 0 PID: 22573 Comm: iterate_numa_mo Tainted: P           OE   4.11.0-rc2-mm1+ #2
    Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
    RIP: 0010:follow_huge_pmd+0x143/0x190
    RSP: 0018:ffffc90004bdbcd0 EFLAGS: 00010202
    RAX: 0000000465003e80 RBX: ffffea0004e34d30 RCX: 00003ffffffff000
    RDX: 0000000011943800 RSI: 0000000000080001 RDI: 0000000465003e80
    RBP: ffffc90004bdbd18 R08: 0000000000000000 R09: ffff880138d34000
    R10: ffffea0004650000 R11: 0000000000c363b0 R12: ffffea0011943800
    R13: ffff8801b8d34000 R14: ffffea0000000000 R15: 000077ff80000000
    FS:  00007fc977710740(0000) GS:ffff88007dc00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffea0011943820 CR3: 000000007a746000 CR4: 00000000001406f0
    Call Trace:
     follow_page_mask+0x270/0x550
     SYSC_move_pages+0x4ea/0x8f0
     SyS_move_pages+0xe/0x10
     do_syscall_64+0x67/0x180
     entry_SYSCALL64_slow_path+0x25/0x25
    RIP: 0033:0x7fc976e03949
    RSP: 002b:00007ffe72221d88 EFLAGS: 00000246 ORIG_RAX: 0000000000000117
    RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc976e03949
    RDX: 0000000000c22390 RSI: 0000000000001400 RDI: 0000000000005827
    RBP: 00007ffe72221e00 R08: 0000000000c2c3a0 R09: 0000000000000004
    R10: 0000000000c363b0 R11: 0000000000000246 R12: 0000000000400650
    R13: 00007ffe72221ee0 R14: 0000000000000000 R15: 0000000000000000
    Code: 81 e4 ff ff 1f 00 48 21 c2 49 c1 ec 0c 48 c1 ea 0c 4c 01 e2 49 bc 00 00 00 00 00 ea ff ff 48 c1 e2 06 49 01 d4 f6 45 bc 04 74 90 <49> 8b 7c 24 20 40 f6 c7 01 75 2b 4c 89 e7 8b 47 1c 85 c0 7e 2a
    RIP: follow_huge_pmd+0x143/0x190 RSP: ffffc90004bdbcd0
    CR2: ffffea0011943820
    ---[ end trace e4f81353a2d23232 ]---
    Kernel panic - not syncing: Fatal exception
    Kernel Offset: disabled

This bug is triggered when pmd_present() returns true for non-present
hugetlb, so fixing the present check in follow_huge_pmd() prevents it.
Using pmd_present() to determine present/non-present for hugetlb is not
correct, because pmd_present() checks multiple bits (not only
_PAGE_PRESENT) for historical reason and it can misjudge hugetlb state.

Fixes: e66f17ff7177 ("mm/hugetlb: take page table lock in follow_huge_pmd()")
Link: http://lkml.kernel.org/r/1490149898-20231-1-git-send-email-n-horiguchi@ah.jp.nec.com
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 mm/hugetlb.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4474,6 +4474,7 @@ follow_huge_pmd(struct mm_struct *mm, un
 {
 	struct page *page = NULL;
 	spinlock_t *ptl;
+	pte_t pte;
 retry:
 	ptl = pmd_lockptr(mm, pmd);
 	spin_lock(ptl);
@@ -4483,12 +4484,13 @@ retry:
 	 */
 	if (!pmd_huge(*pmd))
 		goto out;
-	if (pmd_present(*pmd)) {
+	pte = huge_ptep_get((pte_t *)pmd);
+	if (pte_present(pte)) {
 		page = pmd_page(*pmd) + ((address & ~PMD_MASK) >> PAGE_SHIFT);
 		if (flags & FOLL_GET)
 			get_page(page);
 	} else {
-		if (is_hugetlb_entry_migration(huge_ptep_get((pte_t *)pmd))) {
+		if (is_hugetlb_entry_migration(pte)) {
 			spin_unlock(ptl);
 			__migration_entry_wait(mm, (pte_t *)pmd, ptl);
 			goto retry;

  parent reply	other threads:[~2017-04-06  8:49 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-06  8:37 [PATCH 4.10 00/81] 4.10.9-stable review Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.10 01/81] libceph: force GFP_NOIO for socket allocations Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.10 02/81] KVM: nVMX: fix nested EPT detection Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.10 03/81] xfs: pull up iolock from xfs_free_eofblocks() Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.10 04/81] xfs: sync eofblocks scans under iolock are livelock prone Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.10 05/81] xfs: fix eofblocks race with file extending async dio writes Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.10 06/81] xfs: fix toctou race when locking an inode to access the data map Greg Kroah-Hartman
2017-04-06  8:37 ` [PATCH 4.10 07/81] xfs: fail _dir_open when readahead fails Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 08/81] xfs: filter out obviously bad btree pointers Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 09/81] xfs: check for obviously bad level values in the bmbt root Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 10/81] xfs: verify free block header fields Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 11/81] xfs: allow unwritten extents in the CoW fork Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 12/81] xfs: mark speculative prealloc CoW fork extents unwritten Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 13/81] xfs: reset b_first_retry_time when clear the retry status of xfs_buf_t Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 14/81] xfs: reject all unaligned direct writes to reflinked files Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 15/81] xfs: update ctime and mtime on clone destinatation inodes Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 16/81] xfs: correct null checks and error processing in xfs_initialize_perag Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 17/81] xfs: dont fail xfs_extent_busy allocation Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 18/81] xfs: handle indlen shortage on delalloc extent merge Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 19/81] xfs: split indlen reservations fairly when under reserved Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 20/81] xfs: fix uninitialized variable in _reflink_convert_cow Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 21/81] xfs: dont reserve blocks for right shift transactions Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 22/81] xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 23/81] xfs: tune down agno asserts in the bmap code Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 24/81] xfs: only reclaim unwritten COW extents periodically Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 25/81] xfs: fix and streamline error handling in xfs_end_io Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 26/81] xfs: Use xfs_icluster_size_fsb() to calculate inode alignment mask Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 27/81] xfs: use iomap new flag for newly allocated delalloc blocks Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 28/81] xfs: try any AG when allocating the first btree block when reflinking Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 29/81] scsi: sg: check length passed to SG_NEXT_CMD_LEN Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 30/81] scsi: libsas: fix ata xfer length Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 31/81] scsi: scsi_dh_alua: Check scsi_device_get() return value Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 32/81] scsi: scsi_dh_alua: Ensure that alua_activate() calls the completion function Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 34/81] PCI: thunder-pem: Use Cavium assigned hardware ID for ThunderX host controller Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 35/81] ALSA: seq: Fix race during FIFO resize Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 36/81] ALSA: hda - fix a problem for lineout on a Dell AIO machine Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 37/81] ASoC: atmel-classd: fix audio clock rate Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 38/81] ASoC: Intel: Skylake: fix invalid memory access due to wrong reference of pointer Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 39/81] ASoC: rt5665: fix getting wrong work handler container Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 40/81] HID: wacom: Dont add ghost interface as shared data Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 41/81] mmc: sdhci: Disable runtime pm when the sdio_irq is enabled Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 42/81] mmc: sdhci-of-at91: fix MMC_DDR_52 timing selection Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 43/81] crypto: ccp - Make some CCP DMA channels private Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 44/81] crypto: xts,lrw - fix out-of-bounds write after kmalloc failure Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 45/81] ARCv2: SLC: Make sure busy bit is set properly on SLC flushing Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 46/81] NFSv4.1 fix infinite loop on IO BAD_STATEID error Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 47/81] nfsd: map the ENOKEY to nfserr_perm for avoiding warning Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 48/81] dt-bindings: rng: clocks property on omap_rng not always mandatory Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 49/81] parisc: Clean up fixup routines for get_user()/put_user() Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 50/81] parisc: Avoid stalled CPU warnings after system shutdown Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 51/81] parisc: Fix access fault handling in pa_memcpy() Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 52/81] ACPI: Fix incompatibility with mcount-based function graph tracing Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 53/81] ACPI: Do not create a platform_device for IOAPIC/IOxAPIC Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 54/81] tty/serial: atmel: fix race condition (TX+DMA) Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 55/81] tty/serial: atmel: fix TX path in atmel_console_write() Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 56/81] xhci: Set URB actual length for stopped control transfers Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 57/81] USB: fix linked-list corruption in rh_call_control() Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 60/81] KVM: kvm_io_bus_unregister_dev() should never fail Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 62/81] drm/vc4: Allocate the right amount of space for boot-time CRTC state Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 63/81] drm/etnaviv: (re-)protect fence allocation with GPU mutex Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 64/81] drm/i915/kvmgt: Hold struct kvm reference Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 65/81] x86/mm/KASLR: Exclude EFI region from KASLR VA space randomization Greg Kroah-Hartman
2017-04-06  8:38 ` [PATCH 4.10 66/81] x86/mce: Fix copy/paste error in exception table entries Greg Kroah-Hartman
2017-04-06  8:39 ` [PATCH 4.10 68/81] mm: rmap: fix huge file mmap accounting in the memcg stats Greg Kroah-Hartman
2017-04-06  8:39 ` [PATCH 4.10 69/81] mm: workingset: fix premature shadow node shrinking with cgroups Greg Kroah-Hartman
2017-04-06  8:39 ` Greg Kroah-Hartman [this message]
2017-04-06  8:39 ` [PATCH 4.10 71/81] drm/armada: Fix compile fail Greg Kroah-Hartman
2017-04-06  8:39 ` [PATCH 4.10 73/81] MIPS: Lantiq: Fix cascaded IRQ setup Greg Kroah-Hartman
2017-04-06  8:39 ` [PATCH 4.10 74/81] blk: improve order of bio handling in generic_make_request() Greg Kroah-Hartman
2017-04-06  8:39 ` [PATCH 4.10 75/81] blk: Ensure users for current->bio_list can see the full list Greg Kroah-Hartman
2017-04-06  8:39 ` [PATCH 4.10 76/81] padata: avoid race in reordering Greg Kroah-Hartman
2017-04-06  8:39 ` [PATCH 4.10 77/81] nvme/core: Fix race kicking freed request_queue Greg Kroah-Hartman
2017-04-06  8:39 ` [PATCH 4.10 78/81] nvme/pci: Disable on removal when disconnected Greg Kroah-Hartman
2017-04-06  8:39 ` [PATCH 4.10 80/81] drm/i915: Let execlist_update_context() cover !FULL_PPGTT mode Greg Kroah-Hartman
2017-04-06  8:39 ` [PATCH 4.10 81/81] drm/i915: A hotfix for making aliasing PPGTT work for GVT-g Greg Kroah-Hartman
2017-04-06 17:54 ` [PATCH 4.10 00/81] 4.10.9-stable review Shuah Khan
2017-04-06 18:01   ` Greg Kroah-Hartman
2017-04-06 21:53 ` Guenter Roeck
2017-04-07  8:07   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170406083627.139671116@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=borntraeger@de.ibm.com \
    --cc=gerald.schaefer@de.ibm.com \
    --cc=hillf.zj@alibaba-inc.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).