All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qian Cai <cai@lca.pw>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>,
	Zi Yan <ziy@nvidia.com>, John Hubbard <jhubbard@nvidia.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH mmotm] mm/vmstat: Add events for PMD based THP migration without split fix
Date: Thu, 21 May 2020 14:54:19 -0400	[thread overview]
Message-ID: <20200521185419.GB6367@ovpn-112-192.phx2.redhat.com> (raw)
In-Reply-To: <alpine.LSU.2.11.2005210643340.482@eggly.anvils>

On Thu, May 21, 2020 at 06:49:51AM -0700, Hugh Dickins wrote:
> Fix 5.7-rc6-mm1 page migration crash in unmap_and_move(): when the
> page to be migrated has been freed from under us, that is considered 
> a MIGRATEPAGE_SUCCESS, but no newpage has been allocated (and I don't
> think it would ever need to be counted as a successful THP migration).
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> Fix to mm-vmstat-add-events-for-pmd-based-thp-migration-without-split.patch
> 
>  mm/migrate.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- 5.7-rc6-mm1/mm/migrate.c	2020-05-20 12:21:56.117693827 -0700
> +++ linux/mm/migrate.c	2020-05-20 15:08:12.319476978 -0700
> @@ -1248,7 +1248,7 @@ out:
>  	 * we want to retry.
>  	 */
>  	if (rc == MIGRATEPAGE_SUCCESS) {
> -		if (PageTransHuge(newpage))
> +		if (newpage && PageTransHuge(newpage))

Should this be

if (!IS_ERR_OR_NULL(newpage) && PageTransHuge(newpage)) ?

I have also crashed here due to the buggy commit,

unmap_and_move() -> PageTransHuge(page) -> page->compound_head

but it said 0x00000008 instead of NULL which is aweful a lot like,

https://lore.kernel.org/linux-mm/20200512215813.GA487759@cmpxchg.org/

Interesting thing is I applied this patch and the problem went away, but not
sure if it could still be ERR_PTR sometimes just not always?

[  210.929981][ T4159] BUG: Kernel NULL pointer dereference on read at 0x00000008
[  210.930009][ T4159] Faulting instruction address: 0xc0000000005196c8
[  210.930027][   C61] irq event stamp: 270727
[  210.930028][ T4159] Oops: Kernel access of bad area, sig: 11 [#1]
[  210.930033][ T4159] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=256 DEBUG_PAGEALLOC NUMA PowerNV
[  210.930058][   C61] hardirqs last  enabled at (270726): [<c000000000adcc44>] _raw_spin_unlock_irqrestore+0x94/0xd0
[  210.930064][ T4159] Modules linked in: kvm_hv kvm ip_tables x_tables xfs sd_mod tg3 bnx2x libphy mdio firmware_class ahci libahci libata dm_mirror dm_region_hash dm_log dm_mod
[  210.930083][   C61] hardirqs last disabled at (270727): [<c000000000adc8dc>] _raw_spin_lock_irqsave+0x3c/0xa0
[  210.930086][   C61] softirqs last  enabled at (270556): [<c000000000addbdc>] __do_softirq+0x6dc/0xaa8
[  210.930125][ T4159] CPU: 109 PID: 4159 Comm: test.sh Not tainted 5.7.0-rc6-next-20200521+ #112
[  210.930163][   C61] softirqs last disabled at (270559): [<c000000000129544>] run_ksoftirqd+0x74/0xc0
[  210.930260][ T4159] NIP:  c0000000005196c8 LR: c000000000519568 CTR: 0000000000000000
[  210.930307][ T4159] REGS: c0002005b10af570 TRAP: 0300   Not tainted  (5.7.0-rc6-next-20200521+)
[  210.930342][ T4159] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24248242  XER: 00000000
[  210.930403][ T4159] CFAR: c000000000519570 DAR: 0000000000000008 DSISR: 40000000 IRQMASK: 0 
[  210.930403][ T4159] GPR00: c000000000519568 c0002005b10af800 c000000001765500 0000000000000000 
[  210.930403][ T4159] GPR04: c000000001c18d28 0000000000000006 0000000035279396 fffffffef69809d2 
[  210.930403][ T4159] GPR08: 0000201cc6240000 0000000000000000 0000000000000000 0000000000000008 
[  210.930403][ T4159] GPR12: 0000000000008000 c000201fff670600 0000000000000000 c00c000805312a00 
[  210.930403][ T4159] GPR16: c00000000050b090 c000000000c857d0 c00c000805312a80 0000000000000000 
[  210.930403][ T4159] GPR20: 0000000000000001 0000000000000000 c0002005b10af978 c000201ffc7c8780 
[  210.930403][ T4159] GPR24: fffffffffffffff5 0000000000000002 0000000020000000 0000000000000000 
[  210.930403][ T4159] GPR28: 0000000000000007 0000000000000001 0000000000000000 c00c000805312a08 
[  210.930740][ T4159] NIP [c0000000005196c8] migrate_pages+0xc18/0x1ad0
[  210.930775][ T4159] LR [c000000000519568] migrate_pages+0xab8/0x1ad0
[  210.930824][ T4159] Call Trace:
[  210.930843][ T4159] [c0002005b10af800] [c000000000519568] migrate_pages+0xab8/0x1ad0 (unreliable)
[  210.930882][ T4159] [c0002005b10af910] [c00000000050b6fc] do_migrate_range+0x25c/0x8f0
[  210.930940][ T4159] [c0002005b10afa10] [c00000000050e974] __offline_pages+0x6e4/0x8b0
[  210.930988][ T4159] [c0002005b10afb40] [c000000000887f6c] memory_block_action+0xac/0xc0
[  210.931016][ T4159] [c0002005b10afba0] [c000000000888618] memory_subsys_offline+0x58/0xa0
[  210.931030][ T4159] [c0002005b10afbd0] [c0000000008621a0] device_offline+0x100/0x140
[  210.931080][ T4159] [c0002005b10afc10] [c000000000888938] state_store+0x108/0x190
[  210.931128][ T4159] [c0002005b10afc50] [c00000000085b628] dev_attr_store+0x38/0x60
[  210.931176][ T4159] [c0002005b10afc70] [c0000000006b9790] sysfs_kf_write+0x70/0xb0
[  210.931211][ T4159] [c0002005b10afcb0] [c0000000006b895c] kernfs_fop_write+0x11c/0x270
[  210.931249][ T4159] [c0002005b10afd00] [c00000000057bcac] __vfs_write+0x3c/0x70
[  210.931273][ T4159] [c0002005b10afd20] [c00000000057f0ac] vfs_write+0xcc/0x200
[  210.931319][ T4159] [c0002005b10afd70] [c00000000057f44c] ksys_write+0x7c/0x140
[  210.931345][ T4159] [c0002005b10afdc0] [c000000000039e78] system_call_exception+0x108/0x1d0
[  210.931395][ T4159] [c0002005b10afe20] [c00000000000c9f0] system_call_common+0xf0/0x278
[  210.931445][ T4159] Instruction dump:
[  210.931476][ T4159] 9bad0988 e90d0028 3d22ff9f 3929c670 7d49402a 394a0001 7d49412a 4bafdec5 
[  210.931492][ T4159] 60000000 4bfff544 2fbe0000 409e04c4 <e93b0008> 71290001 40820928 e93b0000 
[  210.931521][ T4159] ---[ end trace 03092b3800dbb5cb ]---
[  211.416724][ T4159] 
[  212.416810][ T4159] Kernel panic - not syncing: Fatal exception
[  213.829268][ T4159] ---[ end Kernel panic - not syncing: Fatal exception ]---

>  			thp_migration_success(true);
>  		put_page(page);
>  		if (reason == MR_MEMORY_FAILURE) {
> 

  reply	other threads:[~2020-05-21 18:54 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-21 13:49 [PATCH mmotm] mm/vmstat: Add events for PMD based THP migration without split fix Hugh Dickins
2020-05-21 13:49 ` Hugh Dickins
2020-05-21 18:54 ` Qian Cai [this message]
2020-05-21 19:48   ` Hugh Dickins
2020-05-21 19:48     ` Hugh Dickins
2020-05-22  0:46 ` Anshuman Khandual
2020-05-22  1:21   ` Hugh Dickins
2020-05-22  1:21     ` Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200521185419.GB6367@ovpn-112-192.phx2.redhat.com \
    --to=cai@lca.pw \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=naoya.horiguchi@nec.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.