linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <songmuchun@bytedance.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>, Yang Shi <shy828301@gmail.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>
Subject: Re: [PATCH] mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all()
Date: Wed, 7 Sep 2022 14:39:02 +0200	[thread overview]
Message-ID: <3896f675-7bd1-91e5-160e-a40cce06c214@redhat.com> (raw)
In-Reply-To: <20220907121157.GA2954283@ik1-406-35019.vs.sakura.ne.jp>

On 07.09.22 14:11, Naoya Horiguchi wrote:
> On Wed, Sep 07, 2022 at 12:26:18PM +0200, David Hildenbrand wrote:
>> On 07.09.22 12:23, David Hildenbrand wrote:
>>> On 07.09.22 12:08, Naoya Horiguchi wrote:
>>>> Hi MM folks,
>>>
>>> Hi,
>>>
>>>>
>>>> When I'm testing memory hotremove with various settings, I found the following
>>>> NULL-pointer dereference.  It reproduces easily with the folloing steps:
>>>>
>>>>      $ echo offline > /sys/devices/system/memory/memoryN/state
>>>>      $ echo 1 > /sys/kernel/debug/split_huge_pages
>>>>
>>>
>>> That's weird, I don't immediately see how both features are related here, especially because it seems to fail quite early in split_huge_pages_write().
>>>
>>> I was able to trigger it here as well, though, and I get on my kernel with debug options enabled:
>>>
>>> [526045.808737] page:00000000467f4890 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x121c000
>>> [526045.818306] flags: 0x17fffc00000000(node=0|zone=2|lastcpupid=0x1ffff)
>>> [526045.824842] raw: 0017fffc00000000 0000000000000000 dead000000000122 0000000000000000
>>> [526045.832676] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
>>> [526045.840498] page dumped because: unmovable page
>>> [526056.362715] page:000000007d7ab72e is uninitialized and poisoned
>>> [526056.362720] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>>> [526056.374837] ------------[ cut here ]------------
>>> [526056.379544] kernel BUG at include/linux/mm.h:1248!
>>> [526056.384429] invalid opcode: 0000 [#1] PREEMPT SMP PTI
>>> [526056.389570] CPU: 16 PID: 20964 Comm: bash Tainted: G          I        6.0.0-rc3-foll-numa+ #41
>>> [526056.398347] Hardware name: Dell Inc. PowerEdge R740/0923K0, BIOS 2.11.2 004/21/2021
>>> [526056.406087] RIP: 0010:split_huge_pages_write+0xcf4/0xe30
>>> [526056.411495] Code: f9 ff ff 48 c7 c6 88 8e 4e a7 48 c7 c7 f8 0b a8 a7 e8 20 1c 3d 00 e9 54 f6 ff ff 48 c7 c6 38 06 4c a7 4c 89 e7 e8 5c 4a f7 ff <0f> 0b 48 8b 7c 24 08 be 02 00 00 00 e8 bb 6c 36 00 e9 11 f5 ff ff
>>> [526056.430325] RSP: 0018:ffffae50e757fb40 EFLAGS: 00010292
>>> [526056.435639] RAX: 0000000000000034 RBX: 0000000002d00000 RCX: 0000000000000000
>>> [526056.442858] RDX: 0000000000000001 RSI: ffffffffa751a839 RDI: 00000000ffffffff
>>> [526056.450077] RBP: 0000000000013937 R08: 0000000000000000 R09: ffffae50e757fa08
>>> [526056.457296] R10: 0000000000000003 R11: ffffffffa793c768 R12: fffff00eb4000000
>>> [526056.464514] R13: ffffae50e757fb7a R14: 000fffffffffffff R15: fffff00eb4000000
>>> [526056.471733] FS:  00007fe35addf740(0000) GS:ffff8b71dc000000(0000) knlGS:0000000000000000
>>> [526056.479906] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [526056.485738] CR2: 000055e87e606358 CR3: 00000004f5b46004 CR4: 00000000007706e0
>>> [526056.492957] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>> [526056.500176] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>> [526056.507396] PKRU: 55555554
>>> [526056.510196] Call Trace:
>>> [526056.512734]  <TASK>
>>> [526056.514929]  ? simple_setattr+0x40/0x60
>>> [526056.518859]  ? vfs_mkobj+0x1b0/0x1c0
>>> [526056.522529]  ? path_openat+0xb2e/0x1360
>>> [526056.526456]  ? do_filp_open+0xa1/0x130
>>> [526056.530296]  full_proxy_write+0x50/0x80
>>> [526056.534229]  vfs_write+0xd7/0x3e0
>>> [526056.537635]  ? fpregs_assert_state_consistent+0x22/0x50
>>> [526056.542955]  ? exit_to_user_mode_prepare+0x3c/0x1d0
>>> [526056.547929]  ? syscall_exit_to_user_mode+0x17/0x40
>>> [526056.552808]  ksys_write+0x53/0xd0
>>> [526056.556215]  do_syscall_64+0x58/0x80
>>> [526056.559879]  ? exit_to_user_mode_prepare+0x3c/0x1d0
>>> [526056.564846]  ? syscall_exit_to_user_mode+0x17/0x40
>>> [526056.569726]  ? do_syscall_64+0x67/0x80
>>> [526056.573563]  ? do_syscall_64+0x67/0x80
>>> [526056.577404]  ? do_syscall_64+0x67/0x80
>>> [526056.581244]  ? syscall_exit_to_user_mode+0x17/0x40
>>> [526056.586120]  ? do_syscall_64+0x67/0x80
>>> [526056.589961]  ? do_syscall_64+0x67/0x80
>>> [526056.593801]  ? do_syscall_64+0x67/0x80
>>> [526056.597641]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
>>> [526056.602779] RIP: 0033:0x7fe35ab01c17
>>> [526056.606446] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
>>> [526056.625277] RSP: 002b:00007ffc1863c8d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
>>> [526056.632929] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fe35ab01c17
>>> [526056.640147] RDX: 0000000000000002 RSI: 000055e87e7009a0 RDI: 0000000000000001
>>> [526056.647366] RBP: 000055e87e7009a0 R08: 0000000000000000 R09: 0000000000000073
>>> [526056.654586] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
>>> [526056.661805] R13: 00007fe35abf8780 R14: 0000000000000002 R15: 00007fe35abf39e0
>>> [526056.669028]  </TASK>
>>>
>>>
>>> Looks like there is a page_to_nid() done in an offline memmap, which is wrong.
>>>
>>> Usually, this indicates that a pfn_to_online_page() is missing in a PFN walker.
>>>
>>
>> And indeed, something like the following might do the trick:
>>
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index e9414ee57c5b..f42bb51e023a 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -2894,11 +2894,9 @@ static void split_huge_pages_all(void)
>>                  max_zone_pfn = zone_end_pfn(zone);
>>                  for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) {
>>                          int nr_pages;
>> -                       if (!pfn_valid(pfn))
>> -                               continue;
>> -                       page = pfn_to_page(pfn);
>> -                       if (!get_page_unless_zero(page))
>> +                       page = pfn_to_online_page(pfn);
>> +                       if (!page || !get_page_unless_zero(page))
>>                                  continue;
>>                          if (zone != page_zone(page))
>>
> 
> Thank you very much, I confirmed the fix.  So I try to write a patch like below.
> I borrowed your debug message because it has more helpful info than mine.
> If you have any comment or suggestion in the description, please let me know.
> 
> Thanks,
> Naoya Horiguchi
> ---
>  From 8a5c284df732943065d23838090d15c94cd10395 Mon Sep 17 00:00:00 2001
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Date: Wed, 7 Sep 2022 20:58:44 +0900
> Subject: [PATCH] mm/huge_memory: use pfn_to_online_page() in
>   split_huge_pages_all()
> 
> NULL pointer dereference is triggered when calling thp split via debugfs
> on the system with offlined memory blocks.  With debug option enabled,
> the following kernel messages are printed out:
> 
>    page:00000000467f4890 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x121c000
>    flags: 0x17fffc00000000(node=0|zone=2|lastcpupid=0x1ffff)
>    raw: 0017fffc00000000 0000000000000000 dead000000000122 0000000000000000
>    raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
>    page dumped because: unmovable page
>    page:000000007d7ab72e is uninitialized and poisoned
>    page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>    ------------[ cut here ]------------
>    kernel BUG at include/linux/mm.h:1248!
>    invalid opcode: 0000 [#1] PREEMPT SMP PTI
>    CPU: 16 PID: 20964 Comm: bash Tainted: G          I        6.0.0-rc3-foll-numa+ #41
>    ...
>    RIP: 0010:split_huge_pages_write+0xcf4/0xe30
> 
> This shows that page_to_nid() in page_zone() is unexpectedly called for an
> offlined memmap.
> 
> Use pfn_to_online_page() to get struct page in PFN walker.
> 
> Fixes: 49071d436b51 ("thp: add debugfs handle to split all huge pages")
> Suggested-by: David Hildenbrand <david@redhat.com>

Feel free to use

Co-developed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>

instead

> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Cc: <stable@vger.kernel.org>
> ---
>   mm/huge_memory.c | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 5fa2ba86dae4..03149a8f46f9 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2894,11 +2894,8 @@ static void split_huge_pages_all(void)
>   		max_zone_pfn = zone_end_pfn(zone);
>   		for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) {
>   			int nr_pages;

Add an empty line here please, while at it.

> -			if (!pfn_valid(pfn))
> -				continue;
> -
> -			page = pfn_to_page(pfn);
> -			if (!get_page_unless_zero(page))
> +			page = pfn_to_online_page(pfn);
> +			if (!page || !get_page_unless_zero(page))
>   				continue;
>   
>   			if (zone != page_zone(page))

Can you send the patch "officially" as well? Thanks

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2022-09-07 12:39 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-07 10:08 [BUG report] kernel NULL pointer dereference in split_huge_page with offlined memory block Naoya Horiguchi
2022-09-07 10:23 ` David Hildenbrand
2022-09-07 10:26   ` David Hildenbrand
2022-09-07 12:11     ` [PATCH] mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all() Naoya Horiguchi
2022-09-07 12:39       ` David Hildenbrand [this message]
2022-09-08  2:39         ` HORIGUCHI NAOYA(堀口 直也)
2022-09-07 17:10       ` Yang Shi
2022-09-07 17:32       ` Michal Hocko
2022-09-07 20:57       ` Andrew Morton
2022-09-08  2:47         ` HORIGUCHI NAOYA(堀口 直也)
2022-09-08  6:14           ` David Hildenbrand
2022-09-08  6:31             ` HORIGUCHI NAOYA(堀口 直也)
2022-09-08  2:19       ` Miaohe Lin
2022-09-08  3:06         ` HORIGUCHI NAOYA(堀口 直也)
2022-09-08  3:25           ` Miaohe Lin
2022-09-08  7:07             ` Michal Hocko
2022-09-09  0:27               ` Miaohe Lin
2022-09-09  9:03                 ` David Hildenbrand
2022-09-08  3:28       ` Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3896f675-7bd1-91e5-160e-a40cce06c214@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=naoya.horiguchi@nec.com \
    --cc=shy828301@gmail.com \
    --cc=songmuchun@bytedance.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).