All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yang Shi <shy828301@gmail.com>
To: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org,  Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <songmuchun@bytedance.com>,
	 Miaohe Lin <linmiaohe@huawei.com>,
	Matthew Wilcox <willy@infradead.org>,
	 Michal Hocko <mhocko@suse.com>,
	Naoya Horiguchi <naoya.horiguchi@nec.com>
Subject: Re: [PATCH] mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all()
Date: Wed, 7 Sep 2022 10:10:34 -0700	[thread overview]
Message-ID: <CAHbLzkrXLARfwVFPigs5T_nusfTO-Vkwsqw7Q44V4t-mhNubQg@mail.gmail.com> (raw)
In-Reply-To: <20220907121157.GA2954283@ik1-406-35019.vs.sakura.ne.jp>

On Wed, Sep 7, 2022 at 5:12 AM Naoya Horiguchi
<naoya.horiguchi@linux.dev> wrote:
>
> On Wed, Sep 07, 2022 at 12:26:18PM +0200, David Hildenbrand wrote:
> > On 07.09.22 12:23, David Hildenbrand wrote:
> > > On 07.09.22 12:08, Naoya Horiguchi wrote:
> > > > Hi MM folks,
> > >
> > > Hi,
> > >
> > > >
> > > > When I'm testing memory hotremove with various settings, I found the following
> > > > NULL-pointer dereference.  It reproduces easily with the folloing steps:
> > > >
> > > >     $ echo offline > /sys/devices/system/memory/memoryN/state
> > > >     $ echo 1 > /sys/kernel/debug/split_huge_pages
> > > >
> > >
> > > That's weird, I don't immediately see how both features are related here, especially because it seems to fail quite early in split_huge_pages_write().
> > >
> > > I was able to trigger it here as well, though, and I get on my kernel with debug options enabled:
> > >
> > > [526045.808737] page:00000000467f4890 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x121c000
> > > [526045.818306] flags: 0x17fffc00000000(node=0|zone=2|lastcpupid=0x1ffff)
> > > [526045.824842] raw: 0017fffc00000000 0000000000000000 dead000000000122 0000000000000000
> > > [526045.832676] raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
> > > [526045.840498] page dumped because: unmovable page
> > > [526056.362715] page:000000007d7ab72e is uninitialized and poisoned
> > > [526056.362720] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> > > [526056.374837] ------------[ cut here ]------------
> > > [526056.379544] kernel BUG at include/linux/mm.h:1248!
> > > [526056.384429] invalid opcode: 0000 [#1] PREEMPT SMP PTI
> > > [526056.389570] CPU: 16 PID: 20964 Comm: bash Tainted: G          I        6.0.0-rc3-foll-numa+ #41
> > > [526056.398347] Hardware name: Dell Inc. PowerEdge R740/0923K0, BIOS 2.11.2 004/21/2021
> > > [526056.406087] RIP: 0010:split_huge_pages_write+0xcf4/0xe30
> > > [526056.411495] Code: f9 ff ff 48 c7 c6 88 8e 4e a7 48 c7 c7 f8 0b a8 a7 e8 20 1c 3d 00 e9 54 f6 ff ff 48 c7 c6 38 06 4c a7 4c 89 e7 e8 5c 4a f7 ff <0f> 0b 48 8b 7c 24 08 be 02 00 00 00 e8 bb 6c 36 00 e9 11 f5 ff ff
> > > [526056.430325] RSP: 0018:ffffae50e757fb40 EFLAGS: 00010292
> > > [526056.435639] RAX: 0000000000000034 RBX: 0000000002d00000 RCX: 0000000000000000
> > > [526056.442858] RDX: 0000000000000001 RSI: ffffffffa751a839 RDI: 00000000ffffffff
> > > [526056.450077] RBP: 0000000000013937 R08: 0000000000000000 R09: ffffae50e757fa08
> > > [526056.457296] R10: 0000000000000003 R11: ffffffffa793c768 R12: fffff00eb4000000
> > > [526056.464514] R13: ffffae50e757fb7a R14: 000fffffffffffff R15: fffff00eb4000000
> > > [526056.471733] FS:  00007fe35addf740(0000) GS:ffff8b71dc000000(0000) knlGS:0000000000000000
> > > [526056.479906] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [526056.485738] CR2: 000055e87e606358 CR3: 00000004f5b46004 CR4: 00000000007706e0
> > > [526056.492957] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > [526056.500176] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > [526056.507396] PKRU: 55555554
> > > [526056.510196] Call Trace:
> > > [526056.512734]  <TASK>
> > > [526056.514929]  ? simple_setattr+0x40/0x60
> > > [526056.518859]  ? vfs_mkobj+0x1b0/0x1c0
> > > [526056.522529]  ? path_openat+0xb2e/0x1360
> > > [526056.526456]  ? do_filp_open+0xa1/0x130
> > > [526056.530296]  full_proxy_write+0x50/0x80
> > > [526056.534229]  vfs_write+0xd7/0x3e0
> > > [526056.537635]  ? fpregs_assert_state_consistent+0x22/0x50
> > > [526056.542955]  ? exit_to_user_mode_prepare+0x3c/0x1d0
> > > [526056.547929]  ? syscall_exit_to_user_mode+0x17/0x40
> > > [526056.552808]  ksys_write+0x53/0xd0
> > > [526056.556215]  do_syscall_64+0x58/0x80
> > > [526056.559879]  ? exit_to_user_mode_prepare+0x3c/0x1d0
> > > [526056.564846]  ? syscall_exit_to_user_mode+0x17/0x40
> > > [526056.569726]  ? do_syscall_64+0x67/0x80
> > > [526056.573563]  ? do_syscall_64+0x67/0x80
> > > [526056.577404]  ? do_syscall_64+0x67/0x80
> > > [526056.581244]  ? syscall_exit_to_user_mode+0x17/0x40
> > > [526056.586120]  ? do_syscall_64+0x67/0x80
> > > [526056.589961]  ? do_syscall_64+0x67/0x80
> > > [526056.593801]  ? do_syscall_64+0x67/0x80
> > > [526056.597641]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
> > > [526056.602779] RIP: 0033:0x7fe35ab01c17
> > > [526056.606446] Code: 0f 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
> > > [526056.625277] RSP: 002b:00007ffc1863c8d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
> > > [526056.632929] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fe35ab01c17
> > > [526056.640147] RDX: 0000000000000002 RSI: 000055e87e7009a0 RDI: 0000000000000001
> > > [526056.647366] RBP: 000055e87e7009a0 R08: 0000000000000000 R09: 0000000000000073
> > > [526056.654586] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
> > > [526056.661805] R13: 00007fe35abf8780 R14: 0000000000000002 R15: 00007fe35abf39e0
> > > [526056.669028]  </TASK>
> > >
> > >
> > > Looks like there is a page_to_nid() done in an offline memmap, which is wrong.
> > >
> > > Usually, this indicates that a pfn_to_online_page() is missing in a PFN walker.
> > >
> >
> > And indeed, something like the following might do the trick:
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index e9414ee57c5b..f42bb51e023a 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -2894,11 +2894,9 @@ static void split_huge_pages_all(void)
> >                 max_zone_pfn = zone_end_pfn(zone);
> >                 for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) {
> >                         int nr_pages;
> > -                       if (!pfn_valid(pfn))
> > -                               continue;
> > -                       page = pfn_to_page(pfn);
> > -                       if (!get_page_unless_zero(page))
> > +                       page = pfn_to_online_page(pfn);
> > +                       if (!page || !get_page_unless_zero(page))
> >                                 continue;
> >                         if (zone != page_zone(page))
> >
>
> Thank you very much, I confirmed the fix.  So I try to write a patch like below.
> I borrowed your debug message because it has more helpful info than mine.
> If you have any comment or suggestion in the description, please let me know.
>
> Thanks,
> Naoya Horiguchi
> ---
> From 8a5c284df732943065d23838090d15c94cd10395 Mon Sep 17 00:00:00 2001
> From: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Date: Wed, 7 Sep 2022 20:58:44 +0900
> Subject: [PATCH] mm/huge_memory: use pfn_to_online_page() in
>  split_huge_pages_all()
>
> NULL pointer dereference is triggered when calling thp split via debugfs
> on the system with offlined memory blocks.  With debug option enabled,
> the following kernel messages are printed out:
>
>   page:00000000467f4890 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x121c000
>   flags: 0x17fffc00000000(node=0|zone=2|lastcpupid=0x1ffff)
>   raw: 0017fffc00000000 0000000000000000 dead000000000122 0000000000000000
>   raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000
>   page dumped because: unmovable page
>   page:000000007d7ab72e is uninitialized and poisoned
>   page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>   ------------[ cut here ]------------
>   kernel BUG at include/linux/mm.h:1248!
>   invalid opcode: 0000 [#1] PREEMPT SMP PTI
>   CPU: 16 PID: 20964 Comm: bash Tainted: G          I        6.0.0-rc3-foll-numa+ #41
>   ...
>   RIP: 0010:split_huge_pages_write+0xcf4/0xe30
>
> This shows that page_to_nid() in page_zone() is unexpectedly called for an
> offlined memmap.
>
> Use pfn_to_online_page() to get struct page in PFN walker.
>
> Fixes: 49071d436b51 ("thp: add debugfs handle to split all huge pages")
> Suggested-by: David Hildenbrand <david@redhat.com>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Cc: <stable@vger.kernel.org>

Thanks, Naoya and David. Looks good to me. Reviewed-by: Yang Shi
<shy828301@gmail.com>

> ---
>  mm/huge_memory.c | 7 ++-----
>  1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 5fa2ba86dae4..03149a8f46f9 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -2894,11 +2894,8 @@ static void split_huge_pages_all(void)
>                 max_zone_pfn = zone_end_pfn(zone);
>                 for (pfn = zone->zone_start_pfn; pfn < max_zone_pfn; pfn++) {
>                         int nr_pages;
> -                       if (!pfn_valid(pfn))
> -                               continue;
> -
> -                       page = pfn_to_page(pfn);
> -                       if (!get_page_unless_zero(page))
> +                       page = pfn_to_online_page(pfn);
> +                       if (!page || !get_page_unless_zero(page))
>                                 continue;
>
>                         if (zone != page_zone(page))
> --
> 2.26.1.8815.g6a475b71f8
>
>


  parent reply	other threads:[~2022-09-07 17:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-07 10:08 [BUG report] kernel NULL pointer dereference in split_huge_page with offlined memory block Naoya Horiguchi
2022-09-07 10:23 ` David Hildenbrand
2022-09-07 10:26   ` David Hildenbrand
2022-09-07 12:11     ` [PATCH] mm/huge_memory: use pfn_to_online_page() in split_huge_pages_all() Naoya Horiguchi
2022-09-07 12:39       ` David Hildenbrand
2022-09-08  2:39         ` HORIGUCHI NAOYA(堀口 直也)
2022-09-07 17:10       ` Yang Shi [this message]
2022-09-07 17:32       ` Michal Hocko
2022-09-07 20:57       ` Andrew Morton
2022-09-08  2:47         ` HORIGUCHI NAOYA(堀口 直也)
2022-09-08  6:14           ` David Hildenbrand
2022-09-08  6:31             ` HORIGUCHI NAOYA(堀口 直也)
2022-09-08  2:19       ` Miaohe Lin
2022-09-08  3:06         ` HORIGUCHI NAOYA(堀口 直也)
2022-09-08  3:25           ` Miaohe Lin
2022-09-08  7:07             ` Michal Hocko
2022-09-09  0:27               ` Miaohe Lin
2022-09-09  9:03                 ` David Hildenbrand
2022-09-08  3:28       ` Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkrXLARfwVFPigs5T_nusfTO-Vkwsqw7Q44V4t-mhNubQg@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=naoya.horiguchi@nec.com \
    --cc=songmuchun@bytedance.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.