All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Zach O'Keefe" <zokeefe@google.com>
To: Yang Shi <shy828301@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: don't warn if the node is offlined
Date: Tue, 1 Nov 2022 12:13:35 -0700	[thread overview]
Message-ID: <CAAa6QmRe1zMp8P-gZjR63Fg6KhOw+fP-v7SQWLNKuc2Y9ZxvyA@mail.gmail.com> (raw)
In-Reply-To: <CAHbLzkonsnr4yxUOpMpoch1eCVNgR5hC9YaMkPR=fSV2Uszc6g@mail.gmail.com>

On Tue, Nov 1, 2022 at 10:13 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Tue, Nov 1, 2022 at 12:54 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 31-10-22 17:05:06, Zach O'Keefe wrote:
> > > On Mon, Oct 31, 2022 at 3:08 PM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Mon 31-10-22 11:31:22, Yang Shi wrote:
> > > > > Syzbot reported the below splat:
> > > > >
> > > > > WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 __alloc_pages_node include/linux/gfp.h:221 [inline]
> > > > > WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 hpage_collapse_alloc_page mm/khugepaged.c:807 [inline]
> > > > > WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963
> > > > > Modules linked in:
> > > > > CPU: 1 PID: 3646 Comm: syz-executor210 Not tainted 6.1.0-rc1-syzkaller-00454-ga70385240892 #0
> > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022
> > > > > RIP: 0010:__alloc_pages_node include/linux/gfp.h:221 [inline]
> > > > > RIP: 0010:hpage_collapse_alloc_page mm/khugepaged.c:807 [inline]
> > > > > RIP: 0010:alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963
> > > > > Code: e5 01 4c 89 ee e8 6e f9 ae ff 4d 85 ed 0f 84 28 fc ff ff e8 70 fc ae ff 48 8d 6b ff 4c 8d 63 07 e9 16 fc ff ff e8 5e fc ae ff <0f> 0b e9 96 fa ff ff 41 bc 1a 00 00 00 e9 86 fd ff ff e8 47 fc ae
> > > > > RSP: 0018:ffffc90003fdf7d8 EFLAGS: 00010293
> > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > > > RDX: ffff888077f457c0 RSI: ffffffff81cd8f42 RDI: 0000000000000001
> > > > > RBP: ffff888079388c0c R08: 0000000000000001 R09: 0000000000000000
> > > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > > > R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000
> > > > > FS:  00007f6b48ccf700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
> > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > CR2: 00007f6b48a819f0 CR3: 00000000171e7000 CR4: 00000000003506e0
> > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > Call Trace:
> > > > >  <TASK>
> > > > >  collapse_file+0x1ca/0x5780 mm/khugepaged.c:1715
> > > >
> > > > This is quite weird, isn't it? alloc_charge_hpage is selecting the most
> > > > busy node (as per collapse_control). How come this can be an offline
> > > > node? Is a parallel memory hotplug happening?
> > >
> > > TBH -- I did not look closely at the syzbot reproducer (let alone
> > > attempt to run it) and assumed this was the case. Taking a quick look,
> > > at least memory hot remove is enabled:
> > >
> > > CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
> > > CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
> > > CONFIG_MEMORY_HOTPLUG=y
> > > CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
> > > CONFIG_MEMORY_HOTREMOVE=y
> > >
> > > But looking at the C reproducer, I don't immediately see anywhere
> > > where we offline nodes. I'll try to run this tomorrow to make sure I'm
> > > not missing something real here.
> >
> > Looking slightly closer at hpage_collapse_scan_file I think that it is
> > possible that xas_for_each simply doesn't find any entries in the page
> > cache and with khugepaged_max_ptes_none == HPAGE_PMD_NR we can fall back
> > to collapse_file even without any real entries.
>
> The khugepaged_max_ptes_none can't be HPAGE_PMD_NR, it must be <=
> (HPAGE_PMD_NR - 1), but MADV_COLLAPSE does ignore it.
>
> But a closer look at the code about how to pick up the preferred node,
> there seems to be a corner case for MADV_COLLAPSE.
>
> The code tried to do some balance if several nodes have the same hit
> record. Basically it does conceptually:
>     * If the target_node <= last_target_node, then iterate from
> last_target_node + 1 to MAX_NUMNODES (1024 on default config)
>     * If the max_value == node_load[nid], then target_node = nid
>
> So assuming the system has 2 nodes, the target_node is 0 and the
> last_target_node is 1, if MADV_COLLAPSE path is hit, then it may
> return 2 for target_node, but it is actually not existing (offline),
> so the warn is triggered.
>

You're one step ahead of me, Yang. I was just debugging the syzbot C
reproducer, and this seems to be exactly the case that is happening.

> The below patch should be able to fix it:
>
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index ea0d186bc9d4..d24405e6736b 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -787,7 +787,8 @@ static int hpage_collapse_find_target_node(struct
> collapse_control *cc)
>         if (target_node <= cc->last_target_node)
>                 for (nid = cc->last_target_node + 1; nid < MAX_NUMNODES;
>                      nid++)
> -                       if (max_value == cc->node_load[nid]) {
> +                       if (node_online(nid) &&
> +                           max_value == cc->node_load[nid]) {
>                                 target_node = nid;
>                                 break;
>                         }
>

Thanks for the patch. I think this is the right place to do the check.

This is slightly tangential - but I don't want to send a new mail
about it -- but I wonder if we should be doing __GFP_THISNODE +
explicit node vs having hpage_collapse_find_target_node() set a
nodemask. We could then provide fallback nodes for ties, or if some
node contained > some threshold number of pages.

> > But the mere possibility of the hotplug race should be a sufficient
> > ground to remove those WARN_ONs
>

Agreed.

> The warn_on did help to catch this bug. But the reasons for removing
> it still stand TBH, so we may consider to move this warn_on to the
> callers which care about it?

I didn't come across in a cursory search -- but if there are callers
which try to synchronize with hot remove to ensure __GFP_THISNODE
succeeds, then sure, the warn makes sense to them.

> >
> >
> > Thanks!
> > --
> > Michal Hocko
> > SUSE Labs

  reply	other threads:[~2022-11-01 19:14 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31 18:31 [PATCH] mm: don't warn if the node is offlined Yang Shi
2022-10-31 21:16 ` Zach O'Keefe
2022-10-31 22:08 ` Michal Hocko
2022-11-01  0:05   ` Zach O'Keefe
2022-11-01  7:54     ` Michal Hocko
2022-11-01 17:12       ` Yang Shi
2022-11-01 19:13         ` Zach O'Keefe [this message]
2022-11-01 20:09           ` Yang Shi
2022-11-01 22:05             ` Zach O'Keefe
2022-11-02  7:39           ` Michal Hocko
2022-11-02  7:49             ` Michal Hocko
2022-11-02 16:03             ` Yang Shi
2022-11-02 16:15               ` Michal Hocko
2022-11-02 17:36                 ` Yang Shi
2022-11-02 17:47                   ` Michal Hocko
2022-11-02 18:18                     ` Yang Shi
2022-11-02 18:58                       ` Zach O'Keefe
2022-11-02 20:08                         ` Yang Shi
2022-11-02 20:21                           ` Zach O'Keefe
2022-11-03  7:54                           ` Michal Hocko
2022-11-03 17:13                             ` Yang Shi
2022-11-03  7:51                         ` Michal Hocko
2022-11-02  7:14         ` Michal Hocko
2022-11-02 15:58           ` Yang Shi
2022-11-02 16:11             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAa6QmRe1zMp8P-gZjR63Fg6KhOw+fP-v7SQWLNKuc2Y9ZxvyA@mail.gmail.com \
    --to=zokeefe@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=shy828301@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.