All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Yang Shi <shy828301@gmail.com>
Cc: Zach O'Keefe <zokeefe@google.com>,
	akpm@linux-foundation.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: don't warn if the node is offlined
Date: Wed, 2 Nov 2022 08:14:42 +0100	[thread overview]
Message-ID: <Y2IY4odZJHCwC16t@dhcp22.suse.cz> (raw)
In-Reply-To: <CAHbLzkonsnr4yxUOpMpoch1eCVNgR5hC9YaMkPR=fSV2Uszc6g@mail.gmail.com>

On Tue 01-11-22 10:12:49, Yang Shi wrote:
> On Tue, Nov 1, 2022 at 12:54 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Mon 31-10-22 17:05:06, Zach O'Keefe wrote:
> > > On Mon, Oct 31, 2022 at 3:08 PM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Mon 31-10-22 11:31:22, Yang Shi wrote:
> > > > > Syzbot reported the below splat:
> > > > >
> > > > > WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 __alloc_pages_node include/linux/gfp.h:221 [inline]
> > > > > WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 hpage_collapse_alloc_page mm/khugepaged.c:807 [inline]
> > > > > WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963
> > > > > Modules linked in:
> > > > > CPU: 1 PID: 3646 Comm: syz-executor210 Not tainted 6.1.0-rc1-syzkaller-00454-ga70385240892 #0
> > > > > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022
> > > > > RIP: 0010:__alloc_pages_node include/linux/gfp.h:221 [inline]
> > > > > RIP: 0010:hpage_collapse_alloc_page mm/khugepaged.c:807 [inline]
> > > > > RIP: 0010:alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963
> > > > > Code: e5 01 4c 89 ee e8 6e f9 ae ff 4d 85 ed 0f 84 28 fc ff ff e8 70 fc ae ff 48 8d 6b ff 4c 8d 63 07 e9 16 fc ff ff e8 5e fc ae ff <0f> 0b e9 96 fa ff ff 41 bc 1a 00 00 00 e9 86 fd ff ff e8 47 fc ae
> > > > > RSP: 0018:ffffc90003fdf7d8 EFLAGS: 00010293
> > > > > RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> > > > > RDX: ffff888077f457c0 RSI: ffffffff81cd8f42 RDI: 0000000000000001
> > > > > RBP: ffff888079388c0c R08: 0000000000000001 R09: 0000000000000000
> > > > > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
> > > > > R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000
> > > > > FS:  00007f6b48ccf700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000
> > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > CR2: 00007f6b48a819f0 CR3: 00000000171e7000 CR4: 00000000003506e0
> > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > Call Trace:
> > > > >  <TASK>
> > > > >  collapse_file+0x1ca/0x5780 mm/khugepaged.c:1715
> > > >
> > > > This is quite weird, isn't it? alloc_charge_hpage is selecting the most
> > > > busy node (as per collapse_control). How come this can be an offline
> > > > node? Is a parallel memory hotplug happening?
> > >
> > > TBH -- I did not look closely at the syzbot reproducer (let alone
> > > attempt to run it) and assumed this was the case. Taking a quick look,
> > > at least memory hot remove is enabled:
> > >
> > > CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
> > > CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
> > > CONFIG_MEMORY_HOTPLUG=y
> > > CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
> > > CONFIG_MEMORY_HOTREMOVE=y
> > >
> > > But looking at the C reproducer, I don't immediately see anywhere
> > > where we offline nodes. I'll try to run this tomorrow to make sure I'm
> > > not missing something real here.
> >
> > Looking slightly closer at hpage_collapse_scan_file I think that it is
> > possible that xas_for_each simply doesn't find any entries in the page
> > cache and with khugepaged_max_ptes_none == HPAGE_PMD_NR we can fall back
> > to collapse_file even without any real entries.
> 
> The khugepaged_max_ptes_none can't be HPAGE_PMD_NR, it must be <=
> (HPAGE_PMD_NR - 1), but MADV_COLLAPSE does ignore it.

OK, I see.

> But a closer look at the code about how to pick up the preferred node,
> there seems to be a corner case for MADV_COLLAPSE.
> 
> The code tried to do some balance if several nodes have the same hit
> record. Basically it does conceptually:
>     * If the target_node <= last_target_node, then iterate from
> last_target_node + 1 to MAX_NUMNODES (1024 on default config)
>     * If the max_value == node_load[nid], then target_node = nid

Correct

> So assuming the system has 2 nodes, the target_node is 0 and the
> last_target_node is 1, if MADV_COLLAPSE path is hit, then it may
> return 2 for target_node, but it is actually not existing (offline),
> so the warn is triggered.

How can node_load[2] > 0 (IIUC max_value > 0 here) if the node is
offline (other than a race with hotplug)?

> The below patch should be able to fix it:
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index ea0d186bc9d4..d24405e6736b 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -787,7 +787,8 @@ static int hpage_collapse_find_target_node(struct
> collapse_control *cc)
>         if (target_node <= cc->last_target_node)
>                 for (nid = cc->last_target_node + 1; nid < MAX_NUMNODES;
>                      nid++)
> -                       if (max_value == cc->node_load[nid]) {
> +                       if (node_online(nid) &&
> +                           max_value == cc->node_load[nid]) {
>                                 target_node = nid;
>                                 break;
>                         }

Node, this is equally racy. node_online might become false right after
the check.
-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2022-11-02  7:14 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-31 18:31 [PATCH] mm: don't warn if the node is offlined Yang Shi
2022-10-31 21:16 ` Zach O'Keefe
2022-10-31 22:08 ` Michal Hocko
2022-11-01  0:05   ` Zach O'Keefe
2022-11-01  7:54     ` Michal Hocko
2022-11-01 17:12       ` Yang Shi
2022-11-01 19:13         ` Zach O'Keefe
2022-11-01 20:09           ` Yang Shi
2022-11-01 22:05             ` Zach O'Keefe
2022-11-02  7:39           ` Michal Hocko
2022-11-02  7:49             ` Michal Hocko
2022-11-02 16:03             ` Yang Shi
2022-11-02 16:15               ` Michal Hocko
2022-11-02 17:36                 ` Yang Shi
2022-11-02 17:47                   ` Michal Hocko
2022-11-02 18:18                     ` Yang Shi
2022-11-02 18:58                       ` Zach O'Keefe
2022-11-02 20:08                         ` Yang Shi
2022-11-02 20:21                           ` Zach O'Keefe
2022-11-03  7:54                           ` Michal Hocko
2022-11-03 17:13                             ` Yang Shi
2022-11-03  7:51                         ` Michal Hocko
2022-11-02  7:14         ` Michal Hocko [this message]
2022-11-02 15:58           ` Yang Shi
2022-11-02 16:11             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2IY4odZJHCwC16t@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=shy828301@gmail.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.