linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, Mel Gorman <mgorman@suse.de>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Sachin Sant <sachinp@linux.vnet.ibm.com>,
	Christopher Lameter <cl@linux.com>,
	linuxppc-dev@lists.ozlabs.org,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Kirill Tkhai <ktkhai@virtuozzo.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Bharata B Rao <bharata@linux.ibm.com>,
	Nathan Lynch <nathanl@linux.ibm.com>
Subject: Re: [PATCH v2 1/4] mm: Check for node_online in node_present_pages
Date: Wed, 18 Mar 2020 16:32:15 +0530	[thread overview]
Message-ID: <20200318110215.GC27520@linux.vnet.ibm.com> (raw)
In-Reply-To: <20200318100256.GH21362@dhcp22.suse.cz>

* Michal Hocko <mhocko@suse.com> [2020-03-18 11:02:56]:

> On Wed 18-03-20 12:58:07, Srikar Dronamraju wrote:
> > Calling a kmalloc_node on a possible node which is not yet onlined can
> > lead to panic. Currently node_present_pages() doesn't verify the node is
> > online before accessing the pgdat for the node. However pgdat struct may
> > not be available resulting in a crash.
> >
> > NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
> > LR [c0000000003d5b94] __slab_alloc+0x34/0x60
> > Call Trace:
> > [c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
> > [c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
> > [c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
> > [c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
> > [c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
> > [c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
> > [c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
> > [c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
> > [c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
> > [c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
> > [c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
> > [c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68
> >
> > Fix this by verifying the node is online before accessing the pgdat
> > structure. Fix the same for node_spanned_pages() too.
> >
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: linux-mm@kvack.org
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Christopher Lameter <cl@linux.com>
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > Cc: Bharata B Rao <bharata@linux.ibm.com>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> >
> > Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> > Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> >  include/linux/mmzone.h | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index f3f264826423..88078a3b95e5 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -756,8 +756,10 @@ typedef struct pglist_data {
> >  	atomic_long_t		vm_stat[NR_VM_NODE_STAT_ITEMS];
> >  } pg_data_t;
> >
> > -#define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
> > -#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
> > +#define node_present_pages(nid)		\
> > +	(node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
> > +#define node_spanned_pages(nid)		\
> > +	(node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)
>
> I believe this is a wrong approach. We really do not want to special
> case all the places which require NODE_DATA. Can we please go and
> allocate pgdat for all possible nodes?
>

I can do that but the question I had was should we make this change just for
Powerpc or should the change be for other archs.

NODE_DATA initialization always seems to be in arch specific code.

The other archs that are affected seem to be mips, sh and sparc
These archs seem to have making an assumption that NODE_DATA has to be local
only,

For example on sparc / arch/sparc/mm/init_64.c in allocate_node_data function.

  NODE_DATA(nid) = memblock_alloc_node(sizeof(struct pglist_data),
                                             SMP_CACHE_BYTES, nid);
        if (!NODE_DATA(nid)) {
                prom_printf("Cannot allocate pglist_data for nid[%d]\n", nid);
                prom_halt();
        }

        NODE_DATA(nid)->node_id = nid;

So even if I make changes to allocate NODE_DATA from fallback node, I may not
be able to test them.

So please let me know your thoughts around the same.

> The current state of memory less hacks subtle bugs poping up here and
> there just prove that we should have done that from the very begining
> IMHO.
>
> >  #ifdef CONFIG_FLAT_NODE_MEM_MAP
> >  #define pgdat_page_nr(pgdat, pagenr)	((pgdat)->node_mem_map + (pagenr))
> >  #else
> > --
> > 2.18.1
>
> --
> Michal Hocko
> SUSE Labs
>

--
Thanks and Regards
Srikar Dronamraju



  reply	other threads:[~2020-03-18 11:06 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-18  7:28 [PATCH v2 0/4] Fix kmalloc_node on offline nodes Srikar Dronamraju
2020-03-18  7:28 ` [PATCH v2 1/4] mm: Check for node_online in node_present_pages Srikar Dronamraju
2020-03-18 10:02   ` Michal Hocko
2020-03-18 11:02     ` Srikar Dronamraju [this message]
2020-03-18 11:14       ` Michal Hocko
2020-03-18 11:53     ` Vlastimil Babka
2020-03-18 12:52       ` Michal Hocko
2020-03-19  0:32       ` Michael Ellerman
2020-03-19  1:11         ` Michael Ellerman
2020-03-19  9:38         ` Vlastimil Babka
2020-03-18  7:28 ` [PATCH v2 2/4] mm/slub: Use mem_node to allocate a new slab Srikar Dronamraju
2020-03-18  7:28 ` [PATCH v2 3/4] mm: Implement reset_numa_mem Srikar Dronamraju
2020-03-18 19:20   ` Christopher Lameter
2020-03-19  7:44     ` Michal Hocko
2020-03-18  7:28 ` [PATCH v2 4/4] powerpc/numa: Set fallback nodes for offline nodes Srikar Dronamraju
2020-03-18 14:28   ` kbuild test robot
2020-03-18 18:56   ` kbuild test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200318110215.GC27520@linux.vnet.ibm.com \
    --to=srikar@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bharata@linux.ibm.com \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=mpe@ellerman.id.au \
    --cc=nathanl@linux.ibm.com \
    --cc=sachinp@linux.vnet.ibm.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).