All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: David Hildenbrand <david@redhat.com>
Cc: Alexey Makhalov <amakhalov@vmware.com>,
	Dennis Zhou <dennis@kernel.org>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oscar Salvador <osalvador@suse.de>, Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@linux.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH v3] mm: fix panic in __alloc_pages
Date: Wed, 8 Dec 2021 09:34:37 +0100	[thread overview]
Message-ID: <YbBuHSkvd6fDdQ9d@dhcp22.suse.cz> (raw)
In-Reply-To: <5a44c44a-141c-363d-c23e-558edc23b9b4@redhat.com>

On Wed 08-12-21 09:24:39, David Hildenbrand wrote:
> On 08.12.21 09:12, Michal Hocko wrote:
> > On Tue 07-12-21 19:03:28, David Hildenbrand wrote:
> >> On 07.12.21 18:17, Alexey Makhalov wrote:
> >>>
> >>>
> >>>> On Dec 7, 2021, at 9:13 AM, David Hildenbrand <david@redhat.com> wrote:
> >>>>
> >>>> On 07.12.21 18:02, Alexey Makhalov wrote:
> >>>>>
> >>>>>
> >>>>>> On Dec 7, 2021, at 8:36 AM, Michal Hocko <mhocko@suse.com> wrote:
> >>>>>>
> >>>>>> On Tue 07-12-21 17:27:29, Michal Hocko wrote:
> >>>>>> [...]
> >>>>>>> So your proposal is to drop set_node_online from the patch and add it as
> >>>>>>> a separate one which handles
> >>>>>>> 	- sysfs part (i.e. do not register a node which doesn't span a
> >>>>>>> 	  physical address space)
> >>>>>>> 	- hotplug side of (drop the pgd allocation, register node lazily
> >>>>>>> 	  when a first memblocks are registered)
> >>>>>>
> >>>>>> In other words, the first stage
> >>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >>>>>> index c5952749ad40..f9024ba09c53 100644
> >>>>>> --- a/mm/page_alloc.c
> >>>>>> +++ b/mm/page_alloc.c
> >>>>>> @@ -6382,7 +6382,11 @@ static void __build_all_zonelists(void *data)
> >>>>>> 	if (self && !node_online(self->node_id)) {
> >>>>>> 		build_zonelists(self);
> >>>>>> 	} else {
> >>>>>> -		for_each_online_node(nid) {
> >>>>>> +		/*
> >>>>>> +		 * All possible nodes have pgdat preallocated
> >>>>>> +		 * free_area_init
> >>>>>> +		 */
> >>>>>> +		for_each_node(nid) {
> >>>>>> 			pg_data_t *pgdat = NODE_DATA(nid);
> >>>>>>
> >>>>>> 			build_zonelists(pgdat);
> >>>>>
> >>>>> Will it blow up memory usage for the nodes which might never be onlined?
> >>>>> I prefer the idea of init on demand.
> >>>>>
> >>>>> Even now there is an existing problem.
> >>>>> In my experiments, I observed _huge_ memory consumption increase by increasing number
> >>>>> of possible numa nodes. I’m going to report it in separate mail thread.
> >>>>
> >>>> I already raised that PPC might be problematic in that regard. Which
> >>>> architecture / setup do you have in mind that can have a lot of possible
> >>>> nodes?
> >>>>
> >>> It is x86_64 VMware VM, not the regular one, but specially configured (1 vCPU per node,
> >>> with hot-plug support, 128 possible nodes)  
> >>
> >> I thought the pgdat would be smaller but I just gave it a test:
> > 
> > Yes, pgdat is quite large! Just embeded zones can eat a lot.
> > 
> >> On my system, pgdata_t is 173824 bytes. So 128 nodes would correspond to
> >> 21 MiB, which is indeed a lot. I assume it's due to "struct zonelist",
> >> which has MAX_ZONES_PER_ZONELIST == (MAX_NUMNODES * MAX_NR_ZONES) zone
> >> references ...
> > 
> > This is what pahole tells me
> > struct pglist_data {
> >         struct zone                node_zones[4] __attribute__((__aligned__(64))); /*     0  5632 */
> >         /* --- cacheline 88 boundary (5632 bytes) --- */
> >         struct zonelist            node_zonelists[1];    /*  5632    80 */
> > 	[...]
> >         /* size: 6400, cachelines: 100, members: 27 */
> >         /* sum members: 6369, holes: 5, sum holes: 31 */
> > 
> > with my particular config (which is !NUMA). I haven't really checked
> > whether there are other places which might scale with MAX_NUM_NODES or
> > something like that.
> > 
> > Anyway, is 21MB of wasted space for 128 Node machine something really
> > note worthy?
> > 
> 
> I think we'll soon might see setups (again, CXL is an example, but als
> owhen providing a dynamic amount of performance differentiated memory
> via virtio-mem) where this will most probably matter. With performance
> differentiated memory we'll see a lot more nodes getting used in
> general, and a lot more nodes eventually getting hotplugged.

There are certainly machines with many nodes. E.g. SLES kernels are
build with CONFIG_NODES_SHIFT=10 which is a lot of potential nodes.
And I have seen really large machines with many nodes but those usually
come with a lot of memory and they do not tend to have non populated
nodes AFAIR.

> If 128 nodes is realistic, I cannot tell.
> 
> We could optimize by allocating some members dynamically. For example
> we'll never need MAX_NUMNODES entries, but only the number of possible
> nodes.

Yes agreed. Scaling with MAX_NUMNODES is almost always wasteful.

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2021-12-08  8:34 UTC|newest]

Thread overview: 100+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-01 20:13 [PATCH] mm: fix panic in __alloc_pages Alexey Makhalov
2021-11-01 20:38 ` Matthew Wilcox
2021-11-02  7:47 ` Michal Hocko
2021-11-02  8:12   ` David Hildenbrand
2021-11-02  8:48     ` Alexey Makhalov
2021-11-02  9:04       ` Michal Hocko
2021-11-02  9:24         ` David Hildenbrand
2021-11-02 10:34           ` Alexey Makhalov
2021-11-02 11:00             ` David Hildenbrand
2021-11-02 11:44               ` Michal Hocko
2021-11-02 12:06                 ` David Hildenbrand
2021-11-02 12:27                   ` Michal Hocko
2021-11-02 12:39                     ` David Hildenbrand
2021-11-02 13:25                       ` Michal Hocko
2021-11-02 13:41                         ` David Hildenbrand
2021-11-02 14:12                           ` Michal Hocko
2021-11-02 14:44                             ` David Hildenbrand
2021-11-02 13:52                         ` Oscar Salvador
2021-11-02 14:35                           ` Michal Hocko
2021-11-08  6:12                   ` Alexey Makhalov
2021-11-08  6:36                     ` [PATCH v2] " Alexey Makhalov
2021-11-08  8:32                       ` David Hildenbrand
2021-11-08 20:23                         ` [PATCH v3] " Alexey Makhalov
2021-11-09  2:08                           ` Eric Dumazet
2021-11-09  7:03                             ` David Hildenbrand
2021-11-09 16:55                               ` Eric Dumazet
2021-11-09 17:15                             ` Michal Hocko
2021-11-09 19:06                               ` Dennis Zhou
2021-11-09 19:54                                 ` Michal Hocko
2021-11-16  1:31                                   ` Alexey Makhalov
2021-11-16  9:17                                     ` Michal Hocko
2021-11-16 20:22                                       ` Alexey Makhalov
2021-11-18  8:35                                         ` Michal Hocko
2021-12-07 10:54                                           ` Michal Hocko
2021-12-07 11:08                                             ` David Hildenbrand
2021-12-07 12:13                                               ` Michal Hocko
2021-12-07 12:28                                                 ` David Hildenbrand
2021-12-07 13:23                                                   ` Michal Hocko
2021-12-07 15:09                                                     ` David Hildenbrand
2021-12-07 15:29                                                       ` Michal Hocko
2021-12-07 15:34                                                         ` David Hildenbrand
2021-12-07 15:56                                                           ` Michal Hocko
2021-12-07 16:09                                                             ` David Hildenbrand
2021-12-07 16:27                                                               ` Michal Hocko
2021-12-07 16:36                                                                 ` Michal Hocko
2021-12-07 16:40                                                                   ` David Hildenbrand
2021-12-08  8:28                                                                     ` Michal Hocko
2021-12-07 17:02                                                                   ` Alexey Makhalov
2021-12-07 17:13                                                                     ` David Hildenbrand
2021-12-07 17:17                                                                       ` Alexey Makhalov
2021-12-07 18:03                                                                         ` David Hildenbrand
2021-12-08  8:12                                                                           ` Michal Hocko
2021-12-08  8:24                                                                             ` David Hildenbrand
2021-12-08  8:34                                                                               ` Michal Hocko [this message]
2021-12-08  8:38                                                                                 ` David Hildenbrand
2021-12-08  8:04                                                                         ` Michal Hocko
2021-12-08  8:19                                                                           ` Alexey Makhalov
2021-12-08  8:30                                                                             ` Michal Hocko
2021-12-08  8:54                                             ` Michal Hocko
2021-12-08  8:57                                               ` Alexey Makhalov
2021-12-08  9:55                                                 ` Michal Hocko
2021-12-09  2:16                                               ` Alexey Makhalov
2021-12-09  8:46                                                 ` Michal Hocko
2021-12-09  9:28                                                   ` Alexey Makhalov
2021-12-09  9:56                                                     ` Michal Hocko
2021-12-09 10:23                                                       ` Alexey Makhalov
2021-12-09 13:29                                                         ` Michal Hocko
2021-12-09 19:01                                                           ` Alexey Makhalov
2021-12-10  9:11                                                             ` Michal Hocko
2021-12-17 12:53                                                               ` Michal Hocko
2021-12-21  5:46                                                                 ` Alexey Makhalov
2021-12-21  9:46                                                                   ` Michal Hocko
2021-12-21 20:23                                                                     ` Alexey Makhalov
2021-12-22 11:41                                                                       ` Michal Hocko
2021-12-09 10:48                                             ` Michal Hocko
2021-12-13 15:06                                               ` Michal Hocko
2021-12-13 15:07                                                 ` David Hildenbrand
2021-12-14  8:38                                                   ` Michal Hocko
2021-12-14 10:07                                               ` [PATCH v2 0/4] mm, memory_hotplug: handle unitialized numa node gracefully Michal Hocko
2021-12-14 10:07                                                 ` [PATCH v2 1/4] mm, memory_hotplug: make arch_alloc_nodedata independent on CONFIG_MEMORY_HOTPLUG Michal Hocko
2021-12-14 10:07                                                 ` [PATCH v2 2/4] mm: handle uninitialized numa nodes gracefully Michal Hocko
2021-12-14 10:33                                                   ` Christoph Lameter
2021-12-14 10:38                                                     ` Michal Hocko
2022-01-14  0:24                                                       ` Wei Yang
2022-01-14 10:01                                                         ` Michal Hocko
2021-12-15  4:47                                                   ` kernel test robot
2021-12-15  4:47                                                     ` kernel test robot
2021-12-15 10:12                                                     ` Michal Hocko
2021-12-15 10:12                                                       ` Michal Hocko
2021-12-14 10:07                                                 ` [PATCH v2 3/4] mm, memory_hotplug: drop arch_free_nodedata Michal Hocko
2021-12-14 10:07                                                 ` [PATCH v2 4/4] mm, memory_hotplug: reorganize new pgdat initialization Michal Hocko
2021-12-17 14:51                                                 ` [PATCH v2 0/4] mm, memory_hotplug: handle unitialized numa node gracefully David Hildenbrand
2021-12-21  9:51                                                   ` Michal Hocko
2022-01-02  7:14                                                     ` Mike Rapoport
2022-01-10 17:16                                                       ` Michal Hocko
2022-01-10 21:16                                                 ` Rafael Aquini
2022-01-11  8:34                                                   ` Michal Hocko
2021-11-08 10:37                       ` [PATCH v2] mm: fix panic in __alloc_pages Michal Hocko
2021-11-02  9:40         ` [PATCH] " Alexey Makhalov
2021-11-02  9:40         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YbBuHSkvd6fDdQ9d@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=amakhalov@vmware.com \
    --cc=cl@linux.com \
    --cc=david@redhat.com \
    --cc=dennis@kernel.org \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=osalvador@suse.de \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.