All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 7/9] mm, page_alloc: remove stop_machine from build_all_zonelists
Date: Fri, 14 Jul 2017 13:43:21 +0200	[thread overview]
Message-ID: <20170714114321.GJ2618@dhcp22.suse.cz> (raw)
In-Reply-To: <52b1af9a-a5a9-9157-8f0f-f17946aeb2da@suse.cz>

On Fri 14-07-17 13:29:14, Vlastimil Babka wrote:
> On 07/14/2017 10:00 AM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > build_all_zonelists has been (ab)using stop_machine to make sure that
> > zonelists do not change while somebody is looking at them. This is
> > is just a gross hack because a) it complicates the context from which
> > we can call build_all_zonelists (see 3f906ba23689 ("mm/memory-hotplug:
> > switch locking to a percpu rwsem")) and b) is is not really necessary
> > especially after "mm, page_alloc: simplify zonelist initialization".
> > 
> > Updates of the zonelists happen very seldom, basically only when a zone
> > becomes populated during memory online or when it loses all the memory
> > during offline. A racing iteration over zonelists could either miss a
> > zone or try to work on one zone twice. Both of these are something we
> > can live with occasionally because there will always be at least one
> > zone visible so we are not likely to fail allocation too easily for
> > example.
> 
> Given the experience with with cpusets and mempolicies, I would rather
> avoid the risk of allocation not seeing the only zone(s) that are
> allowed by its nodemask, and triggering premature OOM.

I would argue, those are a different beast because they are directly
under control of not fully priviledged user and change between the empty
nodemask and cpusets very often. For this one to trigger we
would have to online/offline the last memory block in the zone very
often and that doesn't resemble a sensible usecase even remotely.

> So maybe the
> updates could be done in a way to avoid that, e.g. first append a copy
> of the old zonelist to the end, then overwrite and terminate with NULL.
> But if this requires any barriers or something similar on the iteration
> site, which is performance critical, then it's bad.
> Maybe a seqcount, that the iteration side only starts checking in the
> slowpath? Like we have with cpusets now.
> I know that Mel noted that stop_machine() also never had such guarantees
> to prevent this, but it could have made the chances smaller.

I think we can come up with some scheme but is this really worth it
considering how unlikely the whole thing is? Well, if somebody hits a
premature OOM killer or allocations failures it would have to be along
with a heavy memory hotplug operations and then it would be quite easy
to spot what is going on and try to fix it. I would rather not
overcomplicate it, to be honest.

> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/page_alloc.c | 9 ++-------
> >  1 file changed, 2 insertions(+), 7 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 78bd62418380..217889ecd13f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5066,8 +5066,7 @@ static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
> >   */
> >  DEFINE_MUTEX(zonelists_mutex);
> >  
> > -/* return values int ....just for stop_machine() */
> > -static int __build_all_zonelists(void *data)
> > +static void __build_all_zonelists(void *data)
> >  {
> >  	int nid;
> >  	int cpu;
> > @@ -5103,8 +5102,6 @@ static int __build_all_zonelists(void *data)
> >  			set_cpu_numa_mem(cpu, local_memory_node(cpu_to_node(cpu)));
> >  #endif
> >  	}
> > -
> > -	return 0;
> >  }
> >  
> >  static noinline void __init
> > @@ -5147,9 +5144,7 @@ void __ref build_all_zonelists(pg_data_t *pgdat)
> >  	if (system_state == SYSTEM_BOOTING) {
> >  		build_all_zonelists_init();
> >  	} else {
> > -		/* we have to stop all cpus to guarantee there is no user
> > -		   of zonelist */
> > -		stop_machine_cpuslocked(__build_all_zonelists, pgdat, NULL);
> > +		__build_all_zonelists(pgdat);
> >  		/* cpuset refresh routine should be here */
> >  	}
> >  	vm_total_pages = nr_free_pagecache_pages();
> > 

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	Johannes Weiner <hannes@cmpxchg.org>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 7/9] mm, page_alloc: remove stop_machine from build_all_zonelists
Date: Fri, 14 Jul 2017 13:43:21 +0200	[thread overview]
Message-ID: <20170714114321.GJ2618@dhcp22.suse.cz> (raw)
In-Reply-To: <52b1af9a-a5a9-9157-8f0f-f17946aeb2da@suse.cz>

On Fri 14-07-17 13:29:14, Vlastimil Babka wrote:
> On 07/14/2017 10:00 AM, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > 
> > build_all_zonelists has been (ab)using stop_machine to make sure that
> > zonelists do not change while somebody is looking at them. This is
> > is just a gross hack because a) it complicates the context from which
> > we can call build_all_zonelists (see 3f906ba23689 ("mm/memory-hotplug:
> > switch locking to a percpu rwsem")) and b) is is not really necessary
> > especially after "mm, page_alloc: simplify zonelist initialization".
> > 
> > Updates of the zonelists happen very seldom, basically only when a zone
> > becomes populated during memory online or when it loses all the memory
> > during offline. A racing iteration over zonelists could either miss a
> > zone or try to work on one zone twice. Both of these are something we
> > can live with occasionally because there will always be at least one
> > zone visible so we are not likely to fail allocation too easily for
> > example.
> 
> Given the experience with with cpusets and mempolicies, I would rather
> avoid the risk of allocation not seeing the only zone(s) that are
> allowed by its nodemask, and triggering premature OOM.

I would argue, those are a different beast because they are directly
under control of not fully priviledged user and change between the empty
nodemask and cpusets very often. For this one to trigger we
would have to online/offline the last memory block in the zone very
often and that doesn't resemble a sensible usecase even remotely.

> So maybe the
> updates could be done in a way to avoid that, e.g. first append a copy
> of the old zonelist to the end, then overwrite and terminate with NULL.
> But if this requires any barriers or something similar on the iteration
> site, which is performance critical, then it's bad.
> Maybe a seqcount, that the iteration side only starts checking in the
> slowpath? Like we have with cpusets now.
> I know that Mel noted that stop_machine() also never had such guarantees
> to prevent this, but it could have made the chances smaller.

I think we can come up with some scheme but is this really worth it
considering how unlikely the whole thing is? Well, if somebody hits a
premature OOM killer or allocations failures it would have to be along
with a heavy memory hotplug operations and then it would be quite easy
to spot what is going on and try to fix it. I would rather not
overcomplicate it, to be honest.

> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  mm/page_alloc.c | 9 ++-------
> >  1 file changed, 2 insertions(+), 7 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 78bd62418380..217889ecd13f 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5066,8 +5066,7 @@ static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats);
> >   */
> >  DEFINE_MUTEX(zonelists_mutex);
> >  
> > -/* return values int ....just for stop_machine() */
> > -static int __build_all_zonelists(void *data)
> > +static void __build_all_zonelists(void *data)
> >  {
> >  	int nid;
> >  	int cpu;
> > @@ -5103,8 +5102,6 @@ static int __build_all_zonelists(void *data)
> >  			set_cpu_numa_mem(cpu, local_memory_node(cpu_to_node(cpu)));
> >  #endif
> >  	}
> > -
> > -	return 0;
> >  }
> >  
> >  static noinline void __init
> > @@ -5147,9 +5144,7 @@ void __ref build_all_zonelists(pg_data_t *pgdat)
> >  	if (system_state == SYSTEM_BOOTING) {
> >  		build_all_zonelists_init();
> >  	} else {
> > -		/* we have to stop all cpus to guarantee there is no user
> > -		   of zonelist */
> > -		stop_machine_cpuslocked(__build_all_zonelists, pgdat, NULL);
> > +		__build_all_zonelists(pgdat);
> >  		/* cpuset refresh routine should be here */
> >  	}
> >  	vm_total_pages = nr_free_pagecache_pages();
> > 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-07-14 11:43 UTC|newest]

Thread overview: 115+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-14  7:59 [PATCH 0/9] cleanup zonelists initialization Michal Hocko
2017-07-14  7:59 ` Michal Hocko
2017-07-14  7:59 ` Michal Hocko
2017-07-14  7:59 ` [PATCH 1/9] mm, page_alloc: rip out ZONELIST_ORDER_ZONE Michal Hocko
2017-07-14  7:59   ` Michal Hocko
2017-07-14  9:36   ` Mel Gorman
2017-07-14  9:36     ` Mel Gorman
2017-07-14  9:36     ` Mel Gorman
2017-07-14 10:47     ` Michal Hocko
2017-07-14 10:47       ` Michal Hocko
2017-07-14 11:16       ` Mel Gorman
2017-07-14 11:16         ` Mel Gorman
2017-07-14 11:16         ` Mel Gorman
2017-07-14 11:38         ` Michal Hocko
2017-07-14 11:38           ` Michal Hocko
2017-07-14 11:38           ` Michal Hocko
2017-07-14 12:56           ` Mel Gorman
2017-07-14 12:56             ` Mel Gorman
2017-07-14 13:01             ` Mel Gorman
2017-07-14 13:01               ` Mel Gorman
2017-07-14 13:01               ` Mel Gorman
2017-07-14 13:08             ` Michal Hocko
2017-07-14 13:08               ` Michal Hocko
2017-07-19  9:33   ` Vlastimil Babka
2017-07-19  9:33     ` Vlastimil Babka
2017-07-19  9:33     ` Vlastimil Babka
2017-07-19 13:44     ` Michal Hocko
2017-07-19 13:44       ` Michal Hocko
2017-07-19 13:44       ` Michal Hocko
2017-07-14  7:59 ` [PATCH 2/9] mm, page_alloc: remove boot pageset initialization from memory hotplug Michal Hocko
2017-07-14  7:59   ` Michal Hocko
2017-07-14  9:39   ` Mel Gorman
2017-07-14  9:39     ` Mel Gorman
2017-07-19 13:15   ` Vlastimil Babka
2017-07-19 13:15     ` Vlastimil Babka
2017-07-14  8:00 ` [PATCH 3/9] mm, page_alloc: do not set_cpu_numa_mem on empty nodes initialization Michal Hocko
2017-07-14  8:00   ` Michal Hocko
2017-07-14  9:48   ` Mel Gorman
2017-07-14  9:48     ` Mel Gorman
2017-07-14 10:50     ` Michal Hocko
2017-07-14 10:50       ` Michal Hocko
2017-07-14 12:32       ` Mel Gorman
2017-07-14 12:32         ` Mel Gorman
2017-07-14 12:39         ` Michal Hocko
2017-07-14 12:39           ` Michal Hocko
2017-07-14 12:56           ` Mel Gorman
2017-07-14 12:56             ` Mel Gorman
2017-07-19 13:19   ` Vlastimil Babka
2017-07-19 13:19     ` Vlastimil Babka
2017-07-14  8:00 ` [PATCH 4/9] mm, memory_hotplug: drop zone from build_all_zonelists Michal Hocko
2017-07-14  8:00   ` Michal Hocko
2017-07-19 13:33   ` Vlastimil Babka
2017-07-19 13:33     ` Vlastimil Babka
2017-07-20  8:15     ` Michal Hocko
2017-07-20  8:15       ` Michal Hocko
2017-07-14  8:00 ` [PATCH 5/9] mm, memory_hotplug: remove explicit build_all_zonelists from try_online_node Michal Hocko
2017-07-14  8:00   ` Michal Hocko
2017-07-14 12:14   ` Michal Hocko
2017-07-14 12:14     ` Michal Hocko
2017-07-20  6:13   ` Vlastimil Babka
2017-07-20  6:13     ` Vlastimil Babka
2017-07-14  8:00 ` [PATCH 6/9] mm, page_alloc: simplify zonelist initialization Michal Hocko
2017-07-14  8:00   ` Michal Hocko
2017-07-14  9:55   ` Mel Gorman
2017-07-14  9:55     ` Mel Gorman
2017-07-14 10:51     ` Michal Hocko
2017-07-14 10:51       ` Michal Hocko
2017-07-14 12:46   ` Mel Gorman
2017-07-14 12:46     ` Mel Gorman
2017-07-14 13:02     ` Michal Hocko
2017-07-14 13:02       ` Michal Hocko
2017-07-14 14:18       ` Mel Gorman
2017-07-14 14:18         ` Mel Gorman
2017-07-17  6:06         ` Michal Hocko
2017-07-17  6:06           ` Michal Hocko
2017-07-17  8:07           ` Mel Gorman
2017-07-17  8:07             ` Mel Gorman
2017-07-17  8:19             ` Michal Hocko
2017-07-17  8:19               ` Michal Hocko
2017-07-17  8:58               ` Mel Gorman
2017-07-17  8:58                 ` Mel Gorman
2017-07-17  9:15                 ` Michal Hocko
2017-07-17  9:15                   ` Michal Hocko
2017-07-20  6:55   ` Vlastimil Babka
2017-07-20  6:55     ` Vlastimil Babka
2017-07-20  7:19     ` Michal Hocko
2017-07-20  7:19       ` Michal Hocko
2017-07-14  8:00 ` [PATCH 7/9] mm, page_alloc: remove stop_machine from build_all_zonelists Michal Hocko
2017-07-14  8:00   ` Michal Hocko
2017-07-14  9:59   ` Mel Gorman
2017-07-14  9:59     ` Mel Gorman
2017-07-14 11:00     ` Michal Hocko
2017-07-14 11:00       ` Michal Hocko
2017-07-14 12:47       ` Mel Gorman
2017-07-14 12:47         ` Mel Gorman
2017-07-14 11:29   ` Vlastimil Babka
2017-07-14 11:29     ` Vlastimil Babka
2017-07-14 11:43     ` Michal Hocko [this message]
2017-07-14 11:43       ` Michal Hocko
2017-07-14 11:45       ` Michal Hocko
2017-07-14 11:45         ` Michal Hocko
2017-07-20  6:16         ` Vlastimil Babka
2017-07-20  6:16           ` Vlastimil Babka
2017-07-20  7:24   ` Vlastimil Babka
2017-07-20  7:24     ` Vlastimil Babka
2017-07-20  9:21     ` Michal Hocko
2017-07-20  9:21       ` Michal Hocko
2017-07-14  8:00 ` [PATCH 8/9] mm, memory_hotplug: get rid of zonelists_mutex Michal Hocko
2017-07-14  8:00   ` Michal Hocko
2017-07-14  8:00 ` [PATCH 9/9] mm, sparse, page_ext: drop ugly N_HIGH_MEMORY branches for allocations Michal Hocko
2017-07-14  8:00   ` Michal Hocko
2017-07-20  8:04   ` Vlastimil Babka
2017-07-20  8:04     ` Vlastimil Babka
2017-07-21 14:39 [PATCH -v1 0/9] cleanup zonelists initialization Michal Hocko
2017-07-21 14:39 ` [PATCH 7/9] mm, page_alloc: remove stop_machine from build_all_zonelists Michal Hocko
2017-07-21 14:39   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170714114321.GJ2618@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.