From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S943908AbdDTJGU (ORCPT ); Thu, 20 Apr 2017 05:06:20 -0400 Received: from mx2.suse.de ([195.135.220.15]:39603 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S943657AbdDTJGK (ORCPT ); Thu, 20 Apr 2017 05:06:10 -0400 Date: Thu, 20 Apr 2017 11:06:06 +0200 From: Michal Hocko To: Vlastimil Babka , Dan Williams Cc: linux-mm@kvack.org, Andrew Morton , Mel Gorman , Andrea Arcangeli , Jerome Glisse , Reza Arbab , Yasuaki Ishimatsu , qiuxishi@huawei.com, Kani Toshimitsu , slaoub@gmail.com, Joonsoo Kim , Andi Kleen , David Rientjes , Daniel Kiper , Igor Mammedov , Vitaly Kuznetsov , LKML , Heiko Carstens , Lai Jiangshan , Martin Schwidefsky Subject: Re: [PATCH v3 6/9] mm, memory_hotplug: do not associate hotadded memory to zones until online Message-ID: <20170420090605.GD15781@dhcp22.suse.cz> References: <20170410110351.12215-1-mhocko@kernel.org> <20170410110351.12215-7-mhocko@kernel.org> <20170410162547.GM4618@dhcp22.suse.cz> <49b6c3e2-0e68-b77e-31d6-f589d3b4822e@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <49b6c3e2-0e68-b77e-31d6-f589d3b4822e@suse.cz> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu 20-04-17 10:25:27, Vlastimil Babka wrote: > On 04/10/2017 06:25 PM, Michal Hocko wrote: [...] > > Let's simulate memory hot online manually > > Normal Movable > > > > /sys/devices/system/memory/memory32/valid_zones:Normal > > /sys/devices/system/memory/memory33/valid_zones:Normal Movable > > > > /sys/devices/system/memory/memory32/valid_zones:Normal > > /sys/devices/system/memory/memory33/valid_zones:Normal > > /sys/devices/system/memory/memory34/valid_zones:Normal Movable > > > > /sys/devices/system/memory/memory32/valid_zones:Normal > > /sys/devices/system/memory/memory33/valid_zones:Normal Movable > > /sys/devices/system/memory/memory34/valid_zones:Movable Normal > > Commands seem to be missing above? Yes. git commit just dropped everything starting with # which happened to be the bash prompt for my commands. I have changed that to $ and it looks as follows $ echo 0x100000000 > /sys/devices/system/memory/probe $ grep . /sys/devices/system/memory/memory32/valid_zones Normal Movable $ echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe $ grep . /sys/devices/system/memory/memory3?/valid_zones /sys/devices/system/memory/memory32/valid_zones:Normal /sys/devices/system/memory/memory33/valid_zones:Normal Movable $ echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe $ grep . /sys/devices/system/memory/memory3?/valid_zones /sys/devices/system/memory/memory32/valid_zones:Normal /sys/devices/system/memory/memory33/valid_zones:Normal /sys/devices/system/memory/memory34/valid_zones:Normal Movable $ echo online_movable > /sys/devices/system/memory/memory34/state $ grep . /sys/devices/system/memory/memory3?/valid_zones /sys/devices/system/memory/memory32/valid_zones:Normal /sys/devices/system/memory/memory33/valid_zones:Normal Movable /sys/devices/system/memory/memory34/valid_zones:Movable Normal [...] > > This means that the same physical online steps as above will lead to the > > following state: > > Normal Movable > > > > /sys/devices/system/memory/memory32/valid_zones:Normal Movable > > /sys/devices/system/memory/memory33/valid_zones:Normal Movable > > > > /sys/devices/system/memory/memory32/valid_zones:Normal Movable > > /sys/devices/system/memory/memory33/valid_zones:Normal Movable > > /sys/devices/system/memory/memory34/valid_zones:Normal Movable > > > > /sys/devices/system/memory/memory32/valid_zones:Normal Movable > > /sys/devices/system/memory/memory33/valid_zones:Normal Movable > > /sys/devices/system/memory/memory34/valid_zones:Movable > > Ditto. This just copies the above so I didn't add those commands. I can if that is preferable. > > --- a/include/linux/mmzone.h > > +++ b/include/linux/mmzone.h > > @@ -533,6 +533,20 @@ static inline bool zone_is_empty(struct zone *zone) > > } > > > > /* > > + * Return true if [start_pfn, start_pfn + nr_pages) range has a non-mpty > > > non-empty fixed > > + * intersection with the given zone > > + */ > > +static inline bool zone_intersects(struct zone *zone, > > + unsigned long start_pfn, unsigned long nr_pages) > > +{ > > I'm looking at your current mmotm tree branch, which looks like this: > > + * Return true if [start_pfn, start_pfn + nr_pages) range has a non-mpty > + * intersection with the given zone > + */ > +static inline bool zone_intersects(struct zone *zone, > + unsigned long start_pfn, unsigned long nr_pages) > +{ > + if (zone_is_empty(zone)) > + return false; > + if (zone->zone_start_pfn <= start_pfn && start_pfn < zone_end_pfn(zone)) > + return true; > + if (start_pfn + nr_pages > zone->zone_start_pfn) > + return true; > > A false positive is possible here, when start_pfn >= zone_end_pfn(zone)? Ohh, right. Looks better? diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index eae6da28646e..611ff869fa4d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -541,10 +541,14 @@ static inline bool zone_intersects(struct zone *zone, { if (zone_is_empty(zone)) return false; - if (zone->zone_start_pfn <= start_pfn && start_pfn < zone_end_pfn(zone)) + if (start_pfn >= zone_end_pfn(zone)) + return false; + + if (zone->zone_start_pfn <= start_pfn) return true; if (start_pfn + nr_pages > zone->zone_start_pfn) return true; + return false; } > > @@ -1029,39 +1018,114 @@ static void node_states_set_node(int node, struct memory_notify *arg) > > node_set_state(node, N_MEMORY); > > } > > > > -bool zone_can_shift(unsigned long pfn, unsigned long nr_pages, > > - enum zone_type target, int *zone_shift) > > +bool allow_online_pfn_range(int nid, unsigned long pfn, unsigned long nr_pages, int online_type) > > { > > - struct zone *zone = page_zone(pfn_to_page(pfn)); > > - enum zone_type idx = zone_idx(zone); > > - int i; > > + struct pglist_data *pgdat = NODE_DATA(nid); > > + struct zone *movable_zone = &pgdat->node_zones[ZONE_MOVABLE]; > > + struct zone *normal_zone = &pgdat->node_zones[ZONE_NORMAL]; > > > > - *zone_shift = 0; > > + /* > > + * TODO there shouldn't be any inherent reason to have ZONE_NORMAL > > + * physically before ZONE_MOVABLE. All we need is they do not > > + * overlap. Historically we didn't allow ZONE_NORMAL after ZONE_MOVABLE > > + * though so let's stick with it for simplicity for now. > > + * TODO make sure we do not overlap with ZONE_DEVICE > > Is this last TODO a blocker, unlike the others? I think it is not but my knowledge of the zone device is very limited. I was hoping for Dan's feedback here. From what I understand Zone device occupies the high end of the address space so we shouldn't overlap here. Is this correct Dan? [...] > > + if (online_type == MMOP_ONLINE_MOVABLE && !can_online_high_movable(nid)) > > + return -EINVAL; > > > > - zone = move_pfn_range(zone_shift, pfn, pfn + nr_pages); > > + /* associate pfn range with the zone */ > > + zone = move_pfn_range(online_type, nid, pfn, nr_pages); > > if (!zone) > > return -EINVAL; > > Nit: This !zone currently cannot happen. fixed Thanks! -- Michal Hocko SUSE Labs