Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: osalvador@suse.de
To: akpm@linux-foundation.org
Cc: mhocko@suse.com, dan.j.williams@intel.com,
	pavel.tatashin@microsoft.com, jglisse@redhat.com,
	Jonathan.Cameron@huawei.com, rafael@kernel.org, david@redhat.com,
	linux-mm@kvack.org, Oscar Salvador <osalvador@suse.com>
Subject: Re: [PATCH v2 3/5] mm, memory_hotplug: Move zone/pages handling to offline stage
Date: Wed, 28 Nov 2018 15:15:04 +0100
Message-ID: <31fede3e3aa0c866b8d52d016a14689d@suse.de> (raw)
In-Reply-To: <20181127162005.15833-4-osalvador@suse.de>

On 2018-11-27 17:20, Oscar Salvador wrote:
> From: Oscar Salvador <osalvador@suse.com>
> 
> The current implementation accesses pages during hot-remove
> stage in order to get the zone linked to this memory-range.
> We use that zone for a) check if the zone is ZONE_DEVICE and
> b) to shrink the zone's spanned pages.
> 
> Accessing pages during this stage is problematic, as we might be
> accessing pages that were not initialized if we did not get to
> online the memory before removing it.
> 
> The only reason to check for ZONE_DEVICE in __remove_pages
> is to bypass the call to release_mem_region_adjustable(),
> since these regions are removed with devm_release_mem_region.
> 
> With patch#2, this is no longer a problem so we can safely
> call release_mem_region_adjustable().
> release_mem_region_adjustable() will spot that the region
> we are trying to remove was acquired by means of
> devm_request_mem_region, and will back off safely.
> 
> This allows us to remove all zone-related operations from
> hot-remove stage.
> 
> Because of this, zone's spanned pages are shrinked during
> the offlining stage in shrink_zone_pgdat().
> It would have been great to decrease also the spanned page
> for the node there, but we need them in try_offline_node().
> So we still decrease spanned pages for the node in the hot-remove
> stage.
> 
> The only particularity is that now
> find_smallest_section_pfn/find_biggest_section_pfn, when called from
> shrink_zone_span, will now check for online sections and not
> valid sections instead.
> To make this work with devm/HMM code, we need to call 
> offline_mem_sections
> and online_mem_sections in that code path when we are adding memory.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

I did not really like the idea of having to online/offline sections from 
DEVM code, so I think
this should be better:

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 66cbf334203b..dfdb11f58cd1 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -105,7 +105,6 @@ static void devm_memremap_pages_release(void *data)

  	pfn = align_start >> PAGE_SHIFT;
  	nr_pages = align_size >> PAGE_SHIFT;
-	offline_mem_sections(pfn, pfn + nr_pages);
  	shrink_zone(page_zone(pfn_to_page(pfn)), pfn, pfn + nr_pages, 
nr_pages);

  	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
@@ -229,10 +228,7 @@ void *devm_memremap_pages(struct device *dev, 
struct dev_pagemap *pgmap)

  	if (!error) {
  		struct zone *zone;
-		unsigned long pfn = align_start >> PAGE_SHIFT;
-		unsigned long nr_pages = align_size >> PAGE_SHIFT;

-		online_mem_sections(pfn, pfn + nr_pages);
  		zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE];
  		move_pfn_range_to_zone(zone, align_start >> PAGE_SHIFT,
  				align_size >> PAGE_SHIFT, altmap);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4fe42ccb0be4..653d2bc9affe 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -314,13 +314,17 @@ int __ref __add_pages(int nid, unsigned long 
phys_start_pfn,
  }

  #ifdef CONFIG_MEMORY_HOTREMOVE
-static bool is_section_ok(struct mem_section *ms, bool zone)
+static bool is_section_ok(struct mem_section *ms, struct zone *z)
  {
  	/*
-	 * We cannot shrink pgdat's spanned because we use them
-	 * in try_offline_node to check if all sections were removed.
+	 * In case we are shrinking pgdat's pages or the zone is
+	 * ZONE_DEVICE, we check for valid sections instead.
+	 * We cannot shrink pgdat's spanned pages until hot-remove
+	 * operation because we use them in try_offline_node to check
+	 * if all sections were removed.
+	 * ZONE_DEVICE's sections do not get onlined either.
  	 */
-	if (zone)
+	if (z && !is_dev_zone(z))
  		return online_section(ms);
  	else
  		return valid_section(ms);
@@ -335,7 +339,7 @@ static unsigned long find_smallest_section_pfn(int 
nid, struct zone *zone,
  	for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) {
  		ms = __pfn_to_section(start_pfn);

-		if (!is_section_ok(ms, !!zone))
+		if (!is_section_ok(ms, zone))
  			continue;

  		if (unlikely(pfn_to_nid(start_pfn) != nid))
@@ -425,7 +429,7 @@ static void shrink_zone_span(struct zone *zone, 
unsigned long start_pfn,
  	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) {
  		ms = __pfn_to_section(pfn);

-		if (unlikely(!online_section(ms)))
+		if (unlikely(!is_section_ok(ms, zone)))
  			continue;

  		if (page_zone(pfn_to_page(pfn)) != zone)
@@ -517,11 +521,24 @@ void shrink_zone(struct zone *zone, unsigned long 
start_pfn,
  {
  	int nr_pages = PAGES_PER_SECTION;
  	unsigned long pfn;
+	unsigned long flags;
+	struct pglist_data *pgdat = zone->zone_pgdat;
+
+	pgdat_resize_lock(pgdat, &flags);
+	/*
+	 * Handling for ZONE_DEVICE does not account
+	 * present pages.
+	 */
+	if (!is_dev_zone(zone))
+		pgdat->node_present_pages -= offlined_pages;
+

  	clear_zone_contiguous(zone);
  	for (pfn = start_pfn; pfn < end_pfn; pfn += nr_pages)
  		shrink_zone_span(zone, pfn, pfn + nr_pages);
  	set_zone_contiguous(zone);
+
+	pgdat_resize_unlock(pgdat, &flags);
  }

  static void shrink_pgdat(int nid, unsigned long sect_nr)
@@ -555,8 +572,8 @@ static int __remove_section(int nid, struct 
mem_section *ms,
  }

  /**
- * __remove_pages() - remove sections of pages from a nid
- * @nid: nid from which pages belong to
+ * remove_sections() - remove sections of pages from a nid
+ * @nid: node from which pages need to be removed to
   * @phys_start_pfn: starting pageframe (must be aligned to start of a 
section)
   * @nr_pages: number of pages to remove (must be multiple of section 
size)
   * @altmap: alternative device page map or %NULL if default memmap is 
used
@@ -1581,7 +1598,6 @@ static int __ref __offline_pages(unsigned long 
start_pfn,
  	unsigned long pfn, nr_pages;
  	long offlined_pages;
  	int ret, node;
-	unsigned long flags;
  	unsigned long valid_start, valid_end;
  	struct zone *zone;
  	struct memory_notify arg;
@@ -1663,14 +1679,12 @@ static int __ref __offline_pages(unsigned long 
start_pfn,
  	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
  	/* removal success */

-	/* Shrink zone's managed,spanned and zone/pgdat's present pages */
+	/* Shrink zone's managed and present pages */
  	adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages);
  	zone->present_pages -= offlined_pages;

-	pgdat_resize_lock(zone->zone_pgdat, &flags);
-	zone->zone_pgdat->node_present_pages -= offlined_pages;
+	/* Shrink zone's spanned pages and node's present pages */
  	shrink_zone(zone, valid_start, valid_end, offlined_pages);
-	pgdat_resize_unlock(zone->zone_pgdat, &flags);

  	init_per_zone_wmark_min();

Although there is an ongoing discussion for getting rid of the shrink 
code.
If that is the case, this will be a lot simpler.

  parent reply index

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-27 16:20 [PATCH v2 0/5] Do not touch pages in hot-remove path Oscar Salvador
2018-11-27 16:20 ` [PATCH v2 1/5] mm, memory_hotplug: Add nid parameter to arch_remove_memory Oscar Salvador
2018-11-27 16:20 ` [PATCH v2 2/5] kernel, resource: Check for IORESOURCE_SYSRAM in release_mem_region_adjustable Oscar Salvador
2018-11-27 16:20 ` [PATCH v2 3/5] mm, memory_hotplug: Move zone/pages handling to offline stage Oscar Salvador
2018-11-28  7:52   ` Mike Rapoport
2018-11-28 14:25     ` osalvador
2018-11-28 14:15   ` osalvador [this message]
2018-11-27 16:20 ` [PATCH v2 4/5] mm, memory-hotplug: Rework unregister_mem_sect_under_nodes Oscar Salvador
2019-03-24  6:48   ` Anshuman Khandual
2019-03-25  7:40     ` Oscar Salvador
2019-03-25  8:04       ` Michal Hocko
2019-03-25  8:14         ` Oscar Salvador
2018-11-27 16:20 ` [PATCH v2 5/5] mm, memory_hotplug: Refactor shrink_zone/pgdat_span Oscar Salvador
2018-11-28  6:50   ` Michal Hocko
2018-11-28  7:07     ` Oscar Salvador
2018-11-28 10:03       ` David Hildenbrand
2018-11-28 10:14       ` Michal Hocko
2018-11-28 11:00         ` osalvador
2018-11-28 12:31           ` Michal Hocko
2018-11-28 12:51             ` osalvador
2018-11-28 13:08               ` Michal Hocko
2018-11-28 13:18                 ` osalvador
2018-11-28 15:50                   ` Michal Hocko
2018-11-28 16:02                     ` osalvador
2018-11-29  9:29                     ` osalvador
2018-11-28 13:09               ` osalvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=31fede3e3aa0c866b8d52d016a14689d@suse.de \
    --to=osalvador@suse.de \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.com \
    --cc=pavel.tatashin@microsoft.com \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git