All of lore.kernel.org
 help / color / mirror / Atom feed
From: osalvador@suse.de
To: akpm@linux-foundation.org
Cc: mhocko@suse.com, dan.j.williams@intel.com,
	pavel.tatashin@microsoft.com, jglisse@redhat.com,
	Jonathan.Cameron@huawei.com, rafael@kernel.org, david@redhat.com,
	linux-mm@kvack.org, Oscar Salvador <osalvador@suse.com>
Subject: Re: [PATCH v2 3/5] mm, memory_hotplug: Move zone/pages handling to offline stage
Date: Wed, 28 Nov 2018 15:15:04 +0100	[thread overview]
Message-ID: <31fede3e3aa0c866b8d52d016a14689d@suse.de> (raw)
In-Reply-To: <20181127162005.15833-4-osalvador@suse.de>

On 2018-11-27 17:20, Oscar Salvador wrote:
> From: Oscar Salvador <osalvador@suse.com>
> 
> The current implementation accesses pages during hot-remove
> stage in order to get the zone linked to this memory-range.
> We use that zone for a) check if the zone is ZONE_DEVICE and
> b) to shrink the zone's spanned pages.
> 
> Accessing pages during this stage is problematic, as we might be
> accessing pages that were not initialized if we did not get to
> online the memory before removing it.
> 
> The only reason to check for ZONE_DEVICE in __remove_pages
> is to bypass the call to release_mem_region_adjustable(),
> since these regions are removed with devm_release_mem_region.
> 
> With patch#2, this is no longer a problem so we can safely
> call release_mem_region_adjustable().
> release_mem_region_adjustable() will spot that the region
> we are trying to remove was acquired by means of
> devm_request_mem_region, and will back off safely.
> 
> This allows us to remove all zone-related operations from
> hot-remove stage.
> 
> Because of this, zone's spanned pages are shrinked during
> the offlining stage in shrink_zone_pgdat().
> It would have been great to decrease also the spanned page
> for the node there, but we need them in try_offline_node().
> So we still decrease spanned pages for the node in the hot-remove
> stage.
> 
> The only particularity is that now
> find_smallest_section_pfn/find_biggest_section_pfn, when called from
> shrink_zone_span, will now check for online sections and not
> valid sections instead.
> To make this work with devm/HMM code, we need to call 
> offline_mem_sections
> and online_mem_sections in that code path when we are adding memory.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>

I did not really like the idea of having to online/offline sections from 
DEVM code, so I think
this should be better:

diff --git a/kernel/memremap.c b/kernel/memremap.c
index 66cbf334203b..dfdb11f58cd1 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -105,7 +105,6 @@ static void devm_memremap_pages_release(void *data)

  	pfn = align_start >> PAGE_SHIFT;
  	nr_pages = align_size >> PAGE_SHIFT;
-	offline_mem_sections(pfn, pfn + nr_pages);
  	shrink_zone(page_zone(pfn_to_page(pfn)), pfn, pfn + nr_pages, 
nr_pages);

  	if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
@@ -229,10 +228,7 @@ void *devm_memremap_pages(struct device *dev, 
struct dev_pagemap *pgmap)

  	if (!error) {
  		struct zone *zone;
-		unsigned long pfn = align_start >> PAGE_SHIFT;
-		unsigned long nr_pages = align_size >> PAGE_SHIFT;

-		online_mem_sections(pfn, pfn + nr_pages);
  		zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE];
  		move_pfn_range_to_zone(zone, align_start >> PAGE_SHIFT,
  				align_size >> PAGE_SHIFT, altmap);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 4fe42ccb0be4..653d2bc9affe 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -314,13 +314,17 @@ int __ref __add_pages(int nid, unsigned long 
phys_start_pfn,
  }

  #ifdef CONFIG_MEMORY_HOTREMOVE
-static bool is_section_ok(struct mem_section *ms, bool zone)
+static bool is_section_ok(struct mem_section *ms, struct zone *z)
  {
  	/*
-	 * We cannot shrink pgdat's spanned because we use them
-	 * in try_offline_node to check if all sections were removed.
+	 * In case we are shrinking pgdat's pages or the zone is
+	 * ZONE_DEVICE, we check for valid sections instead.
+	 * We cannot shrink pgdat's spanned pages until hot-remove
+	 * operation because we use them in try_offline_node to check
+	 * if all sections were removed.
+	 * ZONE_DEVICE's sections do not get onlined either.
  	 */
-	if (zone)
+	if (z && !is_dev_zone(z))
  		return online_section(ms);
  	else
  		return valid_section(ms);
@@ -335,7 +339,7 @@ static unsigned long find_smallest_section_pfn(int 
nid, struct zone *zone,
  	for (; start_pfn < end_pfn; start_pfn += PAGES_PER_SECTION) {
  		ms = __pfn_to_section(start_pfn);

-		if (!is_section_ok(ms, !!zone))
+		if (!is_section_ok(ms, zone))
  			continue;

  		if (unlikely(pfn_to_nid(start_pfn) != nid))
@@ -425,7 +429,7 @@ static void shrink_zone_span(struct zone *zone, 
unsigned long start_pfn,
  	for (; pfn < zone_end_pfn; pfn += PAGES_PER_SECTION) {
  		ms = __pfn_to_section(pfn);

-		if (unlikely(!online_section(ms)))
+		if (unlikely(!is_section_ok(ms, zone)))
  			continue;

  		if (page_zone(pfn_to_page(pfn)) != zone)
@@ -517,11 +521,24 @@ void shrink_zone(struct zone *zone, unsigned long 
start_pfn,
  {
  	int nr_pages = PAGES_PER_SECTION;
  	unsigned long pfn;
+	unsigned long flags;
+	struct pglist_data *pgdat = zone->zone_pgdat;
+
+	pgdat_resize_lock(pgdat, &flags);
+	/*
+	 * Handling for ZONE_DEVICE does not account
+	 * present pages.
+	 */
+	if (!is_dev_zone(zone))
+		pgdat->node_present_pages -= offlined_pages;
+

  	clear_zone_contiguous(zone);
  	for (pfn = start_pfn; pfn < end_pfn; pfn += nr_pages)
  		shrink_zone_span(zone, pfn, pfn + nr_pages);
  	set_zone_contiguous(zone);
+
+	pgdat_resize_unlock(pgdat, &flags);
  }

  static void shrink_pgdat(int nid, unsigned long sect_nr)
@@ -555,8 +572,8 @@ static int __remove_section(int nid, struct 
mem_section *ms,
  }

  /**
- * __remove_pages() - remove sections of pages from a nid
- * @nid: nid from which pages belong to
+ * remove_sections() - remove sections of pages from a nid
+ * @nid: node from which pages need to be removed to
   * @phys_start_pfn: starting pageframe (must be aligned to start of a 
section)
   * @nr_pages: number of pages to remove (must be multiple of section 
size)
   * @altmap: alternative device page map or %NULL if default memmap is 
used
@@ -1581,7 +1598,6 @@ static int __ref __offline_pages(unsigned long 
start_pfn,
  	unsigned long pfn, nr_pages;
  	long offlined_pages;
  	int ret, node;
-	unsigned long flags;
  	unsigned long valid_start, valid_end;
  	struct zone *zone;
  	struct memory_notify arg;
@@ -1663,14 +1679,12 @@ static int __ref __offline_pages(unsigned long 
start_pfn,
  	undo_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE);
  	/* removal success */

-	/* Shrink zone's managed,spanned and zone/pgdat's present pages */
+	/* Shrink zone's managed and present pages */
  	adjust_managed_page_count(pfn_to_page(start_pfn), -offlined_pages);
  	zone->present_pages -= offlined_pages;

-	pgdat_resize_lock(zone->zone_pgdat, &flags);
-	zone->zone_pgdat->node_present_pages -= offlined_pages;
+	/* Shrink zone's spanned pages and node's present pages */
  	shrink_zone(zone, valid_start, valid_end, offlined_pages);
-	pgdat_resize_unlock(zone->zone_pgdat, &flags);

  	init_per_zone_wmark_min();

Although there is an ongoing discussion for getting rid of the shrink 
code.
If that is the case, this will be a lot simpler.

  parent reply	other threads:[~2018-11-28 14:15 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-27 16:20 [PATCH v2 0/5] Do not touch pages in hot-remove path Oscar Salvador
2018-11-27 16:20 ` [PATCH v2 1/5] mm, memory_hotplug: Add nid parameter to arch_remove_memory Oscar Salvador
2018-11-27 16:20 ` [PATCH v2 2/5] kernel, resource: Check for IORESOURCE_SYSRAM in release_mem_region_adjustable Oscar Salvador
2018-11-27 16:20 ` [PATCH v2 3/5] mm, memory_hotplug: Move zone/pages handling to offline stage Oscar Salvador
2018-11-28  7:52   ` Mike Rapoport
2018-11-28 14:25     ` osalvador
2018-11-28 14:15   ` osalvador [this message]
2018-11-27 16:20 ` [PATCH v2 4/5] mm, memory-hotplug: Rework unregister_mem_sect_under_nodes Oscar Salvador
2019-03-24  6:48   ` Anshuman Khandual
2019-03-25  7:40     ` Oscar Salvador
2019-03-25  8:04       ` Michal Hocko
2019-03-25  8:14         ` Oscar Salvador
2018-11-27 16:20 ` [PATCH v2 5/5] mm, memory_hotplug: Refactor shrink_zone/pgdat_span Oscar Salvador
2018-11-28  6:50   ` Michal Hocko
2018-11-28  7:07     ` Oscar Salvador
2018-11-28 10:03       ` David Hildenbrand
2018-11-28 10:14       ` Michal Hocko
2018-11-28 11:00         ` osalvador
2018-11-28 12:31           ` Michal Hocko
2018-11-28 12:51             ` osalvador
2018-11-28 13:08               ` Michal Hocko
2018-11-28 13:18                 ` osalvador
2018-11-28 15:50                   ` Michal Hocko
2018-11-28 16:02                     ` osalvador
2018-11-29  9:29                     ` osalvador
2018-11-28 13:09               ` osalvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=31fede3e3aa0c866b8d52d016a14689d@suse.de \
    --to=osalvador@suse.de \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=osalvador@suse.com \
    --cc=pavel.tatashin@microsoft.com \
    --cc=rafael@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.