linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm,hugetlb,migration: don't migrate kernelcore hugepages
@ 2017-10-01 22:51 Alexandru Moise
  2017-10-02 12:54 ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Alexandru Moise @ 2017-10-01 22:51 UTC (permalink / raw)
  To: corbet, paulmck, akpm, tglx, mingo, cdall, mchehab, zohar,
	marc.zyngier, mhocko, rientjes, hannes, mike.kravetz,
	n-horiguchi, aneesh.kumar, punit.agrawal, aarcange,
	gerald.schaefer, jglisse, kirill.shutemov, will.deacon,
	linux-doc, linux-kernel, linux-mm

This attempts to bring more flexibility to how hugepages are allocated
by making it possible to decide whether we want the hugepages to be
allocated from ZONE_MOVABLE or to the zone allocated by the "kernelcore="
boot parameter for non-movable allocations.

A new boot parameter is introduced, "hugepages_movable=", this sets the
default value for the "hugepages_treat_as_movable" sysctl. This allows
us to determine the zone for hugepages allocated at boot time. It only
affects 2M hugepages allocated at boot time for now because 1G
hugepages are allocated much earlier in the boot process and ignore
this sysctl completely.

The "hugepages_treat_as_movable" sysctl is also turned into a mandatory
setting that all hugepage allocations at runtime must respect (both
2M and 1G sized hugepages). The default value is changed to "1" to
preserve the existing behavior that if hugepage migration is supported,
then the pages will be allocated from ZONE_MOVABLE.

Note however if not enough contiguous memory is present in ZONE_MOVABLE
then the allocation will fallback to the non-movable zone and those
pages will not be migratable.

The implementation is a bit dirty so obviously I'm open to suggestions
for a better way to implement this behavior, or comments whether the whole
idea is fundamentally __wrong__.

Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com>
---
 Documentation/admin-guide/kernel-parameters.txt |  8 ++++++++
 Documentation/sysctl/vm.txt                     |  3 +++
 mm/hugetlb.c                                    | 15 +++++++++++++--
 mm/migrate.c                                    |  8 +++++++-
 4 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 05496622b4ef..25116d32d59e 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1318,6 +1318,14 @@
 			x86-64 are 2M (when the CPU supports "pse") and 1G
 			(when the CPU supports the "pdpe1gb" cpuinfo flag).
 
+	hugepages_movable=
+			[HW,IA-64,PPC,X86-64] Default value for the
+			hugepages_treat_as_movable sysctl (default is 1).
+			When 1 this will attempt to allocate hugepages from
+			ZONE_MOVABLE, if 0 it will attempt to allocate hugepages
+			from the non-movable zone created with the "kernelcore="
+			kernel parameter.
+
 	hvc_iucv=	[S390] Number of z/VM IUCV hypervisor console (HVC)
 			       terminal devices. Valid values: 0..8
 	hvc_iucv_allow=	[S390] Comma-separated list of z/VM user IDs.
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 9baf66a9ef4e..4c5755a1cf9f 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -267,6 +267,9 @@ or not. If set to non-zero, hugepages can be allocated from ZONE_MOVABLE.
 ZONE_MOVABLE is created when kernel boot parameter kernelcore= is specified,
 so this parameter has no effect if used without kernelcore=.
 
+The default value for this sysctl can also be set via the hugepages_movable=
+kernel boot parameter (to 0 or 1), default is 1.
+
 Hugepage migration is now available in some situations which depend on the
 architecture and/or the hugepage size. If a hugepage supports migration,
 allocation from ZONE_MOVABLE is always enabled for the hugepage regardless
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 424b0ef08a60..5d4efdadbd56 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -36,7 +36,7 @@
 #include <linux/userfaultfd_k.h>
 #include "internal.h"
 
-int hugepages_treat_as_movable;
+int hugepages_treat_as_movable = 1;
 
 int hugetlb_max_hstate __read_mostly;
 unsigned int default_hstate_idx;
@@ -926,7 +926,7 @@ static struct page *dequeue_huge_page_nodemask(struct hstate *h, gfp_t gfp_mask,
 /* Movability of hugepages depends on migration support. */
 static inline gfp_t htlb_alloc_mask(struct hstate *h)
 {
-	if (hugepages_treat_as_movable || hugepage_migration_supported(h))
+	if (hugepages_treat_as_movable && hugepage_migration_supported(h))
 		return GFP_HIGHUSER_MOVABLE;
 	else
 		return GFP_HIGHUSER;
@@ -2805,6 +2805,17 @@ static int __init hugetlb_init(void)
 }
 subsys_initcall(hugetlb_init);
 
+static int __init hugepages_movable(char *str)
+{
+	if (!strncmp(str, "0", 1))
+		hugepages_treat_as_movable = 0;
+	else if (!strncmp(str, "1", 1))
+		hugepages_treat_as_movable = 1;
+
+	return 1;
+}
+__setup("hugepages_movable=", hugepages_movable);
+
 /* Should be called on processing a hugepagesz=... option */
 void __init hugetlb_bad_size(void)
 {
diff --git a/mm/migrate.c b/mm/migrate.c
index 6954c1435833..23946d88e533 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1266,6 +1266,7 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	int page_was_mapped = 0;
 	struct page *new_hpage;
 	struct anon_vma *anon_vma = NULL;
+	bool zone_movable_present;
 
 	/*
 	 * Movability of hugepages depends on architectures and hugepage size.
@@ -1274,7 +1275,12 @@ static int unmap_and_move_huge_page(new_page_t get_new_page,
 	 * tables or check whether the hugepage is pmd-based or not before
 	 * kicking migration.
 	 */
-	if (!hugepage_migration_supported(page_hstate(hpage))) {
+	zone_movable_present = (NODE_DATA(page_to_nid(hpage))->node_zones[ZONE_MOVABLE].spanned_pages > 0);
+
+	if (!hugepage_migration_supported(page_hstate(hpage)) ||
+		zone_movable_present ?
+		!(zone_idx(page_zone(hpage)) == ZONE_MOVABLE) :
+			false) {
 		putback_active_hugepage(hpage);
 		return -ENOSYS;
 	}
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm,hugetlb,migration: don't migrate kernelcore hugepages
  2017-10-01 22:51 [PATCH] mm,hugetlb,migration: don't migrate kernelcore hugepages Alexandru Moise
@ 2017-10-02 12:54 ` Michal Hocko
  2017-10-02 14:06   ` Alexandru Moise
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2017-10-02 12:54 UTC (permalink / raw)
  To: Alexandru Moise
  Cc: corbet, paulmck, akpm, tglx, mingo, cdall, mchehab, zohar,
	marc.zyngier, rientjes, hannes, mike.kravetz, n-horiguchi,
	aneesh.kumar, punit.agrawal, aarcange, gerald.schaefer, jglisse,
	kirill.shutemov, will.deacon, linux-doc, linux-kernel, linux-mm

On Mon 02-10-17 00:51:11, Alexandru Moise wrote:
> This attempts to bring more flexibility to how hugepages are allocated
> by making it possible to decide whether we want the hugepages to be
> allocated from ZONE_MOVABLE or to the zone allocated by the "kernelcore="
> boot parameter for non-movable allocations.
> 
> A new boot parameter is introduced, "hugepages_movable=", this sets the
> default value for the "hugepages_treat_as_movable" sysctl. This allows
> us to determine the zone for hugepages allocated at boot time. It only
> affects 2M hugepages allocated at boot time for now because 1G
> hugepages are allocated much earlier in the boot process and ignore
> this sysctl completely.
> 
> The "hugepages_treat_as_movable" sysctl is also turned into a mandatory
> setting that all hugepage allocations at runtime must respect (both
> 2M and 1G sized hugepages). The default value is changed to "1" to
> preserve the existing behavior that if hugepage migration is supported,
> then the pages will be allocated from ZONE_MOVABLE.
> 
> Note however if not enough contiguous memory is present in ZONE_MOVABLE
> then the allocation will fallback to the non-movable zone and those
> pages will not be migratable.

This changelog doesn't explain _why_ we would need something like that.

> The implementation is a bit dirty so obviously I'm open to suggestions
> for a better way to implement this behavior, or comments whether the whole
> idea is fundamentally __wrong__.

To be honest I think this is just a wrong approach. hugepages_treat_as_movable
is quite questionable to be honest because it breaks the basic semantic
of the movable zone if the hugetlb pages are not really migratable which
should be the only criterion. Hugetlb pages are no different from other
migratable pages in that regards.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm,hugetlb,migration: don't migrate kernelcore hugepages
  2017-10-02 12:54 ` Michal Hocko
@ 2017-10-02 14:06   ` Alexandru Moise
  2017-10-02 14:27     ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Alexandru Moise @ 2017-10-02 14:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: corbet, paulmck, akpm, tglx, mingo, cdall, mchehab, zohar,
	marc.zyngier, rientjes, hannes, mike.kravetz, n-horiguchi,
	aneesh.kumar, punit.agrawal, aarcange, gerald.schaefer, jglisse,
	kirill.shutemov, will.deacon, linux-doc, linux-kernel, linux-mm

On Mon, Oct 02, 2017 at 02:54:32PM +0200, Michal Hocko wrote:
> On Mon 02-10-17 00:51:11, Alexandru Moise wrote:
> > This attempts to bring more flexibility to how hugepages are allocated
> > by making it possible to decide whether we want the hugepages to be
> > allocated from ZONE_MOVABLE or to the zone allocated by the "kernelcore="
> > boot parameter for non-movable allocations.
> > 
> > A new boot parameter is introduced, "hugepages_movable=", this sets the
> > default value for the "hugepages_treat_as_movable" sysctl. This allows
> > us to determine the zone for hugepages allocated at boot time. It only
> > affects 2M hugepages allocated at boot time for now because 1G
> > hugepages are allocated much earlier in the boot process and ignore
> > this sysctl completely.
> > 
> > The "hugepages_treat_as_movable" sysctl is also turned into a mandatory
> > setting that all hugepage allocations at runtime must respect (both
> > 2M and 1G sized hugepages). The default value is changed to "1" to
> > preserve the existing behavior that if hugepage migration is supported,
> > then the pages will be allocated from ZONE_MOVABLE.
> > 
> > Note however if not enough contiguous memory is present in ZONE_MOVABLE
> > then the allocation will fallback to the non-movable zone and those
> > pages will not be migratable.
> 
> This changelog doesn't explain _why_ we would need something like that.
> 

So people shouldn't be able to choose whether their hugepages should be
migratable or not? Maybe they consider some of their applications more
important than others.

Say:
You have a large number of correctable errors on a subpage of a compound
page. So you copy the contents of the page to another hugepage, break the
original page and offline the subpage. But maybe you'd rather that some of
your hugepages not be broken and moved because you're not that worried about
memory corruption, but more about availability.

Without this patch even if hugepages are in the non-movable zone, they move.

> > The implementation is a bit dirty so obviously I'm open to suggestions
> > for a better way to implement this behavior, or comments whether the whole
> > idea is fundamentally __wrong__.
> 
> To be honest I think this is just a wrong approach. hugepages_treat_as_movable
> is quite questionable to be honest because it breaks the basic semantic
> of the movable zone if the hugetlb pages are not really migratable which
> should be the only criterion. Hugetlb pages are no different from other
> migratable pages in that regards.

Shouldn't hugepages allocated to unmovable zone, by definition, not be able
to be migrated? With this patch, hugepages in the movable zone do move, but
hugepages in the non-movable zone don't. Or am I misunderstanding the semantics
completely?

../Alex
> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm,hugetlb,migration: don't migrate kernelcore hugepages
  2017-10-02 14:06   ` Alexandru Moise
@ 2017-10-02 14:27     ` Michal Hocko
  2017-10-02 15:06       ` Alexandru Moise
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2017-10-02 14:27 UTC (permalink / raw)
  To: Alexandru Moise
  Cc: corbet, paulmck, akpm, tglx, mingo, cdall, mchehab, zohar,
	marc.zyngier, rientjes, hannes, mike.kravetz, n-horiguchi,
	aneesh.kumar, punit.agrawal, aarcange, gerald.schaefer, jglisse,
	kirill.shutemov, will.deacon, linux-doc, linux-kernel, linux-mm

On Mon 02-10-17 16:06:33, Alexandru Moise wrote:
> On Mon, Oct 02, 2017 at 02:54:32PM +0200, Michal Hocko wrote:
> > On Mon 02-10-17 00:51:11, Alexandru Moise wrote:
> > > This attempts to bring more flexibility to how hugepages are allocated
> > > by making it possible to decide whether we want the hugepages to be
> > > allocated from ZONE_MOVABLE or to the zone allocated by the "kernelcore="
> > > boot parameter for non-movable allocations.
> > > 
> > > A new boot parameter is introduced, "hugepages_movable=", this sets the
> > > default value for the "hugepages_treat_as_movable" sysctl. This allows
> > > us to determine the zone for hugepages allocated at boot time. It only
> > > affects 2M hugepages allocated at boot time for now because 1G
> > > hugepages are allocated much earlier in the boot process and ignore
> > > this sysctl completely.
> > > 
> > > The "hugepages_treat_as_movable" sysctl is also turned into a mandatory
> > > setting that all hugepage allocations at runtime must respect (both
> > > 2M and 1G sized hugepages). The default value is changed to "1" to
> > > preserve the existing behavior that if hugepage migration is supported,
> > > then the pages will be allocated from ZONE_MOVABLE.
> > > 
> > > Note however if not enough contiguous memory is present in ZONE_MOVABLE
> > > then the allocation will fallback to the non-movable zone and those
> > > pages will not be migratable.
> > 
> > This changelog doesn't explain _why_ we would need something like that.
> > 
> 
> So people shouldn't be able to choose whether their hugepages should be
> migratable or not?

How are hugetlb pages any different from THP wrt. migrateability POV? Or
any other mapped memory to the userspace in general?

> Maybe they consider some of their applications more important than
> others.

I do not understand this part.

> Say:
> You have a large number of correctable errors on a subpage of a compound
> page. So you copy the contents of the page to another hugepage, break the
> original page and offline the subpage. 

I suspect you have HWPoisoning in mind right?

> But maybe you'd rather that some of
> your hugepages not be broken and moved because you're not that worried about
> memory corruption, but more about availability.

Could you be more specific please?

> Without this patch even if hugepages are in the non-movable zone, they move.

which is ok. This is very same with any other movable allocations.
 
> > > The implementation is a bit dirty so obviously I'm open to suggestions
> > > for a better way to implement this behavior, or comments whether the whole
> > > idea is fundamentally __wrong__.
> > 
> > To be honest I think this is just a wrong approach. hugepages_treat_as_movable
> > is quite questionable to be honest because it breaks the basic semantic
> > of the movable zone if the hugetlb pages are not really migratable which
> > should be the only criterion. Hugetlb pages are no different from other
> > migratable pages in that regards.
> 
> Shouldn't hugepages allocated to unmovable zone, by definition, not be able
> to be migrated? With this patch, hugepages in the movable zone do move, but
> hugepages in the non-movable zone don't. Or am I misunderstanding the semantics
> completely?

yes. movable zone is only about a guarantee to move memory around.
Movable allocations are still allowed to use kernel zones (aka
non-movable). The main reason for the movable zone these days is memory
hotplug which needs a semi-guarantee that the memory used can be
migrated elsewhere to free up the offlined memory.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm,hugetlb,migration: don't migrate kernelcore hugepages
  2017-10-02 14:27     ` Michal Hocko
@ 2017-10-02 15:06       ` Alexandru Moise
  2017-10-02 16:15         ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Alexandru Moise @ 2017-10-02 15:06 UTC (permalink / raw)
  To: Michal Hocko
  Cc: corbet, paulmck, akpm, tglx, mingo, cdall, mchehab, zohar,
	marc.zyngier, rientjes, hannes, mike.kravetz, n-horiguchi,
	aneesh.kumar, punit.agrawal, aarcange, gerald.schaefer, jglisse,
	kirill.shutemov, will.deacon, linux-doc, linux-kernel, linux-mm

On Mon, Oct 02, 2017 at 04:27:17PM +0200, Michal Hocko wrote:
> On Mon 02-10-17 16:06:33, Alexandru Moise wrote:
> > On Mon, Oct 02, 2017 at 02:54:32PM +0200, Michal Hocko wrote:
> > > On Mon 02-10-17 00:51:11, Alexandru Moise wrote:
> > > > This attempts to bring more flexibility to how hugepages are allocated
> > > > by making it possible to decide whether we want the hugepages to be
> > > > allocated from ZONE_MOVABLE or to the zone allocated by the "kernelcore="
> > > > boot parameter for non-movable allocations.
> > > > 
> > > > A new boot parameter is introduced, "hugepages_movable=", this sets the
> > > > default value for the "hugepages_treat_as_movable" sysctl. This allows
> > > > us to determine the zone for hugepages allocated at boot time. It only
> > > > affects 2M hugepages allocated at boot time for now because 1G
> > > > hugepages are allocated much earlier in the boot process and ignore
> > > > this sysctl completely.
> > > > 
> > > > The "hugepages_treat_as_movable" sysctl is also turned into a mandatory
> > > > setting that all hugepage allocations at runtime must respect (both
> > > > 2M and 1G sized hugepages). The default value is changed to "1" to
> > > > preserve the existing behavior that if hugepage migration is supported,
> > > > then the pages will be allocated from ZONE_MOVABLE.
> > > > 
> > > > Note however if not enough contiguous memory is present in ZONE_MOVABLE
> > > > then the allocation will fallback to the non-movable zone and those
> > > > pages will not be migratable.
> > > 
> > > This changelog doesn't explain _why_ we would need something like that.
> > > 
> > 
> > So people shouldn't be able to choose whether their hugepages should be
> > migratable or not?
> 
> How are hugetlb pages any different from THP wrt. migrateability POV? Or
> any other mapped memory to the userspace in general?

THP shares more with regular userspace mapped memory than with hugetlbfs pages.
They have separate codepaths in migrate_pages(). And no one ever sets the movable
flag on a hugetlbfs mapping, so even though __PageMovable(hpage) on a hugetlbfs
page returns false, it will still move.

> 
> > Maybe they consider some of their applications more important than
> > others.
> 
> I do not understand this part.
> 
> > Say:
> > You have a large number of correctable errors on a subpage of a compound
> > page. So you copy the contents of the page to another hugepage, break the
> > original page and offline the subpage. 
> 
> I suspect you have HWPoisoning in mind right?

No, rather soft offlining. 

> 
> > But maybe you'd rather that some of
> > your hugepages not be broken and moved because you're not that worried about
> > memory corruption, but more about availability.
> 
> Could you be more specific please?

You can have a platform with reliable DIMM modules and a platform with less reliable
DIMM modules. So you would prefer to inhibit hugepage migration on the platform with
reliable DIMM modules that you know will behave ok even under a high number of 
correctable memory errors. tools like mcelog however are not hugepage aware and
cannot be told "if this PFN is part of a hugepage, don't try to soft offline it",
rather deciding which PFNs should be unmovable should be done in the kernel,
but it should still be controllable by the administrator.

For hugetlbfs pages in particular, this behavior is not present, without this patch.

> 
> > Without this patch even if hugepages are in the non-movable zone, they move.
> 
> which is ok. This is very same with any other movable allocations.

So you can have movable pages in the non-movable kernel zone?

>  
> > > > The implementation is a bit dirty so obviously I'm open to suggestions
> > > > for a better way to implement this behavior, or comments whether the whole
> > > > idea is fundamentally __wrong__.
> > > 
> > > To be honest I think this is just a wrong approach. hugepages_treat_as_movable
> > > is quite questionable to be honest because it breaks the basic semantic
> > > of the movable zone if the hugetlb pages are not really migratable which
> > > should be the only criterion. Hugetlb pages are no different from other
> > > migratable pages in that regards.
> > 
> > Shouldn't hugepages allocated to unmovable zone, by definition, not be able
> > to be migrated? With this patch, hugepages in the movable zone do move, but
> > hugepages in the non-movable zone don't. Or am I misunderstanding the semantics
> > completely?
> 
> yes. movable zone is only about a guarantee to move memory around.
> Movable allocations are still allowed to use kernel zones (aka
> non-movable). The main reason for the movable zone these days is memory
> hotplug which needs a semi-guarantee that the memory used can be
> migrated elsewhere to free up the offlined memory.

But isn't kernel-zone memory guaranteed not to migrate?

I agree that movable allocations are allowed to fallback to kernel zones.
i.e. This is behavior is correct:
Page A is in ZONE_MOVABLE, page B is in kernel zone.
Page A gets soft-offlined, the contents are moved to page B.

This behavior is not correct:
Page C is in kernel zone, page D is also in kernel zone.
Page C gets soft offlined, contents of page C get moved to page D.

With hugepages, there is no check for whereto the migration goes because
the pages are pre-allocated and simply dequeued from the hstate freelist.

Thus hugepages will end up being unreserved and moved to a different
reserved hugepage, and the administrator has no control over this behavior,
even if they're kernel zone pages.

> 
> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm,hugetlb,migration: don't migrate kernelcore hugepages
  2017-10-02 15:06       ` Alexandru Moise
@ 2017-10-02 16:15         ` Michal Hocko
  2017-10-03  5:42           ` Alexandru Moise
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2017-10-02 16:15 UTC (permalink / raw)
  To: Alexandru Moise
  Cc: corbet, paulmck, akpm, tglx, mingo, cdall, mchehab, zohar,
	marc.zyngier, rientjes, hannes, mike.kravetz, n-horiguchi,
	aneesh.kumar, punit.agrawal, aarcange, gerald.schaefer, jglisse,
	kirill.shutemov, will.deacon, linux-doc, linux-kernel, linux-mm

On Mon 02-10-17 17:06:38, Alexandru Moise wrote:
> On Mon, Oct 02, 2017 at 04:27:17PM +0200, Michal Hocko wrote:
> > On Mon 02-10-17 16:06:33, Alexandru Moise wrote:
> > > On Mon, Oct 02, 2017 at 02:54:32PM +0200, Michal Hocko wrote:
> > > > On Mon 02-10-17 00:51:11, Alexandru Moise wrote:
> > > > > This attempts to bring more flexibility to how hugepages are allocated
> > > > > by making it possible to decide whether we want the hugepages to be
> > > > > allocated from ZONE_MOVABLE or to the zone allocated by the "kernelcore="
> > > > > boot parameter for non-movable allocations.
> > > > > 
> > > > > A new boot parameter is introduced, "hugepages_movable=", this sets the
> > > > > default value for the "hugepages_treat_as_movable" sysctl. This allows
> > > > > us to determine the zone for hugepages allocated at boot time. It only
> > > > > affects 2M hugepages allocated at boot time for now because 1G
> > > > > hugepages are allocated much earlier in the boot process and ignore
> > > > > this sysctl completely.
> > > > > 
> > > > > The "hugepages_treat_as_movable" sysctl is also turned into a mandatory
> > > > > setting that all hugepage allocations at runtime must respect (both
> > > > > 2M and 1G sized hugepages). The default value is changed to "1" to
> > > > > preserve the existing behavior that if hugepage migration is supported,
> > > > > then the pages will be allocated from ZONE_MOVABLE.
> > > > > 
> > > > > Note however if not enough contiguous memory is present in ZONE_MOVABLE
> > > > > then the allocation will fallback to the non-movable zone and those
> > > > > pages will not be migratable.
> > > > 
> > > > This changelog doesn't explain _why_ we would need something like that.
> > > > 
> > > 
> > > So people shouldn't be able to choose whether their hugepages should be
> > > migratable or not?
> > 
> > How are hugetlb pages any different from THP wrt. migrateability POV? Or
> > any other mapped memory to the userspace in general?
> 
> THP shares more with regular userspace mapped memory than with hugetlbfs pages.
> They have separate codepaths in migrate_pages().

That is a mere implementation detail. You are right that THP shares more
with regular userspace memory because it is transparent from the
configuration POV but that has nothing to do with page migration AFAICS.

> And no one ever sets the movable
> flag on a hugetlbfs mapping, so even though __PageMovable(hpage) on a hugetlbfs
> page returns false, it will still move.

__PageMovable is a completely unrelated thing. It is for pages which are
!LRU but still movable.

> 
> > 
> > > Maybe they consider some of their applications more important than
> > > others.
> > 
> > I do not understand this part.
> > 
> > > Say:
> > > You have a large number of correctable errors on a subpage of a compound
> > > page. So you copy the contents of the page to another hugepage, break the
> > > original page and offline the subpage. 
> > 
> > I suspect you have HWPoisoning in mind right?
> 
> No, rather soft offlining. 

I thought this is the same thing.

> > > But maybe you'd rather that some of
> > > your hugepages not be broken and moved because you're not that worried about
> > > memory corruption, but more about availability.
> > 
> > Could you be more specific please?
> 
> You can have a platform with reliable DIMM modules and a platform with less reliable
> DIMM modules. So you would prefer to inhibit hugepage migration on the platform with
> reliable DIMM modules that you know will behave ok even under a high number of 
> correctable memory errors. tools like mcelog however are not hugepage aware and
> cannot be told "if this PFN is part of a hugepage, don't try to soft offline it",
> rather deciding which PFNs should be unmovable should be done in the kernel,
> but it should still be controllable by the administrator.

This sounds like a userspace policy that should be handled outside of
the kernel.

> For hugetlbfs pages in particular, this behavior is not present, without this patch.
> 
> > 
> > > Without this patch even if hugepages are in the non-movable zone, they move.
> > 
> > which is ok. This is very same with any other movable allocations.
> 
> So you can have movable pages in the non-movable kernel zone?

yes. Most configuration even do not have any movable zone unless
explicitly configured.

> > > > > The implementation is a bit dirty so obviously I'm open to suggestions
> > > > > for a better way to implement this behavior, or comments whether the whole
> > > > > idea is fundamentally __wrong__.
> > > > 
> > > > To be honest I think this is just a wrong approach. hugepages_treat_as_movable
> > > > is quite questionable to be honest because it breaks the basic semantic
> > > > of the movable zone if the hugetlb pages are not really migratable which
> > > > should be the only criterion. Hugetlb pages are no different from other
> > > > migratable pages in that regards.
> > > 
> > > Shouldn't hugepages allocated to unmovable zone, by definition, not be able
> > > to be migrated? With this patch, hugepages in the movable zone do move, but
> > > hugepages in the non-movable zone don't. Or am I misunderstanding the semantics
> > > completely?
> > 
> > yes. movable zone is only about a guarantee to move memory around.
> > Movable allocations are still allowed to use kernel zones (aka
> > non-movable). The main reason for the movable zone these days is memory
> > hotplug which needs a semi-guarantee that the memory used can be
> > migrated elsewhere to free up the offlined memory.
> 
> But isn't kernel-zone memory guaranteed not to migrate?

No.

> I agree that movable allocations are allowed to fallback to kernel zones.
> i.e. This is behavior is correct:
> Page A is in ZONE_MOVABLE, page B is in kernel zone.
> Page A gets soft-offlined, the contents are moved to page B.
> 
> This behavior is not correct:
> Page C is in kernel zone, page D is also in kernel zone.
> Page C gets soft offlined, contents of page C get moved to page D.

Why is this incorrect?

> With hugepages, there is no check for whereto the migration goes because
> the pages are pre-allocated and simply dequeued from the hstate freelist.

true

> Thus hugepages will end up being unreserved and moved to a different
> reserved hugepage, and the administrator has no control over this behavior,
> even if they're kernel zone pages.

I really fail to see why kernel vs. movable zones play any role here.
Zones should be mostly an implementation detail which userspace
shouldn't really care about.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm,hugetlb,migration: don't migrate kernelcore hugepages
  2017-10-02 16:15         ` Michal Hocko
@ 2017-10-03  5:42           ` Alexandru Moise
  2017-10-03  7:10             ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Alexandru Moise @ 2017-10-03  5:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: corbet, paulmck, akpm, tglx, mingo, cdall, mchehab, zohar,
	marc.zyngier, rientjes, hannes, mike.kravetz, n-horiguchi,
	aneesh.kumar, punit.agrawal, aarcange, gerald.schaefer, jglisse,
	kirill.shutemov, will.deacon, linux-doc, linux-kernel, linux-mm

On Mon, Oct 02, 2017 at 06:15:00PM +0200, Michal Hocko wrote:
> On Mon 02-10-17 17:06:38, Alexandru Moise wrote:
> > On Mon, Oct 02, 2017 at 04:27:17PM +0200, Michal Hocko wrote:
> > > On Mon 02-10-17 16:06:33, Alexandru Moise wrote:
> > > > On Mon, Oct 02, 2017 at 02:54:32PM +0200, Michal Hocko wrote:
> > > > > On Mon 02-10-17 00:51:11, Alexandru Moise wrote:
> > > > > > This attempts to bring more flexibility to how hugepages are allocated
> > > > > > by making it possible to decide whether we want the hugepages to be
> > > > > > allocated from ZONE_MOVABLE or to the zone allocated by the "kernelcore="
> > > > > > boot parameter for non-movable allocations.
> > > > > > 
> > > > > > A new boot parameter is introduced, "hugepages_movable=", this sets the
> > > > > > default value for the "hugepages_treat_as_movable" sysctl. This allows
> > > > > > us to determine the zone for hugepages allocated at boot time. It only
> > > > > > affects 2M hugepages allocated at boot time for now because 1G
> > > > > > hugepages are allocated much earlier in the boot process and ignore
> > > > > > this sysctl completely.
> > > > > > 
> > > > > > The "hugepages_treat_as_movable" sysctl is also turned into a mandatory
> > > > > > setting that all hugepage allocations at runtime must respect (both
> > > > > > 2M and 1G sized hugepages). The default value is changed to "1" to
> > > > > > preserve the existing behavior that if hugepage migration is supported,
> > > > > > then the pages will be allocated from ZONE_MOVABLE.
> > > > > > 
> > > > > > Note however if not enough contiguous memory is present in ZONE_MOVABLE
> > > > > > then the allocation will fallback to the non-movable zone and those
> > > > > > pages will not be migratable.
> > > > > 
> > > > > This changelog doesn't explain _why_ we would need something like that.
> > > > > 
> > > > 
> > > > So people shouldn't be able to choose whether their hugepages should be
> > > > migratable or not?
> > > 
> > > How are hugetlb pages any different from THP wrt. migrateability POV? Or
> > > any other mapped memory to the userspace in general?
> > 
> > THP shares more with regular userspace mapped memory than with hugetlbfs pages.
> > They have separate codepaths in migrate_pages().
> 
> That is a mere implementation detail. You are right that THP shares more
> with regular userspace memory because it is transparent from the
> configuration POV but that has nothing to do with page migration AFAICS.
> 
> > And no one ever sets the movable
> > flag on a hugetlbfs mapping, so even though __PageMovable(hpage) on a hugetlbfs
> > page returns false, it will still move.
> 
> __PageMovable is a completely unrelated thing. It is for pages which are
> !LRU but still movable.
> 
> > 
> > > 
> > > > Maybe they consider some of their applications more important than
> > > > others.
> > > 
> > > I do not understand this part.
> > > 
> > > > Say:
> > > > You have a large number of correctable errors on a subpage of a compound
> > > > page. So you copy the contents of the page to another hugepage, break the
> > > > original page and offline the subpage. 
> > > 
> > > I suspect you have HWPoisoning in mind right?
> > 
> > No, rather soft offlining. 
> 
> I thought this is the same thing.
> 
> > > > But maybe you'd rather that some of
> > > > your hugepages not be broken and moved because you're not that worried about
> > > > memory corruption, but more about availability.
> > > 
> > > Could you be more specific please?
> > 
> > You can have a platform with reliable DIMM modules and a platform with less reliable
> > DIMM modules. So you would prefer to inhibit hugepage migration on the platform with
> > reliable DIMM modules that you know will behave ok even under a high number of 
> > correctable memory errors. tools like mcelog however are not hugepage aware and
> > cannot be told "if this PFN is part of a hugepage, don't try to soft offline it",
> > rather deciding which PFNs should be unmovable should be done in the kernel,
> > but it should still be controllable by the administrator.
> 
> This sounds like a userspace policy that should be handled outside of
> the kernel.
> 
> > For hugetlbfs pages in particular, this behavior is not present, without this patch.
> > 
> > > 
> > > > Without this patch even if hugepages are in the non-movable zone, they move.
> > > 
> > > which is ok. This is very same with any other movable allocations.
> > 
> > So you can have movable pages in the non-movable kernel zone?
> 
> yes. Most configuration even do not have any movable zone unless
> explicitly configured.
> 
> > > > > > The implementation is a bit dirty so obviously I'm open to suggestions
> > > > > > for a better way to implement this behavior, or comments whether the whole
> > > > > > idea is fundamentally __wrong__.
> > > > > 
> > > > > To be honest I think this is just a wrong approach. hugepages_treat_as_movable
> > > > > is quite questionable to be honest because it breaks the basic semantic
> > > > > of the movable zone if the hugetlb pages are not really migratable which
> > > > > should be the only criterion. Hugetlb pages are no different from other
> > > > > migratable pages in that regards.
> > > > 
> > > > Shouldn't hugepages allocated to unmovable zone, by definition, not be able
> > > > to be migrated? With this patch, hugepages in the movable zone do move, but
> > > > hugepages in the non-movable zone don't. Or am I misunderstanding the semantics
> > > > completely?
> > > 
> > > yes. movable zone is only about a guarantee to move memory around.
> > > Movable allocations are still allowed to use kernel zones (aka
> > > non-movable). The main reason for the movable zone these days is memory
> > > hotplug which needs a semi-guarantee that the memory used can be
> > > migrated elsewhere to free up the offlined memory.
> > 
> > But isn't kernel-zone memory guaranteed not to migrate?
> 
> No.
> 
> > I agree that movable allocations are allowed to fallback to kernel zones.
> > i.e. This is behavior is correct:
> > Page A is in ZONE_MOVABLE, page B is in kernel zone.
> > Page A gets soft-offlined, the contents are moved to page B.
> > 
> > This behavior is not correct:
> > Page C is in kernel zone, page D is also in kernel zone.
> > Page C gets soft offlined, contents of page C get moved to page D.
> 
> Why is this incorrect?
> 
> > With hugepages, there is no check for whereto the migration goes because
> > the pages are pre-allocated and simply dequeued from the hstate freelist.
> 
> true
> 
> > Thus hugepages will end up being unreserved and moved to a different
> > reserved hugepage, and the administrator has no control over this behavior,
> > even if they're kernel zone pages.
> 
> I really fail to see why kernel vs. movable zones play any role here.
> Zones should be mostly an implementation detail which userspace
> shouldn't really care about.

Ok, the whole zone approach is a bad idea. Do you think that there's
any value at all to trying to make hugepages un-movable at all? Should
the hugepages_treat_as_movable sysctl die and just make hugepages movable
by default?

../Alex

> 
> -- 
> Michal Hocko
> SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm,hugetlb,migration: don't migrate kernelcore hugepages
  2017-10-03  5:42           ` Alexandru Moise
@ 2017-10-03  7:10             ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2017-10-03  7:10 UTC (permalink / raw)
  To: Alexandru Moise
  Cc: corbet, paulmck, akpm, tglx, mingo, cdall, mchehab, zohar,
	marc.zyngier, rientjes, hannes, mike.kravetz, n-horiguchi,
	aneesh.kumar, punit.agrawal, aarcange, gerald.schaefer, jglisse,
	kirill.shutemov, will.deacon, linux-doc, linux-kernel, linux-mm

On Tue 03-10-17 07:42:25, Alexandru Moise wrote:
> On Mon, Oct 02, 2017 at 06:15:00PM +0200, Michal Hocko wrote:
[...]
> > I really fail to see why kernel vs. movable zones play any role here.
> > Zones should be mostly an implementation detail which userspace
> > shouldn't really care about.
> 
> Ok, the whole zone approach is a bad idea. Do you think that there's
> any value at all to trying to make hugepages un-movable at all?

I am not aware of any usecase, to be honest.

> Should
> the hugepages_treat_as_movable sysctl die and just make hugepages movable
> by default?

I think that hugepages_treat_as_movable is just a historical relict from
the time when hugetlb pages were not movable but the main purpose of
the movable zone was different back at the time. Just to clarifiy, the
original intention of the zone was to prevent memory fragmentation and
as hugetlb pages are not fragmenting memory because they are long lived
and contiguous, it was acceptable to use the zone. The purpose of the
zone has changed towards a migratability guarantee since then but the
knob has stayed behind. I think we should just remove it.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-10-03  7:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-01 22:51 [PATCH] mm,hugetlb,migration: don't migrate kernelcore hugepages Alexandru Moise
2017-10-02 12:54 ` Michal Hocko
2017-10-02 14:06   ` Alexandru Moise
2017-10-02 14:27     ` Michal Hocko
2017-10-02 15:06       ` Alexandru Moise
2017-10-02 16:15         ` Michal Hocko
2017-10-03  5:42           ` Alexandru Moise
2017-10-03  7:10             ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).