linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] mm/memory_hotplug: make it possible to offline blocks with reserved pages
@ 2017-11-08 13:01 Vitaly Kuznetsov
  2017-11-08 14:25 ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Vitaly Kuznetsov @ 2017-11-08 13:01 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, Michal Hocko, Vlastimil Babka,
	Mel Gorman, YASUAKI ISHIMATSU, Hillf Danton, Johannes Weiner,
	K. Y. Srinivasan, Stephen Hemminger, Alex Ng

Hyper-V balloon driver needs to hotplug memory in smaller chunks and to
workaround Linux's 128Mb allignment requirement so it does a trick: partly
populated 128Mb blocks are added and then a custom online_page_callback
hook checks if the particular page is 'backed' during onlining, in case it
is not backed it is left in Reserved state. When the host adds more pages
to the block we bring them online from the driver (see
hv_bring_pgs_online()/hv_page_online_one() in drivers/hv/hv_balloon.c).
Eventually the whole block becomes fully populated and we hotplug the next
128Mb. This all works for quite some time already.

What is not working is offlining of such partly populated blocks:
check_pages_isolated_cb() callback will not pass with a sinle Reserved page
and we end up with -EBUSY. However, there's no reason to fail offlining in
this case: these pages are already offline, we may just skip them. Add the
appropriate workaround to test_pages_isolated().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
RFC part:
- Other usages of Reserved pages making offlining blocks with them a no-go
  may exist.
- I'm not exactly sure that adding another parameter to
  test_pages_isolated() is a good idea, we may go with a single flag for
  both Reserved and HwPoisoned pages: we have just two call sites and they
  have opposite needs (true, true in one case and false, false in the
  other).
---
 include/linux/page-isolation.h |  2 +-
 mm/memory_hotplug.c            |  2 +-
 mm/page_alloc.c                |  8 +++++++-
 mm/page_isolation.c            | 11 ++++++++---
 4 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
index 05a04e603686..daba12a59574 100644
--- a/include/linux/page-isolation.h
+++ b/include/linux/page-isolation.h
@@ -61,7 +61,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
  * Test all pages in [start_pfn, end_pfn) are isolated or not.
  */
 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
-			bool skip_hwpoisoned_pages);
+			bool skip_hwpoisoned_pages, bool skip_reserved_pages);
 
 struct page *alloc_migrate_target(struct page *page, unsigned long private,
 				int **resultp);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d4b5f29906b9..5b7d1482804f 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1467,7 +1467,7 @@ check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages,
 {
 	int ret;
 	long offlined = *(long *)data;
-	ret = test_pages_isolated(start_pfn, start_pfn + nr_pages, true);
+	ret = test_pages_isolated(start_pfn, start_pfn + nr_pages, true, true);
 	offlined = nr_pages;
 	if (!ret)
 		*(long *)data += offlined;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 77e4d3c5c57b..b475928c476c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7632,7 +7632,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
 	}
 
 	/* Make sure the range is really isolated. */
-	if (test_pages_isolated(outer_start, end, false)) {
+	if (test_pages_isolated(outer_start, end, false, false)) {
 		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
 			__func__, outer_start, end);
 		ret = -EBUSY;
@@ -7746,6 +7746,12 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
 			continue;
 		}
 
+		/* Some pages might never be online, skip them */
+		if (unlikely(PageReserved(page))) {
+			pfn++;
+			continue;
+		}
+
 		BUG_ON(page_count(page));
 		BUG_ON(!PageBuddy(page));
 		order = page_order(page);
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 44f213935bf6..fd9c18e00b92 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -233,7 +233,8 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
  */
 static unsigned long
 __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
-				  bool skip_hwpoisoned_pages)
+				  bool skip_hwpoisoned_pages,
+				  bool skip_reserved_pages)
 {
 	struct page *page;
 
@@ -253,6 +254,9 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 		else if (skip_hwpoisoned_pages && PageHWPoison(page))
 			/* A HWPoisoned page cannot be also PageBuddy */
 			pfn++;
+		else if (skip_reserved_pages && PageReserved(page))
+			/* Skipping Reserved pages */
+			pfn++;
 		else
 			break;
 	}
@@ -262,7 +266,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
 
 /* Caller should ensure that requested range is in a single zone */
 int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
-			bool skip_hwpoisoned_pages)
+			bool skip_hwpoisoned_pages, bool skip_reserved_pages)
 {
 	unsigned long pfn, flags;
 	struct page *page;
@@ -285,7 +289,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
 	zone = page_zone(page);
 	spin_lock_irqsave(&zone->lock, flags);
 	pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
-						skip_hwpoisoned_pages);
+						skip_hwpoisoned_pages,
+						skip_reserved_pages);
 	spin_unlock_irqrestore(&zone->lock, flags);
 
 	trace_test_pages_isolated(start_pfn, end_pfn, pfn);
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] mm/memory_hotplug: make it possible to offline blocks with reserved pages
  2017-11-08 13:01 [PATCH RFC] mm/memory_hotplug: make it possible to offline blocks with reserved pages Vitaly Kuznetsov
@ 2017-11-08 14:25 ` Michal Hocko
  2017-11-08 15:39   ` Vitaly Kuznetsov
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2017-11-08 14:25 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka,
	Mel Gorman, YASUAKI ISHIMATSU, Hillf Danton, Johannes Weiner,
	K. Y. Srinivasan, Stephen Hemminger, Alex Ng

On Wed 08-11-17 14:01:55, Vitaly Kuznetsov wrote:
> Hyper-V balloon driver needs to hotplug memory in smaller chunks and to
> workaround Linux's 128Mb allignment requirement so it does a trick: partly
> populated 128Mb blocks are added and then a custom online_page_callback
> hook checks if the particular page is 'backed' during onlining, in case it
> is not backed it is left in Reserved state. When the host adds more pages
> to the block we bring them online from the driver (see
> hv_bring_pgs_online()/hv_page_online_one() in drivers/hv/hv_balloon.c).
> Eventually the whole block becomes fully populated and we hotplug the next
> 128Mb. This all works for quite some time already.

Why does HyperV needs to workaround the section size limit in the first
place? We are allocation memmap for the whole section anyway so it won't
save any memory. So the whole thing sounds rather dubious to me.

> What is not working is offlining of such partly populated blocks:
> check_pages_isolated_cb() callback will not pass with a sinle Reserved page
> and we end up with -EBUSY. However, there's no reason to fail offlining in
> this case: these pages are already offline, we may just skip them. Add the
> appropriate workaround to test_pages_isolated().

How do you recognize pages reserved by other users. You cannot simply
remove them, it would just blow up.

> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> RFC part:
> - Other usages of Reserved pages making offlining blocks with them a no-go
>   may exist.
> - I'm not exactly sure that adding another parameter to
>   test_pages_isolated() is a good idea, we may go with a single flag for
>   both Reserved and HwPoisoned pages: we have just two call sites and they
>   have opposite needs (true, true in one case and false, false in the
>   other).
> ---
>  include/linux/page-isolation.h |  2 +-
>  mm/memory_hotplug.c            |  2 +-
>  mm/page_alloc.c                |  8 +++++++-
>  mm/page_isolation.c            | 11 ++++++++---
>  4 files changed, 17 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
> index 05a04e603686..daba12a59574 100644
> --- a/include/linux/page-isolation.h
> +++ b/include/linux/page-isolation.h
> @@ -61,7 +61,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>   * Test all pages in [start_pfn, end_pfn) are isolated or not.
>   */
>  int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
> -			bool skip_hwpoisoned_pages);
> +			bool skip_hwpoisoned_pages, bool skip_reserved_pages);
>  
>  struct page *alloc_migrate_target(struct page *page, unsigned long private,
>  				int **resultp);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index d4b5f29906b9..5b7d1482804f 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1467,7 +1467,7 @@ check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages,
>  {
>  	int ret;
>  	long offlined = *(long *)data;
> -	ret = test_pages_isolated(start_pfn, start_pfn + nr_pages, true);
> +	ret = test_pages_isolated(start_pfn, start_pfn + nr_pages, true, true);
>  	offlined = nr_pages;
>  	if (!ret)
>  		*(long *)data += offlined;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 77e4d3c5c57b..b475928c476c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -7632,7 +7632,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>  	}
>  
>  	/* Make sure the range is really isolated. */
> -	if (test_pages_isolated(outer_start, end, false)) {
> +	if (test_pages_isolated(outer_start, end, false, false)) {
>  		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
>  			__func__, outer_start, end);
>  		ret = -EBUSY;
> @@ -7746,6 +7746,12 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>  			continue;
>  		}
>  
> +		/* Some pages might never be online, skip them */
> +		if (unlikely(PageReserved(page))) {
> +			pfn++;
> +			continue;
> +		}
> +
>  		BUG_ON(page_count(page));
>  		BUG_ON(!PageBuddy(page));
>  		order = page_order(page);
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index 44f213935bf6..fd9c18e00b92 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -233,7 +233,8 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>   */
>  static unsigned long
>  __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
> -				  bool skip_hwpoisoned_pages)
> +				  bool skip_hwpoisoned_pages,
> +				  bool skip_reserved_pages)
>  {
>  	struct page *page;
>  
> @@ -253,6 +254,9 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>  		else if (skip_hwpoisoned_pages && PageHWPoison(page))
>  			/* A HWPoisoned page cannot be also PageBuddy */
>  			pfn++;
> +		else if (skip_reserved_pages && PageReserved(page))
> +			/* Skipping Reserved pages */
> +			pfn++;
>  		else
>  			break;
>  	}
> @@ -262,7 +266,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>  
>  /* Caller should ensure that requested range is in a single zone */
>  int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
> -			bool skip_hwpoisoned_pages)
> +			bool skip_hwpoisoned_pages, bool skip_reserved_pages)
>  {
>  	unsigned long pfn, flags;
>  	struct page *page;
> @@ -285,7 +289,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>  	zone = page_zone(page);
>  	spin_lock_irqsave(&zone->lock, flags);
>  	pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
> -						skip_hwpoisoned_pages);
> +						skip_hwpoisoned_pages,
> +						skip_reserved_pages);
>  	spin_unlock_irqrestore(&zone->lock, flags);
>  
>  	trace_test_pages_isolated(start_pfn, end_pfn, pfn);
> -- 
> 2.13.6
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] mm/memory_hotplug: make it possible to offline blocks with reserved pages
  2017-11-08 14:25 ` Michal Hocko
@ 2017-11-08 15:39   ` Vitaly Kuznetsov
  2017-11-08 15:57     ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Vitaly Kuznetsov @ 2017-11-08 15:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka,
	Mel Gorman, YASUAKI ISHIMATSU, Hillf Danton, Johannes Weiner,
	K. Y. Srinivasan, Stephen Hemminger, Alex Ng

Michal Hocko <mhocko@kernel.org> writes:

> On Wed 08-11-17 14:01:55, Vitaly Kuznetsov wrote:
>> Hyper-V balloon driver needs to hotplug memory in smaller chunks and to
>> workaround Linux's 128Mb allignment requirement so it does a trick: partly
>> populated 128Mb blocks are added and then a custom online_page_callback
>> hook checks if the particular page is 'backed' during onlining, in case it
>> is not backed it is left in Reserved state. When the host adds more pages
>> to the block we bring them online from the driver (see
>> hv_bring_pgs_online()/hv_page_online_one() in drivers/hv/hv_balloon.c).
>> Eventually the whole block becomes fully populated and we hotplug the next
>> 128Mb. This all works for quite some time already.
>
> Why does HyperV needs to workaround the section size limit in the first
> place? We are allocation memmap for the whole section anyway so it won't
> save any memory. So the whole thing sounds rather dubious to me.
>

Memory hotplug requirements in Windows are different, they have 2Mb
granularity, not 128Mb like we have in Linux x86.

Imagine there's a request to add 32Mb of memory comming from the
Hyper-V host. What can we do? Don't add anything at all and wait till
we're suggested to add > 128Mb and then add a section or the current
approach.

>> What is not working is offlining of such partly populated blocks:
>> check_pages_isolated_cb() callback will not pass with a sinle Reserved page
>> and we end up with -EBUSY. However, there's no reason to fail offlining in
>> this case: these pages are already offline, we may just skip them. Add the
>> appropriate workaround to test_pages_isolated().
>
> How do you recognize pages reserved by other users. You cannot simply
> remove them, it would just blow up.
>

I exepcted sumothing like that, thus RFC. Is there a way to detect pages
which were never onlined? E.g. it is Reserved and count == 0?

>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>> RFC part:
>> - Other usages of Reserved pages making offlining blocks with them a no-go
>>   may exist.
>> - I'm not exactly sure that adding another parameter to
>>   test_pages_isolated() is a good idea, we may go with a single flag for
>>   both Reserved and HwPoisoned pages: we have just two call sites and they
>>   have opposite needs (true, true in one case and false, false in the
>>   other).
>> ---
>>  include/linux/page-isolation.h |  2 +-
>>  mm/memory_hotplug.c            |  2 +-
>>  mm/page_alloc.c                |  8 +++++++-
>>  mm/page_isolation.c            | 11 ++++++++---
>>  4 files changed, 17 insertions(+), 6 deletions(-)
>> 
>> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
>> index 05a04e603686..daba12a59574 100644
>> --- a/include/linux/page-isolation.h
>> +++ b/include/linux/page-isolation.h
>> @@ -61,7 +61,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>   * Test all pages in [start_pfn, end_pfn) are isolated or not.
>>   */
>>  int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>> -			bool skip_hwpoisoned_pages);
>> +			bool skip_hwpoisoned_pages, bool skip_reserved_pages);
>>  
>>  struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>  				int **resultp);
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index d4b5f29906b9..5b7d1482804f 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1467,7 +1467,7 @@ check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages,
>>  {
>>  	int ret;
>>  	long offlined = *(long *)data;
>> -	ret = test_pages_isolated(start_pfn, start_pfn + nr_pages, true);
>> +	ret = test_pages_isolated(start_pfn, start_pfn + nr_pages, true, true);
>>  	offlined = nr_pages;
>>  	if (!ret)
>>  		*(long *)data += offlined;
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 77e4d3c5c57b..b475928c476c 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -7632,7 +7632,7 @@ int alloc_contig_range(unsigned long start, unsigned long end,
>>  	}
>>  
>>  	/* Make sure the range is really isolated. */
>> -	if (test_pages_isolated(outer_start, end, false)) {
>> +	if (test_pages_isolated(outer_start, end, false, false)) {
>>  		pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
>>  			__func__, outer_start, end);
>>  		ret = -EBUSY;
>> @@ -7746,6 +7746,12 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn)
>>  			continue;
>>  		}
>>  
>> +		/* Some pages might never be online, skip them */
>> +		if (unlikely(PageReserved(page))) {
>> +			pfn++;
>> +			continue;
>> +		}
>> +
>>  		BUG_ON(page_count(page));
>>  		BUG_ON(!PageBuddy(page));
>>  		order = page_order(page);
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 44f213935bf6..fd9c18e00b92 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -233,7 +233,8 @@ int undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>   */
>>  static unsigned long
>>  __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>> -				  bool skip_hwpoisoned_pages)
>> +				  bool skip_hwpoisoned_pages,
>> +				  bool skip_reserved_pages)
>>  {
>>  	struct page *page;
>>  
>> @@ -253,6 +254,9 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>>  		else if (skip_hwpoisoned_pages && PageHWPoison(page))
>>  			/* A HWPoisoned page cannot be also PageBuddy */
>>  			pfn++;
>> +		else if (skip_reserved_pages && PageReserved(page))
>> +			/* Skipping Reserved pages */
>> +			pfn++;
>>  		else
>>  			break;
>>  	}
>> @@ -262,7 +266,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn,
>>  
>>  /* Caller should ensure that requested range is in a single zone */
>>  int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>> -			bool skip_hwpoisoned_pages)
>> +			bool skip_hwpoisoned_pages, bool skip_reserved_pages)
>>  {
>>  	unsigned long pfn, flags;
>>  	struct page *page;
>> @@ -285,7 +289,8 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>>  	zone = page_zone(page);
>>  	spin_lock_irqsave(&zone->lock, flags);
>>  	pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn,
>> -						skip_hwpoisoned_pages);
>> +						skip_hwpoisoned_pages,
>> +						skip_reserved_pages);
>>  	spin_unlock_irqrestore(&zone->lock, flags);
>>  
>>  	trace_test_pages_isolated(start_pfn, end_pfn, pfn);
>> -- 
>> 2.13.6
>> 

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] mm/memory_hotplug: make it possible to offline blocks with reserved pages
  2017-11-08 15:39   ` Vitaly Kuznetsov
@ 2017-11-08 15:57     ` Michal Hocko
  2017-11-08 16:16       ` Vitaly Kuznetsov
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2017-11-08 15:57 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka,
	Mel Gorman, YASUAKI ISHIMATSU, Hillf Danton, Johannes Weiner,
	K. Y. Srinivasan, Stephen Hemminger, Alex Ng

On Wed 08-11-17 16:39:49, Vitaly Kuznetsov wrote:
> Michal Hocko <mhocko@kernel.org> writes:
> 
> > On Wed 08-11-17 14:01:55, Vitaly Kuznetsov wrote:
> >> Hyper-V balloon driver needs to hotplug memory in smaller chunks and to
> >> workaround Linux's 128Mb allignment requirement so it does a trick: partly
> >> populated 128Mb blocks are added and then a custom online_page_callback
> >> hook checks if the particular page is 'backed' during onlining, in case it
> >> is not backed it is left in Reserved state. When the host adds more pages
> >> to the block we bring them online from the driver (see
> >> hv_bring_pgs_online()/hv_page_online_one() in drivers/hv/hv_balloon.c).
> >> Eventually the whole block becomes fully populated and we hotplug the next
> >> 128Mb. This all works for quite some time already.
> >
> > Why does HyperV needs to workaround the section size limit in the first
> > place? We are allocation memmap for the whole section anyway so it won't
> > save any memory. So the whole thing sounds rather dubious to me.
> >
> 
> Memory hotplug requirements in Windows are different, they have 2Mb
> granularity, not 128Mb like we have in Linux x86.
> 
> Imagine there's a request to add 32Mb of memory comming from the
> Hyper-V host. What can we do? Don't add anything at all and wait till
> we're suggested to add > 128Mb and then add a section or the current
> approach.

Use a different approach than memory hotplug. E.g. memory balloning.

> >> What is not working is offlining of such partly populated blocks:
> >> check_pages_isolated_cb() callback will not pass with a sinle Reserved page
> >> and we end up with -EBUSY. However, there's no reason to fail offlining in
> >> this case: these pages are already offline, we may just skip them. Add the
> >> appropriate workaround to test_pages_isolated().
> >
> > How do you recognize pages reserved by other users. You cannot simply
> > remove them, it would just blow up.
> >
> 
> I exepcted sumothing like that, thus RFC. Is there a way to detect pages
> which were never onlined? E.g. it is Reserved and count == 0?

That would be quite tricky. But I am not convinced that the whole thing
makes any sense at all. We are in fact always creating the full section
so onlining only a part of it sounds really dubious to me.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] mm/memory_hotplug: make it possible to offline blocks with reserved pages
  2017-11-08 15:57     ` Michal Hocko
@ 2017-11-08 16:16       ` Vitaly Kuznetsov
  2017-11-09 13:16         ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Vitaly Kuznetsov @ 2017-11-08 16:16 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka,
	Mel Gorman, YASUAKI ISHIMATSU, Hillf Danton, Johannes Weiner,
	K. Y. Srinivasan, Stephen Hemminger, Alex Ng

Michal Hocko <mhocko@kernel.org> writes:

> On Wed 08-11-17 16:39:49, Vitaly Kuznetsov wrote:
>> Michal Hocko <mhocko@kernel.org> writes:
>> 
>> > On Wed 08-11-17 14:01:55, Vitaly Kuznetsov wrote:
>> >> Hyper-V balloon driver needs to hotplug memory in smaller chunks and to
>> >> workaround Linux's 128Mb allignment requirement so it does a trick: partly
>> >> populated 128Mb blocks are added and then a custom online_page_callback
>> >> hook checks if the particular page is 'backed' during onlining, in case it
>> >> is not backed it is left in Reserved state. When the host adds more pages
>> >> to the block we bring them online from the driver (see
>> >> hv_bring_pgs_online()/hv_page_online_one() in drivers/hv/hv_balloon.c).
>> >> Eventually the whole block becomes fully populated and we hotplug the next
>> >> 128Mb. This all works for quite some time already.
>> >
>> > Why does HyperV needs to workaround the section size limit in the first
>> > place? We are allocation memmap for the whole section anyway so it won't
>> > save any memory. So the whole thing sounds rather dubious to me.
>> >
>> 
>> Memory hotplug requirements in Windows are different, they have 2Mb
>> granularity, not 128Mb like we have in Linux x86.
>> 
>> Imagine there's a request to add 32Mb of memory comming from the
>> Hyper-V host. What can we do? Don't add anything at all and wait till
>> we're suggested to add > 128Mb and then add a section or the current
>> approach.
>
> Use a different approach than memory hotplug. E.g. memory balloning.
>

But how? When we boot we may have very little memory and later on we
hotplug a lot so we may not even be able to ballon all possible memory
without running out of memory.

>> >> What is not working is offlining of such partly populated blocks:
>> >> check_pages_isolated_cb() callback will not pass with a sinle Reserved page
>> >> and we end up with -EBUSY. However, there's no reason to fail offlining in
>> >> this case: these pages are already offline, we may just skip them. Add the
>> >> appropriate workaround to test_pages_isolated().
>> >
>> > How do you recognize pages reserved by other users. You cannot simply
>> > remove them, it would just blow up.
>> >
>> 
>> I exepcted sumothing like that, thus RFC. Is there a way to detect pages
>> which were never onlined? E.g. it is Reserved and count == 0?
>
> That would be quite tricky. But I am not convinced that the whole thing
> makes any sense at all. We are in fact always creating the full section
> so onlining only a part of it sounds really dubious to me.

I understand the concearn but this is something which works nowdays (and
for a long time already) so unless we're able to suggest some better API
for it (e.g. variable section size for example) we'll probably have to
live with it.

The issue I'm trying to address with the patch is not of top priority:
there's no such thing as memory unplug in Hyper-V so offlining sections
is not something users do every day.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] mm/memory_hotplug: make it possible to offline blocks with reserved pages
  2017-11-08 16:16       ` Vitaly Kuznetsov
@ 2017-11-09 13:16         ` Michal Hocko
  2017-11-09 13:30           ` Vitaly Kuznetsov
  0 siblings, 1 reply; 8+ messages in thread
From: Michal Hocko @ 2017-11-09 13:16 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka,
	Mel Gorman, YASUAKI ISHIMATSU, Hillf Danton, Johannes Weiner,
	K. Y. Srinivasan, Stephen Hemminger, Alex Ng

On Wed 08-11-17 17:16:19, Vitaly Kuznetsov wrote:
> Michal Hocko <mhocko@kernel.org> writes:
> 
> > On Wed 08-11-17 16:39:49, Vitaly Kuznetsov wrote:
> >> Michal Hocko <mhocko@kernel.org> writes:
> >> 
> >> > On Wed 08-11-17 14:01:55, Vitaly Kuznetsov wrote:
> >> >> Hyper-V balloon driver needs to hotplug memory in smaller chunks and to
> >> >> workaround Linux's 128Mb allignment requirement so it does a trick: partly
> >> >> populated 128Mb blocks are added and then a custom online_page_callback
> >> >> hook checks if the particular page is 'backed' during onlining, in case it
> >> >> is not backed it is left in Reserved state. When the host adds more pages
> >> >> to the block we bring them online from the driver (see
> >> >> hv_bring_pgs_online()/hv_page_online_one() in drivers/hv/hv_balloon.c).
> >> >> Eventually the whole block becomes fully populated and we hotplug the next
> >> >> 128Mb. This all works for quite some time already.
> >> >
> >> > Why does HyperV needs to workaround the section size limit in the first
> >> > place? We are allocation memmap for the whole section anyway so it won't
> >> > save any memory. So the whole thing sounds rather dubious to me.
> >> >
> >> 
> >> Memory hotplug requirements in Windows are different, they have 2Mb
> >> granularity, not 128Mb like we have in Linux x86.
> >> 
> >> Imagine there's a request to add 32Mb of memory comming from the
> >> Hyper-V host. What can we do? Don't add anything at all and wait till
> >> we're suggested to add > 128Mb and then add a section or the current
> >> approach.
> >
> > Use a different approach than memory hotplug. E.g. memory balloning.
> >
> 
> But how? When we boot we may have very little memory and later on we
> hotplug a lot so we may not even be able to ballon all possible memory
> without running out of memory.

Just add more memory and make part of it unusable and return it back to
the host via standard ballooning means.

How realistic is that the host gives only such a small amount of memory
btw?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] mm/memory_hotplug: make it possible to offline blocks with reserved pages
  2017-11-09 13:16         ` Michal Hocko
@ 2017-11-09 13:30           ` Vitaly Kuznetsov
  2017-11-09 13:42             ` Michal Hocko
  0 siblings, 1 reply; 8+ messages in thread
From: Vitaly Kuznetsov @ 2017-11-09 13:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka,
	Mel Gorman, YASUAKI ISHIMATSU, Hillf Danton, Johannes Weiner,
	K. Y. Srinivasan, Stephen Hemminger, Alex Ng

Michal Hocko <mhocko@kernel.org> writes:

> On Wed 08-11-17 17:16:19, Vitaly Kuznetsov wrote:
>> Michal Hocko <mhocko@kernel.org> writes:
>> 
>> > On Wed 08-11-17 16:39:49, Vitaly Kuznetsov wrote:
>> >> Michal Hocko <mhocko@kernel.org> writes:
>> >> 
>> >> > On Wed 08-11-17 14:01:55, Vitaly Kuznetsov wrote:
>> >> >> Hyper-V balloon driver needs to hotplug memory in smaller chunks and to
>> >> >> workaround Linux's 128Mb allignment requirement so it does a trick: partly
>> >> >> populated 128Mb blocks are added and then a custom online_page_callback
>> >> >> hook checks if the particular page is 'backed' during onlining, in case it
>> >> >> is not backed it is left in Reserved state. When the host adds more pages
>> >> >> to the block we bring them online from the driver (see
>> >> >> hv_bring_pgs_online()/hv_page_online_one() in drivers/hv/hv_balloon.c).
>> >> >> Eventually the whole block becomes fully populated and we hotplug the next
>> >> >> 128Mb. This all works for quite some time already.
>> >> >
>> >> > Why does HyperV needs to workaround the section size limit in the first
>> >> > place? We are allocation memmap for the whole section anyway so it won't
>> >> > save any memory. So the whole thing sounds rather dubious to me.
>> >> >
>> >> 
>> >> Memory hotplug requirements in Windows are different, they have 2Mb
>> >> granularity, not 128Mb like we have in Linux x86.
>> >> 
>> >> Imagine there's a request to add 32Mb of memory comming from the
>> >> Hyper-V host. What can we do? Don't add anything at all and wait till
>> >> we're suggested to add > 128Mb and then add a section or the current
>> >> approach.
>> >
>> > Use a different approach than memory hotplug. E.g. memory balloning.
>> >
>> 
>> But how? When we boot we may have very little memory and later on we
>> hotplug a lot so we may not even be able to ballon all possible memory
>> without running out of memory.
>
> Just add more memory and make part of it unusable and return it back to
> the host via standard ballooning means.

We don't have control over how much memory host gives us and we have no
way to return anything to the host.

>
> How realistic is that the host gives only such a small amount of memory
> btw?

It happens all the time, Hyper-V host will gradually increase guest's
memory when Dynamic Memory is enabled. Moreover, there's a manual
interface when host's admin can hotplug memory to guests (starting
WS2016) with 2M granularity.

-- 
  Vitaly

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH RFC] mm/memory_hotplug: make it possible to offline blocks with reserved pages
  2017-11-09 13:30           ` Vitaly Kuznetsov
@ 2017-11-09 13:42             ` Michal Hocko
  0 siblings, 0 replies; 8+ messages in thread
From: Michal Hocko @ 2017-11-09 13:42 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka,
	Mel Gorman, YASUAKI ISHIMATSU, Hillf Danton, Johannes Weiner,
	K. Y. Srinivasan, Stephen Hemminger, Alex Ng

On Thu 09-11-17 14:30:18, Vitaly Kuznetsov wrote:
> Michal Hocko <mhocko@kernel.org> writes:
[...]
> > How realistic is that the host gives only such a small amount of memory
> > btw?
> 
> It happens all the time, Hyper-V host will gradually increase guest's
> memory when Dynamic Memory is enabled. Moreover, there's a manual
> interface when host's admin can hotplug memory to guests (starting
> WS2016) with 2M granularity.

Sigh, this sounds quite insane but whatever. I am not sure we want to
make the generic hotplug code more complicated for this single usecase.
So I suspect you might be better off by implementing this feature on top
of hotplug. Just keep track of the partial sections and make the memory
which is not onlined yet reserved and unusable by the kernel. It sucks,
I know, but as long as there is not a wider demand for sub section
memory hotplug I would be rather reluctant to make the fragile code even
more complicated. Mem section granularity is hardcoded in way too many
places I am afraid.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-11-09 13:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-08 13:01 [PATCH RFC] mm/memory_hotplug: make it possible to offline blocks with reserved pages Vitaly Kuznetsov
2017-11-08 14:25 ` Michal Hocko
2017-11-08 15:39   ` Vitaly Kuznetsov
2017-11-08 15:57     ` Michal Hocko
2017-11-08 16:16       ` Vitaly Kuznetsov
2017-11-09 13:16         ` Michal Hocko
2017-11-09 13:30           ` Vitaly Kuznetsov
2017-11-09 13:42             ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).