linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/1] Initialize struct pages for the full section
@ 2018-12-12 17:27 Mikhail Zaslonko
  2018-12-12 17:27 ` [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section Mikhail Zaslonko
  0 siblings, 1 reply; 14+ messages in thread
From: Mikhail Zaslonko @ 2018-12-12 17:27 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, mhocko, Pavel.Tatashin, schwidefsky,
	heiko.carstens, gerald.schaefer, zaslonko

This patch refers to the older thread:
https://marc.info/?t=153658306400001&r=1&w=2

As suggested by Michal Hocko, instead of adjusting memory_hotplug paths,
I have changed memmap_init_zone() to initialize struct pages beyond the
zone end (if zone end is not aligned with the section boundary).

Mikhail Zaslonko (1):
  mm, memory_hotplug: Initialize struct pages for the full memory
    section

 mm/page_alloc.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section
  2018-12-12 17:27 [PATCH v2 0/1] Initialize struct pages for the full section Mikhail Zaslonko
@ 2018-12-12 17:27 ` Mikhail Zaslonko
  2018-12-13  3:46   ` Wei Yang
                     ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Mikhail Zaslonko @ 2018-12-12 17:27 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, mhocko, Pavel.Tatashin, schwidefsky,
	heiko.carstens, gerald.schaefer, zaslonko

If memory end is not aligned with the sparse memory section boundary, the
mapping of such a section is only partly initialized. This may lead to
VM_BUG_ON due to uninitialized struct page access from
is_mem_section_removable() or test_pages_in_a_zone() function triggered by
memory_hotplug sysfs handlers:

Here are the the panic examples:
 CONFIG_DEBUG_VM=y
 CONFIG_DEBUG_VM_PGFLAGS=y

 kernel parameter mem=2050M
 --------------------------
 page:000003d082008000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
 ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
  [<00000000008f15c4>] show_valid_zones+0x5c/0x190
  [<00000000008cf9c4>] dev_attr_show+0x34/0x70
  [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
  [<00000000003e4194>] seq_read+0x204/0x480
  [<00000000003b53ea>] __vfs_read+0x32/0x178
  [<00000000003b55b2>] vfs_read+0x82/0x138
  [<00000000003b5be2>] ksys_read+0x5a/0xb0
  [<0000000000b86ba0>] system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
  [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
 Kernel panic - not syncing: Fatal exception: panic_on_oops

 kernel parameter mem=3075M
 --------------------------
 page:000003d08300c000 is uninitialized and poisoned
 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 Call Trace:
 ([<000000000038596c>] is_mem_section_removable+0xb4/0x190)
  [<00000000008f12fa>] show_mem_removable+0x9a/0xd8
  [<00000000008cf9c4>] dev_attr_show+0x34/0x70
  [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
  [<00000000003e4194>] seq_read+0x204/0x480
  [<00000000003b53ea>] __vfs_read+0x32/0x178
  [<00000000003b55b2>] vfs_read+0x82/0x138
  [<00000000003b5be2>] ksys_read+0x5a/0xb0
  [<0000000000b86ba0>] system_call+0xdc/0x2d8
 Last Breaking-Event-Address:
  [<000000000038596c>] is_mem_section_removable+0xb4/0x190
 Kernel panic - not syncing: Fatal exception: panic_on_oops

Fix the problem by initializing the last memory section of each zone
in memmap_init_zone() till the very end, even if it goes beyond the zone
end.

Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: <stable@vger.kernel.org>
---
 mm/page_alloc.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2ec9cc407216..e2afdb2dc2c5 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5542,6 +5542,18 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
 			cond_resched();
 		}
 	}
+#ifdef CONFIG_SPARSEMEM
+	/*
+	 * If the zone does not span the rest of the section then
+	 * we should at least initialize those pages. Otherwise we
+	 * could blow up on a poisoned page in some paths which depend
+	 * on full sections being initialized (e.g. memory hotplug).
+	 */
+	while (end_pfn % PAGES_PER_SECTION) {
+		__init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
+		end_pfn++;
+	}
+#endif
 }
 
 #ifdef CONFIG_ZONE_DEVICE
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section
  2018-12-12 17:27 ` [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section Mikhail Zaslonko
@ 2018-12-13  3:46   ` Wei Yang
       [not found]     ` <e4cebbae-3fcb-f03c-3d0e-a1a44ff0675a@linux.bm.com>
  2018-12-13 12:59   ` Michal Hocko
  2018-12-14 15:22   ` David Hildenbrand
  2 siblings, 1 reply; 14+ messages in thread
From: Wei Yang @ 2018-12-13  3:46 UTC (permalink / raw)
  To: Mikhail Zaslonko
  Cc: akpm, linux-kernel, linux-mm, mhocko, Pavel.Tatashin,
	schwidefsky, heiko.carstens, gerald.schaefer

On Wed, Dec 12, 2018 at 06:27:12PM +0100, Mikhail Zaslonko wrote:
>If memory end is not aligned with the sparse memory section boundary, the
>mapping of such a section is only partly initialized. This may lead to
>VM_BUG_ON due to uninitialized struct page access from
>is_mem_section_removable() or test_pages_in_a_zone() function triggered by
>memory_hotplug sysfs handlers:
>
>Here are the the panic examples:
> CONFIG_DEBUG_VM=y
> CONFIG_DEBUG_VM_PGFLAGS=y
>
> kernel parameter mem=2050M
> --------------------------
> page:000003d082008000 is uninitialized and poisoned
> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> Call Trace:
> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
>  [<00000000008f15c4>] show_valid_zones+0x5c/0x190
>  [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>  [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>  [<00000000003e4194>] seq_read+0x204/0x480
>  [<00000000003b53ea>] __vfs_read+0x32/0x178
>  [<00000000003b55b2>] vfs_read+0x82/0x138
>  [<00000000003b5be2>] ksys_read+0x5a/0xb0
>  [<0000000000b86ba0>] system_call+0xdc/0x2d8
> Last Breaking-Event-Address:
>  [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
> Kernel panic - not syncing: Fatal exception: panic_on_oops
>
> kernel parameter mem=3075M
> --------------------------
> page:000003d08300c000 is uninitialized and poisoned
> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> Call Trace:
> ([<000000000038596c>] is_mem_section_removable+0xb4/0x190)
>  [<00000000008f12fa>] show_mem_removable+0x9a/0xd8
>  [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>  [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>  [<00000000003e4194>] seq_read+0x204/0x480
>  [<00000000003b53ea>] __vfs_read+0x32/0x178
>  [<00000000003b55b2>] vfs_read+0x82/0x138
>  [<00000000003b5be2>] ksys_read+0x5a/0xb0
>  [<0000000000b86ba0>] system_call+0xdc/0x2d8
> Last Breaking-Event-Address:
>  [<000000000038596c>] is_mem_section_removable+0xb4/0x190
> Kernel panic - not syncing: Fatal exception: panic_on_oops
>
>Fix the problem by initializing the last memory section of each zone
>in memmap_init_zone() till the very end, even if it goes beyond the zone
>end.
>
>Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
>Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
>Cc: <stable@vger.kernel.org>
>---
> mm/page_alloc.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 2ec9cc407216..e2afdb2dc2c5 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -5542,6 +5542,18 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> 			cond_resched();
> 		}
> 	}
>+#ifdef CONFIG_SPARSEMEM
>+	/*
>+	 * If the zone does not span the rest of the section then
>+	 * we should at least initialize those pages. Otherwise we
>+	 * could blow up on a poisoned page in some paths which depend
>+	 * on full sections being initialized (e.g. memory hotplug).
>+	 */
>+	while (end_pfn % PAGES_PER_SECTION) {
>+		__init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
>+		end_pfn++;
>+	}
>+#endif
> }

What will happen if the end_pfn is PAGES_PER_SECTION aligned, but there
is invalid pfn? For example, the page is skipped in the for loop of
memmap_init_zone()?

> 
> #ifdef CONFIG_ZONE_DEVICE
>-- 
>2.16.4

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section
  2018-12-12 17:27 ` [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section Mikhail Zaslonko
  2018-12-13  3:46   ` Wei Yang
@ 2018-12-13 12:59   ` Michal Hocko
  2018-12-14 15:22   ` David Hildenbrand
  2 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2018-12-13 12:59 UTC (permalink / raw)
  To: Mikhail Zaslonko
  Cc: akpm, linux-kernel, linux-mm, Pavel.Tatashin, schwidefsky,
	heiko.carstens, gerald.schaefer

On Wed 12-12-18 18:27:12, Mikhail Zaslonko wrote:
> If memory end is not aligned with the sparse memory section boundary, the
> mapping of such a section is only partly initialized. This may lead to
> VM_BUG_ON due to uninitialized struct page access from
> is_mem_section_removable() or test_pages_in_a_zone() function triggered by
> memory_hotplug sysfs handlers:
> 
> Here are the the panic examples:
>  CONFIG_DEBUG_VM=y
>  CONFIG_DEBUG_VM_PGFLAGS=y
> 
>  kernel parameter mem=2050M
>  --------------------------
>  page:000003d082008000 is uninitialized and poisoned
>  page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>  Call Trace:
>  ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
>   [<00000000008f15c4>] show_valid_zones+0x5c/0x190
>   [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>   [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>   [<00000000003e4194>] seq_read+0x204/0x480
>   [<00000000003b53ea>] __vfs_read+0x32/0x178
>   [<00000000003b55b2>] vfs_read+0x82/0x138
>   [<00000000003b5be2>] ksys_read+0x5a/0xb0
>   [<0000000000b86ba0>] system_call+0xdc/0x2d8
>  Last Breaking-Event-Address:
>   [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
>  Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
>  kernel parameter mem=3075M
>  --------------------------
>  page:000003d08300c000 is uninitialized and poisoned
>  page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>  Call Trace:
>  ([<000000000038596c>] is_mem_section_removable+0xb4/0x190)
>   [<00000000008f12fa>] show_mem_removable+0x9a/0xd8
>   [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>   [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>   [<00000000003e4194>] seq_read+0x204/0x480
>   [<00000000003b53ea>] __vfs_read+0x32/0x178
>   [<00000000003b55b2>] vfs_read+0x82/0x138
>   [<00000000003b5be2>] ksys_read+0x5a/0xb0
>   [<0000000000b86ba0>] system_call+0xdc/0x2d8
>  Last Breaking-Event-Address:
>   [<000000000038596c>] is_mem_section_removable+0xb4/0x190
>  Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
> Fix the problem by initializing the last memory section of each zone
> in memmap_init_zone() till the very end, even if it goes beyond the zone
> end.
> 
> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> Cc: <stable@vger.kernel.org>

This has alwways been problem AFAIU. It just went unnoticed because we
have zeroed memmaps during allocation before f7f99100d8d9 ("mm: stop
zeroing memory during allocation in vmemmap") and so the above test
would simply skip these ranges as belonging to zone 0 or provided a
garbage.

So I guess we do care for post f7f99100d8d9 kernels mostly and therefore
Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap")

Other than that looks good to me.
Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  mm/page_alloc.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2ec9cc407216..e2afdb2dc2c5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5542,6 +5542,18 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>  			cond_resched();
>  		}
>  	}
> +#ifdef CONFIG_SPARSEMEM
> +	/*
> +	 * If the zone does not span the rest of the section then
> +	 * we should at least initialize those pages. Otherwise we
> +	 * could blow up on a poisoned page in some paths which depend
> +	 * on full sections being initialized (e.g. memory hotplug).
> +	 */
> +	while (end_pfn % PAGES_PER_SECTION) {
> +		__init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
> +		end_pfn++;
> +	}
> +#endif
>  }
>  
>  #ifdef CONFIG_ZONE_DEVICE
> -- 
> 2.16.4

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section
       [not found]     ` <e4cebbae-3fcb-f03c-3d0e-a1a44ff0675a@linux.bm.com>
@ 2018-12-13 15:12       ` Wei Yang
       [not found]         ` <674c53e2-e4b3-f21f-4613-b149acef7e53@linux.bm.com>
  0 siblings, 1 reply; 14+ messages in thread
From: Wei Yang @ 2018-12-13 15:12 UTC (permalink / raw)
  To: Zaslonko Mikhail
  Cc: Wei Yang, Mikhail Zaslonko, akpm, linux-kernel, linux-mm, mhocko,
	Pavel.Tatashin, schwidefsky, heiko.carstens, gerald.schaefer

On Thu, Dec 13, 2018 at 01:37:16PM +0100, Zaslonko Mikhail wrote:
>On 13.12.2018 04:46, Wei Yang wrote:
>> On Wed, Dec 12, 2018 at 06:27:12PM +0100, Mikhail Zaslonko wrote:
>>> If memory end is not aligned with the sparse memory section boundary, the
>>> mapping of such a section is only partly initialized. This may lead to
>>> VM_BUG_ON due to uninitialized struct page access from
>>> is_mem_section_removable() or test_pages_in_a_zone() function triggered by
>>> memory_hotplug sysfs handlers:
>>>
>>> Here are the the panic examples:
>>> CONFIG_DEBUG_VM=y
>>> CONFIG_DEBUG_VM_PGFLAGS=y
>>>
>>> kernel parameter mem=2050M
>>> --------------------------
>>> page:000003d082008000 is uninitialized and poisoned
>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>>> Call Trace:
>>> ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
>>>  [<00000000008f15c4>] show_valid_zones+0x5c/0x190
>>>  [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>>>  [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>>>  [<00000000003e4194>] seq_read+0x204/0x480
>>>  [<00000000003b53ea>] __vfs_read+0x32/0x178
>>>  [<00000000003b55b2>] vfs_read+0x82/0x138
>>>  [<00000000003b5be2>] ksys_read+0x5a/0xb0
>>>  [<0000000000b86ba0>] system_call+0xdc/0x2d8
>>> Last Breaking-Event-Address:
>>>  [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
>>> Kernel panic - not syncing: Fatal exception: panic_on_oops
>>>
>>> kernel parameter mem=3075M
>>> --------------------------
>>> page:000003d08300c000 is uninitialized and poisoned
>>> page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>>> Call Trace:
>>> ([<000000000038596c>] is_mem_section_removable+0xb4/0x190)
>>>  [<00000000008f12fa>] show_mem_removable+0x9a/0xd8
>>>  [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>>>  [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>>>  [<00000000003e4194>] seq_read+0x204/0x480
>>>  [<00000000003b53ea>] __vfs_read+0x32/0x178
>>>  [<00000000003b55b2>] vfs_read+0x82/0x138
>>>  [<00000000003b5be2>] ksys_read+0x5a/0xb0
>>>  [<0000000000b86ba0>] system_call+0xdc/0x2d8
>>> Last Breaking-Event-Address:
>>>  [<000000000038596c>] is_mem_section_removable+0xb4/0x190
>>> Kernel panic - not syncing: Fatal exception: panic_on_oops
>>>
>>> Fix the problem by initializing the last memory section of each zone
>>> in memmap_init_zone() till the very end, even if it goes beyond the zone
>>> end.
>>>
>>> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
>>> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
>>> Cc: <stable@vger.kernel.org>
>>> ---
>>> mm/page_alloc.c | 12 ++++++++++++
>>> 1 file changed, 12 insertions(+)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 2ec9cc407216..e2afdb2dc2c5 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -5542,6 +5542,18 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>> 			cond_resched();
>>> 		}
>>> 	}
>>> +#ifdef CONFIG_SPARSEMEM
>>> +	/*
>>> +	 * If the zone does not span the rest of the section then
>>> +	 * we should at least initialize those pages. Otherwise we
>>> +	 * could blow up on a poisoned page in some paths which depend
>>> +	 * on full sections being initialized (e.g. memory hotplug).
>>> +	 */
>>> +	while (end_pfn % PAGES_PER_SECTION) {
>>> +		__init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
>>> +		end_pfn++;
>>> +	}
>>> +#endif
>>> }
>> 
>> What will happen if the end_pfn is PAGES_PER_SECTION aligned, but there
>> is invalid pfn? For example, the page is skipped in the for loop of
>> memmap_init_zone()?
>> 
>
>If the end_pfn is PAGES_PER_SECTION aligned, we do not do any extra 
>initialization.
>If the page is skipped in the loop, it will remain uninitialized for 
>the reason (e.g. a memory hole). The behavior has not been changed here.
>

I may not describe my question clearly.

If the page is skipped, then it is uninitialized. Then would this page
trigger the call trace when read the removable sysfs?

>>>
>>> #ifdef CONFIG_ZONE_DEVICE
>>> -- 
>>> 2.16.4
>> 

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section
       [not found]         ` <674c53e2-e4b3-f21f-4613-b149acef7e53@linux.bm.com>
@ 2018-12-14 10:19           ` Michal Hocko
  2018-12-15  0:26             ` Wei Yang
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2018-12-14 10:19 UTC (permalink / raw)
  To: Zaslonko Mikhail
  Cc: Wei Yang, Mikhail Zaslonko, akpm, linux-kernel, linux-mm,
	Pavel.Tatashin, schwidefsky, heiko.carstens, gerald.schaefer

[Your From address seems to have a typo (linux.bm.com) - fixed]

On Fri 14-12-18 10:33:55, Zaslonko Mikhail wrote:
[...]
> Yes, it might still trigger PF_POISONED_CHECK if the first page 
> of the pageblock is left uninitialized (poisoned).
> But in order to cover these exceptional cases we would need to 
> adjust memory_hotplug sysfs handler functions with similar 
> checks (as in the for loop of memmap_init_zone()). And I guess 
> that is what we were trying to avoid (adding special cases to 
> memory_hotplug paths).

is_mem_section_removable should test pfn_valid_within at least.
But that would require some care because next_active_pageblock expects
aligned pages. Ble, this code is just horrible. I would just remove it
altogether. I strongly suspect that nobody is using it for anything
reasonable anyway. The only reliable way to check whether a block is
removable is to remove it. Everything else is just racy.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section
  2018-12-12 17:27 ` [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section Mikhail Zaslonko
  2018-12-13  3:46   ` Wei Yang
  2018-12-13 12:59   ` Michal Hocko
@ 2018-12-14 15:22   ` David Hildenbrand
  2018-12-14 15:49     ` David Hildenbrand
  2 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2018-12-14 15:22 UTC (permalink / raw)
  To: Mikhail Zaslonko, akpm
  Cc: linux-kernel, linux-mm, mhocko, Pavel.Tatashin, schwidefsky,
	heiko.carstens, gerald.schaefer

On 12.12.18 18:27, Mikhail Zaslonko wrote:
> If memory end is not aligned with the sparse memory section boundary, the
> mapping of such a section is only partly initialized. This may lead to
> VM_BUG_ON due to uninitialized struct page access from
> is_mem_section_removable() or test_pages_in_a_zone() function triggered by
> memory_hotplug sysfs handlers:
> 
> Here are the the panic examples:
>  CONFIG_DEBUG_VM=y
>  CONFIG_DEBUG_VM_PGFLAGS=y
> 
>  kernel parameter mem=2050M
>  --------------------------
>  page:000003d082008000 is uninitialized and poisoned
>  page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>  Call Trace:
>  ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
>   [<00000000008f15c4>] show_valid_zones+0x5c/0x190
>   [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>   [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>   [<00000000003e4194>] seq_read+0x204/0x480
>   [<00000000003b53ea>] __vfs_read+0x32/0x178
>   [<00000000003b55b2>] vfs_read+0x82/0x138
>   [<00000000003b5be2>] ksys_read+0x5a/0xb0
>   [<0000000000b86ba0>] system_call+0xdc/0x2d8
>  Last Breaking-Event-Address:
>   [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
>  Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
>  kernel parameter mem=3075M
>  --------------------------
>  page:000003d08300c000 is uninitialized and poisoned
>  page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>  Call Trace:
>  ([<000000000038596c>] is_mem_section_removable+0xb4/0x190)
>   [<00000000008f12fa>] show_mem_removable+0x9a/0xd8
>   [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>   [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>   [<00000000003e4194>] seq_read+0x204/0x480
>   [<00000000003b53ea>] __vfs_read+0x32/0x178
>   [<00000000003b55b2>] vfs_read+0x82/0x138
>   [<00000000003b5be2>] ksys_read+0x5a/0xb0
>   [<0000000000b86ba0>] system_call+0xdc/0x2d8
>  Last Breaking-Event-Address:
>   [<000000000038596c>] is_mem_section_removable+0xb4/0x190
>  Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
> Fix the problem by initializing the last memory section of each zone
> in memmap_init_zone() till the very end, even if it goes beyond the zone
> end.
> 
> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> Cc: <stable@vger.kernel.org>
> ---
>  mm/page_alloc.c | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2ec9cc407216..e2afdb2dc2c5 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5542,6 +5542,18 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>  			cond_resched();
>  		}
>  	}
> +#ifdef CONFIG_SPARSEMEM
> +	/*
> +	 * If the zone does not span the rest of the section then
> +	 * we should at least initialize those pages. Otherwise we
> +	 * could blow up on a poisoned page in some paths which depend
> +	 * on full sections being initialized (e.g. memory hotplug).
> +	 */
> +	while (end_pfn % PAGES_PER_SECTION) {
> +		__init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
> +		end_pfn++;

This page will not be marked as PG_reserved - although it is a physical
memory gap. Do we care?

> +	}
> +#endif
>  }
>  
>  #ifdef CONFIG_ZONE_DEVICE
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section
  2018-12-14 15:22   ` David Hildenbrand
@ 2018-12-14 15:49     ` David Hildenbrand
  2018-12-14 19:23       ` Gerald Schaefer
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2018-12-14 15:49 UTC (permalink / raw)
  To: Mikhail Zaslonko, akpm
  Cc: linux-kernel, linux-mm, mhocko, Pavel.Tatashin, schwidefsky,
	heiko.carstens, gerald.schaefer

On 14.12.18 16:22, David Hildenbrand wrote:
> On 12.12.18 18:27, Mikhail Zaslonko wrote:
>> If memory end is not aligned with the sparse memory section boundary, the
>> mapping of such a section is only partly initialized. This may lead to
>> VM_BUG_ON due to uninitialized struct page access from
>> is_mem_section_removable() or test_pages_in_a_zone() function triggered by
>> memory_hotplug sysfs handlers:
>>
>> Here are the the panic examples:
>>  CONFIG_DEBUG_VM=y
>>  CONFIG_DEBUG_VM_PGFLAGS=y
>>
>>  kernel parameter mem=2050M
>>  --------------------------
>>  page:000003d082008000 is uninitialized and poisoned
>>  page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>>  Call Trace:
>>  ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
>>   [<00000000008f15c4>] show_valid_zones+0x5c/0x190
>>   [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>>   [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>>   [<00000000003e4194>] seq_read+0x204/0x480
>>   [<00000000003b53ea>] __vfs_read+0x32/0x178
>>   [<00000000003b55b2>] vfs_read+0x82/0x138
>>   [<00000000003b5be2>] ksys_read+0x5a/0xb0
>>   [<0000000000b86ba0>] system_call+0xdc/0x2d8
>>  Last Breaking-Event-Address:
>>   [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
>>  Kernel panic - not syncing: Fatal exception: panic_on_oops
>>
>>  kernel parameter mem=3075M
>>  --------------------------
>>  page:000003d08300c000 is uninitialized and poisoned
>>  page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>>  Call Trace:
>>  ([<000000000038596c>] is_mem_section_removable+0xb4/0x190)
>>   [<00000000008f12fa>] show_mem_removable+0x9a/0xd8
>>   [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>>   [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>>   [<00000000003e4194>] seq_read+0x204/0x480
>>   [<00000000003b53ea>] __vfs_read+0x32/0x178
>>   [<00000000003b55b2>] vfs_read+0x82/0x138
>>   [<00000000003b5be2>] ksys_read+0x5a/0xb0
>>   [<0000000000b86ba0>] system_call+0xdc/0x2d8
>>  Last Breaking-Event-Address:
>>   [<000000000038596c>] is_mem_section_removable+0xb4/0x190
>>  Kernel panic - not syncing: Fatal exception: panic_on_oops
>>
>> Fix the problem by initializing the last memory section of each zone
>> in memmap_init_zone() till the very end, even if it goes beyond the zone
>> end.
>>
>> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
>> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
>> Cc: <stable@vger.kernel.org>
>> ---
>>  mm/page_alloc.c | 12 ++++++++++++
>>  1 file changed, 12 insertions(+)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 2ec9cc407216..e2afdb2dc2c5 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5542,6 +5542,18 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>  			cond_resched();
>>  		}
>>  	}
>> +#ifdef CONFIG_SPARSEMEM
>> +	/*
>> +	 * If the zone does not span the rest of the section then
>> +	 * we should at least initialize those pages. Otherwise we
>> +	 * could blow up on a poisoned page in some paths which depend
>> +	 * on full sections being initialized (e.g. memory hotplug).
>> +	 */
>> +	while (end_pfn % PAGES_PER_SECTION) {
>> +		__init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
>> +		end_pfn++;
> 
> This page will not be marked as PG_reserved - although it is a physical
> memory gap. Do we care?
> 

Hm, or do we even have any idea what this is (e.g. could it also be
something not a gap)?

For physical memory gaps within a section, architectures usually exclude
that memory from getting passed to e.g. the page allocator by
memblock_reserve().

Before handing all free pages to the page allocator, all such reserved
memblocks will be marked reserved.

But this here seems to be different. We don't have a previous
memblock_reserve(), because otherwise these pages would have properly
been initialized already when marking them reserved.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section
  2018-12-14 15:49     ` David Hildenbrand
@ 2018-12-14 19:23       ` Gerald Schaefer
  2018-12-17  9:38         ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: Gerald Schaefer @ 2018-12-14 19:23 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Mikhail Zaslonko, akpm, linux-kernel, linux-mm, mhocko,
	Pavel.Tatashin, schwidefsky, heiko.carstens

On Fri, 14 Dec 2018 16:49:14 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 14.12.18 16:22, David Hildenbrand wrote:
> > On 12.12.18 18:27, Mikhail Zaslonko wrote:  
> >> If memory end is not aligned with the sparse memory section boundary, the
> >> mapping of such a section is only partly initialized. This may lead to
> >> VM_BUG_ON due to uninitialized struct page access from
> >> is_mem_section_removable() or test_pages_in_a_zone() function triggered by
> >> memory_hotplug sysfs handlers:
> >>
> >> Here are the the panic examples:
> >>  CONFIG_DEBUG_VM=y
> >>  CONFIG_DEBUG_VM_PGFLAGS=y
> >>
> >>  kernel parameter mem=2050M
> >>  --------------------------
> >>  page:000003d082008000 is uninitialized and poisoned
> >>  page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> >>  Call Trace:
> >>  ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
> >>   [<00000000008f15c4>] show_valid_zones+0x5c/0x190
> >>   [<00000000008cf9c4>] dev_attr_show+0x34/0x70
> >>   [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
> >>   [<00000000003e4194>] seq_read+0x204/0x480
> >>   [<00000000003b53ea>] __vfs_read+0x32/0x178
> >>   [<00000000003b55b2>] vfs_read+0x82/0x138
> >>   [<00000000003b5be2>] ksys_read+0x5a/0xb0
> >>   [<0000000000b86ba0>] system_call+0xdc/0x2d8
> >>  Last Breaking-Event-Address:
> >>   [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
> >>  Kernel panic - not syncing: Fatal exception: panic_on_oops
> >>
> >>  kernel parameter mem=3075M
> >>  --------------------------
> >>  page:000003d08300c000 is uninitialized and poisoned
> >>  page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> >>  Call Trace:
> >>  ([<000000000038596c>] is_mem_section_removable+0xb4/0x190)
> >>   [<00000000008f12fa>] show_mem_removable+0x9a/0xd8
> >>   [<00000000008cf9c4>] dev_attr_show+0x34/0x70
> >>   [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
> >>   [<00000000003e4194>] seq_read+0x204/0x480
> >>   [<00000000003b53ea>] __vfs_read+0x32/0x178
> >>   [<00000000003b55b2>] vfs_read+0x82/0x138
> >>   [<00000000003b5be2>] ksys_read+0x5a/0xb0
> >>   [<0000000000b86ba0>] system_call+0xdc/0x2d8
> >>  Last Breaking-Event-Address:
> >>   [<000000000038596c>] is_mem_section_removable+0xb4/0x190
> >>  Kernel panic - not syncing: Fatal exception: panic_on_oops
> >>
> >> Fix the problem by initializing the last memory section of each zone
> >> in memmap_init_zone() till the very end, even if it goes beyond the zone
> >> end.
> >>
> >> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
> >> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> >> Cc: <stable@vger.kernel.org>
> >> ---
> >>  mm/page_alloc.c | 12 ++++++++++++
> >>  1 file changed, 12 insertions(+)
> >>
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 2ec9cc407216..e2afdb2dc2c5 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -5542,6 +5542,18 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
> >>  			cond_resched();
> >>  		}
> >>  	}
> >> +#ifdef CONFIG_SPARSEMEM
> >> +	/*
> >> +	 * If the zone does not span the rest of the section then
> >> +	 * we should at least initialize those pages. Otherwise we
> >> +	 * could blow up on a poisoned page in some paths which depend
> >> +	 * on full sections being initialized (e.g. memory hotplug).
> >> +	 */
> >> +	while (end_pfn % PAGES_PER_SECTION) {
> >> +		__init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
> >> +		end_pfn++;  
> > 
> > This page will not be marked as PG_reserved - although it is a physical
> > memory gap. Do we care?
> >   
> 
> Hm, or do we even have any idea what this is (e.g. could it also be
> something not a gap)?

In the "mem=" restriction scenario it would be a gap, and probably fall
into the PG_reserved categorization from your recent patch:
 * - Pages falling into physical memory gaps - not IORESOURCE_SYSRAM. Trying
 *   to read/write these pages might end badly. Don't touch!

Not sure if it could be something else. In theory, if it is possible to have
a scenario where memory zones are not section-aligned, then this
end_pfn % PAGES_PER_SECTION part could be part of another zone. But then it
should not matter if the pages get pre-initialized here, with or w/o
PG_reseved, because they should later be properly initialized in their zone.

So marking them as PG_reserved sounds right, especially in the light of your
current PG_reserved clean-up.

> 
> For physical memory gaps within a section, architectures usually exclude
> that memory from getting passed to e.g. the page allocator by
> memblock_reserve().
> 
> Before handing all free pages to the page allocator, all such reserved
> memblocks will be marked reserved.
> 
> But this here seems to be different. We don't have a previous
> memblock_reserve(), because otherwise these pages would have properly
> been initialized already when marking them reserved.

Not sure how memblock_reserve() and struct page initialization are
related, but at least on s390 there is a memblock_reserve() on the range
in question in setup_arch() -> reserve_memory_end(). However, in this
"mem=" scenario, the range is also removed later with memblock_remove()
in setup_memory_end(), because it is beyond memory_end.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section
  2018-12-14 10:19           ` Michal Hocko
@ 2018-12-15  0:26             ` Wei Yang
  0 siblings, 0 replies; 14+ messages in thread
From: Wei Yang @ 2018-12-15  0:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Zaslonko Mikhail, Wei Yang, akpm, linux-kernel, linux-mm,
	Pavel.Tatashin, schwidefsky, heiko.carstens, gerald.schaefer

On Fri, Dec 14, 2018 at 11:19:59AM +0100, Michal Hocko wrote:
>[Your From address seems to have a typo (linux.bm.com) - fixed]
>
>On Fri 14-12-18 10:33:55, Zaslonko Mikhail wrote:
>[...]
>> Yes, it might still trigger PF_POISONED_CHECK if the first page 
>> of the pageblock is left uninitialized (poisoned).
>> But in order to cover these exceptional cases we would need to 
>> adjust memory_hotplug sysfs handler functions with similar 
>> checks (as in the for loop of memmap_init_zone()). And I guess 
>> that is what we were trying to avoid (adding special cases to 
>> memory_hotplug paths).
>
>is_mem_section_removable should test pfn_valid_within at least.
>But that would require some care because next_active_pageblock expects
>aligned pages. Ble, this code is just horrible. I would just remove it
>altogether. I strongly suspect that nobody is using it for anything
>reasonable anyway. The only reliable way to check whether a block is
>removable is to remove it. Everything else is just racy.
>

Sounds reasonable.

The result return from removable sysfs is transient. If no user rely on
this, remove this is a better way.

>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section
  2018-12-14 19:23       ` Gerald Schaefer
@ 2018-12-17  9:38         ` David Hildenbrand
  2018-12-17 12:28           ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2018-12-17  9:38 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Mikhail Zaslonko, akpm, linux-kernel, linux-mm, mhocko,
	Pavel.Tatashin, schwidefsky, heiko.carstens

On 14.12.18 20:23, Gerald Schaefer wrote:
> On Fri, 14 Dec 2018 16:49:14 +0100
> David Hildenbrand <david@redhat.com> wrote:
> 
>> On 14.12.18 16:22, David Hildenbrand wrote:
>>> On 12.12.18 18:27, Mikhail Zaslonko wrote:  
>>>> If memory end is not aligned with the sparse memory section boundary, the
>>>> mapping of such a section is only partly initialized. This may lead to
>>>> VM_BUG_ON due to uninitialized struct page access from
>>>> is_mem_section_removable() or test_pages_in_a_zone() function triggered by
>>>> memory_hotplug sysfs handlers:
>>>>
>>>> Here are the the panic examples:
>>>>  CONFIG_DEBUG_VM=y
>>>>  CONFIG_DEBUG_VM_PGFLAGS=y
>>>>
>>>>  kernel parameter mem=2050M
>>>>  --------------------------
>>>>  page:000003d082008000 is uninitialized and poisoned
>>>>  page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>>>>  Call Trace:
>>>>  ([<0000000000385b26>] test_pages_in_a_zone+0xde/0x160)
>>>>   [<00000000008f15c4>] show_valid_zones+0x5c/0x190
>>>>   [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>>>>   [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>>>>   [<00000000003e4194>] seq_read+0x204/0x480
>>>>   [<00000000003b53ea>] __vfs_read+0x32/0x178
>>>>   [<00000000003b55b2>] vfs_read+0x82/0x138
>>>>   [<00000000003b5be2>] ksys_read+0x5a/0xb0
>>>>   [<0000000000b86ba0>] system_call+0xdc/0x2d8
>>>>  Last Breaking-Event-Address:
>>>>   [<0000000000385b26>] test_pages_in_a_zone+0xde/0x160
>>>>  Kernel panic - not syncing: Fatal exception: panic_on_oops
>>>>
>>>>  kernel parameter mem=3075M
>>>>  --------------------------
>>>>  page:000003d08300c000 is uninitialized and poisoned
>>>>  page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>>>>  Call Trace:
>>>>  ([<000000000038596c>] is_mem_section_removable+0xb4/0x190)
>>>>   [<00000000008f12fa>] show_mem_removable+0x9a/0xd8
>>>>   [<00000000008cf9c4>] dev_attr_show+0x34/0x70
>>>>   [<0000000000463ad0>] sysfs_kf_seq_show+0xc8/0x148
>>>>   [<00000000003e4194>] seq_read+0x204/0x480
>>>>   [<00000000003b53ea>] __vfs_read+0x32/0x178
>>>>   [<00000000003b55b2>] vfs_read+0x82/0x138
>>>>   [<00000000003b5be2>] ksys_read+0x5a/0xb0
>>>>   [<0000000000b86ba0>] system_call+0xdc/0x2d8
>>>>  Last Breaking-Event-Address:
>>>>   [<000000000038596c>] is_mem_section_removable+0xb4/0x190
>>>>  Kernel panic - not syncing: Fatal exception: panic_on_oops
>>>>
>>>> Fix the problem by initializing the last memory section of each zone
>>>> in memmap_init_zone() till the very end, even if it goes beyond the zone
>>>> end.
>>>>
>>>> Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
>>>> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
>>>> Cc: <stable@vger.kernel.org>
>>>> ---
>>>>  mm/page_alloc.c | 12 ++++++++++++
>>>>  1 file changed, 12 insertions(+)
>>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 2ec9cc407216..e2afdb2dc2c5 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -5542,6 +5542,18 @@ void __meminit memmap_init_zone(unsigned long size, int nid, unsigned long zone,
>>>>  			cond_resched();
>>>>  		}
>>>>  	}
>>>> +#ifdef CONFIG_SPARSEMEM
>>>> +	/*
>>>> +	 * If the zone does not span the rest of the section then
>>>> +	 * we should at least initialize those pages. Otherwise we
>>>> +	 * could blow up on a poisoned page in some paths which depend
>>>> +	 * on full sections being initialized (e.g. memory hotplug).
>>>> +	 */
>>>> +	while (end_pfn % PAGES_PER_SECTION) {
>>>> +		__init_single_page(pfn_to_page(end_pfn), end_pfn, zone, nid);
>>>> +		end_pfn++;  
>>>
>>> This page will not be marked as PG_reserved - although it is a physical
>>> memory gap. Do we care?
>>>   
>>
>> Hm, or do we even have any idea what this is (e.g. could it also be
>> something not a gap)?
> 
> In the "mem=" restriction scenario it would be a gap, and probably fall
> into the PG_reserved categorization from your recent patch:
>  * - Pages falling into physical memory gaps - not IORESOURCE_SYSRAM. Trying
>  *   to read/write these pages might end badly. Don't touch!
> 
> Not sure if it could be something else. In theory, if it is possible to have
> a scenario where memory zones are not section-aligned, then this
> end_pfn % PAGES_PER_SECTION part could be part of another zone. But then it
> should not matter if the pages get pre-initialized here, with or w/o
> PG_reseved, because they should later be properly initialized in their zone.
> 
> So marking them as PG_reserved sounds right, especially in the light of your
> current PG_reserved clean-up.
> 
>>
>> For physical memory gaps within a section, architectures usually exclude
>> that memory from getting passed to e.g. the page allocator by
>> memblock_reserve().
>>
>> Before handing all free pages to the page allocator, all such reserved
>> memblocks will be marked reserved.
>>
>> But this here seems to be different. We don't have a previous
>> memblock_reserve(), because otherwise these pages would have properly
>> been initialized already when marking them reserved.
> 
> Not sure how memblock_reserve() and struct page initialization are
> related, but at least on s390 there is a memblock_reserve() on the range
> in question in setup_arch() -> reserve_memory_end(). However, in this
> "mem=" scenario, the range is also removed later with memblock_remove()
> in setup_memory_end(), because it is beyond memory_end.
> 

I am wondering if we should fix this on the memblock level instead than.
Something like, before handing memory over to the page allocator, add
memory as reserved up to the last section boundary. Or even when setting
the physical memory limit (mem= scenario).

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section
  2018-12-17  9:38         ` David Hildenbrand
@ 2018-12-17 12:28           ` Michal Hocko
  2018-12-17 13:29             ` David Hildenbrand
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2018-12-17 12:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Gerald Schaefer, Mikhail Zaslonko, akpm, linux-kernel, linux-mm,
	Pavel.Tatashin, schwidefsky, heiko.carstens

On Mon 17-12-18 10:38:32, David Hildenbrand wrote:
[...]
> I am wondering if we should fix this on the memblock level instead than.
> Something like, before handing memory over to the page allocator, add
> memory as reserved up to the last section boundary. Or even when setting
> the physical memory limit (mem= scenario).

Memory initialization is spread over several places and that makes it
really hard to grasp and maintain. I do not really see why we should
make memblock even more special. We do intialize the section worth of
memory here so it sounds like a proper place to quirk for incomplete
sections.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section
  2018-12-17 12:28           ` Michal Hocko
@ 2018-12-17 13:29             ` David Hildenbrand
  2018-12-17 13:35               ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: David Hildenbrand @ 2018-12-17 13:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Gerald Schaefer, Mikhail Zaslonko, akpm, linux-kernel, linux-mm,
	Pavel.Tatashin, schwidefsky, heiko.carstens

On 17.12.18 13:28, Michal Hocko wrote:
> On Mon 17-12-18 10:38:32, David Hildenbrand wrote:
> [...]
>> I am wondering if we should fix this on the memblock level instead than.
>> Something like, before handing memory over to the page allocator, add
>> memory as reserved up to the last section boundary. Or even when setting
>> the physical memory limit (mem= scenario).
> 
> Memory initialization is spread over several places and that makes it
> really hard to grasp and maintain. I do not really see why we should
> make memblock even more special. We do intialize the section worth of
> memory here so it sounds like a proper place to quirk for incomplete
> sections.
> 

True as well. The reason I am asking is, that memblock usually takes
care of physical memory holes.

-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section
  2018-12-17 13:29             ` David Hildenbrand
@ 2018-12-17 13:35               ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2018-12-17 13:35 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Gerald Schaefer, Mikhail Zaslonko, akpm, linux-kernel, linux-mm,
	Pavel.Tatashin, schwidefsky, heiko.carstens

On Mon 17-12-18 14:29:04, David Hildenbrand wrote:
> On 17.12.18 13:28, Michal Hocko wrote:
> > On Mon 17-12-18 10:38:32, David Hildenbrand wrote:
> > [...]
> >> I am wondering if we should fix this on the memblock level instead than.
> >> Something like, before handing memory over to the page allocator, add
> >> memory as reserved up to the last section boundary. Or even when setting
> >> the physical memory limit (mem= scenario).
> > 
> > Memory initialization is spread over several places and that makes it
> > really hard to grasp and maintain. I do not really see why we should
> > make memblock even more special. We do intialize the section worth of
> > memory here so it sounds like a proper place to quirk for incomplete
> > sections.
> > 
> 
> True as well. The reason I am asking is, that memblock usually takes
> care of physical memory holes.

Yes and no. It only reflects existing memory ranges (so yes it skips
over holes) and then it provides an API that platform/arch code can
abuse to cut holes into existing ranges.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2018-12-17 13:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-12 17:27 [PATCH v2 0/1] Initialize struct pages for the full section Mikhail Zaslonko
2018-12-12 17:27 ` [PATCH v2 1/1] mm, memory_hotplug: Initialize struct pages for the full memory section Mikhail Zaslonko
2018-12-13  3:46   ` Wei Yang
     [not found]     ` <e4cebbae-3fcb-f03c-3d0e-a1a44ff0675a@linux.bm.com>
2018-12-13 15:12       ` Wei Yang
     [not found]         ` <674c53e2-e4b3-f21f-4613-b149acef7e53@linux.bm.com>
2018-12-14 10:19           ` Michal Hocko
2018-12-15  0:26             ` Wei Yang
2018-12-13 12:59   ` Michal Hocko
2018-12-14 15:22   ` David Hildenbrand
2018-12-14 15:49     ` David Hildenbrand
2018-12-14 19:23       ` Gerald Schaefer
2018-12-17  9:38         ` David Hildenbrand
2018-12-17 12:28           ` Michal Hocko
2018-12-17 13:29             ` David Hildenbrand
2018-12-17 13:35               ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).