* [PATCH v1 0/2] mm: Memory offlining + page isolation cleanups @ 2019-10-21 14:19 David Hildenbrand 2019-10-21 14:19 ` [PATCH v1 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining David Hildenbrand 2019-10-21 14:19 ` [PATCH v1 2/2] mm/page_isolation.c: Convert SKIP_HWPOISON to MEMORY_OFFLINE David Hildenbrand 0 siblings, 2 replies; 13+ messages in thread From: David Hildenbrand @ 2019-10-21 14:19 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, David Hildenbrand, Alexander Duyck, Andrew Morton, Anshuman Khandual, Dan Williams, Mel Gorman, Michal Hocko, Mike Rapoport, Mike Rapoport, Oscar Salvador, Pavel Tatashin, Pavel Tatashin, Pingfan Liu, Qian Cai, Vlastimil Babka, Wei Yang Two cleanups that popped up while working on (and discussing) virtio-mem: https://lkml.org/lkml/2019/9/19/463 Tested with DIMMs on x86. David Hildenbrand (2): mm/page_alloc.c: Don't set pages PageReserved() when offlining mm/page_isolation.c: Convert SKIP_HWPOISON to MEMORY_OFFLINE include/linux/page-isolation.h | 4 ++-- mm/memory_hotplug.c | 8 +++++--- mm/page_alloc.c | 9 +++------ mm/page_isolation.c | 12 ++++++------ 4 files changed, 16 insertions(+), 17 deletions(-) -- 2.21.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v1 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining 2019-10-21 14:19 [PATCH v1 0/2] mm: Memory offlining + page isolation cleanups David Hildenbrand @ 2019-10-21 14:19 ` David Hildenbrand 2019-10-21 14:43 ` Michal Hocko 2019-10-21 14:19 ` [PATCH v1 2/2] mm/page_isolation.c: Convert SKIP_HWPOISON to MEMORY_OFFLINE David Hildenbrand 1 sibling, 1 reply; 13+ messages in thread From: David Hildenbrand @ 2019-10-21 14:19 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, David Hildenbrand, Andrew Morton, Michal Hocko, Vlastimil Babka, Oscar Salvador, Mel Gorman, Mike Rapoport, Dan Williams, Wei Yang, Alexander Duyck, Anshuman Khandual, Pavel Tatashin We call __offline_isolated_pages() from __offline_pages() after all pages were isolated and are either free (PageBuddy()) or PageHWPoison. Nothing can stop us from offlining memory at this point. In __offline_isolated_pages() we first set all affected memory sections offline (offline_mem_sections(pfn, end_pfn)), to mark the memmap as invalid (pfn_to_online_page() will no longer succeed), and then walk over all pages to pull the free pages from the free lists (to the isolated free lists, to be precise). Note that re-onlining a memory block will result in the whole memmap getting reinitialized, overwriting any old state. We already poision the memmap when offlining is complete to find any access to stale/uninitialized memmaps. So, setting the pages PageReserved() is not helpful. The memap is marked offline and all pageblocks are isolated. As soon as offline, the memmap is stale either way. This looks like a leftover from ancient times where we initialized the memmap when adding memory and not when onlining it (the pages were set PageReserved so re-onling would work as expected). Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Oscar Salvador <osalvador@suse.de> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/page_alloc.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ed8884dc0c47..bf6b21f02154 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8667,7 +8667,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) { struct page *page; struct zone *zone; - unsigned int order, i; + unsigned int order; unsigned long pfn; unsigned long flags; unsigned long offlined_pages = 0; @@ -8695,7 +8695,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) */ if (unlikely(!PageBuddy(page) && PageHWPoison(page))) { pfn++; - SetPageReserved(page); offlined_pages++; continue; } @@ -8709,8 +8708,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) pfn, 1 << order, end_pfn); #endif del_page_from_free_area(page, &zone->free_area[order]); - for (i = 0; i < (1 << order); i++) - SetPageReserved((page+i)); pfn += (1 << order); } spin_unlock_irqrestore(&zone->lock, flags); -- 2.21.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining 2019-10-21 14:19 ` [PATCH v1 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining David Hildenbrand @ 2019-10-21 14:43 ` Michal Hocko 2019-10-21 15:39 ` David Hildenbrand 0 siblings, 1 reply; 13+ messages in thread From: Michal Hocko @ 2019-10-21 14:43 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, linux-mm, Andrew Morton, Vlastimil Babka, Oscar Salvador, Mel Gorman, Mike Rapoport, Dan Williams, Wei Yang, Alexander Duyck, Anshuman Khandual, Pavel Tatashin On Mon 21-10-19 16:19:25, David Hildenbrand wrote: > We call __offline_isolated_pages() from __offline_pages() after all > pages were isolated and are either free (PageBuddy()) or PageHWPoison. > Nothing can stop us from offlining memory at this point. > > In __offline_isolated_pages() we first set all affected memory sections > offline (offline_mem_sections(pfn, end_pfn)), to mark the memmap as > invalid (pfn_to_online_page() will no longer succeed), and then walk over > all pages to pull the free pages from the free lists (to the isolated > free lists, to be precise). > > Note that re-onlining a memory block will result in the whole memmap > getting reinitialized, overwriting any old state. We already poision the > memmap when offlining is complete to find any access to > stale/uninitialized memmaps. > > So, setting the pages PageReserved() is not helpful. The memap is marked > offline and all pageblocks are isolated. As soon as offline, the memmap > is stale either way. > > This looks like a leftover from ancient times where we initialized the > memmap when adding memory and not when onlining it (the pages were set > PageReserved so re-onling would work as expected). > > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Michal Hocko <mhocko@suse.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Mel Gorman <mgorman@techsingularity.net> > Cc: Mike Rapoport <rppt@linux.ibm.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Wei Yang <richard.weiyang@gmail.com> > Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> > Cc: Anshuman Khandual <anshuman.khandual@arm.com> > Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> > Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> We still set PageReserved before onlining pages and that one should be good to go as well (memmap_init_zone). Thanks! There is a comment above offline_isolated_pages_cb that should be removed as well. > --- > mm/page_alloc.c | 5 +---- > 1 file changed, 1 insertion(+), 4 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index ed8884dc0c47..bf6b21f02154 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8667,7 +8667,7 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) > { > struct page *page; > struct zone *zone; > - unsigned int order, i; > + unsigned int order; > unsigned long pfn; > unsigned long flags; > unsigned long offlined_pages = 0; > @@ -8695,7 +8695,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) > */ > if (unlikely(!PageBuddy(page) && PageHWPoison(page))) { > pfn++; > - SetPageReserved(page); > offlined_pages++; > continue; > } > @@ -8709,8 +8708,6 @@ __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn) > pfn, 1 << order, end_pfn); > #endif > del_page_from_free_area(page, &zone->free_area[order]); > - for (i = 0; i < (1 << order); i++) > - SetPageReserved((page+i)); > pfn += (1 << order); > } > spin_unlock_irqrestore(&zone->lock, flags); > -- > 2.21.0 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining 2019-10-21 14:43 ` Michal Hocko @ 2019-10-21 15:39 ` David Hildenbrand 2019-10-21 15:47 ` Michal Hocko 0 siblings, 1 reply; 13+ messages in thread From: David Hildenbrand @ 2019-10-21 15:39 UTC (permalink / raw) To: Michal Hocko Cc: linux-kernel, linux-mm, Andrew Morton, Vlastimil Babka, Oscar Salvador, Mel Gorman, Mike Rapoport, Dan Williams, Wei Yang, Alexander Duyck, Anshuman Khandual, Pavel Tatashin On 21.10.19 16:43, Michal Hocko wrote: > On Mon 21-10-19 16:19:25, David Hildenbrand wrote: >> We call __offline_isolated_pages() from __offline_pages() after all >> pages were isolated and are either free (PageBuddy()) or PageHWPoison. >> Nothing can stop us from offlining memory at this point. >> >> In __offline_isolated_pages() we first set all affected memory sections >> offline (offline_mem_sections(pfn, end_pfn)), to mark the memmap as >> invalid (pfn_to_online_page() will no longer succeed), and then walk over >> all pages to pull the free pages from the free lists (to the isolated >> free lists, to be precise). >> >> Note that re-onlining a memory block will result in the whole memmap >> getting reinitialized, overwriting any old state. We already poision the >> memmap when offlining is complete to find any access to >> stale/uninitialized memmaps. >> >> So, setting the pages PageReserved() is not helpful. The memap is marked >> offline and all pageblocks are isolated. As soon as offline, the memmap >> is stale either way. >> >> This looks like a leftover from ancient times where we initialized the >> memmap when adding memory and not when onlining it (the pages were set >> PageReserved so re-onling would work as expected). >> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Michal Hocko <mhocko@suse.com> >> Cc: Vlastimil Babka <vbabka@suse.cz> >> Cc: Oscar Salvador <osalvador@suse.de> >> Cc: Mel Gorman <mgorman@techsingularity.net> >> Cc: Mike Rapoport <rppt@linux.ibm.com> >> Cc: Dan Williams <dan.j.williams@intel.com> >> Cc: Wei Yang <richard.weiyang@gmail.com> >> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> >> Cc: Anshuman Khandual <anshuman.khandual@arm.com> >> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> >> Signed-off-by: David Hildenbrand <david@redhat.com> > > Acked-by: Michal Hocko <mhocko@suse.com> > > We still set PageReserved before onlining pages and that one should be > good to go as well (memmap_init_zone). > Thanks! memmap_init_zone() is called when onlining memory. There, set all pages to reserved right now (on context == MEMMAP_HOTPLUG). We clear PG_reserved when onlining a page to the buddy (e.g., generic_online_page). If we would online a memory block with holes, we would want to keep all such pages (!pfn_valid()) set to reserved. Also, there might be other side effects. So it might not be that easy to remove. A cleanup that I have on my list is to disallow offlining memory blocks with holes. This implies that we will never online memory blocks with holes. This allows for some cleanups in the onlining/offlining code. For example, it would allow to get rid of this PG_reserved initialization. I don't think that we have to support offlining memory blocks with holes. This can only be bootmem (never hotplugged memory), where the chance for this to work is in my opinion already not too good. What's your opinion on this? > > There is a comment above offline_isolated_pages_cb that should be > removed as well. Right, I'll convert that comment "Mark all sections offline and remove all free pages from the buddy." Thanks! -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining 2019-10-21 15:39 ` David Hildenbrand @ 2019-10-21 15:47 ` Michal Hocko 2019-10-21 15:54 ` David Hildenbrand 0 siblings, 1 reply; 13+ messages in thread From: Michal Hocko @ 2019-10-21 15:47 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, linux-mm, Andrew Morton, Vlastimil Babka, Oscar Salvador, Mel Gorman, Mike Rapoport, Dan Williams, Wei Yang, Alexander Duyck, Anshuman Khandual, Pavel Tatashin On Mon 21-10-19 17:39:36, David Hildenbrand wrote: > On 21.10.19 16:43, Michal Hocko wrote: [...] > > We still set PageReserved before onlining pages and that one should be > > good to go as well (memmap_init_zone). > > Thanks! > > memmap_init_zone() is called when onlining memory. There, set all pages to > reserved right now (on context == MEMMAP_HOTPLUG). We clear PG_reserved when > onlining a page to the buddy (e.g., generic_online_page). If we would online > a memory block with holes, we would want to keep all such pages > (!pfn_valid()) set to reserved. Also, there might be other side effects. Isn't it sufficient to have those pages in a poisoned state? They are not onlined so their state is basically undefined anyway. I do not see how PageReserved makes this any better. Also is the hole inside a hotplugable memory something we really have to care about. Has anybody actually seen a platform to require that? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining 2019-10-21 15:47 ` Michal Hocko @ 2019-10-21 15:54 ` David Hildenbrand 2019-10-22 8:20 ` Michal Hocko 0 siblings, 1 reply; 13+ messages in thread From: David Hildenbrand @ 2019-10-21 15:54 UTC (permalink / raw) To: Michal Hocko Cc: linux-kernel, linux-mm, Andrew Morton, Vlastimil Babka, Oscar Salvador, Mel Gorman, Mike Rapoport, Dan Williams, Wei Yang, Alexander Duyck, Anshuman Khandual, Pavel Tatashin On 21.10.19 17:47, Michal Hocko wrote: > On Mon 21-10-19 17:39:36, David Hildenbrand wrote: >> On 21.10.19 16:43, Michal Hocko wrote: > [...] >>> We still set PageReserved before onlining pages and that one should be >>> good to go as well (memmap_init_zone). >>> Thanks! >> >> memmap_init_zone() is called when onlining memory. There, set all pages to >> reserved right now (on context == MEMMAP_HOTPLUG). We clear PG_reserved when >> onlining a page to the buddy (e.g., generic_online_page). If we would online >> a memory block with holes, we would want to keep all such pages >> (!pfn_valid()) set to reserved. Also, there might be other side effects. > > Isn't it sufficient to have those pages in a poisoned state? They are > not onlined so their state is basically undefined anyway. I do not see > how PageReserved makes this any better. It is what people have been using for a long time. Memory hole -> PG_reserved. The memmap is valid, but people want to tell "this here is crap, don't look at it". > > Also is the hole inside a hotplugable memory something we really have to > care about. Has anybody actually seen a platform to require that? That's what I was asking. I can see "support" for this was added basically right from the beginning. I'd say we rip that out and cleanup/simplify. I am not aware of a platform that requires this. Especially, memory holes on DIMMs (detected during boot) seem like an unlikely thing. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining 2019-10-21 15:54 ` David Hildenbrand @ 2019-10-22 8:20 ` Michal Hocko 2019-10-22 8:23 ` David Hildenbrand 0 siblings, 1 reply; 13+ messages in thread From: Michal Hocko @ 2019-10-22 8:20 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, linux-mm, Andrew Morton, Vlastimil Babka, Oscar Salvador, Mel Gorman, Mike Rapoport, Dan Williams, Wei Yang, Alexander Duyck, Anshuman Khandual, Pavel Tatashin On Mon 21-10-19 17:54:35, David Hildenbrand wrote: > On 21.10.19 17:47, Michal Hocko wrote: > > On Mon 21-10-19 17:39:36, David Hildenbrand wrote: > > > On 21.10.19 16:43, Michal Hocko wrote: > > [...] > > > > We still set PageReserved before onlining pages and that one should be > > > > good to go as well (memmap_init_zone). > > > > Thanks! > > > > > > memmap_init_zone() is called when onlining memory. There, set all pages to > > > reserved right now (on context == MEMMAP_HOTPLUG). We clear PG_reserved when > > > onlining a page to the buddy (e.g., generic_online_page). If we would online > > > a memory block with holes, we would want to keep all such pages > > > (!pfn_valid()) set to reserved. Also, there might be other side effects. > > > > Isn't it sufficient to have those pages in a poisoned state? They are > > not onlined so their state is basically undefined anyway. I do not see > > how PageReserved makes this any better. > > It is what people have been using for a long time. Memory hole -> > PG_reserved. The memmap is valid, but people want to tell "this here is > crap, don't look at it". The page is poisoned, right? If yes then setting the reserved bit doesn't make any sense. > > Also is the hole inside a hotplugable memory something we really have to > > care about. Has anybody actually seen a platform to require that? > > That's what I was asking. I can see "support" for this was added basically > right from the beginning. I'd say we rip that out and cleanup/simplify. I am > not aware of a platform that requires this. Especially, memory holes on > DIMMs (detected during boot) seem like an unlikely thing. The thing is that the hotplug development shows ad-hoc decisions throughout the code. It is even worse that it is hard to guess whether some hludges are a result of a careful design or ad-hoc trial and failure approach on setups that never were production. Building on top of that be preserving hacks is not going to improve the situation. So I am perfectly fine to focus on making the most straightforward setups work reliably. Even when there is a risk of breaking some odd setups. We can fix them up later but we would have at least a specific example and document it. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining 2019-10-22 8:20 ` Michal Hocko @ 2019-10-22 8:23 ` David Hildenbrand 2019-10-22 8:58 ` Michal Hocko 0 siblings, 1 reply; 13+ messages in thread From: David Hildenbrand @ 2019-10-22 8:23 UTC (permalink / raw) To: Michal Hocko Cc: linux-kernel, linux-mm, Andrew Morton, Vlastimil Babka, Oscar Salvador, Mel Gorman, Mike Rapoport, Dan Williams, Wei Yang, Alexander Duyck, Anshuman Khandual, Pavel Tatashin On 22.10.19 10:20, Michal Hocko wrote: > On Mon 21-10-19 17:54:35, David Hildenbrand wrote: >> On 21.10.19 17:47, Michal Hocko wrote: >>> On Mon 21-10-19 17:39:36, David Hildenbrand wrote: >>>> On 21.10.19 16:43, Michal Hocko wrote: >>> [...] >>>>> We still set PageReserved before onlining pages and that one should be >>>>> good to go as well (memmap_init_zone). >>>>> Thanks! >>>> >>>> memmap_init_zone() is called when onlining memory. There, set all pages to >>>> reserved right now (on context == MEMMAP_HOTPLUG). We clear PG_reserved when >>>> onlining a page to the buddy (e.g., generic_online_page). If we would online >>>> a memory block with holes, we would want to keep all such pages >>>> (!pfn_valid()) set to reserved. Also, there might be other side effects. >>> >>> Isn't it sufficient to have those pages in a poisoned state? They are >>> not onlined so their state is basically undefined anyway. I do not see >>> how PageReserved makes this any better. >> >> It is what people have been using for a long time. Memory hole -> >> PG_reserved. The memmap is valid, but people want to tell "this here is >> crap, don't look at it". > > The page is poisoned, right? If yes then setting the reserved bit > doesn't make any sense. No it's not poisoned AFAIK. It should be initialized - and I remember that PG_reserved on memory holes is relevant to detect MMIO pages. (e.g., looking at KVM code ...) > >>> Also is the hole inside a hotplugable memory something we really have to >>> care about. Has anybody actually seen a platform to require that? >> >> That's what I was asking. I can see "support" for this was added basically >> right from the beginning. I'd say we rip that out and cleanup/simplify. I am >> not aware of a platform that requires this. Especially, memory holes on >> DIMMs (detected during boot) seem like an unlikely thing. > > The thing is that the hotplug development shows ad-hoc decisions > throughout the code. It is even worse that it is hard to guess whether > some hludges are a result of a careful design or ad-hoc trial and > failure approach on setups that never were production. Building on top > of that be preserving hacks is not going to improve the situation. So I > am perfectly fine to focus on making the most straightforward setups > work reliably. Even when there is a risk of breaking some odd setups. We > can fix them up later but we would have at least a specific example and > document it. > Alright, I'll prepare a simple patch that rejects offlining memory with memory holes. We can apply that and see if anybody screams out loud. If not, we can clean up that crap. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining 2019-10-22 8:23 ` David Hildenbrand @ 2019-10-22 8:58 ` Michal Hocko 2019-10-22 9:03 ` David Hildenbrand 0 siblings, 1 reply; 13+ messages in thread From: Michal Hocko @ 2019-10-22 8:58 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, linux-mm, Andrew Morton, Vlastimil Babka, Oscar Salvador, Mel Gorman, Mike Rapoport, Dan Williams, Wei Yang, Alexander Duyck, Anshuman Khandual, Pavel Tatashin On Tue 22-10-19 10:23:37, David Hildenbrand wrote: > On 22.10.19 10:20, Michal Hocko wrote: > > On Mon 21-10-19 17:54:35, David Hildenbrand wrote: > > > On 21.10.19 17:47, Michal Hocko wrote: > > > > On Mon 21-10-19 17:39:36, David Hildenbrand wrote: > > > > > On 21.10.19 16:43, Michal Hocko wrote: > > > > [...] > > > > > > We still set PageReserved before onlining pages and that one should be > > > > > > good to go as well (memmap_init_zone). > > > > > > Thanks! > > > > > > > > > > memmap_init_zone() is called when onlining memory. There, set all pages to > > > > > reserved right now (on context == MEMMAP_HOTPLUG). We clear PG_reserved when > > > > > onlining a page to the buddy (e.g., generic_online_page). If we would online > > > > > a memory block with holes, we would want to keep all such pages > > > > > (!pfn_valid()) set to reserved. Also, there might be other side effects. > > > > > > > > Isn't it sufficient to have those pages in a poisoned state? They are > > > > not onlined so their state is basically undefined anyway. I do not see > > > > how PageReserved makes this any better. > > > > > > It is what people have been using for a long time. Memory hole -> > > > PG_reserved. The memmap is valid, but people want to tell "this here is > > > crap, don't look at it". > > > > The page is poisoned, right? If yes then setting the reserved bit > > doesn't make any sense. > > No it's not poisoned AFAIK. It should be initialized Dohh, it seems I still keep confusing myself. You are right the page is initialized at this stage. A potential hole in RAM or ZONE_DEVICE memory will just not hit the page allocator. Sorry about the noise. > - and I remember that PG_reserved on memory holes is relevant to > detect MMIO pages. (e.g., looking at KVM code ...) I can see kvm_is_reserved_pfn() which checks both pfn_valid and PageReserved. How does this help to detect memory holes though? Any driver might be setting the page reserved. > > > > Also is the hole inside a hotplugable memory something we really have to > > > > care about. Has anybody actually seen a platform to require that? > > > > > > That's what I was asking. I can see "support" for this was added basically > > > right from the beginning. I'd say we rip that out and cleanup/simplify. I am > > > not aware of a platform that requires this. Especially, memory holes on > > > DIMMs (detected during boot) seem like an unlikely thing. > > > > The thing is that the hotplug development shows ad-hoc decisions > > throughout the code. It is even worse that it is hard to guess whether > > some hludges are a result of a careful design or ad-hoc trial and > > failure approach on setups that never were production. Building on top > > of that be preserving hacks is not going to improve the situation. So I > > am perfectly fine to focus on making the most straightforward setups > > work reliably. Even when there is a risk of breaking some odd setups. We > > can fix them up later but we would have at least a specific example and > > document it. > > > > Alright, I'll prepare a simple patch that rejects offlining memory with Is offlining an interesting path? I would expect onlining to be much more interesting one. > memory holes. We can apply that and see if anybody screams out loud. If not, > we can clean up that crap. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining 2019-10-22 8:58 ` Michal Hocko @ 2019-10-22 9:03 ` David Hildenbrand 0 siblings, 0 replies; 13+ messages in thread From: David Hildenbrand @ 2019-10-22 9:03 UTC (permalink / raw) To: Michal Hocko Cc: linux-kernel, linux-mm, Andrew Morton, Vlastimil Babka, Oscar Salvador, Mel Gorman, Mike Rapoport, Dan Williams, Wei Yang, Alexander Duyck, Anshuman Khandual, Pavel Tatashin >> - and I remember that PG_reserved on memory holes is relevant to >> detect MMIO pages. (e.g., looking at KVM code ...) > > I can see kvm_is_reserved_pfn() which checks both pfn_valid and > PageReserved. How does this help to detect memory holes though? > Any driver might be setting the page reserved. See my other mail. This is mostly to not touch MMIO pages and ZONE_DEVICE pages ... well and /dev/mem mapped pages. > >>>>> Also is the hole inside a hotplugable memory something we really have to >>>>> care about. Has anybody actually seen a platform to require that? >>>> >>>> That's what I was asking. I can see "support" for this was added basically >>>> right from the beginning. I'd say we rip that out and cleanup/simplify. I am >>>> not aware of a platform that requires this. Especially, memory holes on >>>> DIMMs (detected during boot) seem like an unlikely thing. >>> >>> The thing is that the hotplug development shows ad-hoc decisions >>> throughout the code. It is even worse that it is hard to guess whether >>> some hludges are a result of a careful design or ad-hoc trial and >>> failure approach on setups that never were production. Building on top >>> of that be preserving hacks is not going to improve the situation. So I >>> am perfectly fine to focus on making the most straightforward setups >>> work reliably. Even when there is a risk of breaking some odd setups. We >>> can fix them up later but we would have at least a specific example and >>> document it. >>> >> >> Alright, I'll prepare a simple patch that rejects offlining memory with > > Is offlining an interesting path? I would expect onlining to be much > more interesting one. If you can't offline memory with holes, you can also not online memory with holes AFAIKS :) Bootmem is online, and memory you can hotplug (initially offline) cannot have any holes. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH v1 2/2] mm/page_isolation.c: Convert SKIP_HWPOISON to MEMORY_OFFLINE 2019-10-21 14:19 [PATCH v1 0/2] mm: Memory offlining + page isolation cleanups David Hildenbrand 2019-10-21 14:19 ` [PATCH v1 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining David Hildenbrand @ 2019-10-21 14:19 ` David Hildenbrand 2019-10-21 15:02 ` Michal Hocko 1 sibling, 1 reply; 13+ messages in thread From: David Hildenbrand @ 2019-10-21 14:19 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, David Hildenbrand, Michal Hocko, Oscar Salvador, Andrew Morton, Anshuman Khandual, Pingfan Liu, Qian Cai, Pavel Tatashin, Dan Williams, Vlastimil Babka, Mel Gorman, Mike Rapoport, Alexander Duyck We have two types of users of page isolation: 1. Memory offlining: Offline memory so it can be unplugged. Memory won't be touched. 2. Memory allocation: Allocate memory (e.g., alloc_contig_range()) to become the owner of the memory and make use of it. For example, in case we want to offline memory, we can ignore (skip over) PageHWPoison() pages, as the memory won't get used. We can allow to offline memory. In contrast, we don't want to allow to allocate such memory. Let's generalize the approach so we can special case other types of pages we want to skip over in case we offline memory. While at it, also pass the same flags to test_pages_isolated(). Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Pingfan Liu <kernelfans@gmail.com> Cc: Qian Cai <cai@lca.pw> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Suggested-by: Michal Hocko <mhocko@suse.com> Signed-off-by: David Hildenbrand <david@redhat.com> --- include/linux/page-isolation.h | 4 ++-- mm/memory_hotplug.c | 8 +++++--- mm/page_alloc.c | 4 ++-- mm/page_isolation.c | 12 ++++++------ 4 files changed, 15 insertions(+), 13 deletions(-) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 1099c2fee20f..6861df759fad 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -30,7 +30,7 @@ static inline bool is_migrate_isolate(int migratetype) } #endif -#define SKIP_HWPOISON 0x1 +#define MEMORY_OFFLINE 0x1 #define REPORT_FAILURE 0x2 bool has_unmovable_pages(struct zone *zone, struct page *page, int count, @@ -58,7 +58,7 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, * Test all pages in [start_pfn, end_pfn) are isolated or not. */ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, - bool skip_hwpoisoned_pages); + int isol_flags); struct page *alloc_migrate_target(struct page *page, unsigned long private); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 5e6b2a312362..aa8abbd0d2e9 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1149,7 +1149,8 @@ static bool is_pageblock_removable_nolock(unsigned long pfn) if (!zone_spans_pfn(zone, pfn)) return false; - return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE, SKIP_HWPOISON); + return !has_unmovable_pages(zone, page, 0, MIGRATE_MOVABLE, + MEMORY_OFFLINE); } /* Checks if this range of memory is likely to be hot-removable. */ @@ -1366,7 +1367,8 @@ static int check_pages_isolated_cb(unsigned long start_pfn, unsigned long nr_pages, void *data) { - return test_pages_isolated(start_pfn, start_pfn + nr_pages, true); + return test_pages_isolated(start_pfn, start_pfn + nr_pages, + MEMORY_OFFLINE); } static int __init cmdline_parse_movable_node(char *p) @@ -1477,7 +1479,7 @@ static int __ref __offline_pages(unsigned long start_pfn, /* set above range as isolated */ ret = start_isolate_page_range(start_pfn, end_pfn, MIGRATE_MOVABLE, - SKIP_HWPOISON | REPORT_FAILURE); + MEMORY_OFFLINE | REPORT_FAILURE); if (ret < 0) { reason = "failure to isolate range"; goto failed_removal; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index bf6b21f02154..b44712c7fdd7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8270,7 +8270,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, * The HWPoisoned page may be not in buddy system, and * page_count() is not 0. */ - if ((flags & SKIP_HWPOISON) && PageHWPoison(page)) + if (flags & MEMORY_OFFLINE && PageHWPoison(page)) continue; if (__PageMovable(page)) @@ -8486,7 +8486,7 @@ int alloc_contig_range(unsigned long start, unsigned long end, } /* Make sure the range is really isolated. */ - if (test_pages_isolated(outer_start, end, false)) { + if (test_pages_isolated(outer_start, end, 0)) { pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n", __func__, outer_start, end); ret = -EBUSY; diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 89c19c0feadb..82b80aeb8a71 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -168,7 +168,8 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * @migratetype: Migrate type to set in error recovery. * @flags: The following flags are allowed (they can be combined in * a bit mask) - * SKIP_HWPOISON - ignore hwpoison pages + * MEMORY_OFFLINE - isolate to offline (!allocate) memory + * e.g., skip over PageHWPoison() pages * REPORT_FAILURE - report details about the failure to * isolate the range * @@ -257,7 +258,7 @@ void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, */ static unsigned long __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, - bool skip_hwpoisoned_pages) + int flags) { struct page *page; @@ -274,7 +275,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, * simple way to verify that as VM_BUG_ON(), though. */ pfn += 1 << page_order(page); - else if (skip_hwpoisoned_pages && PageHWPoison(page)) + else if (flags & MEMORY_OFFLINE && PageHWPoison(page)) /* A HWPoisoned page cannot be also PageBuddy */ pfn++; else @@ -286,7 +287,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, /* Caller should ensure that requested range is in a single zone */ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, - bool skip_hwpoisoned_pages) + int isol_flags) { unsigned long pfn, flags; struct page *page; @@ -308,8 +309,7 @@ int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn, /* Check all pages are free or marked as ISOLATED */ zone = page_zone(page); spin_lock_irqsave(&zone->lock, flags); - pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn, - skip_hwpoisoned_pages); + pfn = __test_page_isolated_in_pageblock(start_pfn, end_pfn, isol_flags); spin_unlock_irqrestore(&zone->lock, flags); trace_test_pages_isolated(start_pfn, end_pfn, pfn); -- 2.21.0 ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH v1 2/2] mm/page_isolation.c: Convert SKIP_HWPOISON to MEMORY_OFFLINE 2019-10-21 14:19 ` [PATCH v1 2/2] mm/page_isolation.c: Convert SKIP_HWPOISON to MEMORY_OFFLINE David Hildenbrand @ 2019-10-21 15:02 ` Michal Hocko 2019-10-21 15:04 ` David Hildenbrand 0 siblings, 1 reply; 13+ messages in thread From: Michal Hocko @ 2019-10-21 15:02 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, linux-mm, Oscar Salvador, Andrew Morton, Anshuman Khandual, Pingfan Liu, Qian Cai, Pavel Tatashin, Dan Williams, Vlastimil Babka, Mel Gorman, Mike Rapoport, Alexander Duyck On Mon 21-10-19 16:19:26, David Hildenbrand wrote: > We have two types of users of page isolation: > 1. Memory offlining: Offline memory so it can be unplugged. Memory won't > be touched. > 2. Memory allocation: Allocate memory (e.g., alloc_contig_range()) to > become the owner of the memory and make use of it. > > For example, in case we want to offline memory, we can ignore (skip over) > PageHWPoison() pages, as the memory won't get used. We can allow to > offline memory. In contrast, we don't want to allow to allocate such > memory. > > Let's generalize the approach so we can special case other types of > pages we want to skip over in case we offline memory. While at it, also > pass the same flags to test_pages_isolated(). > > Cc: Michal Hocko <mhocko@suse.com> > Cc: Oscar Salvador <osalvador@suse.de> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Anshuman Khandual <anshuman.khandual@arm.com> > Cc: David Hildenbrand <david@redhat.com> > Cc: Pingfan Liu <kernelfans@gmail.com> > Cc: Qian Cai <cai@lca.pw> > Cc: Pavel Tatashin <pasha.tatashin@soleen.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: Mel Gorman <mgorman@techsingularity.net> > Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> > Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> > Suggested-by: Michal Hocko <mhocko@suse.com> > Signed-off-by: David Hildenbrand <david@redhat.com> Yes, a highlevel flag makes more sense than requesting specific types of pages to skip over. Acked-by: Michal Hocko <mhocko@suse.com> Please make the code easier to follow ... > --- > include/linux/page-isolation.h | 4 ++-- > mm/memory_hotplug.c | 8 +++++--- > mm/page_alloc.c | 4 ++-- > mm/page_isolation.c | 12 ++++++------ > 4 files changed, 15 insertions(+), 13 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index bf6b21f02154..b44712c7fdd7 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8270,7 +8270,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, > * The HWPoisoned page may be not in buddy system, and > * page_count() is not 0. > */ > - if ((flags & SKIP_HWPOISON) && PageHWPoison(page)) > + if (flags & MEMORY_OFFLINE && PageHWPoison(page)) > continue; > > if (__PageMovable(page)) [...] > @@ -257,7 +258,7 @@ void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, > */ > static unsigned long > __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, > - bool skip_hwpoisoned_pages) > + int flags) > { > struct page *page; > > @@ -274,7 +275,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, > * simple way to verify that as VM_BUG_ON(), though. > */ > pfn += 1 << page_order(page); > - else if (skip_hwpoisoned_pages && PageHWPoison(page)) > + else if (flags & MEMORY_OFFLINE && PageHWPoison(page)) > /* A HWPoisoned page cannot be also PageBuddy */ > pfn++; > else .. and use parentheses for the flag check. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v1 2/2] mm/page_isolation.c: Convert SKIP_HWPOISON to MEMORY_OFFLINE 2019-10-21 15:02 ` Michal Hocko @ 2019-10-21 15:04 ` David Hildenbrand 0 siblings, 0 replies; 13+ messages in thread From: David Hildenbrand @ 2019-10-21 15:04 UTC (permalink / raw) To: Michal Hocko Cc: linux-kernel, linux-mm, Oscar Salvador, Andrew Morton, Anshuman Khandual, Pingfan Liu, Qian Cai, Pavel Tatashin, Dan Williams, Vlastimil Babka, Mel Gorman, Mike Rapoport, Alexander Duyck On 21.10.19 17:02, Michal Hocko wrote: > On Mon 21-10-19 16:19:26, David Hildenbrand wrote: >> We have two types of users of page isolation: >> 1. Memory offlining: Offline memory so it can be unplugged. Memory won't >> be touched. >> 2. Memory allocation: Allocate memory (e.g., alloc_contig_range()) to >> become the owner of the memory and make use of it. >> >> For example, in case we want to offline memory, we can ignore (skip over) >> PageHWPoison() pages, as the memory won't get used. We can allow to >> offline memory. In contrast, we don't want to allow to allocate such >> memory. >> >> Let's generalize the approach so we can special case other types of >> pages we want to skip over in case we offline memory. While at it, also >> pass the same flags to test_pages_isolated(). >> >> Cc: Michal Hocko <mhocko@suse.com> >> Cc: Oscar Salvador <osalvador@suse.de> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Cc: Anshuman Khandual <anshuman.khandual@arm.com> >> Cc: David Hildenbrand <david@redhat.com> >> Cc: Pingfan Liu <kernelfans@gmail.com> >> Cc: Qian Cai <cai@lca.pw> >> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> >> Cc: Dan Williams <dan.j.williams@intel.com> >> Cc: Vlastimil Babka <vbabka@suse.cz> >> Cc: Mel Gorman <mgorman@techsingularity.net> >> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> >> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> >> Suggested-by: Michal Hocko <mhocko@suse.com> >> Signed-off-by: David Hildenbrand <david@redhat.com> > > Yes, a highlevel flag makes more sense than requesting specific types of > pages to skip over. > > Acked-by: Michal Hocko <mhocko@suse.com> > > Please make the code easier to follow ... >> --- >> include/linux/page-isolation.h | 4 ++-- >> mm/memory_hotplug.c | 8 +++++--- >> mm/page_alloc.c | 4 ++-- >> mm/page_isolation.c | 12 ++++++------ >> 4 files changed, 15 insertions(+), 13 deletions(-) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index bf6b21f02154..b44712c7fdd7 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -8270,7 +8270,7 @@ bool has_unmovable_pages(struct zone *zone, struct page *page, int count, >> * The HWPoisoned page may be not in buddy system, and >> * page_count() is not 0. >> */ >> - if ((flags & SKIP_HWPOISON) && PageHWPoison(page)) >> + if (flags & MEMORY_OFFLINE && PageHWPoison(page)) >> continue; >> >> if (__PageMovable(page)) > [...] >> @@ -257,7 +258,7 @@ void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, >> */ >> static unsigned long >> __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, >> - bool skip_hwpoisoned_pages) >> + int flags) >> { >> struct page *page; >> >> @@ -274,7 +275,7 @@ __test_page_isolated_in_pageblock(unsigned long pfn, unsigned long end_pfn, >> * simple way to verify that as VM_BUG_ON(), though. >> */ >> pfn += 1 << page_order(page); >> - else if (skip_hwpoisoned_pages && PageHWPoison(page)) >> + else if (flags & MEMORY_OFFLINE && PageHWPoison(page)) >> /* A HWPoisoned page cannot be also PageBuddy */ >> pfn++; >> else > > .. and use parentheses for the flag check. > Can do if you prefer :) Thanks! I'll resend both patches. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-10-22 9:03 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-10-21 14:19 [PATCH v1 0/2] mm: Memory offlining + page isolation cleanups David Hildenbrand 2019-10-21 14:19 ` [PATCH v1 1/2] mm/page_alloc.c: Don't set pages PageReserved() when offlining David Hildenbrand 2019-10-21 14:43 ` Michal Hocko 2019-10-21 15:39 ` David Hildenbrand 2019-10-21 15:47 ` Michal Hocko 2019-10-21 15:54 ` David Hildenbrand 2019-10-22 8:20 ` Michal Hocko 2019-10-22 8:23 ` David Hildenbrand 2019-10-22 8:58 ` Michal Hocko 2019-10-22 9:03 ` David Hildenbrand 2019-10-21 14:19 ` [PATCH v1 2/2] mm/page_isolation.c: Convert SKIP_HWPOISON to MEMORY_OFFLINE David Hildenbrand 2019-10-21 15:02 ` Michal Hocko 2019-10-21 15:04 ` David Hildenbrand
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.