* [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline @ 2020-09-01 12:46 Pavel Tatashin 2020-09-01 18:37 ` David Rientjes ` (4 more replies) 0 siblings, 5 replies; 23+ messages in thread From: Pavel Tatashin @ 2020-09-01 12:46 UTC (permalink / raw) To: linux-kernel, akpm, mhocko, linux-mm, pasha.tatashin There is a race during page offline that can lead to infinite loop: a page never ends up on a buddy list and __offline_pages() keeps retrying infinitely or until a termination signal is received. Thread#1 - a new process: load_elf_binary begin_new_exec exec_mmap mmput exit_mmap tlb_finish_mmu tlb_flush_mmu release_pages free_unref_page_list free_unref_page_prepare set_pcppage_migratetype(page, migratetype); // Set page->index migration type below MIGRATE_PCPTYPES Thread#2 - hot-removes memory __offline_pages start_isolate_page_range set_migratetype_isolate set_pageblock_migratetype(page, MIGRATE_ISOLATE); Set migration type to MIGRATE_ISOLATE-> set drain_all_pages(zone); // drain per-cpu page lists to buddy allocator. Thread#1 - continue free_unref_page_commit migratetype = get_pcppage_migratetype(page); // get old migration type list_add(&page->lru, &pcp->lists[migratetype]); // add new page to already drained pcp list Thread#2 Never drains pcp again, and therefore gets stuck in the loop. The fix is to try to drain per-cpu lists again after check_pages_isolated_cb() fails. Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: stable@vger.kernel.org --- mm/memory_hotplug.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index e9d5ab5d3ca0..d6d54922bfce 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1575,6 +1575,15 @@ static int __ref __offline_pages(unsigned long start_pfn, /* check again */ ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, NULL, check_pages_isolated_cb); + /* + * per-cpu pages are drained in start_isolate_page_range, but if + * there are still pages that are not free, make sure that we + * drain again, because when we isolated range we might + * have raced with another thread that was adding pages to + * pcp list. + */ + if (ret) + drain_all_pages(zone); } while (ret); /* Ok, all of our target is isolated. -- 2.25.1 ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-01 12:46 [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline Pavel Tatashin @ 2020-09-01 18:37 ` David Rientjes 2020-09-02 14:01 ` Michal Hocko ` (3 subsequent siblings) 4 siblings, 0 replies; 23+ messages in thread From: David Rientjes @ 2020-09-01 18:37 UTC (permalink / raw) To: Pavel Tatashin; +Cc: linux-kernel, akpm, mhocko, linux-mm On Tue, 1 Sep 2020, Pavel Tatashin wrote: > There is a race during page offline that can lead to infinite loop: > a page never ends up on a buddy list and __offline_pages() keeps > retrying infinitely or until a termination signal is received. > > Thread#1 - a new process: > > load_elf_binary > begin_new_exec > exec_mmap > mmput > exit_mmap > tlb_finish_mmu > tlb_flush_mmu > release_pages > free_unref_page_list > free_unref_page_prepare > set_pcppage_migratetype(page, migratetype); > // Set page->index migration type below MIGRATE_PCPTYPES > > Thread#2 - hot-removes memory > __offline_pages > start_isolate_page_range > set_migratetype_isolate > set_pageblock_migratetype(page, MIGRATE_ISOLATE); > Set migration type to MIGRATE_ISOLATE-> set > drain_all_pages(zone); > // drain per-cpu page lists to buddy allocator. > > Thread#1 - continue > free_unref_page_commit > migratetype = get_pcppage_migratetype(page); > // get old migration type > list_add(&page->lru, &pcp->lists[migratetype]); > // add new page to already drained pcp list > > Thread#2 > Never drains pcp again, and therefore gets stuck in the loop. > > The fix is to try to drain per-cpu lists again after > check_pages_isolated_cb() fails. > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > Cc: stable@vger.kernel.org Acked-by: David Rientjes <rientjes@google.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-01 12:46 [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline Pavel Tatashin 2020-09-01 18:37 ` David Rientjes @ 2020-09-02 14:01 ` Michal Hocko 2020-09-02 14:10 ` Michal Hocko 2020-09-02 14:08 ` Michal Hocko ` (2 subsequent siblings) 4 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2020-09-02 14:01 UTC (permalink / raw) To: Pavel Tatashin; +Cc: linux-kernel, akpm, linux-mm, Vlastimil Babka, Mel Gorman [Cc Mel and Vlastimil - I am still rummaging] On Tue 01-09-20 08:46:15, Pavel Tatashin wrote: > There is a race during page offline that can lead to infinite loop: > a page never ends up on a buddy list and __offline_pages() keeps > retrying infinitely or until a termination signal is received. > > Thread#1 - a new process: > > load_elf_binary > begin_new_exec > exec_mmap > mmput > exit_mmap > tlb_finish_mmu > tlb_flush_mmu > release_pages > free_unref_page_list > free_unref_page_prepare > set_pcppage_migratetype(page, migratetype); > // Set page->index migration type below MIGRATE_PCPTYPES > > Thread#2 - hot-removes memory > __offline_pages > start_isolate_page_range > set_migratetype_isolate > set_pageblock_migratetype(page, MIGRATE_ISOLATE); > Set migration type to MIGRATE_ISOLATE-> set > drain_all_pages(zone); > // drain per-cpu page lists to buddy allocator. > > Thread#1 - continue > free_unref_page_commit > migratetype = get_pcppage_migratetype(page); > // get old migration type > list_add(&page->lru, &pcp->lists[migratetype]); > // add new page to already drained pcp list > > Thread#2 > Never drains pcp again, and therefore gets stuck in the loop. > > The fix is to try to drain per-cpu lists again after > check_pages_isolated_cb() fails. > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > Cc: stable@vger.kernel.org > --- > mm/memory_hotplug.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index e9d5ab5d3ca0..d6d54922bfce 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1575,6 +1575,15 @@ static int __ref __offline_pages(unsigned long start_pfn, > /* check again */ > ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, > NULL, check_pages_isolated_cb); > + /* > + * per-cpu pages are drained in start_isolate_page_range, but if > + * there are still pages that are not free, make sure that we > + * drain again, because when we isolated range we might > + * have raced with another thread that was adding pages to > + * pcp list. > + */ > + if (ret) > + drain_all_pages(zone); > } while (ret); > > /* Ok, all of our target is isolated. > -- > 2.25.1 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-02 14:01 ` Michal Hocko @ 2020-09-02 14:10 ` Michal Hocko 2020-09-02 14:31 ` Pavel Tatashin 0 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2020-09-02 14:10 UTC (permalink / raw) To: Pavel Tatashin; +Cc: linux-kernel, akpm, linux-mm, Vlastimil Babka, Mel Gorman On Wed 02-09-20 16:01:17, Michal Hocko wrote: > [Cc Mel and Vlastimil - I am still rummaging] > > On Tue 01-09-20 08:46:15, Pavel Tatashin wrote: > > There is a race during page offline that can lead to infinite loop: > > a page never ends up on a buddy list and __offline_pages() keeps > > retrying infinitely or until a termination signal is received. > > > > Thread#1 - a new process: > > > > load_elf_binary > > begin_new_exec > > exec_mmap > > mmput > > exit_mmap > > tlb_finish_mmu > > tlb_flush_mmu > > release_pages > > free_unref_page_list > > free_unref_page_prepare > > set_pcppage_migratetype(page, migratetype); > > // Set page->index migration type below MIGRATE_PCPTYPES > > > > Thread#2 - hot-removes memory > > __offline_pages > > start_isolate_page_range > > set_migratetype_isolate > > set_pageblock_migratetype(page, MIGRATE_ISOLATE); > > Set migration type to MIGRATE_ISOLATE-> set > > drain_all_pages(zone); > > // drain per-cpu page lists to buddy allocator. > > > > Thread#1 - continue > > free_unref_page_commit > > migratetype = get_pcppage_migratetype(page); > > // get old migration type > > list_add(&page->lru, &pcp->lists[migratetype]); > > // add new page to already drained pcp list > > > > Thread#2 > > Never drains pcp again, and therefore gets stuck in the loop. > > > > The fix is to try to drain per-cpu lists again after > > check_pages_isolated_cb() fails. Still trying to wrap my head around this but I think this is not a proper fix. It should be the page isolation to make sure no races are possible with the page freeing path. > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > > Cc: stable@vger.kernel.org > > --- > > mm/memory_hotplug.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > index e9d5ab5d3ca0..d6d54922bfce 100644 > > --- a/mm/memory_hotplug.c > > +++ b/mm/memory_hotplug.c > > @@ -1575,6 +1575,15 @@ static int __ref __offline_pages(unsigned long start_pfn, > > /* check again */ > > ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, > > NULL, check_pages_isolated_cb); > > + /* > > + * per-cpu pages are drained in start_isolate_page_range, but if > > + * there are still pages that are not free, make sure that we > > + * drain again, because when we isolated range we might > > + * have raced with another thread that was adding pages to > > + * pcp list. > > + */ > > + if (ret) > > + drain_all_pages(zone); > > } while (ret); > > > > /* Ok, all of our target is isolated. > > -- > > 2.25.1 > > > > -- > Michal Hocko > SUSE Labs -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-02 14:10 ` Michal Hocko @ 2020-09-02 14:31 ` Pavel Tatashin 2020-09-02 14:49 ` Vlastimil Babka 0 siblings, 1 reply; 23+ messages in thread From: Pavel Tatashin @ 2020-09-02 14:31 UTC (permalink / raw) To: Michal Hocko; +Cc: LKML, Andrew Morton, linux-mm, Vlastimil Babka, Mel Gorman > > > The fix is to try to drain per-cpu lists again after > > > check_pages_isolated_cb() fails. > > Still trying to wrap my head around this but I think this is not a > proper fix. It should be the page isolation to make sure no races are > possible with the page freeing path. > As Bharata B Rao found in another thread, the problem was introduced by this change: c52e75935f8d: mm: remove extra drain pages on pcp list So, the drain used to be tried every time with lru_add_drain_all(); Which, I think is excessive, as we start a thread per cpu to try to drain and catch a rare race condition. With the proposed change we drain again only when we find such a condition. Fixing it in start_isolate_page_range means that we must somehow synchronize it with the release_pages() which adds costs to runtime code, instead of to hot-remove code. Pasha ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-02 14:31 ` Pavel Tatashin @ 2020-09-02 14:49 ` Vlastimil Babka 0 siblings, 0 replies; 23+ messages in thread From: Vlastimil Babka @ 2020-09-02 14:49 UTC (permalink / raw) To: Pavel Tatashin, Michal Hocko; +Cc: LKML, Andrew Morton, linux-mm, Mel Gorman On 9/2/20 4:31 PM, Pavel Tatashin wrote: >> > > The fix is to try to drain per-cpu lists again after >> > > check_pages_isolated_cb() fails. >> >> Still trying to wrap my head around this but I think this is not a >> proper fix. It should be the page isolation to make sure no races are >> possible with the page freeing path. >> > > As Bharata B Rao found in another thread, the problem was introduced > by this change: > c52e75935f8d: mm: remove extra drain pages on pcp list > > So, the drain used to be tried every time with lru_add_drain_all(); > Which, I think is excessive, as we start a thread per cpu to try to > drain and catch a rare race condition. With the proposed change we > drain again only when we find such a condition. Fixing it in > start_isolate_page_range means that we must somehow synchronize it > with the release_pages() which adds costs to runtime code, instead of > to hot-remove code. Agreed. Isolation was always racy wrt freeing to pcplists, and it was simply acceptable to do some extra drains if needed. Removing that race would be indeed acceptable only if it didn't affect alloc/free fastpaths. > Pasha > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-01 12:46 [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline Pavel Tatashin 2020-09-01 18:37 ` David Rientjes 2020-09-02 14:01 ` Michal Hocko @ 2020-09-02 14:08 ` Michal Hocko 2020-09-02 14:26 ` Pavel Tatashin 2020-09-03 7:07 ` Michal Hocko 2020-09-03 13:50 ` Vlastimil Babka 4 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2020-09-02 14:08 UTC (permalink / raw) To: Pavel Tatashin; +Cc: linux-kernel, akpm, linux-mm On Tue 01-09-20 08:46:15, Pavel Tatashin wrote: > There is a race during page offline that can lead to infinite loop: > a page never ends up on a buddy list and __offline_pages() keeps > retrying infinitely or until a termination signal is received. > > Thread#1 - a new process: > > load_elf_binary > begin_new_exec > exec_mmap > mmput > exit_mmap > tlb_finish_mmu > tlb_flush_mmu > release_pages > free_unref_page_list > free_unref_page_prepare > set_pcppage_migratetype(page, migratetype); > // Set page->index migration type below MIGRATE_PCPTYPES > > Thread#2 - hot-removes memory > __offline_pages > start_isolate_page_range > set_migratetype_isolate > set_pageblock_migratetype(page, MIGRATE_ISOLATE); > Set migration type to MIGRATE_ISOLATE-> set > drain_all_pages(zone); > // drain per-cpu page lists to buddy allocator. It is not really clear to me how we could have passed has_unmovable_pages at this stage when the page is not PageBuddy. Is this because you are using Movable Zones? > > Thread#1 - continue > free_unref_page_commit > migratetype = get_pcppage_migratetype(page); > // get old migration type > list_add(&page->lru, &pcp->lists[migratetype]); > // add new page to already drained pcp list > > Thread#2 > Never drains pcp again, and therefore gets stuck in the loop. > > The fix is to try to drain per-cpu lists again after > check_pages_isolated_cb() fails. But this means that the page is not isolated and so it could be reused for something else. No? > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > Cc: stable@vger.kernel.org > --- > mm/memory_hotplug.c | 9 +++++++++ > 1 file changed, 9 insertions(+) > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index e9d5ab5d3ca0..d6d54922bfce 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1575,6 +1575,15 @@ static int __ref __offline_pages(unsigned long start_pfn, > /* check again */ > ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, > NULL, check_pages_isolated_cb); > + /* > + * per-cpu pages are drained in start_isolate_page_range, but if > + * there are still pages that are not free, make sure that we > + * drain again, because when we isolated range we might > + * have raced with another thread that was adding pages to > + * pcp list. > + */ > + if (ret) > + drain_all_pages(zone); > } while (ret); > > /* Ok, all of our target is isolated. > -- > 2.25.1 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-02 14:08 ` Michal Hocko @ 2020-09-02 14:26 ` Pavel Tatashin 2020-09-02 14:55 ` Vlastimil Babka 0 siblings, 1 reply; 23+ messages in thread From: Pavel Tatashin @ 2020-09-02 14:26 UTC (permalink / raw) To: Michal Hocko; +Cc: LKML, Andrew Morton, linux-mm On Wed, Sep 2, 2020 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote: > > On Tue 01-09-20 08:46:15, Pavel Tatashin wrote: > > There is a race during page offline that can lead to infinite loop: > > a page never ends up on a buddy list and __offline_pages() keeps > > retrying infinitely or until a termination signal is received. > > > > Thread#1 - a new process: > > > > load_elf_binary > > begin_new_exec > > exec_mmap > > mmput > > exit_mmap > > tlb_finish_mmu > > tlb_flush_mmu > > release_pages > > free_unref_page_list > > free_unref_page_prepare > > set_pcppage_migratetype(page, migratetype); > > // Set page->index migration type below MIGRATE_PCPTYPES > > > > Thread#2 - hot-removes memory > > __offline_pages > > start_isolate_page_range > > set_migratetype_isolate > > set_pageblock_migratetype(page, MIGRATE_ISOLATE); > > Set migration type to MIGRATE_ISOLATE-> set > > drain_all_pages(zone); > > // drain per-cpu page lists to buddy allocator. > > It is not really clear to me how we could have passed > has_unmovable_pages at this stage when the page is not PageBuddy. Is > this because you are using Movable Zones? Yes, we hot-remove memory from the movable zone. > > > > > Thread#1 - continue > > free_unref_page_commit > > migratetype = get_pcppage_migratetype(page); > > // get old migration type > > list_add(&page->lru, &pcp->lists[migratetype]); > > // add new page to already drained pcp list > > > > Thread#2 > > Never drains pcp again, and therefore gets stuck in the loop. > > > > The fix is to try to drain per-cpu lists again after > > check_pages_isolated_cb() fails. > > But this means that the page is not isolated and so it could be reused > for something else. No? The page is in a movable zone, has zero references, and the section is isolated (i.e. set_pageblock_migratetype(page, MIGRATE_ISOLATE);) is set. The page should be offlinable, but it is lost in a pcp list as that list is never drained again after the first failure to migrate all pages in the range. > > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > > Cc: stable@vger.kernel.org > > --- > > mm/memory_hotplug.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > index e9d5ab5d3ca0..d6d54922bfce 100644 > > --- a/mm/memory_hotplug.c > > +++ b/mm/memory_hotplug.c > > @@ -1575,6 +1575,15 @@ static int __ref __offline_pages(unsigned long start_pfn, > > /* check again */ > > ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, > > NULL, check_pages_isolated_cb); > > + /* > > + * per-cpu pages are drained in start_isolate_page_range, but if > > + * there are still pages that are not free, make sure that we > > + * drain again, because when we isolated range we might > > + * have raced with another thread that was adding pages to > > + * pcp list. > > + */ > > + if (ret) > > + drain_all_pages(zone); > > } while (ret); > > > > /* Ok, all of our target is isolated. > > -- > > 2.25.1 > > > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-02 14:26 ` Pavel Tatashin @ 2020-09-02 14:55 ` Vlastimil Babka 2020-09-02 15:13 ` Michal Hocko 0 siblings, 1 reply; 23+ messages in thread From: Vlastimil Babka @ 2020-09-02 14:55 UTC (permalink / raw) To: Pavel Tatashin, Michal Hocko; +Cc: LKML, Andrew Morton, linux-mm On 9/2/20 4:26 PM, Pavel Tatashin wrote: > On Wed, Sep 2, 2020 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote: >> >> > >> > Thread#1 - continue >> > free_unref_page_commit >> > migratetype = get_pcppage_migratetype(page); >> > // get old migration type >> > list_add(&page->lru, &pcp->lists[migratetype]); >> > // add new page to already drained pcp list >> > >> > Thread#2 >> > Never drains pcp again, and therefore gets stuck in the loop. >> > >> > The fix is to try to drain per-cpu lists again after >> > check_pages_isolated_cb() fails. >> >> But this means that the page is not isolated and so it could be reused >> for something else. No? > > The page is in a movable zone, has zero references, and the section is > isolated (i.e. set_pageblock_migratetype(page, MIGRATE_ISOLATE);) is > set. The page should be offlinable, but it is lost in a pcp list as > that list is never drained again after the first failure to migrate > all pages in the range. Yeah. To answer Michal's "it could be reused for something else" - yes, somebody could allocate it from the pcplist before we do the extra drain. But then it becomes "visible again" and the loop in __offline_pages() should catch it by scan_movable_pages() - do_migrate_range(). And this time the pageblock is already marked as isolated, so the page (freed by migration) won't end up on the pcplist again. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-02 14:55 ` Vlastimil Babka @ 2020-09-02 15:13 ` Michal Hocko 2020-09-02 15:40 ` Pavel Tatashin 2020-09-02 17:51 ` Vlastimil Babka 0 siblings, 2 replies; 23+ messages in thread From: Michal Hocko @ 2020-09-02 15:13 UTC (permalink / raw) To: Vlastimil Babka; +Cc: Pavel Tatashin, LKML, Andrew Morton, linux-mm On Wed 02-09-20 16:55:05, Vlastimil Babka wrote: > On 9/2/20 4:26 PM, Pavel Tatashin wrote: > > On Wed, Sep 2, 2020 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote: > >> > >> > > >> > Thread#1 - continue > >> > free_unref_page_commit > >> > migratetype = get_pcppage_migratetype(page); > >> > // get old migration type > >> > list_add(&page->lru, &pcp->lists[migratetype]); > >> > // add new page to already drained pcp list > >> > > >> > Thread#2 > >> > Never drains pcp again, and therefore gets stuck in the loop. > >> > > >> > The fix is to try to drain per-cpu lists again after > >> > check_pages_isolated_cb() fails. > >> > >> But this means that the page is not isolated and so it could be reused > >> for something else. No? > > > > The page is in a movable zone, has zero references, and the section is > > isolated (i.e. set_pageblock_migratetype(page, MIGRATE_ISOLATE);) is > > set. The page should be offlinable, but it is lost in a pcp list as > > that list is never drained again after the first failure to migrate > > all pages in the range. > > Yeah. To answer Michal's "it could be reused for something else" - yes, somebody > could allocate it from the pcplist before we do the extra drain. But then it > becomes "visible again" and the loop in __offline_pages() should catch it by > scan_movable_pages() - do_migrate_range(). And this time the pageblock is > already marked as isolated, so the page (freed by migration) won't end up on the > pcplist again. So the page block is marked MIGRATE_ISOLATE but the allocation itself could be used for non migrateable objects. Or does anything prevent that from happening? We really do depend on isolation to not allow reuse when offlining. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-02 15:13 ` Michal Hocko @ 2020-09-02 15:40 ` Pavel Tatashin 2020-09-02 17:51 ` Vlastimil Babka 1 sibling, 0 replies; 23+ messages in thread From: Pavel Tatashin @ 2020-09-02 15:40 UTC (permalink / raw) To: Michal Hocko; +Cc: Vlastimil Babka, LKML, Andrew Morton, linux-mm > > >> But this means that the page is not isolated and so it could be reused > > >> for something else. No? > > > > > > The page is in a movable zone, has zero references, and the section is > > > isolated (i.e. set_pageblock_migratetype(page, MIGRATE_ISOLATE);) is > > > set. The page should be offlinable, but it is lost in a pcp list as > > > that list is never drained again after the first failure to migrate > > > all pages in the range. > > > > Yeah. To answer Michal's "it could be reused for something else" - yes, somebody > > could allocate it from the pcplist before we do the extra drain. But then it > > becomes "visible again" and the loop in __offline_pages() should catch it by > > scan_movable_pages() - do_migrate_range(). And this time the pageblock is > > already marked as isolated, so the page (freed by migration) won't end up on the > > pcplist again. > > So the page block is marked MIGRATE_ISOLATE but the allocation itself > could be used for non migrateable objects. Or does anything prevent that > from happening? Vlastimil is right, we could allocate from pcplist, if someone requests allocation, nothing from what I can tell prevents that, and we will immediately migrate that page in do_migrate_range(). > We really do depend on isolation to not allow reuse when offlining. Once a page is isolated it is not re-used but here this page is not isolated because of the race between adding to pcp and isolation. Draining the second time on a failure fixes the race. Pasha ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-02 15:13 ` Michal Hocko 2020-09-02 15:40 ` Pavel Tatashin @ 2020-09-02 17:51 ` Vlastimil Babka 2020-09-03 6:38 ` Michal Hocko 1 sibling, 1 reply; 23+ messages in thread From: Vlastimil Babka @ 2020-09-02 17:51 UTC (permalink / raw) To: Michal Hocko; +Cc: Pavel Tatashin, LKML, Andrew Morton, linux-mm On 9/2/20 5:13 PM, Michal Hocko wrote: > On Wed 02-09-20 16:55:05, Vlastimil Babka wrote: >> On 9/2/20 4:26 PM, Pavel Tatashin wrote: >> > On Wed, Sep 2, 2020 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote: >> >> >> >> > >> >> > Thread#1 - continue >> >> > free_unref_page_commit >> >> > migratetype = get_pcppage_migratetype(page); >> >> > // get old migration type >> >> > list_add(&page->lru, &pcp->lists[migratetype]); >> >> > // add new page to already drained pcp list >> >> > >> >> > Thread#2 >> >> > Never drains pcp again, and therefore gets stuck in the loop. >> >> > >> >> > The fix is to try to drain per-cpu lists again after >> >> > check_pages_isolated_cb() fails. >> >> >> >> But this means that the page is not isolated and so it could be reused >> >> for something else. No? >> > >> > The page is in a movable zone, has zero references, and the section is >> > isolated (i.e. set_pageblock_migratetype(page, MIGRATE_ISOLATE);) is >> > set. The page should be offlinable, but it is lost in a pcp list as >> > that list is never drained again after the first failure to migrate >> > all pages in the range. >> >> Yeah. To answer Michal's "it could be reused for something else" - yes, somebody >> could allocate it from the pcplist before we do the extra drain. But then it >> becomes "visible again" and the loop in __offline_pages() should catch it by >> scan_movable_pages() - do_migrate_range(). And this time the pageblock is >> already marked as isolated, so the page (freed by migration) won't end up on the >> pcplist again. > > So the page block is marked MIGRATE_ISOLATE but the allocation itself > could be used for non migrateable objects. Or does anything prevent that > from happening? In a movable zone, the allocation should not be used for non migrateable objects. E.g. if the zone was not ZONE_MOVABLE, the offlining could fail regardless of this race (analogically for migrating away from CMA pageblocks). > We really do depend on isolation to not allow reuse when offlining. This is not really different than if the page on pcplist was allocated just a moment before the offlining, thus isolation started. We ultimately rely on being able to migrate any allocated pages away during the isolation. This "freeing to pcplists" race doesn't fundamentally change anything in this regard. We just have to guarantee that pages on pcplists will be eventually flushed, to make forward progress, and there was a bug in this aspect. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-02 17:51 ` Vlastimil Babka @ 2020-09-03 6:38 ` Michal Hocko 2020-09-03 18:20 ` David Hildenbrand 0 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2020-09-03 6:38 UTC (permalink / raw) To: Vlastimil Babka; +Cc: Pavel Tatashin, LKML, Andrew Morton, linux-mm On Wed 02-09-20 19:51:45, Vlastimil Babka wrote: > On 9/2/20 5:13 PM, Michal Hocko wrote: > > On Wed 02-09-20 16:55:05, Vlastimil Babka wrote: > >> On 9/2/20 4:26 PM, Pavel Tatashin wrote: > >> > On Wed, Sep 2, 2020 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote: > >> >> > >> >> > > >> >> > Thread#1 - continue > >> >> > free_unref_page_commit > >> >> > migratetype = get_pcppage_migratetype(page); > >> >> > // get old migration type > >> >> > list_add(&page->lru, &pcp->lists[migratetype]); > >> >> > // add new page to already drained pcp list > >> >> > > >> >> > Thread#2 > >> >> > Never drains pcp again, and therefore gets stuck in the loop. > >> >> > > >> >> > The fix is to try to drain per-cpu lists again after > >> >> > check_pages_isolated_cb() fails. > >> >> > >> >> But this means that the page is not isolated and so it could be reused > >> >> for something else. No? > >> > > >> > The page is in a movable zone, has zero references, and the section is > >> > isolated (i.e. set_pageblock_migratetype(page, MIGRATE_ISOLATE);) is > >> > set. The page should be offlinable, but it is lost in a pcp list as > >> > that list is never drained again after the first failure to migrate > >> > all pages in the range. > >> > >> Yeah. To answer Michal's "it could be reused for something else" - yes, somebody > >> could allocate it from the pcplist before we do the extra drain. But then it > >> becomes "visible again" and the loop in __offline_pages() should catch it by > >> scan_movable_pages() - do_migrate_range(). And this time the pageblock is > >> already marked as isolated, so the page (freed by migration) won't end up on the > >> pcplist again. > > > > So the page block is marked MIGRATE_ISOLATE but the allocation itself > > could be used for non migrateable objects. Or does anything prevent that > > from happening? > > In a movable zone, the allocation should not be used for non migrateable > objects. E.g. if the zone was not ZONE_MOVABLE, the offlining could fail > regardless of this race (analogically for migrating away from CMA pageblocks). > > > We really do depend on isolation to not allow reuse when offlining. > > This is not really different than if the page on pcplist was allocated just a > moment before the offlining, thus isolation started. We ultimately rely on being > able to migrate any allocated pages away during the isolation. This "freeing to > pcplists" race doesn't fundamentally change anything in this regard. We just > have to guarantee that pages on pcplists will be eventually flushed, to make > forward progress, and there was a bug in this aspect. You are right. I managed to confuse myself yesterday. The race is impossible for !ZONE_MOVABLE because we do PageBuddy check there. And on the movable zone we are not losing the migrateability property. Pavel I think this will be a useful information to add to the changelog. We should also document this in the code to prevent from further confusion. I would suggest something like the following: diff --git a/mm/page_isolation.c b/mm/page_isolation.c index 242c03121d73..56d4892bceb8 100644 --- a/mm/page_isolation.c +++ b/mm/page_isolation.c @@ -170,6 +170,14 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) * pageblocks we may have modified and return -EBUSY to caller. This * prevents two threads from simultaneously working on overlapping ranges. * + * Please note that there is no strong synchronization with the page allocator + * either. Pages might be freed while their page blocks are marked ISOLATED. + * In some cases pages might still end up on pcp lists and that would allow + * for their allocation even when they are in fact isolated already. Depending on + * how strong of a guarantee the caller needs drain_all_pages might be needed + * (e.g. __offline_pages will need to call it after check for isolated range for + * a next retry). + * * Return: the number of isolated pageblocks on success and -EBUSY if any part * of range cannot be isolated. */ -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-03 6:38 ` Michal Hocko @ 2020-09-03 18:20 ` David Hildenbrand 2020-09-03 18:23 ` Pavel Tatashin 0 siblings, 1 reply; 23+ messages in thread From: David Hildenbrand @ 2020-09-03 18:20 UTC (permalink / raw) To: Michal Hocko, Vlastimil Babka Cc: Pavel Tatashin, LKML, Andrew Morton, linux-mm On 03.09.20 08:38, Michal Hocko wrote: > On Wed 02-09-20 19:51:45, Vlastimil Babka wrote: >> On 9/2/20 5:13 PM, Michal Hocko wrote: >>> On Wed 02-09-20 16:55:05, Vlastimil Babka wrote: >>>> On 9/2/20 4:26 PM, Pavel Tatashin wrote: >>>>> On Wed, Sep 2, 2020 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote: >>>>>> >>>>>>> >>>>>>> Thread#1 - continue >>>>>>> free_unref_page_commit >>>>>>> migratetype = get_pcppage_migratetype(page); >>>>>>> // get old migration type >>>>>>> list_add(&page->lru, &pcp->lists[migratetype]); >>>>>>> // add new page to already drained pcp list >>>>>>> >>>>>>> Thread#2 >>>>>>> Never drains pcp again, and therefore gets stuck in the loop. >>>>>>> >>>>>>> The fix is to try to drain per-cpu lists again after >>>>>>> check_pages_isolated_cb() fails. >>>>>> >>>>>> But this means that the page is not isolated and so it could be reused >>>>>> for something else. No? >>>>> >>>>> The page is in a movable zone, has zero references, and the section is >>>>> isolated (i.e. set_pageblock_migratetype(page, MIGRATE_ISOLATE);) is >>>>> set. The page should be offlinable, but it is lost in a pcp list as >>>>> that list is never drained again after the first failure to migrate >>>>> all pages in the range. >>>> >>>> Yeah. To answer Michal's "it could be reused for something else" - yes, somebody >>>> could allocate it from the pcplist before we do the extra drain. But then it >>>> becomes "visible again" and the loop in __offline_pages() should catch it by >>>> scan_movable_pages() - do_migrate_range(). And this time the pageblock is >>>> already marked as isolated, so the page (freed by migration) won't end up on the >>>> pcplist again. >>> >>> So the page block is marked MIGRATE_ISOLATE but the allocation itself >>> could be used for non migrateable objects. Or does anything prevent that >>> from happening? >> >> In a movable zone, the allocation should not be used for non migrateable >> objects. E.g. if the zone was not ZONE_MOVABLE, the offlining could fail >> regardless of this race (analogically for migrating away from CMA pageblocks). >> >>> We really do depend on isolation to not allow reuse when offlining. >> >> This is not really different than if the page on pcplist was allocated just a >> moment before the offlining, thus isolation started. We ultimately rely on being >> able to migrate any allocated pages away during the isolation. This "freeing to >> pcplists" race doesn't fundamentally change anything in this regard. We just >> have to guarantee that pages on pcplists will be eventually flushed, to make >> forward progress, and there was a bug in this aspect. > > You are right. I managed to confuse myself yesterday. The race is > impossible for !ZONE_MOVABLE because we do PageBuddy check there. And on > the movable zone we are not losing the migrateability property. > > Pavel I think this will be a useful information to add to the changelog. > We should also document this in the code to prevent from further > confusion. I would suggest something like the following: > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > index 242c03121d73..56d4892bceb8 100644 > --- a/mm/page_isolation.c > +++ b/mm/page_isolation.c > @@ -170,6 +170,14 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) > * pageblocks we may have modified and return -EBUSY to caller. This > * prevents two threads from simultaneously working on overlapping ranges. > * > + * Please note that there is no strong synchronization with the page allocator > + * either. Pages might be freed while their page blocks are marked ISOLATED. > + * In some cases pages might still end up on pcp lists and that would allow > + * for their allocation even when they are in fact isolated already. Depending on > + * how strong of a guarantee the caller needs drain_all_pages might be needed > + * (e.g. __offline_pages will need to call it after check for isolated range for > + * a next retry). > + * As expressed in reply to v2, I dislike this hack. There is strong synchronization, just PCP is special. Allocating from MIGRATE_ISOLATE is just plain ugly. Can't we temporarily disable PCP (while some pageblock in the zone is isolated, which we know e.g., due to the counter), so no new pages get put into PCP lists after draining, and re-enable after no pageblocks are isolated again? We keep draining the PCP, so it doesn't seem to be of a lot of use during that period, no? It's a performance hit already. Then, we would only need exactly one drain. And we would only have to check on the free path whether PCP is temporarily disabled. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-03 18:20 ` David Hildenbrand @ 2020-09-03 18:23 ` Pavel Tatashin 2020-09-03 18:31 ` David Hildenbrand 2020-09-04 6:32 ` Vlastimil Babka 0 siblings, 2 replies; 23+ messages in thread From: Pavel Tatashin @ 2020-09-03 18:23 UTC (permalink / raw) To: David Hildenbrand Cc: Michal Hocko, Vlastimil Babka, LKML, Andrew Morton, linux-mm On Thu, Sep 3, 2020 at 2:20 PM David Hildenbrand <david@redhat.com> wrote: > > On 03.09.20 08:38, Michal Hocko wrote: > > On Wed 02-09-20 19:51:45, Vlastimil Babka wrote: > >> On 9/2/20 5:13 PM, Michal Hocko wrote: > >>> On Wed 02-09-20 16:55:05, Vlastimil Babka wrote: > >>>> On 9/2/20 4:26 PM, Pavel Tatashin wrote: > >>>>> On Wed, Sep 2, 2020 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote: > >>>>>> > >>>>>>> > >>>>>>> Thread#1 - continue > >>>>>>> free_unref_page_commit > >>>>>>> migratetype = get_pcppage_migratetype(page); > >>>>>>> // get old migration type > >>>>>>> list_add(&page->lru, &pcp->lists[migratetype]); > >>>>>>> // add new page to already drained pcp list > >>>>>>> > >>>>>>> Thread#2 > >>>>>>> Never drains pcp again, and therefore gets stuck in the loop. > >>>>>>> > >>>>>>> The fix is to try to drain per-cpu lists again after > >>>>>>> check_pages_isolated_cb() fails. > >>>>>> > >>>>>> But this means that the page is not isolated and so it could be reused > >>>>>> for something else. No? > >>>>> > >>>>> The page is in a movable zone, has zero references, and the section is > >>>>> isolated (i.e. set_pageblock_migratetype(page, MIGRATE_ISOLATE);) is > >>>>> set. The page should be offlinable, but it is lost in a pcp list as > >>>>> that list is never drained again after the first failure to migrate > >>>>> all pages in the range. > >>>> > >>>> Yeah. To answer Michal's "it could be reused for something else" - yes, somebody > >>>> could allocate it from the pcplist before we do the extra drain. But then it > >>>> becomes "visible again" and the loop in __offline_pages() should catch it by > >>>> scan_movable_pages() - do_migrate_range(). And this time the pageblock is > >>>> already marked as isolated, so the page (freed by migration) won't end up on the > >>>> pcplist again. > >>> > >>> So the page block is marked MIGRATE_ISOLATE but the allocation itself > >>> could be used for non migrateable objects. Or does anything prevent that > >>> from happening? > >> > >> In a movable zone, the allocation should not be used for non migrateable > >> objects. E.g. if the zone was not ZONE_MOVABLE, the offlining could fail > >> regardless of this race (analogically for migrating away from CMA pageblocks). > >> > >>> We really do depend on isolation to not allow reuse when offlining. > >> > >> This is not really different than if the page on pcplist was allocated just a > >> moment before the offlining, thus isolation started. We ultimately rely on being > >> able to migrate any allocated pages away during the isolation. This "freeing to > >> pcplists" race doesn't fundamentally change anything in this regard. We just > >> have to guarantee that pages on pcplists will be eventually flushed, to make > >> forward progress, and there was a bug in this aspect. > > > > You are right. I managed to confuse myself yesterday. The race is > > impossible for !ZONE_MOVABLE because we do PageBuddy check there. And on > > the movable zone we are not losing the migrateability property. > > > > Pavel I think this will be a useful information to add to the changelog. > > We should also document this in the code to prevent from further > > confusion. I would suggest something like the following: > > > > diff --git a/mm/page_isolation.c b/mm/page_isolation.c > > index 242c03121d73..56d4892bceb8 100644 > > --- a/mm/page_isolation.c > > +++ b/mm/page_isolation.c > > @@ -170,6 +170,14 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) > > * pageblocks we may have modified and return -EBUSY to caller. This > > * prevents two threads from simultaneously working on overlapping ranges. > > * > > + * Please note that there is no strong synchronization with the page allocator > > + * either. Pages might be freed while their page blocks are marked ISOLATED. > > + * In some cases pages might still end up on pcp lists and that would allow > > + * for their allocation even when they are in fact isolated already. Depending on > > + * how strong of a guarantee the caller needs drain_all_pages might be needed > > + * (e.g. __offline_pages will need to call it after check for isolated range for > > + * a next retry). > > + * > > As expressed in reply to v2, I dislike this hack. There is strong > synchronization, just PCP is special. Allocating from MIGRATE_ISOLATE is > just plain ugly. > > Can't we temporarily disable PCP (while some pageblock in the zone is > isolated, which we know e.g., due to the counter), so no new pages get > put into PCP lists after draining, and re-enable after no pageblocks are > isolated again? We keep draining the PCP, so it doesn't seem to be of a > lot of use during that period, no? It's a performance hit already. > > Then, we would only need exactly one drain. And we would only have to > check on the free path whether PCP is temporarily disabled. Hm, we could use a static branches to disable it, that would keep release code just as fast, but I am worried it will make code even uglier. Let's see what others in this thread think about this idea. Thank you, Pasha ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-03 18:23 ` Pavel Tatashin @ 2020-09-03 18:31 ` David Hildenbrand 2020-09-04 7:02 ` Michal Hocko 2020-09-04 6:32 ` Vlastimil Babka 1 sibling, 1 reply; 23+ messages in thread From: David Hildenbrand @ 2020-09-03 18:31 UTC (permalink / raw) To: Pavel Tatashin Cc: Michal Hocko, Vlastimil Babka, LKML, Andrew Morton, linux-mm On 03.09.20 20:23, Pavel Tatashin wrote: > On Thu, Sep 3, 2020 at 2:20 PM David Hildenbrand <david@redhat.com> wrote: >> >> On 03.09.20 08:38, Michal Hocko wrote: >>> On Wed 02-09-20 19:51:45, Vlastimil Babka wrote: >>>> On 9/2/20 5:13 PM, Michal Hocko wrote: >>>>> On Wed 02-09-20 16:55:05, Vlastimil Babka wrote: >>>>>> On 9/2/20 4:26 PM, Pavel Tatashin wrote: >>>>>>> On Wed, Sep 2, 2020 at 10:08 AM Michal Hocko <mhocko@suse.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> Thread#1 - continue >>>>>>>>> free_unref_page_commit >>>>>>>>> migratetype = get_pcppage_migratetype(page); >>>>>>>>> // get old migration type >>>>>>>>> list_add(&page->lru, &pcp->lists[migratetype]); >>>>>>>>> // add new page to already drained pcp list >>>>>>>>> >>>>>>>>> Thread#2 >>>>>>>>> Never drains pcp again, and therefore gets stuck in the loop. >>>>>>>>> >>>>>>>>> The fix is to try to drain per-cpu lists again after >>>>>>>>> check_pages_isolated_cb() fails. >>>>>>>> >>>>>>>> But this means that the page is not isolated and so it could be reused >>>>>>>> for something else. No? >>>>>>> >>>>>>> The page is in a movable zone, has zero references, and the section is >>>>>>> isolated (i.e. set_pageblock_migratetype(page, MIGRATE_ISOLATE);) is >>>>>>> set. The page should be offlinable, but it is lost in a pcp list as >>>>>>> that list is never drained again after the first failure to migrate >>>>>>> all pages in the range. >>>>>> >>>>>> Yeah. To answer Michal's "it could be reused for something else" - yes, somebody >>>>>> could allocate it from the pcplist before we do the extra drain. But then it >>>>>> becomes "visible again" and the loop in __offline_pages() should catch it by >>>>>> scan_movable_pages() - do_migrate_range(). And this time the pageblock is >>>>>> already marked as isolated, so the page (freed by migration) won't end up on the >>>>>> pcplist again. >>>>> >>>>> So the page block is marked MIGRATE_ISOLATE but the allocation itself >>>>> could be used for non migrateable objects. Or does anything prevent that >>>>> from happening? >>>> >>>> In a movable zone, the allocation should not be used for non migrateable >>>> objects. E.g. if the zone was not ZONE_MOVABLE, the offlining could fail >>>> regardless of this race (analogically for migrating away from CMA pageblocks). >>>> >>>>> We really do depend on isolation to not allow reuse when offlining. >>>> >>>> This is not really different than if the page on pcplist was allocated just a >>>> moment before the offlining, thus isolation started. We ultimately rely on being >>>> able to migrate any allocated pages away during the isolation. This "freeing to >>>> pcplists" race doesn't fundamentally change anything in this regard. We just >>>> have to guarantee that pages on pcplists will be eventually flushed, to make >>>> forward progress, and there was a bug in this aspect. >>> >>> You are right. I managed to confuse myself yesterday. The race is >>> impossible for !ZONE_MOVABLE because we do PageBuddy check there. And on >>> the movable zone we are not losing the migrateability property. >>> >>> Pavel I think this will be a useful information to add to the changelog. >>> We should also document this in the code to prevent from further >>> confusion. I would suggest something like the following: >>> >>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c >>> index 242c03121d73..56d4892bceb8 100644 >>> --- a/mm/page_isolation.c >>> +++ b/mm/page_isolation.c >>> @@ -170,6 +170,14 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) >>> * pageblocks we may have modified and return -EBUSY to caller. This >>> * prevents two threads from simultaneously working on overlapping ranges. >>> * >>> + * Please note that there is no strong synchronization with the page allocator >>> + * either. Pages might be freed while their page blocks are marked ISOLATED. >>> + * In some cases pages might still end up on pcp lists and that would allow >>> + * for their allocation even when they are in fact isolated already. Depending on >>> + * how strong of a guarantee the caller needs drain_all_pages might be needed >>> + * (e.g. __offline_pages will need to call it after check for isolated range for >>> + * a next retry). >>> + * >> >> As expressed in reply to v2, I dislike this hack. There is strong >> synchronization, just PCP is special. Allocating from MIGRATE_ISOLATE is >> just plain ugly. >> >> Can't we temporarily disable PCP (while some pageblock in the zone is >> isolated, which we know e.g., due to the counter), so no new pages get >> put into PCP lists after draining, and re-enable after no pageblocks are >> isolated again? We keep draining the PCP, so it doesn't seem to be of a >> lot of use during that period, no? It's a performance hit already. >> >> Then, we would only need exactly one drain. And we would only have to >> check on the free path whether PCP is temporarily disabled. > > Hm, we could use a static branches to disable it, that would keep > release code just as fast, but I am worried it will make code even > uglier. Let's see what others in this thread think about this idea. It would at least stop this "allocate from MIOGRATE_ISOLATE" behavior and the "drain when you feel like it, and drain more frequently to work around broken PCP code" handling. Anyhow, I'll be offline until next Tuesday, cheers! -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-03 18:31 ` David Hildenbrand @ 2020-09-04 7:02 ` Michal Hocko 2020-09-04 14:25 ` Pavel Tatashin 0 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2020-09-04 7:02 UTC (permalink / raw) To: David Hildenbrand Cc: Pavel Tatashin, Vlastimil Babka, LKML, Andrew Morton, linux-mm On Thu 03-09-20 20:31:04, David Hildenbrand wrote: > On 03.09.20 20:23, Pavel Tatashin wrote: > > On Thu, Sep 3, 2020 at 2:20 PM David Hildenbrand <david@redhat.com> wrote: > >> > >> On 03.09.20 08:38, Michal Hocko wrote: [...] > >>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c > >>> index 242c03121d73..56d4892bceb8 100644 > >>> --- a/mm/page_isolation.c > >>> +++ b/mm/page_isolation.c > >>> @@ -170,6 +170,14 @@ __first_valid_page(unsigned long pfn, unsigned long nr_pages) > >>> * pageblocks we may have modified and return -EBUSY to caller. This > >>> * prevents two threads from simultaneously working on overlapping ranges. > >>> * > >>> + * Please note that there is no strong synchronization with the page allocator > >>> + * either. Pages might be freed while their page blocks are marked ISOLATED. > >>> + * In some cases pages might still end up on pcp lists and that would allow > >>> + * for their allocation even when they are in fact isolated already. Depending on > >>> + * how strong of a guarantee the caller needs drain_all_pages might be needed > >>> + * (e.g. __offline_pages will need to call it after check for isolated range for > >>> + * a next retry). > >>> + * > >> > >> As expressed in reply to v2, I dislike this hack. There is strong > >> synchronization, just PCP is special. Allocating from MIGRATE_ISOLATE is > >> just plain ugly. Completely agreed! I am not happy about that either. But I believe this hack is the easiest way forward for stable trees and as an immediate fix. We can build on top of that of course. > >> Can't we temporarily disable PCP (while some pageblock in the zone is > >> isolated, which we know e.g., due to the counter), so no new pages get > >> put into PCP lists after draining, and re-enable after no pageblocks are > >> isolated again? We keep draining the PCP, so it doesn't seem to be of a > >> lot of use during that period, no? It's a performance hit already. This is a good point. > >> Then, we would only need exactly one drain. And we would only have to > >> check on the free path whether PCP is temporarily disabled. > > > > Hm, we could use a static branches to disable it, that would keep > > release code just as fast, but I am worried it will make code even > > uglier. Let's see what others in this thread think about this idea. I know that static branches are a very effective way to enable/disable features but I have no experience in how they perform for a very shortlived use. Maybe that is just fine for a single place which needs to be patched. This would be especially a problem if the static branch is to be enabled from start_isolate_page_range because that includes all cma allocator users. Another alternative would be to enable/disable static branch only from users who really care but this is quite tricky because how do you tell you need or not? It seems that alloc_contig_range would be just fine with a weaker semantic because it would "only" to a spurious failure. Memory hotplug on the other hand really needs to have a point where nobody interferes with the offlined memory so it could ask for a stronger semantic. Yet another option would be to make draining stronger and actually guarantee there are no in-flight pages to be freed to the pcp list. One way would be to tweak pcp->high and implement a strong barrier (IPI?) to sync with all CPUs. Quite expensive, especially when there are many draining requests (read cma users because hotplug doesn't really matter much as it happens seldom). So no nice&cheap solution I can think of... -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-04 7:02 ` Michal Hocko @ 2020-09-04 14:25 ` Pavel Tatashin 2020-09-07 7:26 ` Michal Hocko 0 siblings, 1 reply; 23+ messages in thread From: Pavel Tatashin @ 2020-09-04 14:25 UTC (permalink / raw) To: Michal Hocko Cc: David Hildenbrand, Vlastimil Babka, LKML, Andrew Morton, linux-mm > Another alternative would be to enable/disable static branch only from > users who really care but this is quite tricky because how do you tell > you need or not? It seems that alloc_contig_range would be just fine > with a weaker semantic because it would "only" to a spurious failure. > Memory hotplug on the other hand really needs to have a point where > nobody interferes with the offlined memory so it could ask for a > stronger semantic. > > Yet another option would be to make draining stronger and actually > guarantee there are no in-flight pages to be freed to the pcp list. > One way would be to tweak pcp->high and implement a strong barrier > (IPI?) to sync with all CPUs. Quite expensive, especially when there are > many draining requests (read cma users because hotplug doesn't really > matter much as it happens seldom). > > So no nice&cheap solution I can think of... I think start_isolate_page_range() should not be doing page draining at all. It should isolate ranges, meaning set appropriate flags, but draining should be performed by the users when appropriate: next to lru_add_drain_all() calls both in CMA and hotplug. Currently, the way start_isolate_page_range() drains pages is very inefficient. It calls drain_all_pages() for every valid page block, which is a slow call as it starts a thread per cpu, and waits for those threads to finish before returning. We could optimize by moving the drain_all_pages() calls from set_migratetype_isolate() to start_isolate_page_range() and call it once for every different zone, but both current users of this interface guarantee that all pfns [start_pfn, end_pfn] are within the same zone, and I think we should keep it this way, so again the extra traversal is going to be overhead overhead. This way we will have on average only a single drain per hot-remove (instead of one per block), and also it is going to be symmetric only in one place. Faster hot-remove and cma alloc, and no race, imo win-win. Pasha ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-04 14:25 ` Pavel Tatashin @ 2020-09-07 7:26 ` Michal Hocko 0 siblings, 0 replies; 23+ messages in thread From: Michal Hocko @ 2020-09-07 7:26 UTC (permalink / raw) To: Pavel Tatashin Cc: David Hildenbrand, Vlastimil Babka, LKML, Andrew Morton, linux-mm On Fri 04-09-20 10:25:02, Pavel Tatashin wrote: > > Another alternative would be to enable/disable static branch only from > > users who really care but this is quite tricky because how do you tell > > you need or not? It seems that alloc_contig_range would be just fine > > with a weaker semantic because it would "only" to a spurious failure. > > Memory hotplug on the other hand really needs to have a point where > > nobody interferes with the offlined memory so it could ask for a > > stronger semantic. > > > > Yet another option would be to make draining stronger and actually > > guarantee there are no in-flight pages to be freed to the pcp list. > > One way would be to tweak pcp->high and implement a strong barrier > > (IPI?) to sync with all CPUs. Quite expensive, especially when there are > > many draining requests (read cma users because hotplug doesn't really > > matter much as it happens seldom). > > > > So no nice&cheap solution I can think of... > > I think start_isolate_page_range() should not be doing page draining > at all. It should isolate ranges, meaning set appropriate flags, but > draining should be performed by the users when appropriate: next to > lru_add_drain_all() calls both in CMA and hotplug. I disagree. The pcp draining is an implementation detail and we shouldn't bother callers to be aware of it. > Currently, the way start_isolate_page_range() drains pages is very > inefficient. It calls drain_all_pages() for every valid page block, > which is a slow call as it starts a thread per cpu, and waits for > those threads to finish before returning. This is an implementation detail. > We could optimize by moving the drain_all_pages() calls from > set_migratetype_isolate() to start_isolate_page_range() and call it > once for every different zone, but both current users of this > interface guarantee that all pfns [start_pfn, end_pfn] are within the > same zone, and I think we should keep it this way, so again the extra > traversal is going to be overhead overhead. Again this just leads to tricky code. Just look at how easy it was to break this by removing something that looked clearly a duplicate call. It is true that memory isolation usage is limited to only few usecasaes but I would strongly prefer to make the semantic clear so that we do not repeat this regressions. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-03 18:23 ` Pavel Tatashin 2020-09-03 18:31 ` David Hildenbrand @ 2020-09-04 6:32 ` Vlastimil Babka 1 sibling, 0 replies; 23+ messages in thread From: Vlastimil Babka @ 2020-09-04 6:32 UTC (permalink / raw) To: Pavel Tatashin, David Hildenbrand Cc: Michal Hocko, LKML, Andrew Morton, linux-mm On 9/3/20 8:23 PM, Pavel Tatashin wrote: >> >> As expressed in reply to v2, I dislike this hack. There is strong >> synchronization, just PCP is special. Allocating from MIGRATE_ISOLATE is >> just plain ugly. >> >> Can't we temporarily disable PCP (while some pageblock in the zone is >> isolated, which we know e.g., due to the counter), so no new pages get >> put into PCP lists after draining, and re-enable after no pageblocks are >> isolated again? We keep draining the PCP, so it doesn't seem to be of a >> lot of use during that period, no? It's a performance hit already. >> >> Then, we would only need exactly one drain. And we would only have to >> check on the free path whether PCP is temporarily disabled. > > Hm, we could use a static branches to disable it, that would keep > release code just as fast, but I am worried it will make code even > uglier. Let's see what others in this thread think about this idea. Maybe we could just set pcp->high = 0 or something, make sure the pcplist user only reads this value while irqs are disabled. Then the the IPI in drain_all_pages() should guarantee there's nobody racing freeing to pcplist. But careful to not introduce bugs like the one fixed with [1]. And not sure if this guarantee survives when RT comes and replaces the disabled irq's with local_lock or something. [1] https://lore.kernel.org/linux-mm/1597150703-19003-1-git-send-email-charante@codeaurora.org/ > Thank you, > Pasha > ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-01 12:46 [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline Pavel Tatashin ` (2 preceding siblings ...) 2020-09-02 14:08 ` Michal Hocko @ 2020-09-03 7:07 ` Michal Hocko 2020-09-03 13:43 ` Pavel Tatashin 2020-09-03 13:50 ` Vlastimil Babka 4 siblings, 1 reply; 23+ messages in thread From: Michal Hocko @ 2020-09-03 7:07 UTC (permalink / raw) To: Pavel Tatashin; +Cc: linux-kernel, akpm, linux-mm On Tue 01-09-20 08:46:15, Pavel Tatashin wrote: [...] > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index e9d5ab5d3ca0..d6d54922bfce 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -1575,6 +1575,15 @@ static int __ref __offline_pages(unsigned long start_pfn, > /* check again */ > ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, > NULL, check_pages_isolated_cb); > + /* > + * per-cpu pages are drained in start_isolate_page_range, but if > + * there are still pages that are not free, make sure that we > + * drain again, because when we isolated range we might > + * have raced with another thread that was adding pages to > + * pcp list. I would also add * Forward progress should be still guaranteed because * pages on the pcp list can only belong to MOVABLE_ZONE * because has_unmovable_pages explicitly checks for * PageBuddy on freed pages on other zones. > + */ > + if (ret) > + drain_all_pages(zone); > } while (ret); > > /* Ok, all of our target is isolated. > -- > 2.25.1 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-03 7:07 ` Michal Hocko @ 2020-09-03 13:43 ` Pavel Tatashin 0 siblings, 0 replies; 23+ messages in thread From: Pavel Tatashin @ 2020-09-03 13:43 UTC (permalink / raw) To: Michal Hocko; +Cc: LKML, Andrew Morton, linux-mm Thanks Michal, I will add your comments. Pasha On Thu, Sep 3, 2020 at 3:07 AM Michal Hocko <mhocko@suse.com> wrote: > > On Tue 01-09-20 08:46:15, Pavel Tatashin wrote: > [...] > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > > index e9d5ab5d3ca0..d6d54922bfce 100644 > > --- a/mm/memory_hotplug.c > > +++ b/mm/memory_hotplug.c > > @@ -1575,6 +1575,15 @@ static int __ref __offline_pages(unsigned long start_pfn, > > /* check again */ > > ret = walk_system_ram_range(start_pfn, end_pfn - start_pfn, > > NULL, check_pages_isolated_cb); > > + /* > > + * per-cpu pages are drained in start_isolate_page_range, but if > > + * there are still pages that are not free, make sure that we > > + * drain again, because when we isolated range we might > > + * have raced with another thread that was adding pages to > > + * pcp list. > > I would also add > * Forward progress should be still guaranteed because > * pages on the pcp list can only belong to MOVABLE_ZONE > * because has_unmovable_pages explicitly checks for > * PageBuddy on freed pages on other zones. > > + */ > > + if (ret) > > + drain_all_pages(zone); > > } while (ret); > > > > /* Ok, all of our target is isolated. > > -- > > 2.25.1 > > > > -- > Michal Hocko > SUSE Labs ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline 2020-09-01 12:46 [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline Pavel Tatashin ` (3 preceding siblings ...) 2020-09-03 7:07 ` Michal Hocko @ 2020-09-03 13:50 ` Vlastimil Babka 4 siblings, 0 replies; 23+ messages in thread From: Vlastimil Babka @ 2020-09-03 13:50 UTC (permalink / raw) To: Pavel Tatashin, linux-kernel, akpm, mhocko, linux-mm On 9/1/20 2:46 PM, Pavel Tatashin wrote: > There is a race during page offline that can lead to infinite loop: > a page never ends up on a buddy list and __offline_pages() keeps > retrying infinitely or until a termination signal is received. > > Thread#1 - a new process: > > load_elf_binary > begin_new_exec > exec_mmap > mmput > exit_mmap > tlb_finish_mmu > tlb_flush_mmu > release_pages > free_unref_page_list > free_unref_page_prepare > set_pcppage_migratetype(page, migratetype); > // Set page->index migration type below MIGRATE_PCPTYPES > > Thread#2 - hot-removes memory > __offline_pages > start_isolate_page_range > set_migratetype_isolate > set_pageblock_migratetype(page, MIGRATE_ISOLATE); > Set migration type to MIGRATE_ISOLATE-> set > drain_all_pages(zone); > // drain per-cpu page lists to buddy allocator. > > Thread#1 - continue > free_unref_page_commit > migratetype = get_pcppage_migratetype(page); > // get old migration type > list_add(&page->lru, &pcp->lists[migratetype]); > // add new page to already drained pcp list > > Thread#2 > Never drains pcp again, and therefore gets stuck in the loop. > > The fix is to try to drain per-cpu lists again after > check_pages_isolated_cb() fails. > > Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com> > Cc: stable@vger.kernel.org Fixes: ? Acked-by: Vlastimil Babka <vbabka@suse.cz> Thanks. ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2020-09-07 7:26 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-01 12:46 [PATCH] mm/memory_hotplug: drain per-cpu pages again during memory offline Pavel Tatashin 2020-09-01 18:37 ` David Rientjes 2020-09-02 14:01 ` Michal Hocko 2020-09-02 14:10 ` Michal Hocko 2020-09-02 14:31 ` Pavel Tatashin 2020-09-02 14:49 ` Vlastimil Babka 2020-09-02 14:08 ` Michal Hocko 2020-09-02 14:26 ` Pavel Tatashin 2020-09-02 14:55 ` Vlastimil Babka 2020-09-02 15:13 ` Michal Hocko 2020-09-02 15:40 ` Pavel Tatashin 2020-09-02 17:51 ` Vlastimil Babka 2020-09-03 6:38 ` Michal Hocko 2020-09-03 18:20 ` David Hildenbrand 2020-09-03 18:23 ` Pavel Tatashin 2020-09-03 18:31 ` David Hildenbrand 2020-09-04 7:02 ` Michal Hocko 2020-09-04 14:25 ` Pavel Tatashin 2020-09-07 7:26 ` Michal Hocko 2020-09-04 6:32 ` Vlastimil Babka 2020-09-03 7:07 ` Michal Hocko 2020-09-03 13:43 ` Pavel Tatashin 2020-09-03 13:50 ` Vlastimil Babka
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).