* [PATCH 0/3] hugetlb: fix potential ref counting races @ 2021-07-10 0:24 Mike Kravetz 2021-07-10 0:24 ` [PATCH 1/3] hugetlb: simplify prep_compound_gigantic_page ref count racing code Mike Kravetz ` (3 more replies) 0 siblings, 4 replies; 10+ messages in thread From: Mike Kravetz @ 2021-07-10 0:24 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: Muchun Song, Michal Hocko, Oscar Salvador, David Hildenbrand, Matthew Wilcox, Naoya Horiguchi, Mina Almasry, Andrew Morton, Mike Kravetz When Muchun Song brought up a potential issue with hugetlb ref counting[1], I started looking closer at the code. hugetlbfs is the only code with it's own specialized compound page destructor and taking special action when ref counts drop to zero. Potential races happen in this unique handling of ref counts. The following patches address these races when creating and destroying hugetlb pages. These potential races have likely existed since the creation of hugetlbfs. They certainly have been around for more than 10 years. However, I am unaware of anyone actually hitting these races. It is VERY unlikely than anyone will actually hit these races, but they do exist. I could not think of an easy (or difficult) way to force these races. Therefore, testing consisted of adding code to randomly increase ref counts in strategic places. In this way, I was able to exercise all the race handling code paths. [1] https://lore.kernel.org/linux-mm/CAMZfGtVMn3daKrJwZMaVOGOaJU+B4dS--x_oPmGQMD=c=QNGEg@mail.gmail.com/ Mike Kravetz (3): hugetlb: simplify prep_compound_gigantic_page ref count racing code hugetlb: drop ref count earlier after page allocation hugetlb: before freeing hugetlb page set dtor to appropriate value mm/hugetlb.c | 137 ++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 104 insertions(+), 33 deletions(-) -- 2.31.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] hugetlb: simplify prep_compound_gigantic_page ref count racing code 2021-07-10 0:24 [PATCH 0/3] hugetlb: fix potential ref counting races Mike Kravetz @ 2021-07-10 0:24 ` Mike Kravetz 2021-07-13 6:31 ` [External] " Muchun Song 2021-07-10 0:24 ` [PATCH 2/3] hugetlb: drop ref count earlier after page allocation Mike Kravetz ` (2 subsequent siblings) 3 siblings, 1 reply; 10+ messages in thread From: Mike Kravetz @ 2021-07-10 0:24 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: Muchun Song, Michal Hocko, Oscar Salvador, David Hildenbrand, Matthew Wilcox, Naoya Horiguchi, Mina Almasry, Andrew Morton, Mike Kravetz Code in prep_compound_gigantic_page waits for a rcu grace period if it notices a temporarily inflated ref count on a tail page. This was due to the identified potential race with speculative page cache references which could only last for a rcu grace period. This is overly complicated as this situation is VERY unlikely to ever happen. Instead, just quickly return an error. Also, only print a warning in prep_compound_gigantic_page instead of multiple callers. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- mm/hugetlb.c | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 924553aa8f78..e59ebba63da7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1657,16 +1657,12 @@ static bool prep_compound_gigantic_page(struct page *page, unsigned int order) * cache adding could take a ref on a 'to be' tail page. * We need to respect any increased ref count, and only set * the ref count to zero if count is currently 1. If count - * is not 1, we call synchronize_rcu in the hope that a rcu - * grace period will cause ref count to drop and then retry. - * If count is still inflated on retry we return an error and - * must discard the pages. + * is not 1, we return an error and caller must discard the + * pages. */ if (!page_ref_freeze(p, 1)) { - pr_info("HugeTLB unexpected inflated ref count on freshly allocated page\n"); - synchronize_rcu(); - if (!page_ref_freeze(p, 1)) - goto out_error; + pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); + goto out_error; } set_page_count(p, 0); set_compound_head(p, page); @@ -1830,7 +1826,6 @@ static struct page *alloc_fresh_huge_page(struct hstate *h, retry = true; goto retry; } - pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); return NULL; } } @@ -2828,8 +2823,8 @@ static void __init gather_bootmem_prealloc(void) prep_new_huge_page(h, page, page_to_nid(page)); put_page(page); /* add to the hugepage allocator */ } else { + /* VERY unlikely inflated ref count on a tail page */ free_gigantic_page(page, huge_page_order(h)); - pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); } /* -- 2.31.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [External] [PATCH 1/3] hugetlb: simplify prep_compound_gigantic_page ref count racing code 2021-07-10 0:24 ` [PATCH 1/3] hugetlb: simplify prep_compound_gigantic_page ref count racing code Mike Kravetz @ 2021-07-13 6:31 ` Muchun Song 0 siblings, 0 replies; 10+ messages in thread From: Muchun Song @ 2021-07-13 6:31 UTC (permalink / raw) To: Mike Kravetz Cc: Linux Memory Management List, LKML, Michal Hocko, Oscar Salvador, David Hildenbrand, Matthew Wilcox, Naoya Horiguchi, Mina Almasry, Andrew Morton On Sat, Jul 10, 2021 at 8:25 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > Code in prep_compound_gigantic_page waits for a rcu grace period if it > notices a temporarily inflated ref count on a tail page. This was due > to the identified potential race with speculative page cache references > which could only last for a rcu grace period. This is overly complicated > as this situation is VERY unlikely to ever happen. Instead, just quickly > return an error. Right. The race is very very small. IMHO, that does not complicate the code is the right thing to do. > > Also, only print a warning in prep_compound_gigantic_page instead of > multiple callers. > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > --- > mm/hugetlb.c | 15 +++++---------- > 1 file changed, 5 insertions(+), 10 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 924553aa8f78..e59ebba63da7 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1657,16 +1657,12 @@ static bool prep_compound_gigantic_page(struct page *page, unsigned int order) > * cache adding could take a ref on a 'to be' tail page. > * We need to respect any increased ref count, and only set > * the ref count to zero if count is currently 1. If count > - * is not 1, we call synchronize_rcu in the hope that a rcu > - * grace period will cause ref count to drop and then retry. > - * If count is still inflated on retry we return an error and > - * must discard the pages. > + * is not 1, we return an error and caller must discard the > + * pages. Shall we add more details about why we discard the pages? Thanks. > */ > if (!page_ref_freeze(p, 1)) { > - pr_info("HugeTLB unexpected inflated ref count on freshly allocated page\n"); > - synchronize_rcu(); > - if (!page_ref_freeze(p, 1)) > - goto out_error; > + pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); > + goto out_error; > } > set_page_count(p, 0); > set_compound_head(p, page); > @@ -1830,7 +1826,6 @@ static struct page *alloc_fresh_huge_page(struct hstate *h, > retry = true; > goto retry; > } > - pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); > return NULL; > } > } > @@ -2828,8 +2823,8 @@ static void __init gather_bootmem_prealloc(void) > prep_new_huge_page(h, page, page_to_nid(page)); > put_page(page); /* add to the hugepage allocator */ > } else { > + /* VERY unlikely inflated ref count on a tail page */ > free_gigantic_page(page, huge_page_order(h)); > - pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n"); > } > > /* > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/3] hugetlb: drop ref count earlier after page allocation 2021-07-10 0:24 [PATCH 0/3] hugetlb: fix potential ref counting races Mike Kravetz 2021-07-10 0:24 ` [PATCH 1/3] hugetlb: simplify prep_compound_gigantic_page ref count racing code Mike Kravetz @ 2021-07-10 0:24 ` Mike Kravetz 2021-07-10 0:24 ` [PATCH 3/3] hugetlb: before freeing hugetlb page set dtor to appropriate value Mike Kravetz 2021-07-27 23:41 ` [PATCH 0/3] hugetlb: fix potential ref counting races Mike Kravetz 3 siblings, 0 replies; 10+ messages in thread From: Mike Kravetz @ 2021-07-10 0:24 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: Muchun Song, Michal Hocko, Oscar Salvador, David Hildenbrand, Matthew Wilcox, Naoya Horiguchi, Mina Almasry, Andrew Morton, Mike Kravetz When discussing the possibility of inflated page ref counts, Muuchun Song pointed out this potential issue [1]. It is true that any code could potentially take a reference on a compound page after allocation and before it is converted to and put into use as a hugetlb page. Specifically, this could be done by any users of get_page_unless_zero. There are three areas of concern within hugetlb code. 1) When adding pages to the pool. In this case, new pages are allocated added to the pool by calling put_page to invoke the hugetlb destructor (free_huge_page). If there is an inflated ref count on the page, it will not be immediately added to the free list. It will only be added to the free list when the temporary ref count is dropped. This is deemed acceptable and will not be addressed. 2) A page is allocated for immediate use normally as a surplus page or migration target. In this case, the user of the page will also hold a reference. There is no issue as this is just like normal page ref counting. 3) A page is allocated and MUST be added to the free list to satisfy a reservation. One such example is gather_surplus_pages as pointed out by Muchun in [1]. More specifically, this case covers callers of enqueue_huge_page where the page reference count must be zero. This patch covers this third case. Three routines call enqueue_huge_page when the page reference count could potentially be inflated. They are: gather_surplus_pages, alloc_and_dissolve_huge_page and add_hugetlb_page. add_hugetlb_page is called on error paths when a huge page can not be freed due to the inability to allocate vmemmap pages. In this case, the temporairly inflated ref count is not an issue. When the ref is dropped the appropriate action will be taken. Instead of VM_BUG_ON if the ref count does not drop to zero, simply return. In gather_surplus_pages and alloc_and_dissolve_huge_page the caller expects a page (or pages) to be put on the free lists. In this case we must ensure there are no temporary ref counts. We do this by calling put_page_testzero() earlier and not using pages without a zero ref count. The temporary page flag (HPageTemporary) is used in such cases so that as soon as the inflated ref count is dropped the page will be freed. [1] https://lore.kernel.org/linux-mm/CAMZfGtVMn3daKrJwZMaVOGOaJU+B4dS--x_oPmGQMD=c=QNGEg@mail.gmail.com/ Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- mm/hugetlb.c | 100 +++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 78 insertions(+), 22 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e59ebba63da7..3132c7395743 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1072,6 +1072,8 @@ static void enqueue_huge_page(struct hstate *h, struct page *page) int nid = page_to_nid(page); lockdep_assert_held(&hugetlb_lock); + VM_BUG_ON_PAGE(page_count(page), page); + list_move(&page->lru, &h->hugepage_freelists[nid]); h->free_huge_pages++; h->free_huge_pages_node[nid]++; @@ -1399,11 +1401,20 @@ static void add_hugetlb_page(struct hstate *h, struct page *page, SetHPageVmemmapOptimized(page); /* - * This page is now managed by the hugetlb allocator and has - * no users -- drop the last reference. + * This page is about to be managed by the hugetlb allocator and + * should have no users. Drop our reference, and check for others + * just in case. */ zeroed = put_page_testzero(page); - VM_BUG_ON_PAGE(!zeroed, page); + if (!zeroed) + /* + * It is VERY unlikely soneone else has taken a ref on + * the page. In this case, we simply return as the + * hugetlb destructor (free_huge_page) will be called + * when this other ref is dropped. + */ + return; + arch_clear_hugepage_flags(page); enqueue_huge_page(h, page); } @@ -2015,9 +2026,10 @@ int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn) * Allocates a fresh surplus page from the page allocator. */ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, - int nid, nodemask_t *nmask) + int nid, nodemask_t *nmask, bool zero_ref) { struct page *page = NULL; + bool retry = false; if (hstate_is_gigantic(h)) return NULL; @@ -2027,6 +2039,7 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, goto out_unlock; spin_unlock_irq(&hugetlb_lock); +retry: page = alloc_fresh_huge_page(h, gfp_mask, nid, nmask, NULL); if (!page) return NULL; @@ -2044,11 +2057,35 @@ static struct page *alloc_surplus_huge_page(struct hstate *h, gfp_t gfp_mask, spin_unlock_irq(&hugetlb_lock); put_page(page); return NULL; - } else { - h->surplus_huge_pages++; - h->surplus_huge_pages_node[page_to_nid(page)]++; } + if (zero_ref) { + /* + * Caller requires a page with zero ref count. + * We will drop ref count here. If someone else is holding + * a ref, the page will be freed when they drop it. Abuse + * temporary page flag to accomplish this. + */ + SetHPageTemporary(page); + if (!put_page_testzero(page)) { + /* + * Unexpected inflated ref count on freshly allocated + * huge. Retry once. + */ + pr_info("HugeTLB unexpected inflated ref count on freshly allocated page\n"); + spin_unlock_irq(&hugetlb_lock); + if (retry) + return NULL; + + retry = true; + goto retry; + } + ClearHPageTemporary(page); + } + + h->surplus_huge_pages++; + h->surplus_huge_pages_node[page_to_nid(page)]++; + out_unlock: spin_unlock_irq(&hugetlb_lock); @@ -2090,7 +2127,7 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h, nodemask_t *nodemask; nid = huge_node(vma, addr, gfp_mask, &mpol, &nodemask); - page = alloc_surplus_huge_page(h, gfp_mask, nid, nodemask); + page = alloc_surplus_huge_page(h, gfp_mask, nid, nodemask, false); mpol_cond_put(mpol); return page; @@ -2162,7 +2199,7 @@ static int gather_surplus_pages(struct hstate *h, long delta) spin_unlock_irq(&hugetlb_lock); for (i = 0; i < needed; i++) { page = alloc_surplus_huge_page(h, htlb_alloc_mask(h), - NUMA_NO_NODE, NULL); + NUMA_NO_NODE, NULL, true); if (!page) { alloc_ok = false; break; @@ -2203,24 +2240,20 @@ static int gather_surplus_pages(struct hstate *h, long delta) /* Free the needed pages to the hugetlb pool */ list_for_each_entry_safe(page, tmp, &surplus_list, lru) { - int zeroed; - if ((--needed) < 0) break; - /* - * This page is now managed by the hugetlb allocator and has - * no users -- drop the buddy allocator's reference. - */ - zeroed = put_page_testzero(page); - VM_BUG_ON_PAGE(!zeroed, page); + /* Add the page to the hugetlb allocator */ enqueue_huge_page(h, page); } free: spin_unlock_irq(&hugetlb_lock); - /* Free unnecessary surplus pages to the buddy allocator */ + /* + * Free unnecessary surplus pages to the buddy allocator. + * Pages have no ref count, call free_huge_page directly. + */ list_for_each_entry_safe(page, tmp, &surplus_list, lru) - put_page(page); + free_huge_page(page); spin_lock_irq(&hugetlb_lock); return ret; @@ -2529,6 +2562,7 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, { gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE; int nid = page_to_nid(old_page); + bool alloc_retry = false; struct page *new_page; int ret = 0; @@ -2539,9 +2573,30 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, * the pool. This simplifies and let us do most of the processing * under the lock. */ +alloc_retry: new_page = alloc_buddy_huge_page(h, gfp_mask, nid, NULL, NULL); if (!new_page) return -ENOMEM; + /* + * If all goes well, this page will be directly added to the free + * list in the pool. For this the ref count needs to be zero. + * Attempt to drop now, and retry once if needed. It is VERY + * unlikely there is another ref on the page. + * + * If someone else has a reference to the page, it will be freed + * when they drop their ref. Abuse temporary page flag to accomplish + * this. Retry once if there is an inflated ref count. + */ + SetHPageTemporary(new_page); + if (!put_page_testzero(new_page)) { + if (alloc_retry) + return -EBUSY; + + alloc_retry = true; + goto alloc_retry; + } + ClearHPageTemporary(new_page); + __prep_new_huge_page(h, new_page); retry: @@ -2581,11 +2636,10 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, remove_hugetlb_page(h, old_page, false); /* - * Reference count trick is needed because allocator gives us - * referenced page but the pool requires pages with 0 refcount. + * Ref count on new page is already zero as it was dropped + * earlier. It can be directly added to the pool free list. */ __prep_account_new_huge_page(h, nid); - page_ref_dec(new_page); enqueue_huge_page(h, new_page); /* @@ -2599,6 +2653,8 @@ static int alloc_and_dissolve_huge_page(struct hstate *h, struct page *old_page, free_new: spin_unlock_irq(&hugetlb_lock); + /* Page has a zero ref count, but needs a ref to be freed */ + set_page_refcounted(new_page); update_and_free_page(h, new_page, false); return ret; -- 2.31.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] hugetlb: before freeing hugetlb page set dtor to appropriate value 2021-07-10 0:24 [PATCH 0/3] hugetlb: fix potential ref counting races Mike Kravetz 2021-07-10 0:24 ` [PATCH 1/3] hugetlb: simplify prep_compound_gigantic_page ref count racing code Mike Kravetz 2021-07-10 0:24 ` [PATCH 2/3] hugetlb: drop ref count earlier after page allocation Mike Kravetz @ 2021-07-10 0:24 ` Mike Kravetz 2021-07-14 10:57 ` [External] " Muchun Song 2021-07-27 23:41 ` [PATCH 0/3] hugetlb: fix potential ref counting races Mike Kravetz 3 siblings, 1 reply; 10+ messages in thread From: Mike Kravetz @ 2021-07-10 0:24 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: Muchun Song, Michal Hocko, Oscar Salvador, David Hildenbrand, Matthew Wilcox, Naoya Horiguchi, Mina Almasry, Andrew Morton, Mike Kravetz When removing a hugetlb page from the pool the ref count is set to one (as the free page has no ref count) and compound page destructor is set to NULL_COMPOUND_DTOR. Since a subsequent call to free the hugetlb page will call __free_pages for non-gigantic pages and free_gigantic_page for gigantic pages the destructor is not used. However, consider the following race with code taking a speculative reference on the page: Thread 0 Thread 1 -------- -------- remove_hugetlb_page set_page_refcounted(page); set_compound_page_dtor(page, NULL_COMPOUND_DTOR); get_page_unless_zero(page) __update_and_free_page __free_pages(page, huge_page_order(h)); /* Note that __free_pages() will simply drop the reference to the page. */ put_page(page) __put_compound_page() destroy_compound_page NULL_COMPOUND_DTOR BUG: kernel NULL pointer dereference, address: 0000000000000000 To address this race, set the dtor to the normal compound page dtor for non-gigantic pages. The dtor for gigantic pages does not matter as gigantic pages are changed from a compound page to 'just a group of pages' before freeing. Hence, the destructor is not used. Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- mm/hugetlb.c | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3132c7395743..fa8ec2072949 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1370,8 +1370,28 @@ static void remove_hugetlb_page(struct hstate *h, struct page *page, h->surplus_huge_pages_node[nid]--; } + /* + * Very subtle + * + * For non-gigantic pages set the destructor to the normal compound + * page dtor. This is needed in case someone takes an additional + * temporary ref to the page, and freeing is delayed until they drop + * their reference. + * + * For gigantic pages set the destructor to the null dtor. This + * destructor will never be called. Before freeing the gigantic + * page destroy_compound_gigantic_page will turn the compound page + * into a simple group of pages. After this the destructor does not + * apply. + * + * This handles the case where more than one ref is held when and + * after update_and_free_page is called. + */ set_page_refcounted(page); - set_compound_page_dtor(page, NULL_COMPOUND_DTOR); + if (hstate_is_gigantic(h)) + set_compound_page_dtor(page, NULL_COMPOUND_DTOR); + else + set_compound_page_dtor(page, COMPOUND_PAGE_DTOR); h->nr_huge_pages--; h->nr_huge_pages_node[nid]--; -- 2.31.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [External] [PATCH 3/3] hugetlb: before freeing hugetlb page set dtor to appropriate value 2021-07-10 0:24 ` [PATCH 3/3] hugetlb: before freeing hugetlb page set dtor to appropriate value Mike Kravetz @ 2021-07-14 10:57 ` Muchun Song 2021-07-14 17:39 ` Mike Kravetz 0 siblings, 1 reply; 10+ messages in thread From: Muchun Song @ 2021-07-14 10:57 UTC (permalink / raw) To: Mike Kravetz Cc: Linux Memory Management List, LKML, Michal Hocko, Oscar Salvador, David Hildenbrand, Matthew Wilcox, Naoya Horiguchi, Mina Almasry, Andrew Morton On Sat, Jul 10, 2021 at 8:25 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > When removing a hugetlb page from the pool the ref count is set to > one (as the free page has no ref count) and compound page destructor > is set to NULL_COMPOUND_DTOR. Since a subsequent call to free the > hugetlb page will call __free_pages for non-gigantic pages and > free_gigantic_page for gigantic pages the destructor is not used. > > However, consider the following race with code taking a speculative > reference on the page: > > Thread 0 Thread 1 > -------- -------- > remove_hugetlb_page > set_page_refcounted(page); > set_compound_page_dtor(page, > NULL_COMPOUND_DTOR); > get_page_unless_zero(page) > __update_and_free_page > __free_pages(page, > huge_page_order(h)); > > /* Note that __free_pages() will simply drop > the reference to the page. */ > > put_page(page) > __put_compound_page() > destroy_compound_page > NULL_COMPOUND_DTOR > BUG: kernel NULL pointer > dereference, address: > 0000000000000000 > > To address this race, set the dtor to the normal compound page dtor > for non-gigantic pages. The dtor for gigantic pages does not matter > as gigantic pages are changed from a compound page to 'just a group of > pages' before freeing. Hence, the destructor is not used. > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > --- > mm/hugetlb.c | 22 +++++++++++++++++++++- > 1 file changed, 21 insertions(+), 1 deletion(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 3132c7395743..fa8ec2072949 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1370,8 +1370,28 @@ static void remove_hugetlb_page(struct hstate *h, struct page *page, > h->surplus_huge_pages_node[nid]--; > } > > + /* > + * Very subtle > + * > + * For non-gigantic pages set the destructor to the normal compound > + * page dtor. This is needed in case someone takes an additional > + * temporary ref to the page, and freeing is delayed until they drop > + * their reference. > + * > + * For gigantic pages set the destructor to the null dtor. This > + * destructor will never be called. Before freeing the gigantic > + * page destroy_compound_gigantic_page will turn the compound page > + * into a simple group of pages. After this the destructor does not > + * apply. > + * > + * This handles the case where more than one ref is held when and > + * after update_and_free_page is called. > + */ > set_page_refcounted(page); > - set_compound_page_dtor(page, NULL_COMPOUND_DTOR); > + if (hstate_is_gigantic(h)) > + set_compound_page_dtor(page, NULL_COMPOUND_DTOR); > + else > + set_compound_page_dtor(page, COMPOUND_PAGE_DTOR); Hi Mike, The race is really subtle. But we also should remove the WARN from free_contig_range, right? Because the refcount of the head page of the gigantic page can be greater than one, but free_contig_range has the following warning. WARN(count != 0, "%lu pages are still in use!\n", count); Thanks. > > h->nr_huge_pages--; > h->nr_huge_pages_node[nid]--; > -- > 2.31.1 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] [PATCH 3/3] hugetlb: before freeing hugetlb page set dtor to appropriate value 2021-07-14 10:57 ` [External] " Muchun Song @ 2021-07-14 17:39 ` Mike Kravetz 2021-07-15 3:14 ` Muchun Song 0 siblings, 1 reply; 10+ messages in thread From: Mike Kravetz @ 2021-07-14 17:39 UTC (permalink / raw) To: Muchun Song Cc: Linux Memory Management List, LKML, Michal Hocko, Oscar Salvador, David Hildenbrand, Matthew Wilcox, Naoya Horiguchi, Mina Almasry, Andrew Morton On 7/14/21 3:57 AM, Muchun Song wrote: > On Sat, Jul 10, 2021 at 8:25 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: >> + /* >> + * Very subtle >> + * >> + * For non-gigantic pages set the destructor to the normal compound >> + * page dtor. This is needed in case someone takes an additional >> + * temporary ref to the page, and freeing is delayed until they drop >> + * their reference. >> + * >> + * For gigantic pages set the destructor to the null dtor. This >> + * destructor will never be called. Before freeing the gigantic >> + * page destroy_compound_gigantic_page will turn the compound page >> + * into a simple group of pages. After this the destructor does not >> + * apply. >> + * >> + * This handles the case where more than one ref is held when and >> + * after update_and_free_page is called. >> + */ >> set_page_refcounted(page); >> - set_compound_page_dtor(page, NULL_COMPOUND_DTOR); >> + if (hstate_is_gigantic(h)) >> + set_compound_page_dtor(page, NULL_COMPOUND_DTOR); >> + else >> + set_compound_page_dtor(page, COMPOUND_PAGE_DTOR); > > Hi Mike, > > The race is really subtle. But we also should remove the WARN from > free_contig_range, right? Because the refcount of the head page of > the gigantic page can be greater than one, but free_contig_range has > the following warning. > > WARN(count != 0, "%lu pages are still in use!\n", count); > I did hit that warning in my testing and thought about removing it. However, I decided to keep it because non-hugetlb code also makes use of alloc_contig_range/free_contig_range and it might be useful in those cases. My 'guess' is that the warning was added not because of temporary ref count increases but rather to point out any code that forgot to drop a reference. BTW - It is not just the 'head' page which could trigger this warning, but any 'tail' page as well. That is because we do not call free_contig_range with a compound page, but rather a group of pages all with ref count of at least one. I'm happy to remove the warning if people do not think it is generally useful. -- Mike Kravetz ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [External] [PATCH 3/3] hugetlb: before freeing hugetlb page set dtor to appropriate value 2021-07-14 17:39 ` Mike Kravetz @ 2021-07-15 3:14 ` Muchun Song 0 siblings, 0 replies; 10+ messages in thread From: Muchun Song @ 2021-07-15 3:14 UTC (permalink / raw) To: Mike Kravetz Cc: Linux Memory Management List, LKML, Michal Hocko, Oscar Salvador, David Hildenbrand, Matthew Wilcox, Naoya Horiguchi, Mina Almasry, Andrew Morton On Thu, Jul 15, 2021 at 1:39 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 7/14/21 3:57 AM, Muchun Song wrote: > > On Sat, Jul 10, 2021 at 8:25 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > >> + /* > >> + * Very subtle > >> + * > >> + * For non-gigantic pages set the destructor to the normal compound > >> + * page dtor. This is needed in case someone takes an additional > >> + * temporary ref to the page, and freeing is delayed until they drop > >> + * their reference. > >> + * > >> + * For gigantic pages set the destructor to the null dtor. This > >> + * destructor will never be called. Before freeing the gigantic > >> + * page destroy_compound_gigantic_page will turn the compound page > >> + * into a simple group of pages. After this the destructor does not > >> + * apply. > >> + * > >> + * This handles the case where more than one ref is held when and > >> + * after update_and_free_page is called. > >> + */ > >> set_page_refcounted(page); > >> - set_compound_page_dtor(page, NULL_COMPOUND_DTOR); > >> + if (hstate_is_gigantic(h)) > >> + set_compound_page_dtor(page, NULL_COMPOUND_DTOR); > >> + else > >> + set_compound_page_dtor(page, COMPOUND_PAGE_DTOR); > > > > Hi Mike, > > > > The race is really subtle. But we also should remove the WARN from > > free_contig_range, right? Because the refcount of the head page of > > the gigantic page can be greater than one, but free_contig_range has > > the following warning. > > > > WARN(count != 0, "%lu pages are still in use!\n", count); > > > > I did hit that warning in my testing and thought about removing it. > However, I decided to keep it because non-hugetlb code also makes use of > alloc_contig_range/free_contig_range and it might be useful in those > cases. > > My 'guess' is that the warning was added not because of temporary ref > count increases but rather to point out any code that forgot to drop a > reference. Got it. At least this patch looks good to me. So Reviewed-by: Muchun Song <songmuchun@bytedance.com> > > BTW - It is not just the 'head' page which could trigger this warning, but > any 'tail' page as well. That is because we do not call free_contig_range > with a compound page, but rather a group of pages all with ref count of > at least one. Right. > > I'm happy to remove the warning if people do not think it is generally > useful. For me, I suggest removing it. If someone has any ideas, please let us know. > -- > Mike Kravetz ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] hugetlb: fix potential ref counting races 2021-07-10 0:24 [PATCH 0/3] hugetlb: fix potential ref counting races Mike Kravetz ` (2 preceding siblings ...) 2021-07-10 0:24 ` [PATCH 3/3] hugetlb: before freeing hugetlb page set dtor to appropriate value Mike Kravetz @ 2021-07-27 23:41 ` Mike Kravetz 2021-07-28 4:03 ` Muchun Song 3 siblings, 1 reply; 10+ messages in thread From: Mike Kravetz @ 2021-07-27 23:41 UTC (permalink / raw) To: linux-mm, linux-kernel Cc: Muchun Song, Michal Hocko, Oscar Salvador, David Hildenbrand, Matthew Wilcox, Naoya Horiguchi, Mina Almasry, Andrew Morton Any additional comments on these patches/this approach? The first patch addressing this issue actually went into the 5.14 merge window as commit 7118fc2906e2 ("hugetlb: address ref count racing in prep_compound_gigantic_page"). All this code is very tricky and subtle. It addresses potential issues discovered by code analysis. I do not believe the races have ever been experienced in practice. If anyone has suggestions for a simpler or alternative approach, I would love to hear them. -- Mike Kravetz On 7/9/21 5:24 PM, Mike Kravetz wrote: > When Muchun Song brought up a potential issue with hugetlb ref counting[1], > I started looking closer at the code. hugetlbfs is the only code with it's > own specialized compound page destructor and taking special action when ref > counts drop to zero. Potential races happen in this unique handling of ref > counts. The following patches address these races when creating and > destroying hugetlb pages. > > These potential races have likely existed since the creation of > hugetlbfs. They certainly have been around for more than 10 years. > However, I am unaware of anyone actually hitting these races. It is > VERY unlikely than anyone will actually hit these races, but they do > exist. > > I could not think of an easy (or difficult) way to force these races. > Therefore, testing consisted of adding code to randomly increase ref > counts in strategic places. In this way, I was able to exercise all the > race handling code paths. > > [1] https://lore.kernel.org/linux-mm/CAMZfGtVMn3daKrJwZMaVOGOaJU+B4dS--x_oPmGQMD=c=QNGEg@mail.gmail.com/ > > Mike Kravetz (3): > hugetlb: simplify prep_compound_gigantic_page ref count racing code > hugetlb: drop ref count earlier after page allocation > hugetlb: before freeing hugetlb page set dtor to appropriate value > > mm/hugetlb.c | 137 ++++++++++++++++++++++++++++++++++++++------------- > 1 file changed, 104 insertions(+), 33 deletions(-) > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/3] hugetlb: fix potential ref counting races 2021-07-27 23:41 ` [PATCH 0/3] hugetlb: fix potential ref counting races Mike Kravetz @ 2021-07-28 4:03 ` Muchun Song 0 siblings, 0 replies; 10+ messages in thread From: Muchun Song @ 2021-07-28 4:03 UTC (permalink / raw) To: Mike Kravetz Cc: Linux Memory Management List, LKML, Michal Hocko, Oscar Salvador, David Hildenbrand, Matthew Wilcox, Naoya Horiguchi, Mina Almasry, Andrew Morton On Wed, Jul 28, 2021 at 7:41 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > Any additional comments on these patches/this approach? > > The first patch addressing this issue actually went into the 5.14 merge > window as commit 7118fc2906e2 ("hugetlb: address ref count racing in > prep_compound_gigantic_page"). > > All this code is very tricky and subtle. It addresses potential issues > discovered by code analysis. I do not believe the races have ever been > experienced in practice. If anyone has suggestions for a simpler or > alternative approach, I would love to hear them. Hi Mike, I agree with you that this code is very tricky and subtle. I have looked at this patch set. For me, I cannot figure out a better solution for this race. -- Thanks, Muchun > -- > Mike Kravetz > > > On 7/9/21 5:24 PM, Mike Kravetz wrote: > > When Muchun Song brought up a potential issue with hugetlb ref counting[1], > > I started looking closer at the code. hugetlbfs is the only code with it's > > own specialized compound page destructor and taking special action when ref > > counts drop to zero. Potential races happen in this unique handling of ref > > counts. The following patches address these races when creating and > > destroying hugetlb pages. > > > > These potential races have likely existed since the creation of > > hugetlbfs. They certainly have been around for more than 10 years. > > However, I am unaware of anyone actually hitting these races. It is > > VERY unlikely than anyone will actually hit these races, but they do > > exist. > > > > I could not think of an easy (or difficult) way to force these races. > > Therefore, testing consisted of adding code to randomly increase ref > > counts in strategic places. In this way, I was able to exercise all the > > race handling code paths. > > > > [1] https://lore.kernel.org/linux-mm/CAMZfGtVMn3daKrJwZMaVOGOaJU+B4dS--x_oPmGQMD=c=QNGEg@mail.gmail.com/ > > > > Mike Kravetz (3): > > hugetlb: simplify prep_compound_gigantic_page ref count racing code > > hugetlb: drop ref count earlier after page allocation > > hugetlb: before freeing hugetlb page set dtor to appropriate value > > > > mm/hugetlb.c | 137 ++++++++++++++++++++++++++++++++++++++------------- > > 1 file changed, 104 insertions(+), 33 deletions(-) > > ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-07-28 4:04 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-10 0:24 [PATCH 0/3] hugetlb: fix potential ref counting races Mike Kravetz 2021-07-10 0:24 ` [PATCH 1/3] hugetlb: simplify prep_compound_gigantic_page ref count racing code Mike Kravetz 2021-07-13 6:31 ` [External] " Muchun Song 2021-07-10 0:24 ` [PATCH 2/3] hugetlb: drop ref count earlier after page allocation Mike Kravetz 2021-07-10 0:24 ` [PATCH 3/3] hugetlb: before freeing hugetlb page set dtor to appropriate value Mike Kravetz 2021-07-14 10:57 ` [External] " Muchun Song 2021-07-14 17:39 ` Mike Kravetz 2021-07-15 3:14 ` Muchun Song 2021-07-27 23:41 ` [PATCH 0/3] hugetlb: fix potential ref counting races Mike Kravetz 2021-07-28 4:03 ` Muchun Song
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).