* [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy @ 2021-04-21 2:04 Ding Hui 2021-04-28 14:54 ` David Hildenbrand 0 siblings, 1 reply; 10+ messages in thread From: Ding Hui @ 2021-04-21 2:04 UTC (permalink / raw) To: akpm, naoya.horiguchi, osalvador; +Cc: linux-mm, linux-kernel, Ding Hui Recently we found there is a lot MemFree left in /proc/meminfo after do a lot of pages soft offline. I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages. After take_page_off_buddy, the page is no longer belong to buddy allocator, and will not be used any more, but we maybe missed accounting NR_FREE_PAGES in this situation. Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> --- mm/page_alloc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cfc72873961d..8d65b62784d8 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page) del_page_from_free_list(page_head, zone, page_order); break_down_buddy_pages(zone, page_head, page, 0, page_order, migratetype); + __mod_zone_page_state(zone, NR_FREE_PAGES, -1); ret = true; break; } -- 2.17.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy 2021-04-21 2:04 [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy Ding Hui @ 2021-04-28 14:54 ` David Hildenbrand 2021-04-30 9:43 ` Ding Hui 2021-05-06 2:49 ` [RFC PATCH] " HORIGUCHI NAOYA(堀口 直也) 0 siblings, 2 replies; 10+ messages in thread From: David Hildenbrand @ 2021-04-28 14:54 UTC (permalink / raw) To: Ding Hui, akpm, naoya.horiguchi, osalvador; +Cc: linux-mm, linux-kernel On 21.04.21 04:04, Ding Hui wrote: > Recently we found there is a lot MemFree left in /proc/meminfo after > do a lot of pages soft offline. > > I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages. > After take_page_off_buddy, the page is no longer belong to buddy > allocator, and will not be used any more, but we maybe missed accounting > NR_FREE_PAGES in this situation. > > Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> > --- > mm/page_alloc.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cfc72873961d..8d65b62784d8 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page) > del_page_from_free_list(page_head, zone, page_order); > break_down_buddy_pages(zone, page_head, page, 0, > page_order, migratetype); > + __mod_zone_page_state(zone, NR_FREE_PAGES, -1); > ret = true; > break; > } > Should this use __mod_zone_freepage_state() instead? -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy 2021-04-28 14:54 ` David Hildenbrand @ 2021-04-30 9:43 ` Ding Hui 2021-05-08 3:55 ` [PATCH v2] " Ding Hui 2021-05-06 2:49 ` [RFC PATCH] " HORIGUCHI NAOYA(堀口 直也) 1 sibling, 1 reply; 10+ messages in thread From: Ding Hui @ 2021-04-30 9:43 UTC (permalink / raw) To: David Hildenbrand, akpm, naoya.horiguchi, osalvador Cc: linux-mm, linux-kernel On 2021/4/28 22:54, David Hildenbrand wrote: > On 21.04.21 04:04, Ding Hui wrote: >> Recently we found there is a lot MemFree left in /proc/meminfo after >> do a lot of pages soft offline. >> >> I think it's incorrect since NR_FREE_PAGES should not contain HWPoison >> pages. >> After take_page_off_buddy, the page is no longer belong to buddy >> allocator, and will not be used any more, but we maybe missed accounting >> NR_FREE_PAGES in this situation. >> >> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> >> --- >> mm/page_alloc.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index cfc72873961d..8d65b62784d8 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page) >> del_page_from_free_list(page_head, zone, page_order); >> break_down_buddy_pages(zone, page_head, page, 0, >> page_order, migratetype); >> + __mod_zone_page_state(zone, NR_FREE_PAGES, -1); >> ret = true; >> break; >> } >> > > Should this use __mod_zone_freepage_state() instead? > Yes, you're right. I'll use it in v2. -- Thanks, - Ding Hui ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2] mm/page_alloc: fix counting of free pages after take off from buddy 2021-04-30 9:43 ` Ding Hui @ 2021-05-08 3:55 ` Ding Hui 2021-05-25 8:32 ` HORIGUCHI NAOYA(堀口 直也) 0 siblings, 1 reply; 10+ messages in thread From: Ding Hui @ 2021-05-08 3:55 UTC (permalink / raw) To: david, naoya.horiguchi, osalvador; +Cc: akpm, linux-kernel, linux-mm, Ding Hui Recently we found there is a lot MemFree left in /proc/meminfo after do a lot of pages soft offline. I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages. For offline free pages, after a successful call take_page_off_buddy(), the page is no longer belong to buddy allocator, and will not be used any more, but we missed accounting NR_FREE_PAGES in this situation. Do update like rmqueue() does. Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> --- V2: use __mod_zone_freepage_state instead of __mod_zone_page_state mm/page_alloc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cfc72873961d..e124a615303b 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page) del_page_from_free_list(page_head, zone, page_order); break_down_buddy_pages(zone, page_head, page, 0, page_order, migratetype); + __mod_zone_freepage_state(zone, -1, migratetype); ret = true; break; } -- 2.17.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2] mm/page_alloc: fix counting of free pages after take off from buddy 2021-05-08 3:55 ` [PATCH v2] " Ding Hui @ 2021-05-25 8:32 ` HORIGUCHI NAOYA(堀口 直也) 2021-05-26 0:43 ` Ding Hui 0 siblings, 1 reply; 10+ messages in thread From: HORIGUCHI NAOYA(堀口 直也) @ 2021-05-25 8:32 UTC (permalink / raw) To: Ding Hui; +Cc: david, osalvador, akpm, linux-kernel, linux-mm On Sat, May 08, 2021 at 11:55:33AM +0800, Ding Hui wrote: > Recently we found there is a lot MemFree left in /proc/meminfo after > do a lot of pages soft offline. > > I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages. > For offline free pages, after a successful call take_page_off_buddy(), the > page is no longer belong to buddy allocator, and will not be used any more, > but we missed accounting NR_FREE_PAGES in this situation. > > Do update like rmqueue() does. > > Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> > --- > V2: > use __mod_zone_freepage_state instead of __mod_zone_page_state > > mm/page_alloc.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cfc72873961d..e124a615303b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page) > del_page_from_free_list(page_head, zone, page_order); > break_down_buddy_pages(zone, page_head, page, 0, > page_order, migratetype); > + __mod_zone_freepage_state(zone, -1, migratetype); Page offline code (see set_migratetype_isolate()) seems to handle NR_FREE_PAGES counter in its own way, so I think that it's more correct to call __mod_zone_freepage_state() only when is_migrate_isolate(migratetype)) is false. Otherwise, the patch looks good to me. Thanks, Naoya Horiguchi > ret = true; > break; > } > -- > 2.17.1 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] mm/page_alloc: fix counting of free pages after take off from buddy 2021-05-25 8:32 ` HORIGUCHI NAOYA(堀口 直也) @ 2021-05-26 0:43 ` Ding Hui 0 siblings, 0 replies; 10+ messages in thread From: Ding Hui @ 2021-05-26 0:43 UTC (permalink / raw) To: HORIGUCHI NAOYA(堀口 直也) Cc: david, osalvador, akpm, linux-kernel, linux-mm On 2021/5/25 16:32, HORIGUCHI NAOYA(堀口 直也) wrote: > On Sat, May 08, 2021 at 11:55:33AM +0800, Ding Hui wrote: >> Recently we found there is a lot MemFree left in /proc/meminfo after >> do a lot of pages soft offline. >> >> I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages. >> For offline free pages, after a successful call take_page_off_buddy(), the >> page is no longer belong to buddy allocator, and will not be used any more, >> but we missed accounting NR_FREE_PAGES in this situation. >> >> Do update like rmqueue() does. >> >> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> >> --- >> V2: >> use __mod_zone_freepage_state instead of __mod_zone_page_state >> >> mm/page_alloc.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index cfc72873961d..e124a615303b 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page) >> del_page_from_free_list(page_head, zone, page_order); >> break_down_buddy_pages(zone, page_head, page, 0, >> page_order, migratetype); >> + __mod_zone_freepage_state(zone, -1, migratetype); > > Page offline code (see set_migratetype_isolate()) seems to handle > NR_FREE_PAGES counter in its own way, so I think that it's more correct to > call __mod_zone_freepage_state() only when is_migrate_isolate(migratetype)) > is false. > > Otherwise, the patch looks good to me. > Thanks for reply and suggestion, I'll send v3 patch later. >> ret = true; >> break; >> } >> -- >> 2.17.1 -- Thanks, - Ding Hui ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy 2021-04-28 14:54 ` David Hildenbrand 2021-04-30 9:43 ` Ding Hui @ 2021-05-06 2:49 ` HORIGUCHI NAOYA(堀口 直也) 2021-05-06 4:01 ` Ding Hui 1 sibling, 1 reply; 10+ messages in thread From: HORIGUCHI NAOYA(堀口 直也) @ 2021-05-06 2:49 UTC (permalink / raw) To: David Hildenbrand, Ding Hui; +Cc: akpm, osalvador, linux-mm, linux-kernel On Wed, Apr 28, 2021 at 04:54:59PM +0200, David Hildenbrand wrote: > On 21.04.21 04:04, Ding Hui wrote: > > Recently we found there is a lot MemFree left in /proc/meminfo after > > do a lot of pages soft offline. > > > > I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages. > > After take_page_off_buddy, the page is no longer belong to buddy > > allocator, and will not be used any more, but we maybe missed accounting > > NR_FREE_PAGES in this situation. > > > > Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> > > --- > > mm/page_alloc.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index cfc72873961d..8d65b62784d8 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page) > > del_page_from_free_list(page_head, zone, page_order); > > break_down_buddy_pages(zone, page_head, page, 0, > > page_order, migratetype); > > + __mod_zone_page_state(zone, NR_FREE_PAGES, -1); > > ret = true; > > break; > > } > > > > Should this use __mod_zone_freepage_state() instead? Yes, __mod_zone_freepage_state() looks better to me. And I think that maybe an additional __mod_zone_freepage_state() in unpoison_memory() is necessary to cancel the decrement. I thought of the following, but it doesn't build because get_pfnblock_migratetype() is available only in mm/page_alloc.c, so you might want to add a small exported routine in mm/page_alloc.c and let it called from unpoison_memory(). @@ -1899,8 +1899,12 @@ int unpoison_memory(unsigned long pfn) } if (!get_hwpoison_page(p, flags, 0)) { - if (TestClearPageHWPoison(p)) + if (TestClearPageHWPoison(p)) { + int migratetype = get_pfnblock_migratetype(p, pfn); + num_poisoned_pages_dec(); + __mod_zone_freepage_state(page_zone(p), 1, migratetype); + } unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", pfn, &unpoison_rs); return 0; Thanks, Naoya Horiguchi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy 2021-05-06 2:49 ` [RFC PATCH] " HORIGUCHI NAOYA(堀口 直也) @ 2021-05-06 4:01 ` Ding Hui 2021-05-06 7:30 ` HORIGUCHI NAOYA(堀口 直也) 0 siblings, 1 reply; 10+ messages in thread From: Ding Hui @ 2021-05-06 4:01 UTC (permalink / raw) To: HORIGUCHI NAOYA(堀口 直也), David Hildenbrand Cc: akpm, osalvador, linux-mm, linux-kernel On 2021/5/6 10:49, HORIGUCHI NAOYA(堀口 直也) wrote: > On Wed, Apr 28, 2021 at 04:54:59PM +0200, David Hildenbrand wrote: >> On 21.04.21 04:04, Ding Hui wrote: >>> Recently we found there is a lot MemFree left in /proc/meminfo after >>> do a lot of pages soft offline. >>> >>> I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages. >>> After take_page_off_buddy, the page is no longer belong to buddy >>> allocator, and will not be used any more, but we maybe missed accounting >>> NR_FREE_PAGES in this situation. >>> >>> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> >>> --- >>> mm/page_alloc.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>> index cfc72873961d..8d65b62784d8 100644 >>> --- a/mm/page_alloc.c >>> +++ b/mm/page_alloc.c >>> @@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page) >>> del_page_from_free_list(page_head, zone, page_order); >>> break_down_buddy_pages(zone, page_head, page, 0, >>> page_order, migratetype); >>> + __mod_zone_page_state(zone, NR_FREE_PAGES, -1); >>> ret = true; >>> break; >>> } >>> >> >> Should this use __mod_zone_freepage_state() instead? > > Yes, __mod_zone_freepage_state() looks better to me. > > And I think that maybe an additional __mod_zone_freepage_state() in > unpoison_memory() is necessary to cancel the decrement. I thought of the > following, but it doesn't build because get_pfnblock_migratetype() is > available only in mm/page_alloc.c, so you might want to add a small exported > routine in mm/page_alloc.c and let it called from unpoison_memory(). > > @@ -1899,8 +1899,12 @@ int unpoison_memory(unsigned long pfn) > } > > if (!get_hwpoison_page(p, flags, 0)) { > - if (TestClearPageHWPoison(p)) > + if (TestClearPageHWPoison(p)) { > + int migratetype = get_pfnblock_migratetype(p, pfn); > + > num_poisoned_pages_dec(); > + __mod_zone_freepage_state(page_zone(p), 1, migratetype); > + } > unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", > pfn, &unpoison_rs); > return 0; > I think there is another problem: In normal case, we keep the last refcount of the hwpoison page, so get_hwpoison_page should return 1. The NR_FREE_PAGES will be adjusted when call put_page. At race condition, we maybe leak the page because we does not put it back to buddy in unpoison_memory, however the HWPoison flag, num_poisoned_pages, NR_FREE_PAGES is adjusted correctly. CPU0 CPU1 soft_offline_page soft_offline_free_page page_handle_poison take_page_off_buddy SetPageHWPoison unpoison_memory if (!get_hwpoison_page(p)) TestClearPageHWPoison num_poisoned_pages_dec __mod_zone_freepage_state return 0 /* miss put the page back to buddy */ page_ref_inc num_poisoned_pages_inc How about do nothing and return -EBUSY (so the caller can retry) if unpoison a zero refcount page , or return 0 like 230ac719c500 ("mm/hwpoison: don't try to unpoison containment-failed pages") does ? @@ -1736,11 +1736,9 @@ int unpoison_memory(unsigned long pfn) } if (!get_hwpoison_page(p, flags, 0)) { - if (TestClearPageHWPoison(p)) - num_poisoned_pages_dec(); - unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", + unpoison_pr_info("Unpoison: Software-unpoisoned zero refcount page %#lx\n", pfn, &unpoison_rs); - return 0; + return -EBUSY; } lock_page(page); -- Thanks, - Ding Hui ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy 2021-05-06 4:01 ` Ding Hui @ 2021-05-06 7:30 ` HORIGUCHI NAOYA(堀口 直也) 2021-05-07 1:46 ` Ding Hui 0 siblings, 1 reply; 10+ messages in thread From: HORIGUCHI NAOYA(堀口 直也) @ 2021-05-06 7:30 UTC (permalink / raw) To: Ding Hui; +Cc: David Hildenbrand, akpm, osalvador, linux-mm, linux-kernel On Thu, May 06, 2021 at 12:01:34PM +0800, Ding Hui wrote: > On 2021/5/6 10:49, HORIGUCHI NAOYA(堀口 直也) wrote: > > On Wed, Apr 28, 2021 at 04:54:59PM +0200, David Hildenbrand wrote: > > > On 21.04.21 04:04, Ding Hui wrote: > > > > Recently we found there is a lot MemFree left in /proc/meminfo after > > > > do a lot of pages soft offline. > > > > > > > > I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages. > > > > After take_page_off_buddy, the page is no longer belong to buddy > > > > allocator, and will not be used any more, but we maybe missed accounting > > > > NR_FREE_PAGES in this situation. > > > > > > > > Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> > > > > --- > > > > mm/page_alloc.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > > index cfc72873961d..8d65b62784d8 100644 > > > > --- a/mm/page_alloc.c > > > > +++ b/mm/page_alloc.c > > > > @@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page) > > > > del_page_from_free_list(page_head, zone, page_order); > > > > break_down_buddy_pages(zone, page_head, page, 0, > > > > page_order, migratetype); > > > > + __mod_zone_page_state(zone, NR_FREE_PAGES, -1); > > > > ret = true; > > > > break; > > > > } > > > > > > > > > > Should this use __mod_zone_freepage_state() instead? > > > > Yes, __mod_zone_freepage_state() looks better to me. > > > > And I think that maybe an additional __mod_zone_freepage_state() in > > unpoison_memory() is necessary to cancel the decrement. I thought of the > > following, but it doesn't build because get_pfnblock_migratetype() is > > available only in mm/page_alloc.c, so you might want to add a small exported > > routine in mm/page_alloc.c and let it called from unpoison_memory(). > > > > @@ -1899,8 +1899,12 @@ int unpoison_memory(unsigned long pfn) > > } > > if (!get_hwpoison_page(p, flags, 0)) { > > - if (TestClearPageHWPoison(p)) > > + if (TestClearPageHWPoison(p)) { > > + int migratetype = get_pfnblock_migratetype(p, pfn); > > + > > num_poisoned_pages_dec(); > > + __mod_zone_freepage_state(page_zone(p), 1, migratetype); > > + } > > unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", > > pfn, &unpoison_rs); > > return 0; > > > > I think there is another problem: > In normal case, we keep the last refcount of the hwpoison page, so > get_hwpoison_page should return 1. The NR_FREE_PAGES will be adjusted when > call put_page. I think that take_page_off_buddy() should not be called for this case (the error page have remaining refcount). So it seems that no need to update NR_FREE_PAGES ? > At race condition, we maybe leak the page because we does not put it back to > buddy in unpoison_memory, however the HWPoison flag, num_poisoned_pages, > NR_FREE_PAGES is adjusted correctly. > > CPU0 CPU1 > > soft_offline_page > soft_offline_free_page > page_handle_poison > take_page_off_buddy > SetPageHWPoison > unpoison_memory > if (!get_hwpoison_page(p)) > TestClearPageHWPoison > num_poisoned_pages_dec > __mod_zone_freepage_state > return 0 > /* miss put the page back to buddy */ > page_ref_inc > num_poisoned_pages_inc Thanks for checking this, unpoison_memory() is racy. Recently we are suggesting to introduce mf_mutex by [1]. Although this patch is not merged to mainline yet, but it could be used to prevent the above race too. [1] https://lore.kernel.org/linux-mm/20210427062953.2080293-2-nao.horiguchi@gmail.com/ > > How about do nothing and return -EBUSY (so the caller can retry) if unpoison > a zero refcount page , or return 0 like 230ac719c500 ("mm/hwpoison: don't > try to unpoison containment-failed pages") does ? > > @@ -1736,11 +1736,9 @@ int unpoison_memory(unsigned long pfn) > } > > if (!get_hwpoison_page(p, flags, 0)) { > - if (TestClearPageHWPoison(p)) > - num_poisoned_pages_dec(); > - unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", > + unpoison_pr_info("Unpoison: Software-unpoisoned zero refcount page > %#lx\n", > pfn, &unpoison_rs); > - return 0; > + return -EBUSY; Currently unpoison_memory() does not work as reverse operation of take_page_off_buddy() (it's simply broken), so implementing it at one time would be better. I'll take time to fix unpoison_memory(). Thanks, Naoya Horiguchi ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy 2021-05-06 7:30 ` HORIGUCHI NAOYA(堀口 直也) @ 2021-05-07 1:46 ` Ding Hui 0 siblings, 0 replies; 10+ messages in thread From: Ding Hui @ 2021-05-07 1:46 UTC (permalink / raw) To: HORIGUCHI NAOYA(堀口 直也) Cc: David Hildenbrand, akpm, osalvador, linux-mm, linux-kernel On 2021/5/6 15:30, HORIGUCHI NAOYA(堀口 直也) wrote: > On Thu, May 06, 2021 at 12:01:34PM +0800, Ding Hui wrote: >> On 2021/5/6 10:49, HORIGUCHI NAOYA(堀口 直也) wrote: >>> On Wed, Apr 28, 2021 at 04:54:59PM +0200, David Hildenbrand wrote: >>>> On 21.04.21 04:04, Ding Hui wrote: >>>>> Recently we found there is a lot MemFree left in /proc/meminfo after >>>>> do a lot of pages soft offline. >>>>> >>>>> I think it's incorrect since NR_FREE_PAGES should not contain HWPoison pages. >>>>> After take_page_off_buddy, the page is no longer belong to buddy >>>>> allocator, and will not be used any more, but we maybe missed accounting >>>>> NR_FREE_PAGES in this situation. >>>>> >>>>> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> >>>>> --- >>>>> mm/page_alloc.c | 1 + >>>>> 1 file changed, 1 insertion(+) >>>>> >>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>> index cfc72873961d..8d65b62784d8 100644 >>>>> --- a/mm/page_alloc.c >>>>> +++ b/mm/page_alloc.c >>>>> @@ -8947,6 +8947,7 @@ bool take_page_off_buddy(struct page *page) >>>>> del_page_from_free_list(page_head, zone, page_order); >>>>> break_down_buddy_pages(zone, page_head, page, 0, >>>>> page_order, migratetype); >>>>> + __mod_zone_page_state(zone, NR_FREE_PAGES, -1); >>>>> ret = true; >>>>> break; >>>>> } >>>>> >>>> >>>> Should this use __mod_zone_freepage_state() instead? >>> >>> Yes, __mod_zone_freepage_state() looks better to me. >>> >>> And I think that maybe an additional __mod_zone_freepage_state() in >>> unpoison_memory() is necessary to cancel the decrement. I thought of the >>> following, but it doesn't build because get_pfnblock_migratetype() is >>> available only in mm/page_alloc.c, so you might want to add a small exported >>> routine in mm/page_alloc.c and let it called from unpoison_memory(). >>> >>> @@ -1899,8 +1899,12 @@ int unpoison_memory(unsigned long pfn) >>> } >>> if (!get_hwpoison_page(p, flags, 0)) { >>> - if (TestClearPageHWPoison(p)) >>> + if (TestClearPageHWPoison(p)) { >>> + int migratetype = get_pfnblock_migratetype(p, pfn); >>> + >>> num_poisoned_pages_dec(); >>> + __mod_zone_freepage_state(page_zone(p), 1, migratetype); >>> + } >>> unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", >>> pfn, &unpoison_rs); >>> return 0; >>> >> >> I think there is another problem: >> In normal case, we keep the last refcount of the hwpoison page, so >> get_hwpoison_page should return 1. The NR_FREE_PAGES will be adjusted when >> call put_page. > > I think that take_page_off_buddy() should not be called for this case > (the error page have remaining refcount). So it seems that no need to > update NR_FREE_PAGES ? > Yes, take_page_off_buddy() only used for free pages, but we will call page_ref_inc() after that, on the other hand for in used pages, we increased the refcount by get_any_page(), so in both cases, the hwpoisoned pages have refcount great than zero. I think there is no need to update NR_FREE_PAGES explicitly in unpoison_memory(), the put_page() will help us to update NR_FREE_PAGES and put the page back to buddy. >> At race condition, we maybe leak the page because we does not put it back to >> buddy in unpoison_memory, however the HWPoison flag, num_poisoned_pages, >> NR_FREE_PAGES is adjusted correctly. >> >> CPU0 CPU1 >> >> soft_offline_page >> soft_offline_free_page >> page_handle_poison >> take_page_off_buddy >> SetPageHWPoison >> unpoison_memory >> if (!get_hwpoison_page(p)) >> TestClearPageHWPoison >> num_poisoned_pages_dec >> __mod_zone_freepage_state >> return 0 >> /* miss put the page back to buddy */ >> page_ref_inc >> num_poisoned_pages_inc > > Thanks for checking this, unpoison_memory() is racy. Recently we are suggesting > to introduce mf_mutex by [1]. Although this patch is not merged to mainline yet, > but it could be used to prevent the above race too. > > [1] https://lore.kernel.org/linux-mm/20210427062953.2080293-2-nao.horiguchi@gmail.com/ > I'll look forward to it, thanks. >> >> How about do nothing and return -EBUSY (so the caller can retry) if unpoison >> a zero refcount page , or return 0 like 230ac719c500 ("mm/hwpoison: don't >> try to unpoison containment-failed pages") does ? >> >> @@ -1736,11 +1736,9 @@ int unpoison_memory(unsigned long pfn) >> } >> >> if (!get_hwpoison_page(p, flags, 0)) { >> - if (TestClearPageHWPoison(p)) >> - num_poisoned_pages_dec(); >> - unpoison_pr_info("Unpoison: Software-unpoisoned free page %#lx\n", >> + unpoison_pr_info("Unpoison: Software-unpoisoned zero refcount page >> %#lx\n", >> pfn, &unpoison_rs); >> - return 0; >> + return -EBUSY; > > Currently unpoison_memory() does not work as reverse operation of take_page_off_buddy() > (it's simply broken), so implementing it at one time would be better. > I'll take time to fix unpoison_memory(). > Thanks for your work. Actually, I'm not sure about the exactly meaning of "broken", it seems that the basic function of unpoison_memory() is ok if not considered the race conditions. -- Thanks, - Ding Hui ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2021-05-26 0:43 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-04-21 2:04 [RFC PATCH] mm/page_alloc: fix counting of free pages after take off from buddy Ding Hui 2021-04-28 14:54 ` David Hildenbrand 2021-04-30 9:43 ` Ding Hui 2021-05-08 3:55 ` [PATCH v2] " Ding Hui 2021-05-25 8:32 ` HORIGUCHI NAOYA(堀口 直也) 2021-05-26 0:43 ` Ding Hui 2021-05-06 2:49 ` [RFC PATCH] " HORIGUCHI NAOYA(堀口 直也) 2021-05-06 4:01 ` Ding Hui 2021-05-06 7:30 ` HORIGUCHI NAOYA(堀口 直也) 2021-05-07 1:46 ` Ding Hui
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).