All of lore.kernel.org
 help / color / mirror / Atom feed
* + mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation.patch added to -mm tree
@ 2022-04-04 21:50 Andrew Morton
  2022-04-05 10:29 ` David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2022-04-04 21:50 UTC (permalink / raw)
  To: mm-commits, vbabka, torvalds, rppt, rostedt, osalvador, mgorman,
	david, ziy, akpm


The patch titled
     Subject: mm: wrap __find_buddy_pfn() with a necessary buddy page validation
has been added to the -mm tree.  Its filename is
     mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation.patch

This patch should soon appear at
    https://ozlabs.org/~akpm/mmots/broken-out/mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation.patch
and later at
    https://ozlabs.org/~akpm/mmotm/broken-out/mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Zi Yan <ziy@nvidia.com>
Subject: mm: wrap __find_buddy_pfn() with a necessary buddy page validation

Whenever the buddy of a page is found from __find_buddy_pfn(),
page_is_buddy() should be used to check its validity. Add a helper
function find_buddy_page_pfn() to find the buddy page and do the check
together.

Link: https://lkml.kernel.org/r/20220401230804.1658207-2-zi.yan@sent.com
Link: https://lore.kernel.org/linux-mm/CAHk-=wji_AmYygZMTsPMdJ7XksMt7kOur8oDfDdniBRMjm4VkQ@mail.gmail.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/internal.h       |  117 ++++++++++++++++++++++++++++++++----------
 mm/page_alloc.c     |   52 ++----------------
 mm/page_isolation.c |    9 +--
 3 files changed, 101 insertions(+), 77 deletions(-)

--- a/mm/internal.h~mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation
+++ a/mm/internal.h
@@ -212,6 +212,67 @@ struct alloc_context {
 };
 
 /*
+ * This function returns the order of a free page in the buddy system. In
+ * general, page_zone(page)->lock must be held by the caller to prevent the
+ * page from being allocated in parallel and returning garbage as the order.
+ * If a caller does not hold page_zone(page)->lock, it must guarantee that the
+ * page cannot be allocated or merged in parallel. Alternatively, it must
+ * handle invalid values gracefully, and use buddy_order_unsafe() below.
+ */
+static inline unsigned int buddy_order(struct page *page)
+{
+	/* PageBuddy() must be checked by the caller */
+	return page_private(page);
+}
+
+/*
+ * Like buddy_order(), but for callers who cannot afford to hold the zone lock.
+ * PageBuddy() should be checked first by the caller to minimize race window,
+ * and invalid values must be handled gracefully.
+ *
+ * READ_ONCE is used so that if the caller assigns the result into a local
+ * variable and e.g. tests it for valid range before using, the compiler cannot
+ * decide to remove the variable and inline the page_private(page) multiple
+ * times, potentially observing different values in the tests and the actual
+ * use of the result.
+ */
+#define buddy_order_unsafe(page)	READ_ONCE(page_private(page))
+
+/*
+ * This function checks whether a page is free && is the buddy
+ * we can coalesce a page and its buddy if
+ * (a) the buddy is not in a hole (check before calling!) &&
+ * (b) the buddy is in the buddy system &&
+ * (c) a page and its buddy have the same order &&
+ * (d) a page and its buddy are in the same zone.
+ *
+ * For recording whether a page is in the buddy system, we set PageBuddy.
+ * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
+ *
+ * For recording page's order, we use page_private(page).
+ */
+static inline bool page_is_buddy(struct page *page, struct page *buddy,
+							unsigned int order)
+{
+	if (!page_is_guard(buddy) && !PageBuddy(buddy))
+		return false;
+
+	if (buddy_order(buddy) != order)
+		return false;
+
+	/*
+	 * zone check is done late to avoid uselessly calculating
+	 * zone/node ids for pages that could never merge.
+	 */
+	if (page_zone_id(page) != page_zone_id(buddy))
+		return false;
+
+	VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
+
+	return true;
+}
+
+/*
  * Locate the struct page for both the matching buddy in our
  * pair (buddy1) and the combined O(n+1) page they form (page).
  *
@@ -234,6 +295,35 @@ __find_buddy_pfn(unsigned long page_pfn,
 	return page_pfn ^ (1 << order);
 }
 
+/*
+ * Find the buddy of @page and validate it.
+ * @page: The input page
+ * @pfn: The pfn of the page, it saves a call to page_to_pfn() when the
+ *       function is used in the performance-critical __free_one_page().
+ * @order: The order of the page
+ * @buddy_pfn: The output pointer to the buddy pfn, it also saves a call to
+ *             page_to_pfn().
+ *
+ * The found buddy can be a non PageBuddy, out of @page's zone, or its order is
+ * not the same as @page. The validation is necessary before use it.
+ *
+ * Return: the found buddy page or NULL if not found.
+ */
+static inline struct page *find_buddy_page_pfn(struct page *page,
+			unsigned long pfn, unsigned int order, unsigned long *buddy_pfn)
+{
+	struct page *buddy;
+	unsigned long __buddy_pfn = __find_buddy_pfn(pfn, order);
+
+	buddy = page + (__buddy_pfn - pfn);
+	if (buddy_pfn)
+		*buddy_pfn = __buddy_pfn;
+
+	if (page_is_buddy(page, buddy, order))
+		return buddy;
+	return NULL;
+}
+
 extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
 				unsigned long end_pfn, struct zone *zone);
 
@@ -337,33 +427,6 @@ int find_suitable_fallback(struct free_a
 			int migratetype, bool only_stealable, bool *can_steal);
 
 /*
- * This function returns the order of a free page in the buddy system. In
- * general, page_zone(page)->lock must be held by the caller to prevent the
- * page from being allocated in parallel and returning garbage as the order.
- * If a caller does not hold page_zone(page)->lock, it must guarantee that the
- * page cannot be allocated or merged in parallel. Alternatively, it must
- * handle invalid values gracefully, and use buddy_order_unsafe() below.
- */
-static inline unsigned int buddy_order(struct page *page)
-{
-	/* PageBuddy() must be checked by the caller */
-	return page_private(page);
-}
-
-/*
- * Like buddy_order(), but for callers who cannot afford to hold the zone lock.
- * PageBuddy() should be checked first by the caller to minimize race window,
- * and invalid values must be handled gracefully.
- *
- * READ_ONCE is used so that if the caller assigns the result into a local
- * variable and e.g. tests it for valid range before using, the compiler cannot
- * decide to remove the variable and inline the page_private(page) multiple
- * times, potentially observing different values in the tests and the actual
- * use of the result.
- */
-#define buddy_order_unsafe(page)	READ_ONCE(page_private(page))
-
-/*
  * These three helpers classifies VMAs for virtual memory accounting.
  */
 
--- a/mm/page_alloc.c~mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation
+++ a/mm/page_alloc.c
@@ -868,40 +868,6 @@ static inline void set_buddy_order(struc
 	__SetPageBuddy(page);
 }
 
-/*
- * This function checks whether a page is free && is the buddy
- * we can coalesce a page and its buddy if
- * (a) the buddy is not in a hole (check before calling!) &&
- * (b) the buddy is in the buddy system &&
- * (c) a page and its buddy have the same order &&
- * (d) a page and its buddy are in the same zone.
- *
- * For recording whether a page is in the buddy system, we set PageBuddy.
- * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
- *
- * For recording page's order, we use page_private(page).
- */
-static inline bool page_is_buddy(struct page *page, struct page *buddy,
-							unsigned int order)
-{
-	if (!page_is_guard(buddy) && !PageBuddy(buddy))
-		return false;
-
-	if (buddy_order(buddy) != order)
-		return false;
-
-	/*
-	 * zone check is done late to avoid uselessly calculating
-	 * zone/node ids for pages that could never merge.
-	 */
-	if (page_zone_id(page) != page_zone_id(buddy))
-		return false;
-
-	VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
-
-	return true;
-}
-
 #ifdef CONFIG_COMPACTION
 static inline struct capture_control *task_capc(struct zone *zone)
 {
@@ -1010,18 +976,17 @@ static inline bool
 buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn,
 		   struct page *page, unsigned int order)
 {
-	struct page *higher_page, *higher_buddy;
-	unsigned long combined_pfn;
+	struct page *higher_page;
+	unsigned long higher_page_pfn;
 
 	if (order >= MAX_ORDER - 2)
 		return false;
 
-	combined_pfn = buddy_pfn & pfn;
-	higher_page = page + (combined_pfn - pfn);
-	buddy_pfn = __find_buddy_pfn(combined_pfn, order + 1);
-	higher_buddy = higher_page + (buddy_pfn - combined_pfn);
+	higher_page_pfn = buddy_pfn & pfn;
+	higher_page = page + (higher_page_pfn - pfn);
 
-	return page_is_buddy(higher_page, higher_buddy, order + 1);
+	return find_buddy_page_pfn(higher_page, higher_page_pfn, order + 1,
+			NULL) != NULL;
 }
 
 /*
@@ -1075,10 +1040,9 @@ static inline void __free_one_page(struc
 								migratetype);
 			return;
 		}
-		buddy_pfn = __find_buddy_pfn(pfn, order);
-		buddy = page + (buddy_pfn - pfn);
 
-		if (!page_is_buddy(page, buddy, order))
+		buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn);
+		if (!buddy)
 			goto done_merging;
 
 		if (unlikely(order >= pageblock_order)) {
--- a/mm/page_isolation.c~mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation
+++ a/mm/page_isolation.c
@@ -70,7 +70,6 @@ static void unset_migratetype_isolate(st
 	unsigned long flags, nr_pages;
 	bool isolated_page = false;
 	unsigned int order;
-	unsigned long pfn, buddy_pfn;
 	struct page *buddy;
 
 	zone = page_zone(page);
@@ -89,11 +88,9 @@ static void unset_migratetype_isolate(st
 	if (PageBuddy(page)) {
 		order = buddy_order(page);
 		if (order >= pageblock_order && order < MAX_ORDER - 1) {
-			pfn = page_to_pfn(page);
-			buddy_pfn = __find_buddy_pfn(pfn, order);
-			buddy = page + (buddy_pfn - pfn);
-
-			if (!is_migrate_isolate_page(buddy)) {
+			buddy = find_buddy_page_pfn(page, page_to_pfn(page),
+						    order, NULL);
+			if (buddy && !is_migrate_isolate_page(buddy)) {
 				isolated_page = !!__isolate_free_page(page, order);
 				/*
 				 * Isolating a free page in an isolated pageblock
_

Patches currently in -mm which might be from ziy@nvidia.com are

mm-migrate-use-thp_order-instead-of-hpage_pmd_order-for-new-page-allocation.patch
mm-page_alloc-simplify-pageblock-migratetype-check-in-__free_one_page.patch
mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation.patch


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: + mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation.patch added to -mm tree
  2022-04-04 21:50 + mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation.patch added to -mm tree Andrew Morton
@ 2022-04-05 10:29 ` David Hildenbrand
  2022-04-05 10:42   ` Vlastimil Babka
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2022-04-05 10:29 UTC (permalink / raw)
  To: Andrew Morton, mm-commits, vbabka, torvalds, rppt, rostedt,
	osalvador, mgorman, ziy

> From: Zi Yan <ziy@nvidia.com>
> Subject: mm: wrap __find_buddy_pfn() with a necessary buddy page validation
> 
> Whenever the buddy of a page is found from __find_buddy_pfn(),
> page_is_buddy() should be used to check its validity. Add a helper
> function find_buddy_page_pfn() to find the buddy page and do the check
> together.
> 
> Link: https://lkml.kernel.org/r/20220401230804.1658207-2-zi.yan@sent.com
> Link: https://lore.kernel.org/linux-mm/CAHk-=wji_AmYygZMTsPMdJ7XksMt7kOur8oDfDdniBRMjm4VkQ@mail.gmail.com/
> Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
> Signed-off-by: Zi Yan <ziy@nvidia.com>
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Sorry for replying here, I hope Mimecast stops messing up my mailbox
soon. Only some nits.

In general, LGTM

Acked-by: David Hildenbrand <david@redhat.com>



> ---
> 
>  mm/internal.h       |  117 ++++++++++++++++++++++++++++++++----------
>  mm/page_alloc.c     |   52 ++----------------
>  mm/page_isolation.c |    9 +--
>  3 files changed, 101 insertions(+), 77 deletions(-)
> 
> --- a/mm/internal.h~mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation
> +++ a/mm/internal.h
> @@ -212,6 +212,67 @@ struct alloc_context {
>  };
>  
>  /*
> + * This function returns the order of a free page in the buddy system. In
> + * general, page_zone(page)->lock must be held by the caller to prevent the
> + * page from being allocated in parallel and returning garbage as the order.
> + * If a caller does not hold page_zone(page)->lock, it must guarantee that the
> + * page cannot be allocated or merged in parallel. Alternatively, it must
> + * handle invalid values gracefully, and use buddy_order_unsafe() below.
> + */
> +static inline unsigned int buddy_order(struct page *page)
> +{
> +	/* PageBuddy() must be checked by the caller */
> +	return page_private(page);
> +}
> +
> +/*
> + * Like buddy_order(), but for callers who cannot afford to hold the zone lock.
> + * PageBuddy() should be checked first by the caller to minimize race window,
> + * and invalid values must be handled gracefully.
> + *
> + * READ_ONCE is used so that if the caller assigns the result into a local
> + * variable and e.g. tests it for valid range before using, the compiler cannot
> + * decide to remove the variable and inline the page_private(page) multiple
> + * times, potentially observing different values in the tests and the actual
> + * use of the result.
> + */
> +#define buddy_order_unsafe(page)	READ_ONCE(page_private(page))
> +
> +/*
> + * This function checks whether a page is free && is the buddy
> + * we can coalesce a page and its buddy if
> + * (a) the buddy is not in a hole (check before calling!) &&
> + * (b) the buddy is in the buddy system &&
> + * (c) a page and its buddy have the same order &&
> + * (d) a page and its buddy are in the same zone.
> + *
> + * For recording whether a page is in the buddy system, we set PageBuddy.
> + * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
> + *
> + * For recording page's order, we use page_private(page).
> + */
> +static inline bool page_is_buddy(struct page *page, struct page *buddy,
> +							unsigned int order)

Nit: Strange indentation of the last parameter. But it's already in the
code you're moving, so ...

> +{
> +	if (!page_is_guard(buddy) && !PageBuddy(buddy))
> +		return false;
> +
> +	if (buddy_order(buddy) != order)
> +		return false;
> +
> +	/*
> +	 * zone check is done late to avoid uselessly calculating
> +	 * zone/node ids for pages that could never merge.

I know, this is from existing code. But there really isn't a node check
when relying on the zone id. I hope we are not missing a node check :)

> +	 */
> +	if (page_zone_id(page) != page_zone_id(buddy))
> +		return false;
> +
> +	VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
> +
> +	return true;
> +}
> +
> +/*
>   * Locate the struct page for both the matching buddy in our
>   * pair (buddy1) and the combined O(n+1) page they form (page).
>   *
> @@ -234,6 +295,35 @@ __find_buddy_pfn(unsigned long page_pfn,
>  	return page_pfn ^ (1 << order);
>  }
>  
> +/*
> + * Find the buddy of @page and validate it.
> + * @page: The input page
> + * @pfn: The pfn of the page, it saves a call to page_to_pfn() when the
> + *       function is used in the performance-critical __free_one_page().
> + * @order: The order of the page
> + * @buddy_pfn: The output pointer to the buddy pfn, it also saves a call to
> + *             page_to_pfn().
> + *
> + * The found buddy can be a non PageBuddy, out of @page's zone, or its order is
> + * not the same as @page. The validation is necessary before use it.
> + *
> + * Return: the found buddy page or NULL if not found.
> + */
> +static inline struct page *find_buddy_page_pfn(struct page *page,
> +			unsigned long pfn, unsigned int order, unsigned long *buddy_pfn)
> +{
> +	struct page *buddy;
> +	unsigned long __buddy_pfn = __find_buddy_pfn(pfn, order);

Nit: reverse christmas tree

> +
> +	buddy = page + (__buddy_pfn - pfn);
> +	if (buddy_pfn)
> +		*buddy_pfn = __buddy_pfn;
> +
> +	if (page_is_buddy(page, buddy, order))
> +		return buddy;
> +	return NULL;
> +}
> +

[...]
>  #ifdef CONFIG_COMPACTION
>  static inline struct capture_control *task_capc(struct zone *zone)
>  {
> @@ -1010,18 +976,17 @@ static inline bool
>  buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn,
>  		   struct page *page, unsigned int order)
>  {
> -	struct page *higher_page, *higher_buddy;
> -	unsigned long combined_pfn;
> +	struct page *higher_page;
> +	unsigned long higher_page_pfn;

Nit: reverse christmas tree



-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: + mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation.patch added to -mm tree
  2022-04-05 10:29 ` David Hildenbrand
@ 2022-04-05 10:42   ` Vlastimil Babka
  2022-04-05 10:52     ` David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: Vlastimil Babka @ 2022-04-05 10:42 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton, mm-commits, torvalds, rppt,
	rostedt, osalvador, mgorman, ziy

On 4/5/22 12:29, David Hildenbrand wrote:
>> +
>> +/*
>> + * This function checks whether a page is free && is the buddy
>> + * we can coalesce a page and its buddy if
>> + * (a) the buddy is not in a hole (check before calling!) &&
>> + * (b) the buddy is in the buddy system &&
>> + * (c) a page and its buddy have the same order &&
>> + * (d) a page and its buddy are in the same zone.
>> + *
>> + * For recording whether a page is in the buddy system, we set PageBuddy.
>> + * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
>> + *
>> + * For recording page's order, we use page_private(page).
>> + */
>> +static inline bool page_is_buddy(struct page *page, struct page *buddy,
>> +							unsigned int order)
> 
> Nit: Strange indentation of the last parameter. But it's already in the
> code you're moving, so ...
> 
>> +{
>> +	if (!page_is_guard(buddy) && !PageBuddy(buddy))
>> +		return false;
>> +
>> +	if (buddy_order(buddy) != order)
>> +		return false;
>> +
>> +	/*
>> +	 * zone check is done late to avoid uselessly calculating
>> +	 * zone/node ids for pages that could never merge.
> 
> I know, this is from existing code. But there really isn't a node check
> when relying on the zone id. I hope we are not missing a node check :)

AFAIK "zone_id" is in fact the combination of node+zone (or section+zone),
so it works as long as in config where it's a section
(NODE_NOT_IN_PAGE_FLAGS is defined) then there can't be multiple nodes in a
single section.

>> +	 */
>> +	if (page_zone_id(page) != page_zone_id(buddy))
>> +		return false;
>> +
>> +	VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
>> +
>> +	return true;
>> +}
>> +
>> +/*
>>   * Locate the struct page for both the matching buddy in our
>>   * pair (buddy1) and the combined O(n+1) page they form (page).
>>   *
>> @@ -234,6 +295,35 @@ __find_buddy_pfn(unsigned long page_pfn,
>>  	return page_pfn ^ (1 << order);
>>  }
>>  
>> +/*
>> + * Find the buddy of @page and validate it.
>> + * @page: The input page
>> + * @pfn: The pfn of the page, it saves a call to page_to_pfn() when the
>> + *       function is used in the performance-critical __free_one_page().
>> + * @order: The order of the page
>> + * @buddy_pfn: The output pointer to the buddy pfn, it also saves a call to
>> + *             page_to_pfn().
>> + *
>> + * The found buddy can be a non PageBuddy, out of @page's zone, or its order is
>> + * not the same as @page. The validation is necessary before use it.
>> + *
>> + * Return: the found buddy page or NULL if not found.
>> + */
>> +static inline struct page *find_buddy_page_pfn(struct page *page,
>> +			unsigned long pfn, unsigned int order, unsigned long *buddy_pfn)
>> +{
>> +	struct page *buddy;
>> +	unsigned long __buddy_pfn = __find_buddy_pfn(pfn, order);
> 
> Nit: reverse christmas tree

Was ever reverse xmas tree part of mm/ style or whole Linux styel outside of
net?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: + mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation.patch added to -mm tree
  2022-04-05 10:42   ` Vlastimil Babka
@ 2022-04-05 10:52     ` David Hildenbrand
  2022-04-05 11:06       ` David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2022-04-05 10:52 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, mm-commits, torvalds, rppt,
	rostedt, osalvador, mgorman, ziy

On 05.04.22 12:42, Vlastimil Babka wrote:
> On 4/5/22 12:29, David Hildenbrand wrote:
>>> +
>>> +/*
>>> + * This function checks whether a page is free && is the buddy
>>> + * we can coalesce a page and its buddy if
>>> + * (a) the buddy is not in a hole (check before calling!) &&
>>> + * (b) the buddy is in the buddy system &&
>>> + * (c) a page and its buddy have the same order &&
>>> + * (d) a page and its buddy are in the same zone.
>>> + *
>>> + * For recording whether a page is in the buddy system, we set PageBuddy.
>>> + * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
>>> + *
>>> + * For recording page's order, we use page_private(page).
>>> + */
>>> +static inline bool page_is_buddy(struct page *page, struct page *buddy,
>>> +							unsigned int order)
>>
>> Nit: Strange indentation of the last parameter. But it's already in the
>> code you're moving, so ...
>>
>>> +{
>>> +	if (!page_is_guard(buddy) && !PageBuddy(buddy))
>>> +		return false;
>>> +
>>> +	if (buddy_order(buddy) != order)
>>> +		return false;
>>> +
>>> +	/*
>>> +	 * zone check is done late to avoid uselessly calculating
>>> +	 * zone/node ids for pages that could never merge.
>>
>> I know, this is from existing code. But there really isn't a node check
>> when relying on the zone id. I hope we are not missing a node check :)
> 
> AFAIK "zone_id" is in fact the combination of node+zone (or section+zone),
> so it works as long as in config where it's a section
> (NODE_NOT_IN_PAGE_FLAGS is defined) then there can't be multiple nodes in a
> single section.

Oh, okay. I was reading "We are not really identifying the zone since we
could be using the section number id if we do not have node id available
in page flags.. We only guarantee that it will return the same value for
two combinable pages in a zone."

an naturally wondered if this could be problematic. Thanks for clarifying.

(once we would support MAX_ORDER-1 exceed a single memory section this
could be problematic)

> 
>>> +	 */
>>> +	if (page_zone_id(page) != page_zone_id(buddy))
>>> +		return false;
>>> +
>>> +	VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
>>> +
>>> +	return true;
>>> +}
>>> +
>>> +/*
>>>   * Locate the struct page for both the matching buddy in our
>>>   * pair (buddy1) and the combined O(n+1) page they form (page).
>>>   *
>>> @@ -234,6 +295,35 @@ __find_buddy_pfn(unsigned long page_pfn,
>>>  	return page_pfn ^ (1 << order);
>>>  }
>>>  
>>> +/*
>>> + * Find the buddy of @page and validate it.
>>> + * @page: The input page
>>> + * @pfn: The pfn of the page, it saves a call to page_to_pfn() when the
>>> + *       function is used in the performance-critical __free_one_page().
>>> + * @order: The order of the page
>>> + * @buddy_pfn: The output pointer to the buddy pfn, it also saves a call to
>>> + *             page_to_pfn().
>>> + *
>>> + * The found buddy can be a non PageBuddy, out of @page's zone, or its order is
>>> + * not the same as @page. The validation is necessary before use it.
>>> + *
>>> + * Return: the found buddy page or NULL if not found.
>>> + */
>>> +static inline struct page *find_buddy_page_pfn(struct page *page,
>>> +			unsigned long pfn, unsigned int order, unsigned long *buddy_pfn)
>>> +{
>>> +	struct page *buddy;
>>> +	unsigned long __buddy_pfn = __find_buddy_pfn(pfn, order);
>>
>> Nit: reverse christmas tree
> 
> Was ever reverse xmas tree part of mm/ style or whole Linux styel outside of
> net?

I assume it's used fairly inconsistently.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: + mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation.patch added to -mm tree
  2022-04-05 10:52     ` David Hildenbrand
@ 2022-04-05 11:06       ` David Hildenbrand
  2022-04-05 14:27         ` Zi Yan
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2022-04-05 11:06 UTC (permalink / raw)
  To: Vlastimil Babka, Andrew Morton, mm-commits, torvalds, rppt,
	rostedt, osalvador, mgorman, ziy

On 05.04.22 12:52, David Hildenbrand wrote:
> On 05.04.22 12:42, Vlastimil Babka wrote:
>> On 4/5/22 12:29, David Hildenbrand wrote:
>>>> +
>>>> +/*
>>>> + * This function checks whether a page is free && is the buddy
>>>> + * we can coalesce a page and its buddy if
>>>> + * (a) the buddy is not in a hole (check before calling!) &&
>>>> + * (b) the buddy is in the buddy system &&
>>>> + * (c) a page and its buddy have the same order &&
>>>> + * (d) a page and its buddy are in the same zone.
>>>> + *
>>>> + * For recording whether a page is in the buddy system, we set PageBuddy.
>>>> + * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
>>>> + *
>>>> + * For recording page's order, we use page_private(page).
>>>> + */
>>>> +static inline bool page_is_buddy(struct page *page, struct page *buddy,
>>>> +							unsigned int order)
>>>
>>> Nit: Strange indentation of the last parameter. But it's already in the
>>> code you're moving, so ...
>>>
>>>> +{
>>>> +	if (!page_is_guard(buddy) && !PageBuddy(buddy))
>>>> +		return false;
>>>> +
>>>> +	if (buddy_order(buddy) != order)
>>>> +		return false;
>>>> +
>>>> +	/*
>>>> +	 * zone check is done late to avoid uselessly calculating
>>>> +	 * zone/node ids for pages that could never merge.
>>>
>>> I know, this is from existing code. But there really isn't a node check
>>> when relying on the zone id. I hope we are not missing a node check :)
>>
>> AFAIK "zone_id" is in fact the combination of node+zone (or section+zone),
>> so it works as long as in config where it's a section
>> (NODE_NOT_IN_PAGE_FLAGS is defined) then there can't be multiple nodes in a
>> single section.
> 
> Oh, okay. I was reading "We are not really identifying the zone since we
> could be using the section number id if we do not have node id available
> in page flags.. We only guarantee that it will return the same value for
> two combinable pages in a zone."
> 
> an naturally wondered if this could be problematic. Thanks for clarifying.
> 
> (once we would support MAX_ORDER-1 exceed a single memory section this
> could be problematic)

Looking once more, I think the section check might be sufficient. I
wasn't aware that we'll always have the NODE id stored in page->flags
with vmemmap. (hopefully with more page->flags users like mglru that
won't change)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: + mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation.patch added to -mm tree
  2022-04-05 11:06       ` David Hildenbrand
@ 2022-04-05 14:27         ` Zi Yan
  0 siblings, 0 replies; 6+ messages in thread
From: Zi Yan @ 2022-04-05 14:27 UTC (permalink / raw)
  To: David Hildenbrand, Andrew Morton
  Cc: Vlastimil Babka, mm-commits, torvalds, rppt, rostedt, osalvador, mgorman

[-- Attachment #1: Type: text/plain, Size: 13092 bytes --]

On 5 Apr 2022, at 7:06, David Hildenbrand wrote:

> On 05.04.22 12:52, David Hildenbrand wrote:
>> On 05.04.22 12:42, Vlastimil Babka wrote:
>>> On 4/5/22 12:29, David Hildenbrand wrote:
>>>>> +
>>>>> +/*
>>>>> + * This function checks whether a page is free && is the buddy
>>>>> + * we can coalesce a page and its buddy if
>>>>> + * (a) the buddy is not in a hole (check before calling!) &&
>>>>> + * (b) the buddy is in the buddy system &&
>>>>> + * (c) a page and its buddy have the same order &&
>>>>> + * (d) a page and its buddy are in the same zone.
>>>>> + *
>>>>> + * For recording whether a page is in the buddy system, we set PageBuddy.
>>>>> + * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
>>>>> + *
>>>>> + * For recording page's order, we use page_private(page).
>>>>> + */
>>>>> +static inline bool page_is_buddy(struct page *page, struct page *buddy,
>>>>> +							unsigned int order)
>>>>
>>>> Nit: Strange indentation of the last parameter. But it's already in the
>>>> code you're moving, so ...
>>>>
>>>>> +{
>>>>> +	if (!page_is_guard(buddy) && !PageBuddy(buddy))
>>>>> +		return false;
>>>>> +
>>>>> +	if (buddy_order(buddy) != order)
>>>>> +		return false;
>>>>> +
>>>>> +	/*
>>>>> +	 * zone check is done late to avoid uselessly calculating
>>>>> +	 * zone/node ids for pages that could never merge.
>>>>
>>>> I know, this is from existing code. But there really isn't a node check
>>>> when relying on the zone id. I hope we are not missing a node check :)
>>>
>>> AFAIK "zone_id" is in fact the combination of node+zone (or section+zone),
>>> so it works as long as in config where it's a section
>>> (NODE_NOT_IN_PAGE_FLAGS is defined) then there can't be multiple nodes in a
>>> single section.
>>
>> Oh, okay. I was reading "We are not really identifying the zone since we
>> could be using the section number id if we do not have node id available
>> in page flags.. We only guarantee that it will return the same value for
>> two combinable pages in a zone."
>>
>> an naturally wondered if this could be problematic. Thanks for clarifying.
>>
>> (once we would support MAX_ORDER-1 exceed a single memory section this
>> could be problematic)
>
> Looking once more, I think the section check might be sufficient. I
> wasn't aware that we'll always have the NODE id stored in page->flags
> with vmemmap. (hopefully with more page->flags users like mglru that
> won't change)

Thanks for the review. All the nits are fixed below.

Hi Andrew, let me know if sending a v4 would be preferred.


From 8b42444d03c105d5270c0dafc1aaa3e8c59bc7d4 Mon Sep 17 00:00:00 2001
From: Zi Yan <ziy@nvidia.com>
Date: Thu, 31 Mar 2022 18:59:10 -0400
Subject: [PATCH v4 2/2] mm: wrap __find_buddy_pfn() with a necessary buddy
 page validation.

Whenever the buddy of a page is found from __find_buddy_pfn(),
page_is_buddy() should be used to check its validity. Add a helper
function find_buddy_page_pfn() to find the buddy page and do the check
together.

Link: https://lkml.kernel.org/r/20220401230804.1658207-2-zi.yan@sent.com
Link: https://lore.kernel.org/linux-mm/CAHk-=wji_AmYygZMTsPMdJ7XksMt7kOur8oDfDdniBRMjm4VkQ@mail.gmail.com/
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Zi Yan <ziy@nvidia.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Oscar Salvador <osalvador@suse.de>
---
 mm/internal.h       | 117 ++++++++++++++++++++++++++++++++++----------
 mm/page_alloc.c     |  52 +++-----------------
 mm/page_isolation.c |   9 ++--
 3 files changed, 101 insertions(+), 77 deletions(-)

diff --git a/mm/internal.h b/mm/internal.h
index 876e66237c89..6ec89b4f0bc8 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -211,6 +211,67 @@ struct alloc_context {
 	bool spread_dirty_pages;
 };

+/*
+ * This function returns the order of a free page in the buddy system. In
+ * general, page_zone(page)->lock must be held by the caller to prevent the
+ * page from being allocated in parallel and returning garbage as the order.
+ * If a caller does not hold page_zone(page)->lock, it must guarantee that the
+ * page cannot be allocated or merged in parallel. Alternatively, it must
+ * handle invalid values gracefully, and use buddy_order_unsafe() below.
+ */
+static inline unsigned int buddy_order(struct page *page)
+{
+	/* PageBuddy() must be checked by the caller */
+	return page_private(page);
+}
+
+/*
+ * Like buddy_order(), but for callers who cannot afford to hold the zone lock.
+ * PageBuddy() should be checked first by the caller to minimize race window,
+ * and invalid values must be handled gracefully.
+ *
+ * READ_ONCE is used so that if the caller assigns the result into a local
+ * variable and e.g. tests it for valid range before using, the compiler cannot
+ * decide to remove the variable and inline the page_private(page) multiple
+ * times, potentially observing different values in the tests and the actual
+ * use of the result.
+ */
+#define buddy_order_unsafe(page)	READ_ONCE(page_private(page))
+
+/*
+ * This function checks whether a page is free && is the buddy
+ * we can coalesce a page and its buddy if
+ * (a) the buddy is not in a hole (check before calling!) &&
+ * (b) the buddy is in the buddy system &&
+ * (c) a page and its buddy have the same order &&
+ * (d) a page and its buddy are in the same zone.
+ *
+ * For recording whether a page is in the buddy system, we set PageBuddy.
+ * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
+ *
+ * For recording page's order, we use page_private(page).
+ */
+static inline bool page_is_buddy(struct page *page, struct page *buddy,
+				 unsigned int order)
+{
+	if (!page_is_guard(buddy) && !PageBuddy(buddy))
+		return false;
+
+	if (buddy_order(buddy) != order)
+		return false;
+
+	/*
+	 * zone check is done late to avoid uselessly calculating
+	 * zone/node ids for pages that could never merge.
+	 */
+	if (page_zone_id(page) != page_zone_id(buddy))
+		return false;
+
+	VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
+
+	return true;
+}
+
 /*
  * Locate the struct page for both the matching buddy in our
  * pair (buddy1) and the combined O(n+1) page they form (page).
@@ -234,6 +295,35 @@ __find_buddy_pfn(unsigned long page_pfn, unsigned int order)
 	return page_pfn ^ (1 << order);
 }

+/*
+ * Find the buddy of @page and validate it.
+ * @page: The input page
+ * @pfn: The pfn of the page, it saves a call to page_to_pfn() when the
+ *       function is used in the performance-critical __free_one_page().
+ * @order: The order of the page
+ * @buddy_pfn: The output pointer to the buddy pfn, it also saves a call to
+ *             page_to_pfn().
+ *
+ * The found buddy can be a non PageBuddy, out of @page's zone, or its order is
+ * not the same as @page. The validation is necessary before use it.
+ *
+ * Return: the found buddy page or NULL if not found.
+ */
+static inline struct page *find_buddy_page_pfn(struct page *page,
+			unsigned long pfn, unsigned int order, unsigned long *buddy_pfn)
+{
+	unsigned long __buddy_pfn = __find_buddy_pfn(pfn, order);
+	struct page *buddy;
+
+	buddy = page + (__buddy_pfn - pfn);
+	if (buddy_pfn)
+		*buddy_pfn = __buddy_pfn;
+
+	if (page_is_buddy(page, buddy, order))
+		return buddy;
+	return NULL;
+}
+
 extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
 				unsigned long end_pfn, struct zone *zone);

@@ -336,33 +426,6 @@ isolate_migratepages_range(struct compact_control *cc,
 int find_suitable_fallback(struct free_area *area, unsigned int order,
 			int migratetype, bool only_stealable, bool *can_steal);

-/*
- * This function returns the order of a free page in the buddy system. In
- * general, page_zone(page)->lock must be held by the caller to prevent the
- * page from being allocated in parallel and returning garbage as the order.
- * If a caller does not hold page_zone(page)->lock, it must guarantee that the
- * page cannot be allocated or merged in parallel. Alternatively, it must
- * handle invalid values gracefully, and use buddy_order_unsafe() below.
- */
-static inline unsigned int buddy_order(struct page *page)
-{
-	/* PageBuddy() must be checked by the caller */
-	return page_private(page);
-}
-
-/*
- * Like buddy_order(), but for callers who cannot afford to hold the zone lock.
- * PageBuddy() should be checked first by the caller to minimize race window,
- * and invalid values must be handled gracefully.
- *
- * READ_ONCE is used so that if the caller assigns the result into a local
- * variable and e.g. tests it for valid range before using, the compiler cannot
- * decide to remove the variable and inline the page_private(page) multiple
- * times, potentially observing different values in the tests and the actual
- * use of the result.
- */
-#define buddy_order_unsafe(page)	READ_ONCE(page_private(page))
-
 /*
  * These three helpers classifies VMAs for virtual memory accounting.
  */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2ea106146686..f199931784e4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -868,40 +868,6 @@ static inline void set_buddy_order(struct page *page, unsigned int order)
 	__SetPageBuddy(page);
 }

-/*
- * This function checks whether a page is free && is the buddy
- * we can coalesce a page and its buddy if
- * (a) the buddy is not in a hole (check before calling!) &&
- * (b) the buddy is in the buddy system &&
- * (c) a page and its buddy have the same order &&
- * (d) a page and its buddy are in the same zone.
- *
- * For recording whether a page is in the buddy system, we set PageBuddy.
- * Setting, clearing, and testing PageBuddy is serialized by zone->lock.
- *
- * For recording page's order, we use page_private(page).
- */
-static inline bool page_is_buddy(struct page *page, struct page *buddy,
-							unsigned int order)
-{
-	if (!page_is_guard(buddy) && !PageBuddy(buddy))
-		return false;
-
-	if (buddy_order(buddy) != order)
-		return false;
-
-	/*
-	 * zone check is done late to avoid uselessly calculating
-	 * zone/node ids for pages that could never merge.
-	 */
-	if (page_zone_id(page) != page_zone_id(buddy))
-		return false;
-
-	VM_BUG_ON_PAGE(page_count(buddy) != 0, buddy);
-
-	return true;
-}
-
 #ifdef CONFIG_COMPACTION
 static inline struct capture_control *task_capc(struct zone *zone)
 {
@@ -1010,18 +976,17 @@ static inline bool
 buddy_merge_likely(unsigned long pfn, unsigned long buddy_pfn,
 		   struct page *page, unsigned int order)
 {
-	struct page *higher_page, *higher_buddy;
-	unsigned long combined_pfn;
+	unsigned long higher_page_pfn;
+	struct page *higher_page;

 	if (order >= MAX_ORDER - 2)
 		return false;

-	combined_pfn = buddy_pfn & pfn;
-	higher_page = page + (combined_pfn - pfn);
-	buddy_pfn = __find_buddy_pfn(combined_pfn, order + 1);
-	higher_buddy = higher_page + (buddy_pfn - combined_pfn);
+	higher_page_pfn = buddy_pfn & pfn;
+	higher_page = page + (higher_page_pfn - pfn);

-	return page_is_buddy(higher_page, higher_buddy, order + 1);
+	return find_buddy_page_pfn(higher_page, higher_page_pfn, order + 1,
+			NULL) != NULL;
 }

 /*
@@ -1075,10 +1040,9 @@ static inline void __free_one_page(struct page *page,
 								migratetype);
 			return;
 		}
-		buddy_pfn = __find_buddy_pfn(pfn, order);
-		buddy = page + (buddy_pfn - pfn);

-		if (!page_is_buddy(page, buddy, order))
+		buddy = find_buddy_page_pfn(page, pfn, order, &buddy_pfn);
+		if (!buddy)
 			goto done_merging;

 		if (unlikely(order >= pageblock_order)) {
diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index f67c4c70f17f..ff0ea6308299 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -70,7 +70,6 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	unsigned long flags, nr_pages;
 	bool isolated_page = false;
 	unsigned int order;
-	unsigned long pfn, buddy_pfn;
 	struct page *buddy;

 	zone = page_zone(page);
@@ -89,11 +88,9 @@ static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
 	if (PageBuddy(page)) {
 		order = buddy_order(page);
 		if (order >= pageblock_order && order < MAX_ORDER - 1) {
-			pfn = page_to_pfn(page);
-			buddy_pfn = __find_buddy_pfn(pfn, order);
-			buddy = page + (buddy_pfn - pfn);
-
-			if (!is_migrate_isolate_page(buddy)) {
+			buddy = find_buddy_page_pfn(page, page_to_pfn(page),
+						    order, NULL);
+			if (buddy && !is_migrate_isolate_page(buddy)) {
 				isolated_page = !!__isolate_free_page(page, order);
 				/*
 				 * Isolating a free page in an isolated pageblock
-- 
2.35.1


--
Best Regards,
Yan, Zi

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 854 bytes --]

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-04-06  1:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-04 21:50 + mm-wrap-__find_buddy_pfn-with-a-necessary-buddy-page-validation.patch added to -mm tree Andrew Morton
2022-04-05 10:29 ` David Hildenbrand
2022-04-05 10:42   ` Vlastimil Babka
2022-04-05 10:52     ` David Hildenbrand
2022-04-05 11:06       ` David Hildenbrand
2022-04-05 14:27         ` Zi Yan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.