All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order
@ 2023-05-15 17:08 Tarun Sahu
  2023-05-15 17:15 ` Tarun Sahu
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Tarun Sahu @ 2023-05-15 17:08 UTC (permalink / raw)
  To: linux-mm
  Cc: akpm, muchun.song, mike.kravetz, aneesh.kumar, willy,
	sidhartha.kumar, gerald.schaefer, linux-kernel, jaypatel, tsahu

folio_set_order(folio, 0) is used in kernel at two places
__destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
Currently, It is called to clear out the folio->_folio_nr_pages and
folio->_folio_order.

For __destroy_compound_gigantic_folio:
In past, folio_set_order(folio, 0) was needed because page->mapping used
to overlap with _folio_nr_pages and _folio_order. So if these fields were
left uncleared during freeing gigantic hugepages, they were causing
"BUG: bad page state" due to non-zero page->mapping. Now, After
Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
CMA") page->mapping has explicitly been cleared out for tail pages. Also,
_folio_order and _folio_nr_pages no longer overlaps with page->mapping.

struct page {
...
   struct address_space * mapping;  /* 24     8 */
...
}

struct folio {
...
    union {
        struct {
        	long unsigned int _flags_1;      /* 64    8 */
        	long unsigned int _head_1;       /* 72    8 */
        	unsigned char _folio_dtor;       /* 80    1 */
        	unsigned char _folio_order;      /* 81    1 */

        	/* XXX 2 bytes hole, try to pack */

        	atomic_t   _entire_mapcount;     /* 84    4 */
        	atomic_t   _nr_pages_mapped;     /* 88    4 */
        	atomic_t   _pincount;            /* 92    4 */
        	unsigned int _folio_nr_pages;    /* 96    4 */
        };                                       /* 64   40 */
        struct page __page_1 __attribute__((__aligned__(8))); /* 64   64 */
    }
...
}

So, folio_set_order(folio, 0) can be removed from freeing gigantic
folio path (__destroy_compound_gigantic_folio).

Another place, folio_set_order(folio, 0) is called inside
__prep_compound_gigantic_folio during error path. Here,
folio_set_order(folio, 0) can also be removed if we move
folio_set_order(folio, order) after for loop.

The patch also moves _folio_set_head call in __prep_compound_gigantic_folio()
such that we avoid clearing them in the error path.

Also, as Mike pointed out:
"It would actually be better to move the calls _folio_set_head and
folio_set_order in __prep_compound_gigantic_folio() as suggested here. Why?
In the current code, the ref count on the 'head page' is still 1 (or more)
while those calls are made. So, someone could take a speculative ref on the
page BEFORE the tail pages are set up."

This way, folio_set_order(folio, 0) is no more needed. And it will also
helps removing the confusion of folio order being set to 0 (as _folio_order
field is part of first tail page).

Testing: I have run LTP tests, which all passes. and also I have written
the test in LTP which tests the bug caused by compound_nr and page->mapping
overlapping.

https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/hugetlb/hugemmap/hugemmap32.c

Running on older kernel ( < 5.10-rc7) with the above bug this fails while
on newer kernel and, also with this patch it passes.

Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com>
---
 mm/hugetlb.c  | 9 +++------
 mm/internal.h | 8 ++------
 2 files changed, 5 insertions(+), 12 deletions(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index f154019e6b84..607553445855 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio,
 			set_page_refcounted(p);
 	}
 
-	folio_set_order(folio, 0);
 	__folio_clear_head(folio);
 }
 
@@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
 	struct page *p;
 
 	__folio_clear_reserved(folio);
-	__folio_set_head(folio);
-	/* we rely on prep_new_hugetlb_folio to set the destructor */
-	folio_set_order(folio, order);
 	for (i = 0; i < nr_pages; i++) {
 		p = folio_page(folio, i);
 
@@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
 		if (i != 0)
 			set_compound_head(p, &folio->page);
 	}
+	__folio_set_head(folio);
+	/* we rely on prep_new_hugetlb_folio to set the destructor */
+	folio_set_order(folio, order);
 	atomic_set(&folio->_entire_mapcount, -1);
 	atomic_set(&folio->_nr_pages_mapped, 0);
 	atomic_set(&folio->_pincount, 0);
@@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
 		p = folio_page(folio, j);
 		__ClearPageReserved(p);
 	}
-	folio_set_order(folio, 0);
-	__folio_clear_head(folio);
 	return false;
 }
 
diff --git a/mm/internal.h b/mm/internal.h
index 68410c6d97ac..c59fe08c5b39 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -425,16 +425,12 @@ int split_free_page(struct page *free_page,
  */
 static inline void folio_set_order(struct folio *folio, unsigned int order)
 {
-	if (WARN_ON_ONCE(!folio_test_large(folio)))
+	if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
 		return;
 
 	folio->_folio_order = order;
 #ifdef CONFIG_64BIT
-	/*
-	 * When hugetlb dissolves a folio, we need to clear the tail
-	 * page, rather than setting nr_pages to 1.
-	 */
-	folio->_folio_nr_pages = order ? 1U << order : 0;
+	folio->_folio_nr_pages = 1U << order;
 #endif
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order
  2023-05-15 17:08 [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order Tarun Sahu
@ 2023-05-15 17:15 ` Tarun Sahu
  2023-05-15 17:16 ` Matthew Wilcox
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 10+ messages in thread
From: Tarun Sahu @ 2023-05-15 17:15 UTC (permalink / raw)
  To: linux-mm
  Cc: akpm, muchun.song, mike.kravetz, aneesh.kumar, willy,
	sidhartha.kumar, gerald.schaefer, linux-kernel, jaypatel


Changes from v1:
   - Changed the patch description. Added comment from Mike.

~Tarun

Tarun Sahu <tsahu@linux.ibm.com> writes:

> folio_set_order(folio, 0) is used in kernel at two places
> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
> Currently, It is called to clear out the folio->_folio_nr_pages and
> folio->_folio_order.
>
> For __destroy_compound_gigantic_folio:
> In past, folio_set_order(folio, 0) was needed because page->mapping used
> to overlap with _folio_nr_pages and _folio_order. So if these fields were
> left uncleared during freeing gigantic hugepages, they were causing
> "BUG: bad page state" due to non-zero page->mapping. Now, After
> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.
>
> struct page {
> ...
>    struct address_space * mapping;  /* 24     8 */
> ...
> }
>
> struct folio {
> ...
>     union {
>         struct {
>         	long unsigned int _flags_1;      /* 64    8 */
>         	long unsigned int _head_1;       /* 72    8 */
>         	unsigned char _folio_dtor;       /* 80    1 */
>         	unsigned char _folio_order;      /* 81    1 */
>
>         	/* XXX 2 bytes hole, try to pack */
>
>         	atomic_t   _entire_mapcount;     /* 84    4 */
>         	atomic_t   _nr_pages_mapped;     /* 88    4 */
>         	atomic_t   _pincount;            /* 92    4 */
>         	unsigned int _folio_nr_pages;    /* 96    4 */
>         };                                       /* 64   40 */
>         struct page __page_1 __attribute__((__aligned__(8))); /* 64   64 */
>     }
> ...
> }
>
> So, folio_set_order(folio, 0) can be removed from freeing gigantic
> folio path (__destroy_compound_gigantic_folio).
>
> Another place, folio_set_order(folio, 0) is called inside
> __prep_compound_gigantic_folio during error path. Here,
> folio_set_order(folio, 0) can also be removed if we move
> folio_set_order(folio, order) after for loop.
>
> The patch also moves _folio_set_head call in __prep_compound_gigantic_folio()
> such that we avoid clearing them in the error path.
>
> Also, as Mike pointed out:
> "It would actually be better to move the calls _folio_set_head and
> folio_set_order in __prep_compound_gigantic_folio() as suggested here. Why?
> In the current code, the ref count on the 'head page' is still 1 (or more)
> while those calls are made. So, someone could take a speculative ref on the
> page BEFORE the tail pages are set up."
>
> This way, folio_set_order(folio, 0) is no more needed. And it will also
> helps removing the confusion of folio order being set to 0 (as _folio_order
> field is part of first tail page).
>
> Testing: I have run LTP tests, which all passes. and also I have written
> the test in LTP which tests the bug caused by compound_nr and page->mapping
> overlapping.
>
> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/hugetlb/hugemmap/hugemmap32.c
>
> Running on older kernel ( < 5.10-rc7) with the above bug this fails while
> on newer kernel and, also with this patch it passes.
>
> Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com>
> ---
>  mm/hugetlb.c  | 9 +++------
>  mm/internal.h | 8 ++------
>  2 files changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f154019e6b84..607553445855 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio,
>  			set_page_refcounted(p);
>  	}
>  
> -	folio_set_order(folio, 0);
>  	__folio_clear_head(folio);
>  }
>  
> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  	struct page *p;
>  
>  	__folio_clear_reserved(folio);
> -	__folio_set_head(folio);
> -	/* we rely on prep_new_hugetlb_folio to set the destructor */
> -	folio_set_order(folio, order);
>  	for (i = 0; i < nr_pages; i++) {
>  		p = folio_page(folio, i);
>  
> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  		if (i != 0)
>  			set_compound_head(p, &folio->page);
>  	}
> +	__folio_set_head(folio);
> +	/* we rely on prep_new_hugetlb_folio to set the destructor */
> +	folio_set_order(folio, order);
>  	atomic_set(&folio->_entire_mapcount, -1);
>  	atomic_set(&folio->_nr_pages_mapped, 0);
>  	atomic_set(&folio->_pincount, 0);
> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  		p = folio_page(folio, j);
>  		__ClearPageReserved(p);
>  	}
> -	folio_set_order(folio, 0);
> -	__folio_clear_head(folio);
>  	return false;
>  }
>  
> diff --git a/mm/internal.h b/mm/internal.h
> index 68410c6d97ac..c59fe08c5b39 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page,
>   */
>  static inline void folio_set_order(struct folio *folio, unsigned int order)
>  {
> -	if (WARN_ON_ONCE(!folio_test_large(folio)))
> +	if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
>  		return;
>  
>  	folio->_folio_order = order;
>  #ifdef CONFIG_64BIT
> -	/*
> -	 * When hugetlb dissolves a folio, we need to clear the tail
> -	 * page, rather than setting nr_pages to 1.
> -	 */
> -	folio->_folio_nr_pages = order ? 1U << order : 0;
> +	folio->_folio_nr_pages = 1U << order;
>  #endif
>  }
>  
> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order
  2023-05-15 17:08 [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order Tarun Sahu
  2023-05-15 17:15 ` Tarun Sahu
@ 2023-05-15 17:16 ` Matthew Wilcox
  2023-05-15 17:45   ` Mike Kravetz
  2023-05-16 13:09   ` Tarun Sahu
  2023-05-22  5:49 ` Tarun Sahu
  2023-06-06 15:58 ` Mike Kravetz
  3 siblings, 2 replies; 10+ messages in thread
From: Matthew Wilcox @ 2023-05-15 17:16 UTC (permalink / raw)
  To: Tarun Sahu
  Cc: linux-mm, akpm, muchun.song, mike.kravetz, aneesh.kumar,
	sidhartha.kumar, gerald.schaefer, linux-kernel, jaypatel

On Mon, May 15, 2023 at 10:38:09PM +0530, Tarun Sahu wrote:
> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  	struct page *p;
>  
>  	__folio_clear_reserved(folio);
> -	__folio_set_head(folio);
> -	/* we rely on prep_new_hugetlb_folio to set the destructor */
> -	folio_set_order(folio, order);
>  	for (i = 0; i < nr_pages; i++) {
>  		p = folio_page(folio, i);
>  
> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  		if (i != 0)
>  			set_compound_head(p, &folio->page);
>  	}
> +	__folio_set_head(folio);
> +	/* we rely on prep_new_hugetlb_folio to set the destructor */
> +	folio_set_order(folio, order);

This makes me nervous, as I said before.  This means that
compound_head(tail) can temporarily point to a page which is not marked
as a head page.  That's different from prep_compound_page().  You need to
come up with some good argumentation for why this is safe, and no amount
of testing you do can replace it -- any race in this area will be subtle.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order
  2023-05-15 17:16 ` Matthew Wilcox
@ 2023-05-15 17:45   ` Mike Kravetz
  2023-06-03  0:08     ` Mike Kravetz
  2023-05-16 13:09   ` Tarun Sahu
  1 sibling, 1 reply; 10+ messages in thread
From: Mike Kravetz @ 2023-05-15 17:45 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Tarun Sahu, linux-mm, akpm, muchun.song, aneesh.kumar,
	sidhartha.kumar, gerald.schaefer, linux-kernel, jaypatel

On 05/15/23 18:16, Matthew Wilcox wrote:
> On Mon, May 15, 2023 at 10:38:09PM +0530, Tarun Sahu wrote:
> > @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> >  	struct page *p;
> >  
> >  	__folio_clear_reserved(folio);
> > -	__folio_set_head(folio);
> > -	/* we rely on prep_new_hugetlb_folio to set the destructor */
> > -	folio_set_order(folio, order);
> >  	for (i = 0; i < nr_pages; i++) {
> >  		p = folio_page(folio, i);
> >  
> > @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> >  		if (i != 0)
> >  			set_compound_head(p, &folio->page);
> >  	}
> > +	__folio_set_head(folio);
> > +	/* we rely on prep_new_hugetlb_folio to set the destructor */
> > +	folio_set_order(folio, order);
> 
> This makes me nervous, as I said before.  This means that
> compound_head(tail) can temporarily point to a page which is not marked
> as a head page.  That's different from prep_compound_page().  You need to
> come up with some good argumentation for why this is safe, and no amount
> of testing you do can replace it -- any race in this area will be subtle.

I added comments supporting this approach in the first version of the patch.
My argument was that this is actually safer than the existing code.  That is
because we freeze the page (ref count 0) before setting compound_head(tail).
So, nobody should be taking any speculative refs on those tail pages.

In the existing code, we set the compound page order in the head before
freezing the head or any tail pages.  Therefore, speculative refs can be
taken on any of the pages while in this state.

If we want prep_compound_gigantic_folio to work like prep_compound_page
we would need to take two passes through the pages.  In the first pass,
freeze all the pages and in the second set up the compound page.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order
  2023-05-15 17:16 ` Matthew Wilcox
  2023-05-15 17:45   ` Mike Kravetz
@ 2023-05-16 13:09   ` Tarun Sahu
  1 sibling, 0 replies; 10+ messages in thread
From: Tarun Sahu @ 2023-05-16 13:09 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, akpm, muchun.song, mike.kravetz, aneesh.kumar,
	sidhartha.kumar, gerald.schaefer, linux-kernel, jaypatel

Hi Mathew,

Matthew Wilcox <willy@infradead.org> writes:

> On Mon, May 15, 2023 at 10:38:09PM +0530, Tarun Sahu wrote:
>> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>>  	struct page *p;
>>  
>>  	__folio_clear_reserved(folio);
>> -	__folio_set_head(folio);
>> -	/* we rely on prep_new_hugetlb_folio to set the destructor */
>> -	folio_set_order(folio, order);
>>  	for (i = 0; i < nr_pages; i++) {
>>  		p = folio_page(folio, i);
>>  
>> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>>  		if (i != 0)
>>  			set_compound_head(p, &folio->page);
>>  	}
>> +	__folio_set_head(folio);
>> +	/* we rely on prep_new_hugetlb_folio to set the destructor */
>> +	folio_set_order(folio, order);
>
> This makes me nervous, as I said before.  This means that
> compound_head(tail) can temporarily point to a page which is not marked
> as a head page.  That's different from prep_compound_page().  You need to
> come up with some good argumentation for why this is safe, and no amount
> of testing you do can replace it -- any race in this area will be subtle.

IIUC, I am certain that it is safe to move these calls and agree with what
Mike said. Here is my reasoning:

When we get pages from CMA allocator for gigantic folio, page refcount
for each pages is 1.
page_cache_get_speculative (now folio_try_get_rcu) can take reference to
any of these pages before prep_compound_gigantic_folio explicitly freeze
refcount of these pages. With this race condition there are 2 possible situation.

...
		if (!demote) {
			if (!page_ref_freeze(p, 1)) {
				pr_warn("HugeTLB page can not be used due to unexpected inflated ref count\n");
				goto out_error;
			}
		} else {
			VM_BUG_ON_PAGE(page_count(p), p);
		}
		if (i != 0)
			set_compound_head(p, &folio->page);
	}
...

1. In the current code, before freezing refcount of nth (hence, n+th)
tail page, folio_try_get_rcu might try to take nth tail page reference,
so refcount will be increased of the nth tail page not the head page
(as compound head is not yet set for nth tail page). and once this
happens, nth iteration of loop will cause error and
prep_compound_gigantic_folio will fail.

So, setting the PG_head at the starting of for-loop or at the end won't
have any difference to this flow.

2. If reference for the head page is taken by folio_try_get_rcu before
freezing it, prep_compound_gigantic_page will fail, but before PG_head
and folio_order of head page is cleared in error path, the caller of
folio_try_get_rcu path will find that this page is head page and might
try to operate on its tail pages while these tail pages are invalid.

Hence, It will be safer if we call __folio_set_head and folio_set_order
after freezing the tail page refcount.

~Tarun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order
  2023-05-15 17:08 [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order Tarun Sahu
  2023-05-15 17:15 ` Tarun Sahu
  2023-05-15 17:16 ` Matthew Wilcox
@ 2023-05-22  5:49 ` Tarun Sahu
  2023-06-06 15:58 ` Mike Kravetz
  3 siblings, 0 replies; 10+ messages in thread
From: Tarun Sahu @ 2023-05-22  5:49 UTC (permalink / raw)
  To: linux-mm, Matthew Wilcox, mike.kravetz
  Cc: akpm, muchun.song, mike.kravetz, aneesh.kumar, willy,
	sidhartha.kumar, gerald.schaefer, linux-kernel, jaypatel

Hi,

This is a gentle reminder, please let me know, If any information or any
changes are needed from my end.

Thanks
Tarun

Tarun Sahu <tsahu@linux.ibm.com> writes:

> folio_set_order(folio, 0) is used in kernel at two places
> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
> Currently, It is called to clear out the folio->_folio_nr_pages and
> folio->_folio_order.
>
> For __destroy_compound_gigantic_folio:
> In past, folio_set_order(folio, 0) was needed because page->mapping used
> to overlap with _folio_nr_pages and _folio_order. So if these fields were
> left uncleared during freeing gigantic hugepages, they were causing
> "BUG: bad page state" due to non-zero page->mapping. Now, After
> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.
>
> struct page {
> ...
>    struct address_space * mapping;  /* 24     8 */
> ...
> }
>
> struct folio {
> ...
>     union {
>         struct {
>         	long unsigned int _flags_1;      /* 64    8 */
>         	long unsigned int _head_1;       /* 72    8 */
>         	unsigned char _folio_dtor;       /* 80    1 */
>         	unsigned char _folio_order;      /* 81    1 */
>
>         	/* XXX 2 bytes hole, try to pack */
>
>         	atomic_t   _entire_mapcount;     /* 84    4 */
>         	atomic_t   _nr_pages_mapped;     /* 88    4 */
>         	atomic_t   _pincount;            /* 92    4 */
>         	unsigned int _folio_nr_pages;    /* 96    4 */
>         };                                       /* 64   40 */
>         struct page __page_1 __attribute__((__aligned__(8))); /* 64   64 */
>     }
> ...
> }
>
> So, folio_set_order(folio, 0) can be removed from freeing gigantic
> folio path (__destroy_compound_gigantic_folio).
>
> Another place, folio_set_order(folio, 0) is called inside
> __prep_compound_gigantic_folio during error path. Here,
> folio_set_order(folio, 0) can also be removed if we move
> folio_set_order(folio, order) after for loop.
>
> The patch also moves _folio_set_head call in __prep_compound_gigantic_folio()
> such that we avoid clearing them in the error path.
>
> Also, as Mike pointed out:
> "It would actually be better to move the calls _folio_set_head and
> folio_set_order in __prep_compound_gigantic_folio() as suggested here. Why?
> In the current code, the ref count on the 'head page' is still 1 (or more)
> while those calls are made. So, someone could take a speculative ref on the
> page BEFORE the tail pages are set up."
>
> This way, folio_set_order(folio, 0) is no more needed. And it will also
> helps removing the confusion of folio order being set to 0 (as _folio_order
> field is part of first tail page).
>
> Testing: I have run LTP tests, which all passes. and also I have written
> the test in LTP which tests the bug caused by compound_nr and page->mapping
> overlapping.
>
> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/hugetlb/hugemmap/hugemmap32.c
>
> Running on older kernel ( < 5.10-rc7) with the above bug this fails while
> on newer kernel and, also with this patch it passes.
>
> Signed-off-by: Tarun Sahu <tsahu@linux.ibm.com>
> ---
>  mm/hugetlb.c  | 9 +++------
>  mm/internal.h | 8 ++------
>  2 files changed, 5 insertions(+), 12 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f154019e6b84..607553445855 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1489,7 +1489,6 @@ static void __destroy_compound_gigantic_folio(struct folio *folio,
>  			set_page_refcounted(p);
>  	}
>  
> -	folio_set_order(folio, 0);
>  	__folio_clear_head(folio);
>  }
>  
> @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  	struct page *p;
>  
>  	__folio_clear_reserved(folio);
> -	__folio_set_head(folio);
> -	/* we rely on prep_new_hugetlb_folio to set the destructor */
> -	folio_set_order(folio, order);
>  	for (i = 0; i < nr_pages; i++) {
>  		p = folio_page(folio, i);
>  
> @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  		if (i != 0)
>  			set_compound_head(p, &folio->page);
>  	}
> +	__folio_set_head(folio);
> +	/* we rely on prep_new_hugetlb_folio to set the destructor */
> +	folio_set_order(folio, order);
>  	atomic_set(&folio->_entire_mapcount, -1);
>  	atomic_set(&folio->_nr_pages_mapped, 0);
>  	atomic_set(&folio->_pincount, 0);
> @@ -2017,8 +2016,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  		p = folio_page(folio, j);
>  		__ClearPageReserved(p);
>  	}
> -	folio_set_order(folio, 0);
> -	__folio_clear_head(folio);
>  	return false;
>  }
>  
> diff --git a/mm/internal.h b/mm/internal.h
> index 68410c6d97ac..c59fe08c5b39 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -425,16 +425,12 @@ int split_free_page(struct page *free_page,
>   */
>  static inline void folio_set_order(struct folio *folio, unsigned int order)
>  {
> -	if (WARN_ON_ONCE(!folio_test_large(folio)))
> +	if (WARN_ON_ONCE(!order || !folio_test_large(folio)))
>  		return;
>  
>  	folio->_folio_order = order;
>  #ifdef CONFIG_64BIT
> -	/*
> -	 * When hugetlb dissolves a folio, we need to clear the tail
> -	 * page, rather than setting nr_pages to 1.
> -	 */
> -	folio->_folio_nr_pages = order ? 1U << order : 0;
> +	folio->_folio_nr_pages = 1U << order;
>  #endif
>  }
>  
> -- 
> 2.31.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order
  2023-05-15 17:45   ` Mike Kravetz
@ 2023-06-03  0:08     ` Mike Kravetz
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Kravetz @ 2023-06-03  0:08 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Tarun Sahu, linux-mm, akpm, muchun.song, aneesh.kumar,
	sidhartha.kumar, gerald.schaefer, linux-kernel, jaypatel

On 05/15/23 10:45, Mike Kravetz wrote:
> On 05/15/23 18:16, Matthew Wilcox wrote:
> > On Mon, May 15, 2023 at 10:38:09PM +0530, Tarun Sahu wrote:
> > > @@ -1951,9 +1950,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> > >  	struct page *p;
> > >  
> > >  	__folio_clear_reserved(folio);
> > > -	__folio_set_head(folio);
> > > -	/* we rely on prep_new_hugetlb_folio to set the destructor */
> > > -	folio_set_order(folio, order);
> > >  	for (i = 0; i < nr_pages; i++) {
> > >  		p = folio_page(folio, i);
> > >  
> > > @@ -1999,6 +1995,9 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> > >  		if (i != 0)
> > >  			set_compound_head(p, &folio->page);
> > >  	}
> > > +	__folio_set_head(folio);
> > > +	/* we rely on prep_new_hugetlb_folio to set the destructor */
> > > +	folio_set_order(folio, order);
> > 
> > This makes me nervous, as I said before.  This means that
> > compound_head(tail) can temporarily point to a page which is not marked
> > as a head page.  That's different from prep_compound_page().  You need to
> > come up with some good argumentation for why this is safe, and no amount
> > of testing you do can replace it -- any race in this area will be subtle.

We could continue to set up the head page first as in the current code,
but we need to move the freezing of that page outside the loop.  That is
better then the existing code, however I am not sure if it is any better
than what is proposed here.  I still believe my reasoning below as to
why this proposal is better then the existing code is correct.

Also, that 'folio_set_order(folio, 0)' only exists in the error path of
the current code.  I am not sure if it is actually needed.  Why?  Right
after returning an error, the pages associated with the gigantic page
will be freed.  This is similar to the reason why it can be removed in
__destroy_compound_gigantic_folio.

> I added comments supporting this approach in the first version of the patch.
> My argument was that this is actually safer than the existing code.  That is
> because we freeze the page (ref count 0) before setting compound_head(tail).
> So, nobody should be taking any speculative refs on those tail pages.
> 
> In the existing code, we set the compound page order in the head before
> freezing the head or any tail pages.  Therefore, speculative refs can be
> taken on any of the pages while in this state.
> 
> If we want prep_compound_gigantic_folio to work like prep_compound_page
> we would need to take two passes through the pages.  In the first pass,
> freeze all the pages and in the second set up the compound page.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order
  2023-05-15 17:08 [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order Tarun Sahu
                   ` (2 preceding siblings ...)
  2023-05-22  5:49 ` Tarun Sahu
@ 2023-06-06 15:58 ` Mike Kravetz
  2023-06-08 10:03   ` Tarun Sahu
  3 siblings, 1 reply; 10+ messages in thread
From: Mike Kravetz @ 2023-06-06 15:58 UTC (permalink / raw)
  To: Tarun Sahu
  Cc: linux-mm, akpm, muchun.song, aneesh.kumar, willy,
	sidhartha.kumar, gerald.schaefer, linux-kernel, jaypatel

On 06/06/23 10:32, Tarun Sahu wrote:
>                                        
> Hi Mike,              
>     
> Thanks for your inputs.                          
> I wanted to know if you find it okay, Can I send it again adding your Reviewed-by?

Hi Tarun,

Just a few more comments/questions.

On 05/15/23 22:38, Tarun Sahu wrote:
> folio_set_order(folio, 0) is used in kernel at two places
> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
> Currently, It is called to clear out the folio->_folio_nr_pages and
> folio->_folio_order.
> 
> For __destroy_compound_gigantic_folio:
> In past, folio_set_order(folio, 0) was needed because page->mapping used
> to overlap with _folio_nr_pages and _folio_order. So if these fields were
> left uncleared during freeing gigantic hugepages, they were causing
> "BUG: bad page state" due to non-zero page->mapping. Now, After
> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.

I believe the same logic/reasoning as above also applies to
__prep_compound_gigantic_folio.
Why?
In __prep_compound_gigantic_folio we only call folio_set_order(folio, 0)
in the case of error.  If __prep_compound_gigantic_folio fails, the caller
will then call free_gigantic_folio() on the "gigantic page".  However, it is
not really a gigantic  at this point in time, and we are simply calling
cma_release() or free_contig_range().
The end result is that I do not believe the existing call to
folio_set_order(folio, 0) in __prep_compound_gigantic_folio is actually
required.  ???

If my reasoning above is correct, then we could just have one patch to
remove the folio_set_order(folio, 0) calls and remove special casing for
order 0 in folio_set_order.

However, I still believe your restructuring of __prep_compound_gigantic_folio,
is of value.  I do not believe there is an issue as questioned by Matthew.  My
reasoning has been stated previously.  We could make changes like the following
to retain the same order of operations in __prep_compound_gigantic_folio and
totally avoid Matthew's question.  Totally untested.

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ea24718db4af..a54fee663cb1 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1950,10 +1950,8 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
 	int nr_pages = 1 << order;
 	struct page *p;
 
-	__folio_clear_reserved(folio);
-	__folio_set_head(folio);
 	/* we rely on prep_new_hugetlb_folio to set the destructor */
-	folio_set_order(folio, order);
+
 	for (i = 0; i < nr_pages; i++) {
 		p = folio_page(folio, i);
 
@@ -1969,7 +1967,7 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
 		 * on the head page when they need know if put_page() is needed
 		 * after get_user_pages().
 		 */
-		if (i != 0)	/* head page cleared above */
+		if (i != 0)	/* head page cleared below */
 			__ClearPageReserved(p);
 		/*
 		 * Subtle and very unlikely
@@ -1996,8 +1994,14 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
 		} else {
 			VM_BUG_ON_PAGE(page_count(p), p);
 		}
-		if (i != 0)
+
+		if (i == 0) {
+			__folio_clear_reserved(folio);
+			__folio_set_head(folio);
+			folio_set_order(folio, order);
+		} else {
 			set_compound_head(p, &folio->page);
+		}
 	}
 	atomic_set(&folio->_entire_mapcount, -1);
 	atomic_set(&folio->_nr_pages_mapped, 0);
@@ -2017,7 +2021,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
 		p = folio_page(folio, j);
 		__ClearPageReserved(p);
 	}
-	folio_set_order(folio, 0);
 	__folio_clear_head(folio);
 	return false;
 }


> 
> struct page {
> ...
>    struct address_space * mapping;  /* 24     8 */
> ...
> }
> 
> struct folio {
> ...
>     union {
>         struct {
>         	long unsigned int _flags_1;      /* 64    8 */
>         	long unsigned int _head_1;       /* 72    8 */
>         	unsigned char _folio_dtor;       /* 80    1 */
>         	unsigned char _folio_order;      /* 81    1 */
> 
>         	/* XXX 2 bytes hole, try to pack */
> 
>         	atomic_t   _entire_mapcount;     /* 84    4 */
>         	atomic_t   _nr_pages_mapped;     /* 88    4 */
>         	atomic_t   _pincount;            /* 92    4 */
>         	unsigned int _folio_nr_pages;    /* 96    4 */
>         };                                       /* 64   40 */
>         struct page __page_1 __attribute__((__aligned__(8))); /* 64   64 */
>     }
> ...
> }

I do not think the copy of page/folio definitions adds much value to the
commit message.

-- 
Mike Kravetz

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order
  2023-06-06 15:58 ` Mike Kravetz
@ 2023-06-08 10:03   ` Tarun Sahu
  2023-06-08 23:52     ` Mike Kravetz
  0 siblings, 1 reply; 10+ messages in thread
From: Tarun Sahu @ 2023-06-08 10:03 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, akpm, muchun.song, aneesh.kumar, willy,
	sidhartha.kumar, gerald.schaefer, linux-kernel, jaypatel

Hi Mike,

Please find my comments inline.

Mike Kravetz <mike.kravetz@oracle.com> writes:

> On 06/06/23 10:32, Tarun Sahu wrote:
>>                                        
>> Hi Mike,              
>>     
>> Thanks for your inputs.                          
>> I wanted to know if you find it okay, Can I send it again adding your Reviewed-by?
>
> Hi Tarun,
>
> Just a few more comments/questions.
>
> On 05/15/23 22:38, Tarun Sahu wrote:
>> folio_set_order(folio, 0) is used in kernel at two places
>> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
>> Currently, It is called to clear out the folio->_folio_nr_pages and
>> folio->_folio_order.
>> 
>> For __destroy_compound_gigantic_folio:
>> In past, folio_set_order(folio, 0) was needed because page->mapping used
>> to overlap with _folio_nr_pages and _folio_order. So if these fields were
>> left uncleared during freeing gigantic hugepages, they were causing
>> "BUG: bad page state" due to non-zero page->mapping. Now, After
>> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
>> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
>> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.
>
> I believe the same logic/reasoning as above also applies to
> __prep_compound_gigantic_folio.
> Why?
> In __prep_compound_gigantic_folio we only call folio_set_order(folio, 0)
> in the case of error.  If __prep_compound_gigantic_folio fails, the caller
> will then call free_gigantic_folio() on the "gigantic page".  However, it is
> not really a gigantic  at this point in time, and we are simply calling
> cma_release() or free_contig_range().
> The end result is that I do not believe the existing call to
> folio_set_order(folio, 0) in __prep_compound_gigantic_folio is actually
> required.  ???
No, there is a difference. IIUC, __destroy_compound_gigantic_folio
explicitly reset page->mapping for each page of compound page which
makes sure, even if in future some fields of struct page/folio overlaps
with page->mapping, it won't cause `BUG: bad page state` error. But If we
just remove folio_set_order(folio, 0) from __prep_compound_gigantic_folio
without moving folio_set_order(folio, order), this will cause extra
maintenance overhead to track if page->_folio_order overlaps with
page->mapping everytime struct page fields are changed. As in case of
overlapping page->mapping will be non-zero. IMHO, To avoid it,
moving the folio_set_order(folio, order) after all error checks are
done on tail pages. So, _folio_order is either set on success and not
set in case of error. (which is the original proposal). But for
folio_set_head, I agree the way you suggested below.

WDYT?

>
> If my reasoning above is correct, then we could just have one patch to
> remove the folio_set_order(folio, 0) calls and remove special casing for
> order 0 in folio_set_order.
>
> However, I still believe your restructuring of __prep_compound_gigantic_folio,
> is of value.  I do not believe there is an issue as questioned by Matthew.  My
> reasoning has been stated previously.  We could make changes like the following
> to retain the same order of operations in __prep_compound_gigantic_folio and
> totally avoid Matthew's question.  Totally untested.
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index ea24718db4af..a54fee663cb1 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1950,10 +1950,8 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  	int nr_pages = 1 << order;
>  	struct page *p;
>  
> -	__folio_clear_reserved(folio);
> -	__folio_set_head(folio);
>  	/* we rely on prep_new_hugetlb_folio to set the destructor */
> -	folio_set_order(folio, order);
> +
>  	for (i = 0; i < nr_pages; i++) {
>  		p = folio_page(folio, i);
>  
> @@ -1969,7 +1967,7 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  		 * on the head page when they need know if put_page() is needed
>  		 * after get_user_pages().
>  		 */
> -		if (i != 0)	/* head page cleared above */
> +		if (i != 0)	/* head page cleared below */
>  			__ClearPageReserved(p);
>  		/*
>  		 * Subtle and very unlikely
> @@ -1996,8 +1994,14 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  		} else {
>  			VM_BUG_ON_PAGE(page_count(p), p);
>  		}
> -		if (i != 0)
> +
> +		if (i == 0) {
> +			__folio_clear_reserved(folio);
> +			__folio_set_head(folio);
> +			folio_set_order(folio, order);
With folio_set_head, I agree to this, But does not feel good with
folio_set_order as per my above reasoning. WDYT?

> +		} else {
>  			set_compound_head(p, &folio->page);
> +		}
>  	}
>  	atomic_set(&folio->_entire_mapcount, -1);
>  	atomic_set(&folio->_nr_pages_mapped, 0);
> @@ -2017,7 +2021,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
>  		p = folio_page(folio, j);
>  		__ClearPageReserved(p);
>  	}
> -	folio_set_order(folio, 0);
>  	__folio_clear_head(folio);
>  	return false;
>  }
>
>
>> 
>> struct page {
>> ...
>>    struct address_space * mapping;  /* 24     8 */
>> ...
>> }
>> 
>> struct folio {
>> ...
>>     union {
>>         struct {
>>         	long unsigned int _flags_1;      /* 64    8 */
>>         	long unsigned int _head_1;       /* 72    8 */
>>         	unsigned char _folio_dtor;       /* 80    1 */
>>         	unsigned char _folio_order;      /* 81    1 */
>> 
>>         	/* XXX 2 bytes hole, try to pack */
>> 
>>         	atomic_t   _entire_mapcount;     /* 84    4 */
>>         	atomic_t   _nr_pages_mapped;     /* 88    4 */
>>         	atomic_t   _pincount;            /* 92    4 */
>>         	unsigned int _folio_nr_pages;    /* 96    4 */
>>         };                                       /* 64   40 */
>>         struct page __page_1 __attribute__((__aligned__(8))); /* 64   64 */
>>     }
>> ...
>> }
>
> I do not think the copy of page/folio definitions adds much value to the
> commit message.
Yeah, Will remove it.
>
> -- 
> Mike Kravetz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order
  2023-06-08 10:03   ` Tarun Sahu
@ 2023-06-08 23:52     ` Mike Kravetz
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Kravetz @ 2023-06-08 23:52 UTC (permalink / raw)
  To: Tarun Sahu
  Cc: linux-mm, akpm, muchun.song, aneesh.kumar, willy,
	sidhartha.kumar, gerald.schaefer, linux-kernel, jaypatel

On 06/08/23 15:33, Tarun Sahu wrote:
> Hi Mike,
> 
> Please find my comments inline.
> 
> Mike Kravetz <mike.kravetz@oracle.com> writes:
> 
> > On 06/06/23 10:32, Tarun Sahu wrote:
> >>                                        
> >> Hi Mike,              
> >>     
> >> Thanks for your inputs.                          
> >> I wanted to know if you find it okay, Can I send it again adding your Reviewed-by?
> >
> > Hi Tarun,
> >
> > Just a few more comments/questions.
> >
> > On 05/15/23 22:38, Tarun Sahu wrote:
> >> folio_set_order(folio, 0) is used in kernel at two places
> >> __destroy_compound_gigantic_folio and __prep_compound_gigantic_folio.
> >> Currently, It is called to clear out the folio->_folio_nr_pages and
> >> folio->_folio_order.
> >> 
> >> For __destroy_compound_gigantic_folio:
> >> In past, folio_set_order(folio, 0) was needed because page->mapping used
> >> to overlap with _folio_nr_pages and _folio_order. So if these fields were
> >> left uncleared during freeing gigantic hugepages, they were causing
> >> "BUG: bad page state" due to non-zero page->mapping. Now, After
> >> Commit a01f43901cfb ("hugetlb: be sure to free demoted CMA pages to
> >> CMA") page->mapping has explicitly been cleared out for tail pages. Also,
> >> _folio_order and _folio_nr_pages no longer overlaps with page->mapping.
> >
> > I believe the same logic/reasoning as above also applies to
> > __prep_compound_gigantic_folio.
> > Why?
> > In __prep_compound_gigantic_folio we only call folio_set_order(folio, 0)
> > in the case of error.  If __prep_compound_gigantic_folio fails, the caller
> > will then call free_gigantic_folio() on the "gigantic page".  However, it is
> > not really a gigantic  at this point in time, and we are simply calling
> > cma_release() or free_contig_range().
> > The end result is that I do not believe the existing call to
> > folio_set_order(folio, 0) in __prep_compound_gigantic_folio is actually
> > required.  ???
> No, there is a difference. IIUC, __destroy_compound_gigantic_folio
> explicitly reset page->mapping for each page of compound page which
> makes sure, even if in future some fields of struct page/folio overlaps
> with page->mapping, it won't cause `BUG: bad page state` error. But If we
> just remove folio_set_order(folio, 0) from __prep_compound_gigantic_folio
> without moving folio_set_order(folio, order), this will cause extra
> maintenance overhead to track if page->_folio_order overlaps with
> page->mapping everytime struct page fields are changed. As in case of
> overlapping page->mapping will be non-zero. IMHO, To avoid it,
> moving the folio_set_order(folio, order) after all error checks are
> done on tail pages. So, _folio_order is either set on success and not
> set in case of error. (which is the original proposal). But for
> folio_set_head, I agree the way you suggested below.
> 
> WDYT?

Right.  It is more 'future proof' to only set folio order on success as
done in your original patch.

> >
> > If my reasoning above is correct, then we could just have one patch to
> > remove the folio_set_order(folio, 0) calls and remove special casing for
> > order 0 in folio_set_order.
> >
> > However, I still believe your restructuring of __prep_compound_gigantic_folio,
> > is of value.  I do not believe there is an issue as questioned by Matthew.  My
> > reasoning has been stated previously.  We could make changes like the following
> > to retain the same order of operations in __prep_compound_gigantic_folio and
> > totally avoid Matthew's question.  Totally untested.
> >
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index ea24718db4af..a54fee663cb1 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1950,10 +1950,8 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> >  	int nr_pages = 1 << order;
> >  	struct page *p;
> >  
> > -	__folio_clear_reserved(folio);
> > -	__folio_set_head(folio);
> >  	/* we rely on prep_new_hugetlb_folio to set the destructor */
> > -	folio_set_order(folio, order);
> > +
> >  	for (i = 0; i < nr_pages; i++) {
> >  		p = folio_page(folio, i);
> >  
> > @@ -1969,7 +1967,7 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> >  		 * on the head page when they need know if put_page() is needed
> >  		 * after get_user_pages().
> >  		 */
> > -		if (i != 0)	/* head page cleared above */
> > +		if (i != 0)	/* head page cleared below */
> >  			__ClearPageReserved(p);
> >  		/*
> >  		 * Subtle and very unlikely
> > @@ -1996,8 +1994,14 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> >  		} else {
> >  			VM_BUG_ON_PAGE(page_count(p), p);
> >  		}
> > -		if (i != 0)
> > +
> > +		if (i == 0) {
> > +			__folio_clear_reserved(folio);
> > +			__folio_set_head(folio);
> > +			folio_set_order(folio, order);
> With folio_set_head, I agree to this, But does not feel good with
> folio_set_order as per my above reasoning. WDYT?

Agree with your reasoning.  We should just move __folio_set_head and
folio_set_order after the loop as you originally suggested.

> 
> > +		} else {
> >  			set_compound_head(p, &folio->page);
> > +		}
> >  	}
> >  	atomic_set(&folio->_entire_mapcount, -1);
> >  	atomic_set(&folio->_nr_pages_mapped, 0);
> > @@ -2017,7 +2021,6 @@ static bool __prep_compound_gigantic_folio(struct folio *folio,
> >  		p = folio_page(folio, j);
> >  		__ClearPageReserved(p);
> >  	}
> > -	folio_set_order(folio, 0);
> >  	__folio_clear_head(folio);
> >  	return false;
> >  }
> >
> >
> >> 
> >> struct page {
> >> ...
> >>    struct address_space * mapping;  /* 24     8 */
> >> ...
> >> }
> >> 
> >> struct folio {
> >> ...
> >>     union {
> >>         struct {
> >>         	long unsigned int _flags_1;      /* 64    8 */
> >>         	long unsigned int _head_1;       /* 72    8 */
> >>         	unsigned char _folio_dtor;       /* 80    1 */
> >>         	unsigned char _folio_order;      /* 81    1 */
> >> 
> >>         	/* XXX 2 bytes hole, try to pack */
> >> 
> >>         	atomic_t   _entire_mapcount;     /* 84    4 */
> >>         	atomic_t   _nr_pages_mapped;     /* 88    4 */
> >>         	atomic_t   _pincount;            /* 92    4 */
> >>         	unsigned int _folio_nr_pages;    /* 96    4 */
> >>         };                                       /* 64   40 */
> >>         struct page __page_1 __attribute__((__aligned__(8))); /* 64   64 */
> >>     }
> >> ...
> >> }
> >
> > I do not think the copy of page/folio definitions adds much value to the
> > commit message.
> Yeah, Will remove it.
> >

I think we are finally on the same page.  I am good with this v2 patch.
Only change is to update commit message to remove the definitions.  
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-06-08 23:53 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-15 17:08 [PATCH v2] mm/folio: Avoid special handling for order value 0 in folio_set_order Tarun Sahu
2023-05-15 17:15 ` Tarun Sahu
2023-05-15 17:16 ` Matthew Wilcox
2023-05-15 17:45   ` Mike Kravetz
2023-06-03  0:08     ` Mike Kravetz
2023-05-16 13:09   ` Tarun Sahu
2023-05-22  5:49 ` Tarun Sahu
2023-06-06 15:58 ` Mike Kravetz
2023-06-08 10:03   ` Tarun Sahu
2023-06-08 23:52     ` Mike Kravetz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.