linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] RESEND fix page_cache_next/prev_miss off by one error
@ 2023-06-02 22:57 Mike Kravetz
  2023-06-02 22:57 ` [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one Mike Kravetz
  2023-06-03  0:55 ` [PATCH 0/1] RESEND fix page_cache_next/prev_miss off by one error Andrew Morton
  0 siblings, 2 replies; 10+ messages in thread
From: Mike Kravetz @ 2023-06-02 22:57 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Matthew Wilcox, Ackerley Tng, Sidhartha Kumar, Muchun Song,
	vannapurve, erdemaktas, Andrew Morton, Mike Kravetz

In commits d0ce0e47b323 and 91a2fb956ad99, hugetlb code was changed to
use page_cache_next_miss to determine if a page was present in the page
cache.  However, the current implementation of page_cache_next_miss will
always return the passed index if max_scan is 1 as in the hugetlb code.
As a result, hugetlb code will always thing a page is present in the
cache, even if that is not the case.

The patch which follows addresses the issue by changing the implementation
of page_cache_next_miss and for consistency page_cache_prev_miss.  Since
such a patch also impacts the readahead code, I would suggest using the
patch by Sidhartha Kumar [1] to fix the issue in 6.3 and this patch moving
forward.

If we would rather not modify page_cache_next/prev_miss, then a new
interface as suggested by Ackerley Tng [2] could also be used.

Comments on the best way to fix moving forward would be appreciated.

[1] https://lore.kernel.org/linux-mm/20230505185301.534259-1-sidhartha.kumar@oracle.com/
[2] https://lore.kernel.org/linux-mm/98624c2f481966492b4eb8272aef747790229b73.1683069252.git.ackerleytng@google.com/

Mike Kravetz (1):
  page cache: fix page_cache_next/prev_miss off by one

 mm/filemap.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one
  2023-06-02 22:57 [PATCH 0/1] RESEND fix page_cache_next/prev_miss off by one error Mike Kravetz
@ 2023-06-02 22:57 ` Mike Kravetz
  2023-06-03  0:59   ` Andrew Morton
  2023-06-05 17:26   ` Ackerley Tng
  2023-06-03  0:55 ` [PATCH 0/1] RESEND fix page_cache_next/prev_miss off by one error Andrew Morton
  1 sibling, 2 replies; 10+ messages in thread
From: Mike Kravetz @ 2023-06-02 22:57 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Matthew Wilcox, Ackerley Tng, Sidhartha Kumar, Muchun Song,
	vannapurve, erdemaktas, Andrew Morton, Mike Kravetz

Ackerley Tng reported an issue with hugetlbfs fallocate here[1].  The
issue showed up after the conversion of hugetlb page cache lookup code
to use page_cache_next_miss.  Code in hugetlb fallocate, userfaultfd
and GUP is now using page_cache_next_miss to determine if a page is
present the page cache.  The following statement is used.

	present = page_cache_next_miss(mapping, index, 1) != index;

There are two issues with page_cache_next_miss when used in this way.
1) If the passed value for index is equal to the 'wrap-around' value,
   the same index will always be returned.  This wrap-around value is 0,
   so 0 will be returned even if page is present at index 0.
2) If there is no gap in the range passed, the last index in the range
   will be returned.  When passed a range of 1 as above, the passed
   index value will be returned even if the page is present.
The end result is the statement above will NEVER indicate a page is
present in the cache, even if it is.

As noted by Ackerley in [1], users can see this by hugetlb fallocate
incorrectly returning EEXIST if pages are already present in the file.
In addition, hugetlb pages will not be included in core dumps if they
need to be brought in via GUP.  userfaultfd UFFDIO_COPY also uses this
code and will not notice pages already present in the cache.  It may try
to allocate a new page and potentially return ENOMEM as opposed to
EEXIST.

Both page_cache_next_miss and page_cache_prev_miss have similar issues.
Fix by:
- Check for index equal to 'wrap-around' value and do not exit early.
- If no gap is found in range, return index outside range.
- Update function description to say 'wrap-around' value could be
  returned if passed as index.

[1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/

Reported-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 mm/filemap.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 71dc90f64e43..123540c7ba45 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1733,7 +1733,9 @@ bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
  *
  * Return: The index of the gap if found, otherwise an index outside the
  * range specified (in which case 'return - index >= max_scan' will be true).
- * In the rare case of index wrap-around, 0 will be returned.
+ * In the rare case of index wrap-around, 0 will be returned.  0 will also
+ * be returned if index == 0 and there is a gap at the index.  We can not
+ * wrap-around if passed index == 0.
  */
 pgoff_t page_cache_next_miss(struct address_space *mapping,
 			     pgoff_t index, unsigned long max_scan)
@@ -1743,12 +1745,13 @@ pgoff_t page_cache_next_miss(struct address_space *mapping,
 	while (max_scan--) {
 		void *entry = xas_next(&xas);
 		if (!entry || xa_is_value(entry))
-			break;
-		if (xas.xa_index == 0)
-			break;
+			return xas.xa_index;
+		if (xas.xa_index == 0 && index != 0)
+			return xas.xa_index;
 	}
 
-	return xas.xa_index;
+	/* No gaps in range and no wrap-around, return index beyond range */
+	return xas.xa_index + 1;
 }
 EXPORT_SYMBOL(page_cache_next_miss);
 
@@ -1769,7 +1772,9 @@ EXPORT_SYMBOL(page_cache_next_miss);
  *
  * Return: The index of the gap if found, otherwise an index outside the
  * range specified (in which case 'index - return >= max_scan' will be true).
- * In the rare case of wrap-around, ULONG_MAX will be returned.
+ * In the rare case of wrap-around, ULONG_MAX will be returned.  ULONG_MAX
+ * will also be returned if index == ULONG_MAX and there is a gap at the
+ * index.  We can not wrap-around if passed index == ULONG_MAX.
  */
 pgoff_t page_cache_prev_miss(struct address_space *mapping,
 			     pgoff_t index, unsigned long max_scan)
@@ -1779,12 +1784,13 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
 	while (max_scan--) {
 		void *entry = xas_prev(&xas);
 		if (!entry || xa_is_value(entry))
-			break;
-		if (xas.xa_index == ULONG_MAX)
-			break;
+			return xas.xa_index;
+		if (xas.xa_index == ULONG_MAX && index != ULONG_MAX)
+			return xas.xa_index;
 	}
 
-	return xas.xa_index;
+	/* No gaps in range and no wrap-around, return index beyond range */
+	return xas.xa_index - 1;
 }
 EXPORT_SYMBOL(page_cache_prev_miss);
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/1] RESEND fix page_cache_next/prev_miss off by one error
  2023-06-02 22:57 [PATCH 0/1] RESEND fix page_cache_next/prev_miss off by one error Mike Kravetz
  2023-06-02 22:57 ` [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one Mike Kravetz
@ 2023-06-03  0:55 ` Andrew Morton
  2023-06-03  2:22   ` Mike Kravetz
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2023-06-03  0:55 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-kernel, linux-mm, linux-fsdevel, Matthew Wilcox,
	Ackerley Tng, Sidhartha Kumar, Muchun Song, vannapurve,
	erdemaktas, stable, Greg Kroah-Hartman

On Fri,  2 Jun 2023 15:57:46 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> In commits d0ce0e47b323 and 91a2fb956ad99, hugetlb code was changed to
> use page_cache_next_miss to determine if a page was present in the page
> cache.  However, the current implementation of page_cache_next_miss will
> always return the passed index if max_scan is 1 as in the hugetlb code.
> As a result, hugetlb code will always thing a page is present in the
> cache, even if that is not the case.
> 
> The patch which follows addresses the issue by changing the implementation
> of page_cache_next_miss and for consistency page_cache_prev_miss.  Since
> such a patch also impacts the readahead code, I would suggest using the
> patch by Sidhartha Kumar [1] to fix the issue in 6.3 and this patch moving
> forward.

Well this is tricky.

This patch applies cleanly to 6.3, so if we add cc:stable to this
patch, it will get backported, against your suggestion.

Sidhartha's patch [1] (which you recommend for -stable) is quite
different from this patch.  And Sidhartha's patch has no route to being
tested in linux-next nor to being merged by Linus.

So problems.  The preferable approach is to just backport this patch
into -stable in the usual fashion.  What are the risks in doing this?

> If we would rather not modify page_cache_next/prev_miss, then a new
> interface as suggested by Ackerley Tng [2] could also be used.
> 
> Comments on the best way to fix moving forward would be appreciated.
> 
> [1] https://lore.kernel.org/linux-mm/20230505185301.534259-1-sidhartha.kumar@oracle.com/
> [2] https://lore.kernel.org/linux-mm/98624c2f481966492b4eb8272aef747790229b73.1683069252.git.ackerleytng@google.com/
> 
> Mike Kravetz (1):
>   page cache: fix page_cache_next/prev_miss off by one
> 
>  mm/filemap.c | 26 ++++++++++++++++----------
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one
  2023-06-02 22:57 ` [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one Mike Kravetz
@ 2023-06-03  0:59   ` Andrew Morton
  2023-06-03  2:24     ` Mike Kravetz
  2023-06-05 17:26   ` Ackerley Tng
  1 sibling, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2023-06-03  0:59 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-kernel, linux-mm, linux-fsdevel, Matthew Wilcox,
	Ackerley Tng, Sidhartha Kumar, Muchun Song, vannapurve,
	erdemaktas

On Fri,  2 Jun 2023 15:57:47 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:

> Ackerley Tng reported an issue with hugetlbfs fallocate here[1].  The
> issue showed up after the conversion of hugetlb page cache lookup code
> to use page_cache_next_miss.

So I'm assuming

Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")

?

> Code in hugetlb fallocate, userfaultfd
> and GUP is now using page_cache_next_miss to determine if a page is
> present the page cache.  The following statement is used.
> 
> 	present = page_cache_next_miss(mapping, index, 1) != index;
> 
> There are two issues with page_cache_next_miss when used in this way.
> 1) If the passed value for index is equal to the 'wrap-around' value,
>    the same index will always be returned.  This wrap-around value is 0,
>    so 0 will be returned even if page is present at index 0.
> 2) If there is no gap in the range passed, the last index in the range
>    will be returned.  When passed a range of 1 as above, the passed
>    index value will be returned even if the page is present.
> The end result is the statement above will NEVER indicate a page is
> present in the cache, even if it is.
> 
> As noted by Ackerley in [1], users can see this by hugetlb fallocate
> incorrectly returning EEXIST if pages are already present in the file.
> In addition, hugetlb pages will not be included in core dumps if they
> need to be brought in via GUP.  userfaultfd UFFDIO_COPY also uses this
> code and will not notice pages already present in the cache.  It may try
> to allocate a new page and potentially return ENOMEM as opposed to
> EEXIST.
> 
> Both page_cache_next_miss and page_cache_prev_miss have similar issues.
> Fix by:
> - Check for index equal to 'wrap-around' value and do not exit early.
> - If no gap is found in range, return index outside range.
> - Update function description to say 'wrap-around' value could be
>   returned if passed as index.
> 
> [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/1] RESEND fix page_cache_next/prev_miss off by one error
  2023-06-03  0:55 ` [PATCH 0/1] RESEND fix page_cache_next/prev_miss off by one error Andrew Morton
@ 2023-06-03  2:22   ` Mike Kravetz
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Kravetz @ 2023-06-03  2:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, linux-fsdevel, Matthew Wilcox,
	Ackerley Tng, Sidhartha Kumar, Muchun Song, vannapurve,
	erdemaktas, stable, Greg Kroah-Hartman

On 06/02/23 17:55, Andrew Morton wrote:
> On Fri,  2 Jun 2023 15:57:46 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
> > In commits d0ce0e47b323 and 91a2fb956ad99, hugetlb code was changed to
> > use page_cache_next_miss to determine if a page was present in the page
> > cache.  However, the current implementation of page_cache_next_miss will
> > always return the passed index if max_scan is 1 as in the hugetlb code.
> > As a result, hugetlb code will always thing a page is present in the
> > cache, even if that is not the case.
> > 
> > The patch which follows addresses the issue by changing the implementation
> > of page_cache_next_miss and for consistency page_cache_prev_miss.  Since
> > such a patch also impacts the readahead code, I would suggest using the
> > patch by Sidhartha Kumar [1] to fix the issue in 6.3 and this patch moving
> > forward.
> 
> Well this is tricky.
> 
> This patch applies cleanly to 6.3, so if we add cc:stable to this
> patch, it will get backported, against your suggestion.
> 
> Sidhartha's patch [1] (which you recommend for -stable) is quite
> different from this patch.  And Sidhartha's patch has no route to being
> tested in linux-next nor to being merged by Linus.
> 
> So problems.  The preferable approach is to just backport this patch
> into -stable in the usual fashion.  What are the risks in doing this?

Really hoping to get some comments from Matthew on this.

The only other user is the readahead code and I have little
experience/knowledge there.

Unless I totally screwed up the code, page_cache_next/prev_miss will now
correctly indicate the lack of a page in the cache in these edge cases.
Since readahead is more about performance than correctness (not trying
to minimize), the risk should be small.
-- 
Mike Kravetz
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one
  2023-06-03  0:59   ` Andrew Morton
@ 2023-06-03  2:24     ` Mike Kravetz
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Kravetz @ 2023-06-03  2:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, linux-fsdevel, Matthew Wilcox,
	Ackerley Tng, Sidhartha Kumar, Muchun Song, vannapurve,
	erdemaktas

On 06/02/23 17:59, Andrew Morton wrote:
> On Fri,  2 Jun 2023 15:57:47 -0700 Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
> > Ackerley Tng reported an issue with hugetlbfs fallocate here[1].  The
> > issue showed up after the conversion of hugetlb page cache lookup code
> > to use page_cache_next_miss.
> 
> So I'm assuming
> 
> Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
> 

Yes, that would be preferred.

I originally had Fixes: 0d3f92966629 ("page cache: Convert hole search to
XArray") where page_cache_next/prev_miss were introduced.  But, there is
no issue until the new hugetlb usage.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one
  2023-06-02 22:57 ` [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one Mike Kravetz
  2023-06-03  0:59   ` Andrew Morton
@ 2023-06-05 17:26   ` Ackerley Tng
  2023-06-06 22:41     ` Mike Kravetz
  1 sibling, 1 reply; 10+ messages in thread
From: Ackerley Tng @ 2023-06-05 17:26 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-kernel, linux-mm, linux-fsdevel, willy, sidhartha.kumar,
	songmuchun, vannapurve, erdemaktas, akpm, mike.kravetz

Mike Kravetz <mike.kravetz@oracle.com> writes:

> diff --git a/mm/filemap.c b/mm/filemap.c
> index 71dc90f64e43..123540c7ba45 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -1733,7 +1733,9 @@ bool __folio_lock_or_retry(struct folio *folio,  
> struct mm_struct *mm,
>    *
>    * Return: The index of the gap if found, otherwise an index outside the
>    * range specified (in which case 'return - index >= max_scan' will be  
> true).
> - * In the rare case of index wrap-around, 0 will be returned.
> + * In the rare case of index wrap-around, 0 will be returned.  0 will  
> also
> + * be returned if index == 0 and there is a gap at the index.  We can not
> + * wrap-around if passed index == 0.
>    */
>   pgoff_t page_cache_next_miss(struct address_space *mapping,
>   			     pgoff_t index, unsigned long max_scan)
> @@ -1743,12 +1745,13 @@ pgoff_t page_cache_next_miss(struct address_space  
> *mapping,
>   	while (max_scan--) {
>   		void *entry = xas_next(&xas);
>   		if (!entry || xa_is_value(entry))
> -			break;
> -		if (xas.xa_index == 0)
> -			break;
> +			return xas.xa_index;
> +		if (xas.xa_index == 0 && index != 0)
> +			return xas.xa_index;
>   	}

> -	return xas.xa_index;
> +	/* No gaps in range and no wrap-around, return index beyond range */
> +	return xas.xa_index + 1;
>   }
>   EXPORT_SYMBOL(page_cache_next_miss);


This doesn't seem to work as expected:

Here's a test I did

/* Modified so I can pass in an xarray for this test */
static unsigned long page_cache_next_miss(struct xarray *xa, unsigned long  
index,
					  unsigned long max_scan)
{
	XA_STATE(xas, xa, index);

	while (max_scan--) {
		void *entry = xas_next(&xas);
		if (!entry || xa_is_value(entry))
			return xas.xa_index;
		if (xas.xa_index == 0 && index != 0)
			return xas.xa_index;
	}

	return xas.xa_index + 1;
}

static noinline void check_find_5(void)
{
	struct xarray xa;
	unsigned long max_scan;
	void *ptr = malloc(10);

	xa_init(&xa);
	xa_store_range(&xa, 3, 5, ptr, GFP_KERNEL);

	max_scan = 3;
	printk("page_cache_next_miss(xa, %d, %ld): %ld\n", 4, max_scan,
	       page_cache_next_miss(&xa, 4, max_scan));

}

The above gave me: page_cache_next_miss(xa, 4, 3): 7

But I was expecting a return value of 6.

I investigated a little, and it seems like entry at index 6 if we start
iterating before 6 is 0xe, and xa_is_internal(entry) returns true.

Not yet familiar with the internals of xarrays, not sure what the fix
should be.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one
  2023-06-05 17:26   ` Ackerley Tng
@ 2023-06-06 22:41     ` Mike Kravetz
  2023-06-06 23:35       ` Ackerley Tng
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Kravetz @ 2023-06-06 22:41 UTC (permalink / raw)
  To: Ackerley Tng
  Cc: linux-kernel, linux-mm, linux-fsdevel, willy, sidhartha.kumar,
	songmuchun, vannapurve, erdemaktas, akpm

On 06/05/23 17:26, Ackerley Tng wrote:
> Mike Kravetz <mike.kravetz@oracle.com> writes:
> 
> This doesn't seem to work as expected:
> 
> Here's a test I did
> 
> /* Modified so I can pass in an xarray for this test */
> static unsigned long page_cache_next_miss(struct xarray *xa, unsigned long
> index,
> 					  unsigned long max_scan)
> {
> 	XA_STATE(xas, xa, index);
> 
> 	while (max_scan--) {
> 		void *entry = xas_next(&xas);
> 		if (!entry || xa_is_value(entry))
> 			return xas.xa_index;
> 		if (xas.xa_index == 0 && index != 0)
> 			return xas.xa_index;
> 	}
> 
> 	return xas.xa_index + 1;
> }
> 
> static noinline void check_find_5(void)
> {
> 	struct xarray xa;
> 	unsigned long max_scan;
> 	void *ptr = malloc(10);
> 
> 	xa_init(&xa);
> 	xa_store_range(&xa, 3, 5, ptr, GFP_KERNEL);
> 
> 	max_scan = 3;
> 	printk("page_cache_next_miss(xa, %d, %ld): %ld\n", 4, max_scan,
> 	       page_cache_next_miss(&xa, 4, max_scan));
> 
> }
> 
> The above gave me: page_cache_next_miss(xa, 4, 3): 7
> 
> But I was expecting a return value of 6.
> 
> I investigated a little, and it seems like entry at index 6 if we start
> iterating before 6 is 0xe, and xa_is_internal(entry) returns true.
> 
> Not yet familiar with the internals of xarrays, not sure what the fix
> should be.

I am NOT an expert with xarray.  However, the documentation says:

"Calling xa_store_range() stores the same entry in a range
 of indices.  If you do this, some of the other operations will behave
 in a slightly odd way.  For example, marking the entry at one index
 may result in the entry being marked at some, but not all of the other
 indices.  Storing into one index may result in the entry retrieved by
 some, but not all of the other indices changing."

This may be why your test is not functioning as expected?  I modified
your check_find_5() routine as follows (within lib/test_xarray.c):

static noinline void check_find_5(struct xarray *xa, bool mult)
{
	unsigned long max_scan;
	void *p = &max_scan;

	XA_BUG_ON(xa, !xa_empty(xa));

	if (mult) {
		xa_store(xa, 3, p, GFP_KERNEL);
		xa_store(xa, 4, p, GFP_KERNEL);
		xa_store(xa, 5, p, GFP_KERNEL);
	} else {
		xa_store_range(xa, 3, 5, p, GFP_KERNEL);
	}

	max_scan = 3;
	if (mult)
		printk("---> multiple stores\n");
	else
		printk("---> range store\n");

	printk("page_cache_next_miss(xa, %d, %ld): %ld\n", 4, max_scan,
		__page_cache_next_miss(xa, 4, max_scan));

	if (mult) {
		xa_store(xa, 3, NULL, GFP_KERNEL);
		xa_store(xa, 4, NULL, GFP_KERNEL);
		xa_store(xa, 5, NULL, GFP_KERNEL);
	} else {
		xa_store_range(xa, 3, 5, NULL, GFP_KERNEL);
	}

	xa_destroy(xa);
}

This results in:
[  149.998676] ---> multiple stores
[  149.999391] page_cache_next_miss(xa, 4, 3): 6
[  150.003342] ---> range store
[  150.007002] page_cache_next_miss(xa, 4, 3): 7

I am fairly confident the page cache code will make individual xa_store
calls as opposed to xa_store_range.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one
  2023-06-06 22:41     ` Mike Kravetz
@ 2023-06-06 23:35       ` Ackerley Tng
  0 siblings, 0 replies; 10+ messages in thread
From: Ackerley Tng @ 2023-06-06 23:35 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-kernel, linux-mm, linux-fsdevel, willy, sidhartha.kumar,
	songmuchun, vannapurve, erdemaktas, akpm

Mike Kravetz <mike.kravetz@oracle.com> writes:

> On 06/05/23 17:26, Ackerley Tng wrote:
>> Mike Kravetz <mike.kravetz@oracle.com> writes:

>> This doesn't seem to work as expected:

>> Here's a test I did

>>   ...

>> The above gave me: page_cache_next_miss(xa, 4, 3): 7

>> But I was expecting a return value of 6.

>> I investigated a little, and it seems like entry at index 6 if we start
>> iterating before 6 is 0xe, and xa_is_internal(entry) returns true.

>> Not yet familiar with the internals of xarrays, not sure what the fix
>> should be.

> I am NOT an expert with xarray.  However, the documentation says:

> "Calling xa_store_range() stores the same entry in a range
>   of indices.  If you do this, some of the other operations will behave
>   in a slightly odd way.  For example, marking the entry at one index
>   may result in the entry being marked at some, but not all of the other
>   indices.  Storing into one index may result in the entry retrieved by
>   some, but not all of the other indices changing."

> This may be why your test is not functioning as expected?  I modified
> your check_find_5() routine as follows (within lib/test_xarray.c):

> static noinline void check_find_5(struct xarray *xa, bool mult)
> {
> 	unsigned long max_scan;
> 	void *p = &max_scan;

> 	XA_BUG_ON(xa, !xa_empty(xa));

> 	if (mult) {
> 		xa_store(xa, 3, p, GFP_KERNEL);
> 		xa_store(xa, 4, p, GFP_KERNEL);
> 		xa_store(xa, 5, p, GFP_KERNEL);
> 	} else {
> 		xa_store_range(xa, 3, 5, p, GFP_KERNEL);
> 	}

> 	max_scan = 3;
> 	if (mult)
> 		printk("---> multiple stores\n");
> 	else
> 		printk("---> range store\n");

> 	printk("page_cache_next_miss(xa, %d, %ld): %ld\n", 4, max_scan,
> 		__page_cache_next_miss(xa, 4, max_scan));

> 	if (mult) {
> 		xa_store(xa, 3, NULL, GFP_KERNEL);
> 		xa_store(xa, 4, NULL, GFP_KERNEL);
> 		xa_store(xa, 5, NULL, GFP_KERNEL);
> 	} else {
> 		xa_store_range(xa, 3, 5, NULL, GFP_KERNEL);
> 	}

> 	xa_destroy(xa);
> }

> This results in:
> [  149.998676] ---> multiple stores
> [  149.999391] page_cache_next_miss(xa, 4, 3): 6
> [  150.003342] ---> range store
> [  150.007002] page_cache_next_miss(xa, 4, 3): 7

> I am fairly confident the page cache code will make individual xa_store
> calls as opposed to xa_store_range.

I tried this out with xa_store and a non-NULL pointer, and it works as
expected. Thanks!

I also checked that filemap/page_cache doesn't use xa_store_range(). It
only uses xas_store().

Reviewed-by: Ackerley Tng <ackerleytng@google.com>
Tested-by: Ackerley Tng <ackerleytng@google.com>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one
  2023-05-04 23:38 [PATCH 0/1] " Mike Kravetz
@ 2023-05-04 23:38 ` Mike Kravetz
  0 siblings, 0 replies; 10+ messages in thread
From: Mike Kravetz @ 2023-05-04 23:38 UTC (permalink / raw)
  To: linux-kernel, linux-mm, linux-fsdevel
  Cc: Matthew Wilcox, Ackerley Tng, Sidhartha Kumar, Muchun Song,
	vannapurve, erdemaktas, Andrew Morton, Mike Kravetz, stable

Ackerley Tng reported an issue with hugetlbfs fallocate here[1].  The
issue showed up after the conversion of hugetlb page cache lookup code
to use page_cache_next_miss.  Code in hugetlb fallocate, userfaultfd
and GUP is now using page_cache_next_miss to determine if a page is
present the page cache.  The following statement is used.

	present = page_cache_next_miss(mapping, index, 1) != index;

There are two issues with page_cache_next_miss when used in this way.
1) If the passed value for index is equal to the 'wrap-around' value,
   the same index will always be returned.  This wrap-around value is 0,
   so 0 will be returned even if page is present at index 0.
2) If there is no gap in the range passed, the last index in the range
   will be returned.  When passed a range of 1 as above, the passed
   index value will be returned even if the page is present.
The end result is the statement above will NEVER indicate a page is
present in the cache, even if it is.

As noted by Ackerley in [1], users can see this by hugetlb fallocate
incorrectly returning EEXIST if pages are already present in the file.
In addition, hugetlb pages will not be included in core dumps if they
need to be brought in via GUP.  userfaultfd UFFDIO_COPY also uses this
code and will not notice pages already present in the cache.  It may try
to allocate a new page and potentially return ENOMEM as opposed to
EEXIST.

Both page_cache_next_miss and page_cache_prev_miss have similar issues.
Fix by:
- Check for index equal to 'wrap-around' value and do not exit early.
- If no gap is found in range, return index outside range.
- Update function description to say 'wrap-around' value could be
  returned if passed as index.

Fixes: 0d3f92966629 ("page cache: Convert hole search to XArray")
Cc: <stable@vger.kernel.org>
Reported-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>

[1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
---
 mm/filemap.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index a34abfe8c654..60875d349a7b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1760,7 +1760,9 @@ bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
  *
  * Return: The index of the gap if found, otherwise an index outside the
  * range specified (in which case 'return - index >= max_scan' will be true).
- * In the rare case of index wrap-around, 0 will be returned.
+ * In the rare case of index wrap-around, 0 will be returned.  0 will also
+ * be returned if index == 0 and there is a gap at the index.  We can not
+ * wrap-around if passed index == 0.
  */
 pgoff_t page_cache_next_miss(struct address_space *mapping,
 			     pgoff_t index, unsigned long max_scan)
@@ -1770,12 +1772,13 @@ pgoff_t page_cache_next_miss(struct address_space *mapping,
 	while (max_scan--) {
 		void *entry = xas_next(&xas);
 		if (!entry || xa_is_value(entry))
-			break;
-		if (xas.xa_index == 0)
-			break;
+			return xas.xa_index;
+		if (xas.xa_index == 0 && index != 0)
+			return xas.xa_index;
 	}
 
-	return xas.xa_index;
+	/* No gaps in range and no wrap-around, return index beyond range */
+	return xas.xa_index + 1;
 }
 EXPORT_SYMBOL(page_cache_next_miss);
 
@@ -1796,7 +1799,9 @@ EXPORT_SYMBOL(page_cache_next_miss);
  *
  * Return: The index of the gap if found, otherwise an index outside the
  * range specified (in which case 'index - return >= max_scan' will be true).
- * In the rare case of wrap-around, ULONG_MAX will be returned.
+ * In the rare case of wrap-around, ULONG_MAX will be returned.  ULONG_MAX
+ * will also be returned if index == ULONG_MAX and there is a gap at the
+ * index.  We can not wrap-around if passed index == ULONG_MAX.
  */
 pgoff_t page_cache_prev_miss(struct address_space *mapping,
 			     pgoff_t index, unsigned long max_scan)
@@ -1806,12 +1811,13 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
 	while (max_scan--) {
 		void *entry = xas_prev(&xas);
 		if (!entry || xa_is_value(entry))
-			break;
-		if (xas.xa_index == ULONG_MAX)
-			break;
+			return xas.xa_index;
+		if (xas.xa_index == ULONG_MAX && index != ULONG_MAX)
+			return xas.xa_index;
 	}
 
-	return xas.xa_index;
+	/* No gaps in range and no wrap-around, return index beyond range */
+	return xas.xa_index - 1;
 }
 EXPORT_SYMBOL(page_cache_prev_miss);
 
-- 
2.40.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-06-06 23:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-02 22:57 [PATCH 0/1] RESEND fix page_cache_next/prev_miss off by one error Mike Kravetz
2023-06-02 22:57 ` [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one Mike Kravetz
2023-06-03  0:59   ` Andrew Morton
2023-06-03  2:24     ` Mike Kravetz
2023-06-05 17:26   ` Ackerley Tng
2023-06-06 22:41     ` Mike Kravetz
2023-06-06 23:35       ` Ackerley Tng
2023-06-03  0:55 ` [PATCH 0/1] RESEND fix page_cache_next/prev_miss off by one error Andrew Morton
2023-06-03  2:22   ` Mike Kravetz
  -- strict thread matches above, loose matches on Subject: below --
2023-05-04 23:38 [PATCH 0/1] " Mike Kravetz
2023-05-04 23:38 ` [PATCH 1/1] page cache: fix page_cache_next/prev_miss off by one Mike Kravetz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).