linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: Re-allow pinning of zero pfns
@ 2022-06-10 22:35 Alex Williamson
  2022-06-11  0:21 ` Minchan Kim
  2022-06-11 18:29 ` David Hildenbrand
  0 siblings, 2 replies; 13+ messages in thread
From: Alex Williamson @ 2022-06-10 22:35 UTC (permalink / raw)
  To: akpm, minchan
  Cc: linux-mm, linux-kernel, paulmck, jhubbard, joaodias, jgg, david

The commit referenced below subtly and inadvertently changed the logic
to disallow pinning of zero pfns.  This breaks device assignment with
vfio and potentially various other users of gup.  Exclude the zero page
test from the negation.

Fixes: 1c563432588d ("mm: fix is_pinnable_page against a cma page")
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

At least I assume this was inadvertent...  If there's a better fix,
please run with it as I'm out of the office the 1st half of next
week and would like to see this fixed ASAP.  Thanks!

 include/linux/mm.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index bc8f326be0ce..781fae17177d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1600,7 +1600,7 @@ static inline bool is_pinnable_page(struct page *page)
 	if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE)
 		return false;
 #endif
-	return !(is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page)));
+	return !is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page));
 }
 #else
 static inline bool is_pinnable_page(struct page *page)



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Re-allow pinning of zero pfns
  2022-06-10 22:35 [PATCH] mm: Re-allow pinning of zero pfns Alex Williamson
@ 2022-06-11  0:21 ` Minchan Kim
  2022-06-11 18:29 ` David Hildenbrand
  1 sibling, 0 replies; 13+ messages in thread
From: Minchan Kim @ 2022-06-11  0:21 UTC (permalink / raw)
  To: Alex Williamson
  Cc: akpm, linux-mm, linux-kernel, paulmck, jhubbard, joaodias, jgg, david

On Fri, Jun 10, 2022 at 04:35:13PM -0600, Alex Williamson wrote:
> The commit referenced below subtly and inadvertently changed the logic
> to disallow pinning of zero pfns.  This breaks device assignment with
> vfio and potentially various other users of gup.  Exclude the zero page
> test from the negation.
> 
> Fixes: 1c563432588d ("mm: fix is_pinnable_page against a cma page")
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
> At least I assume this was inadvertent...  If there's a better fix,
> please run with it as I'm out of the office the 1st half of next
> week and would like to see this fixed ASAP.  Thanks!
> 
>  include/linux/mm.h |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bc8f326be0ce..781fae17177d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1600,7 +1600,7 @@ static inline bool is_pinnable_page(struct page *page)
>  	if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE)
>  		return false;
>  #endif
> -	return !(is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page)));
> +	return !is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page));

Thanks for catching!

I don't think zero pfn could stay in the movable zone or CMA area.

Acked-by: Minchan Kim <minchan@kernel.org>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Re-allow pinning of zero pfns
  2022-06-10 22:35 [PATCH] mm: Re-allow pinning of zero pfns Alex Williamson
  2022-06-11  0:21 ` Minchan Kim
@ 2022-06-11 18:29 ` David Hildenbrand
  2022-06-15 15:56   ` Jason Gunthorpe
  1 sibling, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2022-06-11 18:29 UTC (permalink / raw)
  To: Alex Williamson, akpm, minchan
  Cc: linux-mm, linux-kernel, paulmck, jhubbard, joaodias, jgg

On 11.06.22 00:35, Alex Williamson wrote:
> The commit referenced below subtly and inadvertently changed the logic
> to disallow pinning of zero pfns.  This breaks device assignment with
> vfio and potentially various other users of gup.  Exclude the zero page
> test from the negation.

I wonder which setups can reliably work with a long-term pin on a shared
zeropage. In a MAP_PRIVATE mapping, any write access via the page tables
will end up replacing the shared zeropage with an anonymous page.
Something similar should apply in MAP_SHARED mappings, when lazily
allocating disk blocks.

In the future, we might trigger unsharing when taking a R/O pin for the
shared zeropage, just like we do as of now upstream for shared anonymous
pages (!PageAnonExclusive). And something similar could then be done
when finding a !anon page in a MAP_SHARED mapping.

> 
> Fixes: 1c563432588d ("mm: fix is_pinnable_page against a cma page")

Having that said, it indeed looks like that was an unintended change.

Acked-by: David Hildenbrand <david@redhat.com>

> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
> At least I assume this was inadvertent...  If there's a better fix,
> please run with it as I'm out of the office the 1st half of next
> week and would like to see this fixed ASAP.  Thanks!
> 
>  include/linux/mm.h |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index bc8f326be0ce..781fae17177d 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1600,7 +1600,7 @@ static inline bool is_pinnable_page(struct page *page)
>  	if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE)
>  		return false;
>  #endif
> -	return !(is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page)));
> +	return !is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page));
>  }
>  #else
>  static inline bool is_pinnable_page(struct page *page)
> 
> 


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Re-allow pinning of zero pfns
  2022-06-11 18:29 ` David Hildenbrand
@ 2022-06-15 15:56   ` Jason Gunthorpe
  2022-06-23 18:07     ` David Hildenbrand
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2022-06-15 15:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Alex Williamson, akpm, minchan, linux-mm, linux-kernel, paulmck,
	jhubbard, joaodias

On Sat, Jun 11, 2022 at 08:29:47PM +0200, David Hildenbrand wrote:
> On 11.06.22 00:35, Alex Williamson wrote:
> > The commit referenced below subtly and inadvertently changed the logic
> > to disallow pinning of zero pfns.  This breaks device assignment with
> > vfio and potentially various other users of gup.  Exclude the zero page
> > test from the negation.
> 
> I wonder which setups can reliably work with a long-term pin on a shared
> zeropage. In a MAP_PRIVATE mapping, any write access via the page tables
> will end up replacing the shared zeropage with an anonymous page.
> Something similar should apply in MAP_SHARED mappings, when lazily
> allocating disk blocks.
> 
> In the future, we might trigger unsharing when taking a R/O pin for the
> shared zeropage, just like we do as of now upstream for shared anonymous
> pages (!PageAnonExclusive). And something similar could then be done
> when finding a !anon page in a MAP_SHARED mapping.

I'm also confused how qemu is hitting this and it isn't already a bug?

It is arising because vfio doesn't use FOLL_FORCE|FOLL_WRITE to move
away the zero page in most cases.

And why does Yishai say it causes an infinite loop in the kernel?

Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Re-allow pinning of zero pfns
  2022-06-15 15:56   ` Jason Gunthorpe
@ 2022-06-23 18:07     ` David Hildenbrand
  2022-06-23 20:21       ` Alex Williamson
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2022-06-23 18:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, akpm, minchan, linux-mm, linux-kernel, paulmck,
	jhubbard, joaodias

On 15.06.22 17:56, Jason Gunthorpe wrote:
> On Sat, Jun 11, 2022 at 08:29:47PM +0200, David Hildenbrand wrote:
>> On 11.06.22 00:35, Alex Williamson wrote:
>>> The commit referenced below subtly and inadvertently changed the logic
>>> to disallow pinning of zero pfns.  This breaks device assignment with
>>> vfio and potentially various other users of gup.  Exclude the zero page
>>> test from the negation.
>>
>> I wonder which setups can reliably work with a long-term pin on a shared
>> zeropage. In a MAP_PRIVATE mapping, any write access via the page tables
>> will end up replacing the shared zeropage with an anonymous page.
>> Something similar should apply in MAP_SHARED mappings, when lazily
>> allocating disk blocks.

^ correction, shared zeropage is never user in MAP_SHARED mappings
(fortunally).

>>
>> In the future, we might trigger unsharing when taking a R/O pin for the
>> shared zeropage, just like we do as of now upstream for shared anonymous
>> pages (!PageAnonExclusive). And something similar could then be done
>> when finding a !anon page in a MAP_SHARED mapping.
> 
> I'm also confused how qemu is hitting this and it isn't already a bug?
> 

I assume it's just some random thingy mapped into the guest physical
address space (by the bios? R/O?), that actually never ends up getting
used by a device.

So vfio simply only needs this to keep working ... but weon't actually
ever user that data.

But this is just my best guess after thinking about it.

> It is arising because vfio doesn't use FOLL_FORCE|FOLL_WRITE to move
> away the zero page in most cases.
> 
> And why does Yishai say it causes an infinite loop in the kernel?


Good question. Maybe $something keeps retying if pinning fails, either
in the kernel (which would be bad) or in user space. At least QEMU seems
to just fail if pinning fails, but maybe it's a different user space?


-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Re-allow pinning of zero pfns
  2022-06-23 18:07     ` David Hildenbrand
@ 2022-06-23 20:21       ` Alex Williamson
  2022-06-23 20:47         ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Alex Williamson @ 2022-06-23 20:21 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Jason Gunthorpe, akpm, minchan, linux-mm, linux-kernel, paulmck,
	jhubbard, joaodias

On Thu, 23 Jun 2022 20:07:14 +0200
David Hildenbrand <david@redhat.com> wrote:

> On 15.06.22 17:56, Jason Gunthorpe wrote:
> > On Sat, Jun 11, 2022 at 08:29:47PM +0200, David Hildenbrand wrote:  
> >> On 11.06.22 00:35, Alex Williamson wrote:  
> >>> The commit referenced below subtly and inadvertently changed the logic
> >>> to disallow pinning of zero pfns.  This breaks device assignment with
> >>> vfio and potentially various other users of gup.  Exclude the zero page
> >>> test from the negation.  
> >>
> >> I wonder which setups can reliably work with a long-term pin on a shared
> >> zeropage. In a MAP_PRIVATE mapping, any write access via the page tables
> >> will end up replacing the shared zeropage with an anonymous page.
> >> Something similar should apply in MAP_SHARED mappings, when lazily
> >> allocating disk blocks.  
> 
> ^ correction, shared zeropage is never user in MAP_SHARED mappings
> (fortunally).
> 
> >>
> >> In the future, we might trigger unsharing when taking a R/O pin for the
> >> shared zeropage, just like we do as of now upstream for shared anonymous
> >> pages (!PageAnonExclusive). And something similar could then be done
> >> when finding a !anon page in a MAP_SHARED mapping.  
> > 
> > I'm also confused how qemu is hitting this and it isn't already a bug?
> >   
> 
> I assume it's just some random thingy mapped into the guest physical
> address space (by the bios? R/O?), that actually never ends up getting
> used by a device.
> 
> So vfio simply only needs this to keep working ... but weon't actually
> ever user that data.
> 
> But this is just my best guess after thinking about it.

Good guess.

> > It is arising because vfio doesn't use FOLL_FORCE|FOLL_WRITE to move
> > away the zero page in most cases.
> > 
> > And why does Yishai say it causes an infinite loop in the kernel?  
> 
> 
> Good question. Maybe $something keeps retying if pinning fails, either
> in the kernel (which would be bad) or in user space. At least QEMU seems
> to just fail if pinning fails, but maybe it's a different user space?

The loop is in __gup_longterm_locked():

        do {
                rc = __get_user_pages_locked(mm, start, nr_pages, pages, vmas,
                                             NULL, gup_flags);
                if (rc <= 0)
                        break;
                rc = check_and_migrate_movable_pages(rc, pages, gup_flags);
        } while (!rc);

It appears we're pinning a 32 page (128K) range,
__get_user_pages_locked() returns 32, but
check_and_migrate_movable_pages() perpetually returns zero.  I believe
this is because folio_is_pinnable() previously returned true, and now
returns false.  Therefore we drop down to fail at folio_isolate_lru(),
incrementing isolation_error_count.  From there we do nothing more than
unpin the pages, return zero, and hope for better luck next time, which
obviously doesn't happen.

If I generate an errno here, QEMU reports failing on the pc.rom memory
region at 0xc0000.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Re-allow pinning of zero pfns
  2022-06-23 20:21       ` Alex Williamson
@ 2022-06-23 20:47         ` Jason Gunthorpe
  2022-06-24  0:11           ` Alistair Popple
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2022-06-23 20:47 UTC (permalink / raw)
  To: Alex Williamson, Alistair Popple
  Cc: David Hildenbrand, akpm, minchan, linux-mm, linux-kernel,
	paulmck, jhubbard, joaodias

On Thu, Jun 23, 2022 at 02:21:39PM -0600, Alex Williamson wrote:

> check_and_migrate_movable_pages() perpetually returns zero.  I believe
> this is because folio_is_pinnable() previously returned true, and now
> returns false.

Indeed, it is a bug that check_and_migrate_movable_pages() returns
0 when it didn't do anything. It should return an error code.

Hum.. Alistair, maybe you should look at this as well, I'm struggling
alot to understand how it is safe to drop the reference on the page
but hold a pointer to it on the movable_page_list - sure it was
isolated - but why does that mean it won't be concurrently unmapped
and freed?

Anyhow, it looks like the problem is the tortured logic in this
function, what do you think about this:

diff --git a/mm/gup.c b/mm/gup.c
index 5512644076246d..2ffcb3f4ff4a7b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1879,10 +1879,15 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
 					    unsigned int gup_flags)
 {
 	unsigned long isolation_error_count = 0, i;
+	struct migration_target_control mtc = {
+		.nid = NUMA_NO_NODE,
+		.gfp_mask = GFP_USER | __GFP_NOWARN,
+	};
 	struct folio *prev_folio = NULL;
 	LIST_HEAD(movable_page_list);
 	bool drain_allow = true;
-	int ret = 0;
+	int not_migrated;
+	int ret;
 
 	for (i = 0; i < nr_pages; i++) {
 		struct folio *folio = page_folio(pages[i]);
@@ -1919,16 +1924,13 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
 				    folio_nr_pages(folio));
 	}
 
-	if (!list_empty(&movable_page_list) || isolation_error_count)
-		goto unpin_pages;
-
 	/*
 	 * If list is empty, and no isolation errors, means that all pages are
-	 * in the correct zone.
+	 * in the correct zone, nothing to do.
 	 */
-	return nr_pages;
+	if (list_empty(&movable_page_list) && !isolation_error_count)
+		return nr_pages;
 
-unpin_pages:
 	if (gup_flags & FOLL_PIN) {
 		unpin_user_pages(pages, nr_pages);
 	} else {
@@ -1936,20 +1938,22 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
 			put_page(pages[i]);
 	}
 
-	if (!list_empty(&movable_page_list)) {
-		struct migration_target_control mtc = {
-			.nid = NUMA_NO_NODE,
-			.gfp_mask = GFP_USER | __GFP_NOWARN,
-		};
+	if (isolation_error_count) {
+		ret = -EINVAL;
+		goto err_putback;
+	}
 
-		ret = migrate_pages(&movable_page_list, alloc_migration_target,
-				    NULL, (unsigned long)&mtc, MIGRATE_SYNC,
-				    MR_LONGTERM_PIN, NULL);
-		if (ret > 0) /* number of pages not migrated */
-			ret = -ENOMEM;
+	not_migrated = migrate_pages(&movable_page_list, alloc_migration_target,
+				     NULL, (unsigned long)&mtc, MIGRATE_SYNC,
+				     MR_LONGTERM_PIN, NULL);
+	if (not_migrated > 0) {
+		ret = -ENOMEM;
+		goto err_putback;
 	}
+	return 0;
 
-	if (ret && !list_empty(&movable_page_list))
+err_putback:
+	if (!list_empty(&movable_page_list))
 		putback_movable_pages(&movable_page_list);
 	return ret;
 }

> If I generate an errno here, QEMU reports failing on the pc.rom memory
> region at 0xc0000.  Thanks,

Ah, a ROM region that is all zero'd makes some sense why it has gone
unnoticed as a bug.

Jason

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Re-allow pinning of zero pfns
  2022-06-23 20:47         ` Jason Gunthorpe
@ 2022-06-24  0:11           ` Alistair Popple
  2022-06-24  1:34             ` Jason Gunthorpe
  0 siblings, 1 reply; 13+ messages in thread
From: Alistair Popple @ 2022-06-24  0:11 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, David Hildenbrand, akpm, minchan, linux-mm,
	linux-kernel, paulmck, jhubbard, joaodias


Jason Gunthorpe <jgg@nvidia.com> writes:

> On Thu, Jun 23, 2022 at 02:21:39PM -0600, Alex Williamson wrote:
>
>> check_and_migrate_movable_pages() perpetually returns zero.  I believe
>> this is because folio_is_pinnable() previously returned true, and now
>> returns false.
>
> Indeed, it is a bug that check_and_migrate_movable_pages() returns
> 0 when it didn't do anything. It should return an error code.
>
> Hum.. Alistair, maybe you should look at this as well, I'm struggling
> alot to understand how it is safe to drop the reference on the page
> but hold a pointer to it on the movable_page_list - sure it was
> isolated - but why does that mean it won't be concurrently unmapped
> and freed?

folio_isolate_lru() takes a reference on the page so you're safe from it
being freed. If it gets unmapped it will be freed when the matching
putback_movable_pages() is called.

> Anyhow, it looks like the problem is the tortured logic in this
> function, what do you think about this:

At a glance it seems reasonable, although I fear it might conflict with
my changes for device coherent migration. Agree the whole
check_and_migrate_movable_pages() logic is pretty tortured though, and I
don't think I'm making it better so would be happy to try cleaning it up
futher once the device coherent changes are in.

> diff --git a/mm/gup.c b/mm/gup.c
> index 5512644076246d..2ffcb3f4ff4a7b 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1879,10 +1879,15 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  					    unsigned int gup_flags)
>  {
>  	unsigned long isolation_error_count = 0, i;
> +	struct migration_target_control mtc = {
> +		.nid = NUMA_NO_NODE,
> +		.gfp_mask = GFP_USER | __GFP_NOWARN,
> +	};
>  	struct folio *prev_folio = NULL;
>  	LIST_HEAD(movable_page_list);
>  	bool drain_allow = true;
> -	int ret = 0;
> +	int not_migrated;
> +	int ret;
>
>  	for (i = 0; i < nr_pages; i++) {
>  		struct folio *folio = page_folio(pages[i]);
> @@ -1919,16 +1924,13 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  				    folio_nr_pages(folio));
>  	}
>
> -	if (!list_empty(&movable_page_list) || isolation_error_count)
> -		goto unpin_pages;
> -
>  	/*
>  	 * If list is empty, and no isolation errors, means that all pages are
> -	 * in the correct zone.
> +	 * in the correct zone, nothing to do.
>  	 */
> -	return nr_pages;
> +	if (list_empty(&movable_page_list) && !isolation_error_count)
> +		return nr_pages;
>
> -unpin_pages:
>  	if (gup_flags & FOLL_PIN) {
>  		unpin_user_pages(pages, nr_pages);
>  	} else {
> @@ -1936,20 +1938,22 @@ static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  			put_page(pages[i]);
>  	}
>
> -	if (!list_empty(&movable_page_list)) {
> -		struct migration_target_control mtc = {
> -			.nid = NUMA_NO_NODE,
> -			.gfp_mask = GFP_USER | __GFP_NOWARN,
> -		};
> +	if (isolation_error_count) {
> +		ret = -EINVAL;
> +		goto err_putback;
> +	}
>
> -		ret = migrate_pages(&movable_page_list, alloc_migration_target,
> -				    NULL, (unsigned long)&mtc, MIGRATE_SYNC,
> -				    MR_LONGTERM_PIN, NULL);
> -		if (ret > 0) /* number of pages not migrated */
> -			ret = -ENOMEM;
> +	not_migrated = migrate_pages(&movable_page_list, alloc_migration_target,
> +				     NULL, (unsigned long)&mtc, MIGRATE_SYNC,
> +				     MR_LONGTERM_PIN, NULL);
> +	if (not_migrated > 0) {
> +		ret = -ENOMEM;
> +		goto err_putback;
>  	}
> +	return 0;
>
> -	if (ret && !list_empty(&movable_page_list))
> +err_putback:
> +	if (!list_empty(&movable_page_list))
>  		putback_movable_pages(&movable_page_list);
>  	return ret;
>  }
>
>> If I generate an errno here, QEMU reports failing on the pc.rom memory
>> region at 0xc0000.  Thanks,
>
> Ah, a ROM region that is all zero'd makes some sense why it has gone
> unnoticed as a bug.
>
> Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Re-allow pinning of zero pfns
  2022-06-24  0:11           ` Alistair Popple
@ 2022-06-24  1:34             ` Jason Gunthorpe
  2022-06-24  1:55               ` Alistair Popple
  0 siblings, 1 reply; 13+ messages in thread
From: Jason Gunthorpe @ 2022-06-24  1:34 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Alex Williamson, David Hildenbrand, akpm, minchan, linux-mm,
	linux-kernel, paulmck, jhubbard, joaodias

On Fri, Jun 24, 2022 at 10:11:01AM +1000, Alistair Popple wrote:

> > Hum.. Alistair, maybe you should look at this as well, I'm struggling
> > alot to understand how it is safe to drop the reference on the page
> > but hold a pointer to it on the movable_page_list - sure it was
> > isolated - but why does that mean it won't be concurrently unmapped
> > and freed?
> 
> folio_isolate_lru() takes a reference on the page so you're safe from it
> being freed. If it gets unmapped it will be freed when the matching
> putback_movable_pages() is called.

Hm, I guess I didn't dig deep enough into that call chain..

> > Anyhow, it looks like the problem is the tortured logic in this
> > function, what do you think about this:
> 
> At a glance it seems reasonable, although I fear it might conflict with
> my changes for device coherent migration. Agree the whole
> check_and_migrate_movable_pages() logic is pretty tortured though, and I
> don't think I'm making it better so would be happy to try cleaning it up
> futher once the device coherent changes are in.

OK, can I leave this patch with you then? I have no way to test it..

Thanks,
Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Re-allow pinning of zero pfns
  2022-06-24  1:34             ` Jason Gunthorpe
@ 2022-06-24  1:55               ` Alistair Popple
  2022-07-28  8:45                 ` Alistair Popple
  0 siblings, 1 reply; 13+ messages in thread
From: Alistair Popple @ 2022-06-24  1:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, David Hildenbrand, akpm, minchan, linux-mm,
	linux-kernel, paulmck, jhubbard, joaodias


Jason Gunthorpe <jgg@nvidia.com> writes:

> On Fri, Jun 24, 2022 at 10:11:01AM +1000, Alistair Popple wrote:
>
>> > Hum.. Alistair, maybe you should look at this as well, I'm struggling
>> > alot to understand how it is safe to drop the reference on the page
>> > but hold a pointer to it on the movable_page_list - sure it was
>> > isolated - but why does that mean it won't be concurrently unmapped
>> > and freed?
>>
>> folio_isolate_lru() takes a reference on the page so you're safe from it
>> being freed. If it gets unmapped it will be freed when the matching
>> putback_movable_pages() is called.
>
> Hm, I guess I didn't dig deep enough into that call chain..
>
>> > Anyhow, it looks like the problem is the tortured logic in this
>> > function, what do you think about this:
>>
>> At a glance it seems reasonable, although I fear it might conflict with
>> my changes for device coherent migration. Agree the whole
>> check_and_migrate_movable_pages() logic is pretty tortured though, and I
>> don't think I'm making it better so would be happy to try cleaning it up
>> futher once the device coherent changes are in.
>
> OK, can I leave this patch with you then? I have no way to test it..

Yep, no worries.

> Thanks,
> Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Re-allow pinning of zero pfns
  2022-06-24  1:55               ` Alistair Popple
@ 2022-07-28  8:45                 ` Alistair Popple
  2022-07-28  9:23                   ` David Hildenbrand
  0 siblings, 1 reply; 13+ messages in thread
From: Alistair Popple @ 2022-07-28  8:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Alex Williamson, David Hildenbrand, akpm, minchan, linux-mm,
	linux-kernel, paulmck, jhubbard, joaodias


Looks like the original patch might need rebasing. I am about to post a
clean-up for the tortured logic in check_and_migrate_movable_pages() so
can incorporate it there, but I'm wondering what the consensus was for
pinning of zero pfn?

Currently my clean-up will result in PUP returning an error for the zero
pfn rather than looping indefinitely in the kernel. However it wasn't
clear from this thread if returning an error is ok, or if R/O pinning
of the zero pfn should succeed?

 - Alistair

Alistair Popple <apopple@nvidia.com> writes:

> Jason Gunthorpe <jgg@nvidia.com> writes:
>
>> On Fri, Jun 24, 2022 at 10:11:01AM +1000, Alistair Popple wrote:
>>
>>> > Hum.. Alistair, maybe you should look at this as well, I'm struggling
>>> > alot to understand how it is safe to drop the reference on the page
>>> > but hold a pointer to it on the movable_page_list - sure it was
>>> > isolated - but why does that mean it won't be concurrently unmapped
>>> > and freed?
>>>
>>> folio_isolate_lru() takes a reference on the page so you're safe from it
>>> being freed. If it gets unmapped it will be freed when the matching
>>> putback_movable_pages() is called.
>>
>> Hm, I guess I didn't dig deep enough into that call chain..
>>
>>> > Anyhow, it looks like the problem is the tortured logic in this
>>> > function, what do you think about this:
>>>
>>> At a glance it seems reasonable, although I fear it might conflict with
>>> my changes for device coherent migration. Agree the whole
>>> check_and_migrate_movable_pages() logic is pretty tortured though, and I
>>> don't think I'm making it better so would be happy to try cleaning it up
>>> futher once the device coherent changes are in.
>>
>> OK, can I leave this patch with you then? I have no way to test it..
>
> Yep, no worries.
>
>> Thanks,
>> Jason

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Re-allow pinning of zero pfns
  2022-07-28  8:45                 ` Alistair Popple
@ 2022-07-28  9:23                   ` David Hildenbrand
  2022-07-29  2:49                     ` Alistair Popple
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand @ 2022-07-28  9:23 UTC (permalink / raw)
  To: Alistair Popple, Jason Gunthorpe
  Cc: Alex Williamson, akpm, minchan, linux-mm, linux-kernel, paulmck,
	jhubbard, joaodias

On 28.07.22 10:45, Alistair Popple wrote:
> 
> Looks like the original patch might need rebasing. I am about to post a
> clean-up for the tortured logic in check_and_migrate_movable_pages() so
> can incorporate it there, but I'm wondering what the consensus was for
> pinning of zero pfn?

We have to keep it working right now, but in most cases (inside
MAP_PRIVATE regions) it's shaky and undesired.

> 
> Currently my clean-up will result in PUP returning an error for the zero
> pfn rather than looping indefinitely in the kernel. However it wasn't
> clear from this thread if returning an error is ok, or if R/O pinning
> of the zero pfn should succeed?

I'm working on proper COW breaking in MAP_PRIVATE mappings, which will,
for example, unshare the shared zeropage and properly replace it by
exclusive anon pages first in the FOLL_LONGTERM case.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] mm: Re-allow pinning of zero pfns
  2022-07-28  9:23                   ` David Hildenbrand
@ 2022-07-29  2:49                     ` Alistair Popple
  0 siblings, 0 replies; 13+ messages in thread
From: Alistair Popple @ 2022-07-29  2:49 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Jason Gunthorpe, Alex Williamson, akpm, minchan, linux-mm,
	linux-kernel, paulmck, jhubbard, joaodias


David Hildenbrand <david@redhat.com> writes:

> On 28.07.22 10:45, Alistair Popple wrote:
>>
>> Looks like the original patch might need rebasing. I am about to post a
>> clean-up for the tortured logic in check_and_migrate_movable_pages() so
>> can incorporate it there, but I'm wondering what the consensus was for
>> pinning of zero pfn?
>
> We have to keep it working right now, but in most cases (inside
> MAP_PRIVATE regions) it's shaky and undesired.

Ok. Well I've looked at this now so may as well stick

Reviewed-by: Alistair Popple <apopple@nvidia.com>

on it. However I think it needs rebasing, should I send an updated
version?

>>
>> Currently my clean-up will result in PUP returning an error for the zero
>> pfn rather than looping indefinitely in the kernel. However it wasn't
>> clear from this thread if returning an error is ok, or if R/O pinning
>> of the zero pfn should succeed?
>
> I'm working on proper COW breaking in MAP_PRIVATE mappings, which will,
> for example, unshare the shared zeropage and properly replace it by
> exclusive anon pages first in the FOLL_LONGTERM case.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-07-29  2:52 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-10 22:35 [PATCH] mm: Re-allow pinning of zero pfns Alex Williamson
2022-06-11  0:21 ` Minchan Kim
2022-06-11 18:29 ` David Hildenbrand
2022-06-15 15:56   ` Jason Gunthorpe
2022-06-23 18:07     ` David Hildenbrand
2022-06-23 20:21       ` Alex Williamson
2022-06-23 20:47         ` Jason Gunthorpe
2022-06-24  0:11           ` Alistair Popple
2022-06-24  1:34             ` Jason Gunthorpe
2022-06-24  1:55               ` Alistair Popple
2022-07-28  8:45                 ` Alistair Popple
2022-07-28  9:23                   ` David Hildenbrand
2022-07-29  2:49                     ` Alistair Popple

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).