linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one()
@ 2020-03-22 13:57 Pingfan Liu
  2020-03-23  7:34 ` Michal Hocko
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Pingfan Liu @ 2020-03-22 13:57 UTC (permalink / raw)
  To: linux-mm
  Cc: Pingfan Liu, Jérôme Glisse, Michal Hocko, John Hubbard,
	Kirill A . Shutemov, Aneesh Kumar, Andrew Morton

For zone_device, migration can only happen on is_device_private_page(page).
Correct the logic in try_to_unmap_one().

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
To: linux-mm@kvack.org
---
 mm/rmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index b838647..ffadf3e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1380,7 +1380,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,

 	if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
 	    is_zone_device_page(page) && !is_device_private_page(page))
-		return true;
+		return false;

 	if (flags & TTU_SPLIT_HUGE_PMD) {
 		split_huge_pmd_address(vma, address,
@@ -1487,7 +1487,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,

 		if (IS_ENABLED(CONFIG_MIGRATION) &&
 		    (flags & TTU_MIGRATION) &&
-		    is_zone_device_page(page)) {
+		    is_device_private_page(page)) {
 			swp_entry_t entry;
 			pte_t swp_pte;

--
2.7.5



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one()
  2020-03-22 13:57 [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one() Pingfan Liu
@ 2020-03-23  7:34 ` Michal Hocko
  2020-03-23 23:32   ` John Hubbard
                     ` (2 more replies)
  2020-03-24  0:04 ` Balbir Singh
  2020-04-01 14:17 ` [PATCH] mm/rmap: fix the handling of !private device " Pingfan Liu
  2 siblings, 3 replies; 15+ messages in thread
From: Michal Hocko @ 2020-03-23  7:34 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-mm, Jérôme Glisse, John Hubbard,
	Kirill A . Shutemov, Aneesh Kumar, Andrew Morton

On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
> For zone_device, migration can only happen on is_device_private_page(page).
> Correct the logic in try_to_unmap_one().

Maybe it is just me lacking knowledge in the zone_device ZOO. But
this really deserves a much more detailed explanation IMHO. It seems
a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
migration") deliberately made the decision to allow unmapping these
pages? Is the check just wrong, inncomplete? Why?

What is the real user visible problem here?
 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> To: linux-mm@kvack.org
> ---
>  mm/rmap.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index b838647..ffadf3e 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1380,7 +1380,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> 
>  	if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
>  	    is_zone_device_page(page) && !is_device_private_page(page))
> -		return true;
> +		return false;
> 
>  	if (flags & TTU_SPLIT_HUGE_PMD) {
>  		split_huge_pmd_address(vma, address,
> @@ -1487,7 +1487,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> 
>  		if (IS_ENABLED(CONFIG_MIGRATION) &&
>  		    (flags & TTU_MIGRATION) &&
> -		    is_zone_device_page(page)) {
> +		    is_device_private_page(page)) {
>  			swp_entry_t entry;
>  			pte_t swp_pte;
> 
> --
> 2.7.5

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one()
  2020-03-23  7:34 ` Michal Hocko
@ 2020-03-23 23:32   ` John Hubbard
  2020-03-24  3:50     ` Pingfan Liu
  2020-03-24  0:20   ` Ralph Campbell
  2020-03-24  3:47   ` Pingfan Liu
  2 siblings, 1 reply; 15+ messages in thread
From: John Hubbard @ 2020-03-23 23:32 UTC (permalink / raw)
  To: Michal Hocko, Pingfan Liu
  Cc: linux-mm, Jérôme Glisse, Kirill A . Shutemov,
	Aneesh Kumar, Andrew Morton, Dan Williams, Ralph Campbell

On 3/23/20 12:34 AM, Michal Hocko wrote:
> On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
>> For zone_device, migration can only happen on is_device_private_page(page).
>> Correct the logic in try_to_unmap_one().
> 
> Maybe it is just me lacking knowledge in the zone_device ZOO. But
> this really deserves a much more detailed explanation IMHO. It seems
> a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> migration") deliberately made the decision to allow unmapping these
> pages? Is the check just wrong, inncomplete? Why?
> 
> What is the real user visible problem here?
>   

I was going to guess that someone was having trouble with the behavior on
non-device-private ZONE_DEVICE pages, but...I am also at a loss as to
what triggered this patch. So, I have the exact same questions as Michal,
plus really wondering what tests you are running, and on what hardware config.

It's hard to get the right CC's, but probably Dan Williams and Ralph Campbell
should also be added, if you are sure you need this. (I'm adding them now.)

thanks,
-- 
John Hubbard
NVIDIA

>> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
>> Cc: Jérôme Glisse <jglisse@redhat.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: John Hubbard <jhubbard@nvidia.com>
>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> To: linux-mm@kvack.org
>> ---
>>   mm/rmap.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index b838647..ffadf3e 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1380,7 +1380,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>>
>>   	if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
>>   	    is_zone_device_page(page) && !is_device_private_page(page))
>> -		return true;
>> +		return false;
>>
>>   	if (flags & TTU_SPLIT_HUGE_PMD) {
>>   		split_huge_pmd_address(vma, address,
>> @@ -1487,7 +1487,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>>
>>   		if (IS_ENABLED(CONFIG_MIGRATION) &&
>>   		    (flags & TTU_MIGRATION) &&
>> -		    is_zone_device_page(page)) {
>> +		    is_device_private_page(page)) {
>>   			swp_entry_t entry;
>>   			pte_t swp_pte;
>>
>> --
>> 2.7.5
> 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one()
  2020-03-22 13:57 [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one() Pingfan Liu
  2020-03-23  7:34 ` Michal Hocko
@ 2020-03-24  0:04 ` Balbir Singh
  2020-03-24  3:55   ` Pingfan Liu
  2020-04-01 14:17 ` [PATCH] mm/rmap: fix the handling of !private device " Pingfan Liu
  2 siblings, 1 reply; 15+ messages in thread
From: Balbir Singh @ 2020-03-24  0:04 UTC (permalink / raw)
  To: Pingfan Liu, linux-mm
  Cc: Jérôme Glisse, Michal Hocko, John Hubbard,
	Kirill A . Shutemov, Aneesh Kumar, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 306 bytes --]

On 23/3/20 12:57 am, Pingfan Liu wrote:
> For zone_device, migration can only happen on is_device_private_page(page).
> Correct the logic in try_to_unmap_one().

!DEVICE_PRIVATE implies it's a cache coherent page and we shouldn't bail
out - Could you clarify what issue your fixing?

Balbir Singh.

[-- Attachment #2: pEpkey.asc --]
[-- Type: application/pgp-keys, Size: 2501 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one()
  2020-03-23  7:34 ` Michal Hocko
  2020-03-23 23:32   ` John Hubbard
@ 2020-03-24  0:20   ` Ralph Campbell
  2020-03-24  4:21     ` Pingfan Liu
  2020-03-24  3:47   ` Pingfan Liu
  2 siblings, 1 reply; 15+ messages in thread
From: Ralph Campbell @ 2020-03-24  0:20 UTC (permalink / raw)
  To: Michal Hocko, Pingfan Liu
  Cc: linux-mm, Jérôme Glisse, John Hubbard,
	Kirill A . Shutemov, Aneesh Kumar, Andrew Morton


On 3/23/20 12:34 AM, Michal Hocko wrote:
> On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
>> For zone_device, migration can only happen on is_device_private_page(page).
>> Correct the logic in try_to_unmap_one().
> 
> Maybe it is just me lacking knowledge in the zone_device ZOO. But
> this really deserves a much more detailed explanation IMHO. It seems
> a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> migration") deliberately made the decision to allow unmapping these
> pages? Is the check just wrong, inncomplete? Why?
> 
> What is the real user visible problem here?
>   
>> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
>> Cc: Jérôme Glisse <jglisse@redhat.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: John Hubbard <jhubbard@nvidia.com>
>> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> To: linux-mm@kvack.org
>> ---
>>   mm/rmap.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index b838647..ffadf3e 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1380,7 +1380,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>>
>>   	if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
>>   	    is_zone_device_page(page) && !is_device_private_page(page))
>> -		return true;
>> +		return false;

Pages can be mapped in multiple vmas. Returning false here will only break out
of the loop and skip any other vmas mapping this page which is a minor optimization
but shouldn't really affect what try_to_unmap_one() does.

>>   	if (flags & TTU_SPLIT_HUGE_PMD) {
>>   		split_huge_pmd_address(vma, address,
>> @@ -1487,7 +1487,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>>
>>   		if (IS_ENABLED(CONFIG_MIGRATION) &&
>>   		    (flags & TTU_MIGRATION) &&
>> -		    is_zone_device_page(page)) {
>> +		    is_device_private_page(page)) {
>>   			swp_entry_t entry;
>>   			pte_t swp_pte;

Since the page was checked for !device private, this is more clear but shouldn't
change anything.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one()
  2020-03-23  7:34 ` Michal Hocko
  2020-03-23 23:32   ` John Hubbard
  2020-03-24  0:20   ` Ralph Campbell
@ 2020-03-24  3:47   ` Pingfan Liu
  2020-03-24  9:14     ` Michal Hocko
  2 siblings, 1 reply; 15+ messages in thread
From: Pingfan Liu @ 2020-03-24  3:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux-MM, Jérôme Glisse, John Hubbard,
	Kirill A . Shutemov, Aneesh Kumar, Andrew Morton

On Mon, Mar 23, 2020 at 3:34 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
> > For zone_device, migration can only happen on is_device_private_page(page).
> > Correct the logic in try_to_unmap_one().
>
> Maybe it is just me lacking knowledge in the zone_device ZOO. But
> this really deserves a much more detailed explanation IMHO. It seems
> a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> migration") deliberately made the decision to allow unmapping these
> pages? Is the check just wrong, inncomplete? Why?
I am not quite sure about zone_device, but I will try to explain it later.

But first of all, I think the code conflicts with the logic behind it.
If try_to_unmap_one() success to unmap a page, then it should kill the
pte, and return true. But the original code return true before the
code like "ptep_clear_flush()"

Now, I try to say about !device_private zone device. (Please pardon
and correct me if I make a mistake)
memmap_init_zone_device() raises an extra _refcount on all zone
device. And private-device should lifts the count later, otherwise it
can not migrate. But I did not find the exact place yet.

While this extra _refcount will block migration, it is not the whole
reason if a zone device page is mapped.

If  a zone device page is mapped, then I think the original code
happen to work due to it skip the call of page_remove_rmap(), and in
try_to_unmap(){ return !page_mapcount(page) ? true : false;}.
>
> What is the real user visible problem here?
As explained, the original code happens to work, but it conflicts with
the logic.

Thanks,
Pingfan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one()
  2020-03-23 23:32   ` John Hubbard
@ 2020-03-24  3:50     ` Pingfan Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Pingfan Liu @ 2020-03-24  3:50 UTC (permalink / raw)
  To: John Hubbard
  Cc: Michal Hocko, Linux-MM, Jérôme Glisse,
	Kirill A . Shutemov, Aneesh Kumar, Andrew Morton, Dan Williams,
	Ralph Campbell

On Tue, Mar 24, 2020 at 7:32 AM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 3/23/20 12:34 AM, Michal Hocko wrote:
> > On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
> >> For zone_device, migration can only happen on is_device_private_page(page).
> >> Correct the logic in try_to_unmap_one().
> >
> > Maybe it is just me lacking knowledge in the zone_device ZOO. But
> > this really deserves a much more detailed explanation IMHO. It seems
> > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> > migration") deliberately made the decision to allow unmapping these
> > pages? Is the check just wrong, inncomplete? Why?
> >
> > What is the real user visible problem here?
> >
>
> I was going to guess that someone was having trouble with the behavior on
> non-device-private ZONE_DEVICE pages, but...I am also at a loss as to
> what triggered this patch. So, I have the exact same questions as Michal,
> plus really wondering what tests you are running, and on what hardware config.
:), please see my reply to Michal.
>
> It's hard to get the right CC's, but probably Dan Williams and Ralph Campbell
> should also be added, if you are sure you need this. (I'm adding them now.)
Appreciate you to do that. In fact, I am not sure about some detail,
and hope someone can correct me if I make mistake.

Thanks,
Pingfan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one()
  2020-03-24  0:04 ` Balbir Singh
@ 2020-03-24  3:55   ` Pingfan Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Pingfan Liu @ 2020-03-24  3:55 UTC (permalink / raw)
  To: Balbir Singh
  Cc: Linux-MM, Jérôme Glisse, Michal Hocko, John Hubbard,
	Kirill A . Shutemov, Aneesh Kumar, Andrew Morton

On Tue, Mar 24, 2020 at 8:04 AM Balbir Singh <bsingharora@gmail.com> wrote:
>
> On 23/3/20 12:57 am, Pingfan Liu wrote:
> > For zone_device, migration can only happen on is_device_private_page(page).
> > Correct the logic in try_to_unmap_one().
>
> !DEVICE_PRIVATE implies it's a cache coherent page and we shouldn't bail
> out - Could you clarify what issue your fixing?
As I replied to Michal, I tried to fix the code logic, but not really hit a bug.

Thanks,
Pingfan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one()
  2020-03-24  0:20   ` Ralph Campbell
@ 2020-03-24  4:21     ` Pingfan Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Pingfan Liu @ 2020-03-24  4:21 UTC (permalink / raw)
  To: Ralph Campbell
  Cc: Michal Hocko, Linux-MM, Jérôme Glisse, John Hubbard,
	Kirill A . Shutemov, Aneesh Kumar, Andrew Morton

Thanks for your explanation.

On Tue, Mar 24, 2020 at 8:20 AM Ralph Campbell <rcampbell@nvidia.com> wrote:
>
>
> On 3/23/20 12:34 AM, Michal Hocko wrote:
> > On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
> >> For zone_device, migration can only happen on is_device_private_page(page).
> >> Correct the logic in try_to_unmap_one().
> >
> > Maybe it is just me lacking knowledge in the zone_device ZOO. But
> > this really deserves a much more detailed explanation IMHO. It seems
> > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> > migration") deliberately made the decision to allow unmapping these
> > pages? Is the check just wrong, inncomplete? Why?
> >
> > What is the real user visible problem here?
> >
> >> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> >> Cc: Jérôme Glisse <jglisse@redhat.com>
> >> Cc: Michal Hocko <mhocko@kernel.org>
> >> Cc: John Hubbard <jhubbard@nvidia.com>
> >> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> >> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
> >> Cc: Andrew Morton <akpm@linux-foundation.org>
> >> To: linux-mm@kvack.org
> >> ---
> >>   mm/rmap.c | 4 ++--
> >>   1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/mm/rmap.c b/mm/rmap.c
> >> index b838647..ffadf3e 100644
> >> --- a/mm/rmap.c
> >> +++ b/mm/rmap.c
> >> @@ -1380,7 +1380,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >>
> >>      if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
> >>          is_zone_device_page(page) && !is_device_private_page(page))
> >> -            return true;
> >> +            return false;
>
> Pages can be mapped in multiple vmas. Returning false here will only break out
> of the loop and skip any other vmas mapping this page which is a minor optimization
> but shouldn't really affect what try_to_unmap_one() does.
Yes, it returns false to terminate further iteration. And I think
fs-dax page would not go through this path.

The unmap of fs-dax should go through: umount fs, and arch_remove_memory().
>
> >>      if (flags & TTU_SPLIT_HUGE_PMD) {
> >>              split_huge_pmd_address(vma, address,
> >> @@ -1487,7 +1487,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >>
> >>              if (IS_ENABLED(CONFIG_MIGRATION) &&
> >>                  (flags & TTU_MIGRATION) &&
> >> -                is_zone_device_page(page)) {
> >> +                is_device_private_page(page)) {
> >>                      swp_entry_t entry;
> >>                      pte_t swp_pte;
>
> Since the page was checked for !device private, this is more clear but shouldn't
> change anything.
Yes. It just makes things clear.

Thanks,
Pingfan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one()
  2020-03-24  3:47   ` Pingfan Liu
@ 2020-03-24  9:14     ` Michal Hocko
  2020-03-25 10:54       ` Pingfan Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2020-03-24  9:14 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: Linux-MM, Jérôme Glisse, John Hubbard,
	Kirill A . Shutemov, Aneesh Kumar, Andrew Morton

On Tue 24-03-20 11:47:20, Pingfan Liu wrote:
> On Mon, Mar 23, 2020 at 3:34 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
> > > For zone_device, migration can only happen on is_device_private_page(page).
> > > Correct the logic in try_to_unmap_one().
> >
> > Maybe it is just me lacking knowledge in the zone_device ZOO. But
> > this really deserves a much more detailed explanation IMHO. It seems
> > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> > migration") deliberately made the decision to allow unmapping these
> > pages? Is the check just wrong, inncomplete? Why?
> I am not quite sure about zone_device, but I will try to explain it later.
> 
> But first of all, I think the code conflicts with the logic behind it.
> If try_to_unmap_one() success to unmap a page, then it should kill the
> pte, and return true. But the original code return true before the
> code like "ptep_clear_flush()"
> 
> Now, I try to say about !device_private zone device. (Please pardon
> and correct me if I make a mistake)
> memmap_init_zone_device() raises an extra _refcount on all zone
> device. And private-device should lifts the count later, otherwise it
> can not migrate. But I did not find the exact place yet.
> 
> While this extra _refcount will block migration, it is not the whole
> reason if a zone device page is mapped.
> 
> If  a zone device page is mapped, then I think the original code
> happen to work due to it skip the call of page_remove_rmap(), and in
> try_to_unmap(){ return !page_mapcount(page) ? true : false;}.

OK, you made me look more closely.

The lack of documentation and therefore the expected semantic doesn't
really help. So we can only deduce from the existing code which is a
recipe for cargo cult programming :/

The only difference betweena rmap_one returning true and false is that
the VMA walk stops for the later and done() callback is not called.
Does rmap_one success means the mapping for the vma has been torn down?
No. As we can see for the munlock case which just shows hot vague the
semantic of this callback might be.

I believe the zone device path was just copying it. Is that wrong?
Well, it is less optimal than necessary because the property we are
checking is not VMA specific so all other VMAs (if there are any at all)
will have the same to say.

So the only last remaining point is the done() callback. That one is
documented as a check. There is no note about potential side effects but
the current implementation is really only a check so skipping it doesn't
make any real difference.

> > What is the real user visible problem here?
> As explained, the original code happens to work, but it conflicts with
> the logic.

Your changelog should be explicit about this being a pure code
refinement/cleanup without any functional changes.

The rmap walk and callbacks would benefit from a proper documentation.
Hint...

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one()
  2020-03-24  9:14     ` Michal Hocko
@ 2020-03-25 10:54       ` Pingfan Liu
  2020-04-01 14:10         ` Pingfan Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Pingfan Liu @ 2020-03-25 10:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux-MM, Jérôme Glisse, John Hubbard,
	Kirill A . Shutemov, Aneesh Kumar, Andrew Morton

On Tue, Mar 24, 2020 at 5:14 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Tue 24-03-20 11:47:20, Pingfan Liu wrote:
> > On Mon, Mar 23, 2020 at 3:34 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
> > > > For zone_device, migration can only happen on is_device_private_page(page).
> > > > Correct the logic in try_to_unmap_one().
> > >
> > > Maybe it is just me lacking knowledge in the zone_device ZOO. But
> > > this really deserves a much more detailed explanation IMHO. It seems
> > > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> > > migration") deliberately made the decision to allow unmapping these
> > > pages? Is the check just wrong, inncomplete? Why?
> > I am not quite sure about zone_device, but I will try to explain it later.
> >
> > But first of all, I think the code conflicts with the logic behind it.
> > If try_to_unmap_one() success to unmap a page, then it should kill the
> > pte, and return true. But the original code return true before the
> > code like "ptep_clear_flush()"
> >
> > Now, I try to say about !device_private zone device. (Please pardon
> > and correct me if I make a mistake)
> > memmap_init_zone_device() raises an extra _refcount on all zone
> > device. And private-device should lifts the count later, otherwise it
> > can not migrate. But I did not find the exact place yet.
> >
> > While this extra _refcount will block migration, it is not the whole
> > reason if a zone device page is mapped.
> >
> > If  a zone device page is mapped, then I think the original code
> > happen to work due to it skip the call of page_remove_rmap(), and in
> > try_to_unmap(){ return !page_mapcount(page) ? true : false;}.
>
> OK, you made me look more closely.
>
> The lack of documentation and therefore the expected semantic doesn't
> really help. So we can only deduce from the existing code which is a
> recipe for cargo cult programming :/
>
> The only difference betweena rmap_one returning true and false is that
> the VMA walk stops for the later and done() callback is not called.
> Does rmap_one success means the mapping for the vma has been torn down?
> No. As we can see for the munlock case which just shows hot vague the
> semantic of this callback might be.
>
> I believe the zone device path was just copying it. Is that wrong?
> Well, it is less optimal than necessary because the property we are
> checking is not VMA specific so all other VMAs (if there are any at all)
> will have the same to say.
>
> So the only last remaining point is the done() callback. That one is
> documented as a check. There is no note about potential side effects but
> the current implementation is really only a check so skipping it doesn't
> make any real difference.
>
> > > What is the real user visible problem here?
> > As explained, the original code happens to work, but it conflicts with
> > the logic.
>
> Your changelog should be explicit about this being a pure code
> refinement/cleanup without any functional changes.
OK, I will do that.
>
> The rmap walk and callbacks would benefit from a proper documentation.
> Hint...
I will add some note around try_to_unmap_one(), and update the commit
log. (Hope my understanding of zone device is right, and will cc more
people for comment)

Thanks,
Pingfan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one()
  2020-03-25 10:54       ` Pingfan Liu
@ 2020-04-01 14:10         ` Pingfan Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Pingfan Liu @ 2020-04-01 14:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux-MM, Jérôme Glisse, John Hubbard,
	Kirill A . Shutemov, Aneesh Kumar, Andrew Morton

On Wed, Mar 25, 2020 at 6:54 PM Pingfan Liu <kernelfans@gmail.com> wrote:
>
> On Tue, Mar 24, 2020 at 5:14 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Tue 24-03-20 11:47:20, Pingfan Liu wrote:
> > > On Mon, Mar 23, 2020 at 3:34 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > On Sun 22-03-20 21:57:07, Pingfan Liu wrote:
> > > > > For zone_device, migration can only happen on is_device_private_page(page).
> > > > > Correct the logic in try_to_unmap_one().
> > > >
> > > > Maybe it is just me lacking knowledge in the zone_device ZOO. But
> > > > this really deserves a much more detailed explanation IMHO. It seems
> > > > a5430dda8a3a ("mm/migrate: support un-addressable ZONE_DEVICE page in
> > > > migration") deliberately made the decision to allow unmapping these
> > > > pages? Is the check just wrong, inncomplete? Why?
> > > I am not quite sure about zone_device, but I will try to explain it later.
> > >
> > > But first of all, I think the code conflicts with the logic behind it.
> > > If try_to_unmap_one() success to unmap a page, then it should kill the
> > > pte, and return true. But the original code return true before the
> > > code like "ptep_clear_flush()"
> > >
> > > Now, I try to say about !device_private zone device. (Please pardon
> > > and correct me if I make a mistake)
> > > memmap_init_zone_device() raises an extra _refcount on all zone
> > > device. And private-device should lifts the count later, otherwise it
> > > can not migrate. But I did not find the exact place yet.
> > >
> > > While this extra _refcount will block migration, it is not the whole
> > > reason if a zone device page is mapped.
> > >
> > > If  a zone device page is mapped, then I think the original code
> > > happen to work due to it skip the call of page_remove_rmap(), and in
> > > try_to_unmap(){ return !page_mapcount(page) ? true : false;}.
> >
> > OK, you made me look more closely.
> >
> > The lack of documentation and therefore the expected semantic doesn't
> > really help. So we can only deduce from the existing code which is a
> > recipe for cargo cult programming :/
> >
> > The only difference betweena rmap_one returning true and false is that
> > the VMA walk stops for the later and done() callback is not called.
> > Does rmap_one success means the mapping for the vma has been torn down?
> > No. As we can see for the munlock case which just shows hot vague the
> > semantic of this callback might be.
> >
> > I believe the zone device path was just copying it. Is that wrong?
> > Well, it is less optimal than necessary because the property we are
> > checking is not VMA specific so all other VMAs (if there are any at all)
> > will have the same to say.
> >
> > So the only last remaining point is the done() callback. That one is
> > documented as a check. There is no note about potential side effects but
> > the current implementation is really only a check so skipping it doesn't
> > make any real difference.
> >
> > > > What is the real user visible problem here?
> > > As explained, the original code happens to work, but it conflicts with
> > > the logic.
> >
> > Your changelog should be explicit about this being a pure code
> > refinement/cleanup without any functional changes.
> OK, I will do that.
It took me some time to make clear try_to_munlock(). And now I can
make some notes. I will send out V2 soon.

Thanks,
Pingfan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH] mm/rmap: fix the handling of !private device page in try_to_unmap_one()
  2020-03-22 13:57 [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one() Pingfan Liu
  2020-03-23  7:34 ` Michal Hocko
  2020-03-24  0:04 ` Balbir Singh
@ 2020-04-01 14:17 ` Pingfan Liu
  2020-04-01 15:58   ` Michal Hocko
  2 siblings, 1 reply; 15+ messages in thread
From: Pingfan Liu @ 2020-04-01 14:17 UTC (permalink / raw)
  To: linux-mm
  Cc: Pingfan Liu, Jérôme Glisse, Michal Hocko,
	Ralph Campbell, Balbir Singh, Dan Williams, John Hubbard,
	Kirill A . Shutemov, Aneesh Kumar, Andrew Morton

This patch is a pure code refinement without any functional changes.

try_to_unmap_one() is shared by try_to_unmap() and try_to_munlock(). As for
unmap, if try_to_unmap_one() return true, it means the pte has been teared
down and mapcount dec. Apparently the current code

        if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
            is_zone_device_page(page) && !is_device_private_page(page))
               return true;

conflicts with this logic.

Further more, as for zone_device, migration can only happen on
is_device_private_page(page). For other zone_device, memmap_init_zone_device()
raises an extra _refcount on all zone pages. This extra _refcount will block
migration. So in try_to_unmap_one(), it can just return false for other zone
device.

The reason why original code happen to work one !private zone_device.
-1. if page mapped, then try_to_unmap_one()->page_remove_rmap() is skipped, and
finally try_to_unmap(){ return !page_mapcount(page) ? true : false;} will
return false.
-2. if page not mapped, the extra _refcount will prevent the migration.

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
To: linux-mm@kvack.org
---
v1 -> v2: improve commit log and note in code
 mm/rmap.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index b838647..723af4f 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1358,6 +1358,10 @@ void page_remove_rmap(struct page *page, bool compound)

 /*
  * @arg: enum ttu_flags will be passed to this argument
+ *
+ * For munlock, return true if @page is not mlocked by @vma without killing pte
+ * For unmap, return true after tearing down pte.
+ * For both cases, return false if rmap_walk should be stopped.
  */
 static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		     unsigned long address, void *arg)
@@ -1380,7 +1384,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,

 	if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
 	    is_zone_device_page(page) && !is_device_private_page(page))
-		return true;
+		return false;

 	if (flags & TTU_SPLIT_HUGE_PMD) {
 		split_huge_pmd_address(vma, address,
@@ -1487,7 +1491,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,

 		if (IS_ENABLED(CONFIG_MIGRATION) &&
 		    (flags & TTU_MIGRATION) &&
-		    is_zone_device_page(page)) {
+		    is_device_private_page(page)) {
 			swp_entry_t entry;
 			pte_t swp_pte;

--
2.7.5



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/rmap: fix the handling of !private device page in try_to_unmap_one()
  2020-04-01 14:17 ` [PATCH] mm/rmap: fix the handling of !private device " Pingfan Liu
@ 2020-04-01 15:58   ` Michal Hocko
  2020-04-02  7:40     ` Pingfan Liu
  0 siblings, 1 reply; 15+ messages in thread
From: Michal Hocko @ 2020-04-01 15:58 UTC (permalink / raw)
  To: Pingfan Liu
  Cc: linux-mm, Jérôme Glisse, Ralph Campbell, Balbir Singh,
	Dan Williams, John Hubbard, Kirill A . Shutemov, Aneesh Kumar,
	Andrew Morton

On Wed 01-04-20 22:17:58, Pingfan Liu wrote:
> This patch is a pure code refinement without any functional changes.
> 
> try_to_unmap_one() is shared by try_to_unmap() and try_to_munlock(). As for
> unmap, if try_to_unmap_one() return true, it means the pte has been teared
> down and mapcount dec.

I haven't really checked the full history of the rmap walk but this is
certainly not the currently implemented semantic of this callback.
Returing true only tells the caller that it should continue with other
VMAs which map the given page. It doesn't really mean that the pte has
been torn down. The munlock case is a nice example of how that is use
properly while migration path for device pages how it is used
incorrectly because it doesn't make any sense to walk other VMAs because
is_device_private_page is a property of the page not the VMA. And that
is the only reason to drop that.

> Apparently the current code
> 
>         if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
>             is_zone_device_page(page) && !is_device_private_page(page))
>                return true;
> 
> conflicts with this logic.
> 
> Further more, as for zone_device, migration can only happen on
> is_device_private_page(page). For other zone_device, memmap_init_zone_device()
> raises an extra _refcount on all zone pages. This extra _refcount will block
> migration. So in try_to_unmap_one(), it can just return false for other zone
> device.
> 
> The reason why original code happen to work one !private zone_device.
> -1. if page mapped, then try_to_unmap_one()->page_remove_rmap() is skipped, and
> finally try_to_unmap(){ return !page_mapcount(page) ? true : false;} will
> return false.
> -2. if page not mapped, the extra _refcount will prevent the migration.

And this sounds like a separate change to me worth a patch on its own.

> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> Cc: Jérôme Glisse <jglisse@redhat.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Ralph Campbell <rcampbell@nvidia.com>
> Cc: Balbir Singh <bsingharora@gmail.com>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: John Hubbard <jhubbard@nvidia.com>
> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> To: linux-mm@kvack.org
> ---
> v1 -> v2: improve commit log and note in code
>  mm/rmap.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index b838647..723af4f 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1358,6 +1358,10 @@ void page_remove_rmap(struct page *page, bool compound)
> 
>  /*
>   * @arg: enum ttu_flags will be passed to this argument
> + *
> + * For munlock, return true if @page is not mlocked by @vma without killing pte
> + * For unmap, return true after tearing down pte.
> + * For both cases, return false if rmap_walk should be stopped.
>   */
>  static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  		     unsigned long address, void *arg)
> @@ -1380,7 +1384,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> 
>  	if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
>  	    is_zone_device_page(page) && !is_device_private_page(page))
> -		return true;
> +		return false;
> 
>  	if (flags & TTU_SPLIT_HUGE_PMD) {
>  		split_huge_pmd_address(vma, address,
> @@ -1487,7 +1491,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> 
>  		if (IS_ENABLED(CONFIG_MIGRATION) &&
>  		    (flags & TTU_MIGRATION) &&
> -		    is_zone_device_page(page)) {
> +		    is_device_private_page(page)) {
>  			swp_entry_t entry;
>  			pte_t swp_pte;
> 
> --
> 2.7.5

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH] mm/rmap: fix the handling of !private device page in try_to_unmap_one()
  2020-04-01 15:58   ` Michal Hocko
@ 2020-04-02  7:40     ` Pingfan Liu
  0 siblings, 0 replies; 15+ messages in thread
From: Pingfan Liu @ 2020-04-02  7:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux-MM, Jérôme Glisse, Ralph Campbell, Balbir Singh,
	Dan Williams, John Hubbard, Kirill A . Shutemov, Aneesh Kumar,
	Andrew Morton

On Wed, Apr 1, 2020 at 11:58 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Wed 01-04-20 22:17:58, Pingfan Liu wrote:
> > This patch is a pure code refinement without any functional changes.
> >
> > try_to_unmap_one() is shared by try_to_unmap() and try_to_munlock(). As for
> > unmap, if try_to_unmap_one() return true, it means the pte has been teared
> > down and mapcount dec.
>
> I haven't really checked the full history of the rmap walk but this is
> certainly not the currently implemented semantic of this callback.
> Returing true only tells the caller that it should continue with other
> VMAs which map the given page. It doesn't really mean that the pte has
> been torn down. The munlock case is a nice example of how that is use
I did not paste the whole story in commit log, but note them in the code.
For munlock, we only care about the page will be put to correct lru.
But as commit "As for unmap", it should tear down pte, otherwise the
page may be accessed by and old mapping. Also here omit an assumption,
for !private device page, e.g. fs-dax, there is no need to mlock them.
> properly while migration path for device pages how it is used
> incorrectly because it doesn't make any sense to walk other VMAs because
> is_device_private_page is a property of the page not the VMA. And that
> is the only reason to drop that.
>
> > Apparently the current code
> >
> >         if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
> >             is_zone_device_page(page) && !is_device_private_page(page))
> >                return true;
> >
> > conflicts with this logic.
> >
[...]
> >  /*
> >   * @arg: enum ttu_flags will be passed to this argument
> > + *
> > + * For munlock, return true if @page is not mlocked by @vma without killing pte
Here is the note for munlock case

Thanks,
Pingfan
> > + * For unmap, return true after tearing down pte.
> > + * For both cases, return false if rmap_walk should be stopped.
> >   */
> >  static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >                    unsigned long address, void *arg)
> > @@ -1380,7 +1384,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >
> >       if (IS_ENABLED(CONFIG_MIGRATION) && (flags & TTU_MIGRATION) &&
> >           is_zone_device_page(page) && !is_device_private_page(page))
> > -             return true;
> > +             return false;
> >
> >       if (flags & TTU_SPLIT_HUGE_PMD) {
> >               split_huge_pmd_address(vma, address,
> > @@ -1487,7 +1491,7 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >
> >               if (IS_ENABLED(CONFIG_MIGRATION) &&
> >                   (flags & TTU_MIGRATION) &&
> > -                 is_zone_device_page(page)) {
> > +                 is_device_private_page(page)) {
> >                       swp_entry_t entry;
> >                       pte_t swp_pte;
> >
> > --
> > 2.7.5
>
> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-04-02  7:40 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-22 13:57 [PATCH] mm/rmap: fix the handling of device private page in try_to_unmap_one() Pingfan Liu
2020-03-23  7:34 ` Michal Hocko
2020-03-23 23:32   ` John Hubbard
2020-03-24  3:50     ` Pingfan Liu
2020-03-24  0:20   ` Ralph Campbell
2020-03-24  4:21     ` Pingfan Liu
2020-03-24  3:47   ` Pingfan Liu
2020-03-24  9:14     ` Michal Hocko
2020-03-25 10:54       ` Pingfan Liu
2020-04-01 14:10         ` Pingfan Liu
2020-03-24  0:04 ` Balbir Singh
2020-03-24  3:55   ` Pingfan Liu
2020-04-01 14:17 ` [PATCH] mm/rmap: fix the handling of !private device " Pingfan Liu
2020-04-01 15:58   ` Michal Hocko
2020-04-02  7:40     ` Pingfan Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).