linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm : fix pte _PAGE_DIRTY bit when fallback migrate page
@ 2020-07-09  2:48 robbieko
  2020-07-10 15:31 ` Vlastimil Babka
  0 siblings, 1 reply; 10+ messages in thread
From: robbieko @ 2020-07-09  2:48 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Robbie Ko

From: Robbie Ko <robbieko@synology.com>

When a migrate page occurs, we first create a migration entry
to replace the original pte, and then go to fallback_migrate_page
to execute a writeout if the migratepage is not supported.

In the writeout, we will clear the dirty bit of the page and use
page_mkclean to clear the dirty bit along with the corresponding pte,
but page_mkclean does not support migration entry.

The page ditry bit is cleared, but the dirty bit of the pte still exists,
so if mmap continues to write, it will result in data loss.

We fix the by first remove the migration entry and then clearing
the dirty bits of the page, which also clears the pte's dirty bits.

Signed-off-by: Robbie Ko <robbieko@synology.com>
---
 mm/migrate.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/migrate.c b/mm/migrate.c
index f37729673558..5c407434b9ba 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -875,10 +875,6 @@ static int writeout(struct address_space *mapping, struct page *page)
 		/* No write method for the address space */
 		return -EINVAL;
 
-	if (!clear_page_dirty_for_io(page))
-		/* Someone else already triggered a write */
-		return -EAGAIN;
-
 	/*
 	 * A dirty page may imply that the underlying filesystem has
 	 * the page on some queue. So the page must be clean for
@@ -889,6 +885,10 @@ static int writeout(struct address_space *mapping, struct page *page)
 	 */
 	remove_migration_ptes(page, page, false);
 
+	if (!clear_page_dirty_for_io(page))
+		/* Someone else already triggered a write */
+		return -EAGAIN;
+
 	rc = mapping->a_ops->writepage(page, &wbc);
 
 	if (rc != AOP_WRITEPAGE_ACTIVATE)
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm : fix pte _PAGE_DIRTY bit when fallback migrate page
  2020-07-09  2:48 [PATCH] mm : fix pte _PAGE_DIRTY bit when fallback migrate page robbieko
@ 2020-07-10 15:31 ` Vlastimil Babka
  2020-07-13  1:57   ` Robbie Ko
  0 siblings, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2020-07-10 15:31 UTC (permalink / raw)
  To: robbieko, linux-mm; +Cc: linux-kernel

On 7/9/20 4:48 AM, robbieko wrote:
> From: Robbie Ko <robbieko@synology.com>
> 
> When a migrate page occurs, we first create a migration entry
> to replace the original pte, and then go to fallback_migrate_page
> to execute a writeout if the migratepage is not supported.
> 
> In the writeout, we will clear the dirty bit of the page and use
> page_mkclean to clear the dirty bit along with the corresponding pte,
> but page_mkclean does not support migration entry.
> 
> The page ditry bit is cleared, but the dirty bit of the pte still exists,
> so if mmap continues to write, it will result in data loss.

Curious, did you observe this data loss? What filesystem? If yes, it seems
serious enough to
CC stable and determine a Fixes: tag?

> We fix the by first remove the migration entry and then clearing
> the dirty bits of the page, which also clears the pte's dirty bits.
> 
> Signed-off-by: Robbie Ko <robbieko@synology.com>
> ---
>  mm/migrate.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index f37729673558..5c407434b9ba 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -875,10 +875,6 @@ static int writeout(struct address_space *mapping, struct page *page)
>  		/* No write method for the address space */
>  		return -EINVAL;
>  
> -	if (!clear_page_dirty_for_io(page))
> -		/* Someone else already triggered a write */
> -		return -EAGAIN;
> -
>  	/*
>  	 * A dirty page may imply that the underlying filesystem has
>  	 * the page on some queue. So the page must be clean for
> @@ -889,6 +885,10 @@ static int writeout(struct address_space *mapping, struct page *page)
>  	 */
>  	remove_migration_ptes(page, page, false);
>  
> +	if (!clear_page_dirty_for_io(page))
> +		/* Someone else already triggered a write */
> +		return -EAGAIN;
> +
>  	rc = mapping->a_ops->writepage(page, &wbc);
>  
>  	if (rc != AOP_WRITEPAGE_ACTIVATE)
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm : fix pte _PAGE_DIRTY bit when fallback migrate page
  2020-07-10 15:31 ` Vlastimil Babka
@ 2020-07-13  1:57   ` Robbie Ko
  2020-07-14  9:46     ` Vlastimil Babka
  0 siblings, 1 reply; 10+ messages in thread
From: Robbie Ko @ 2020-07-13  1:57 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm; +Cc: linux-kernel


Vlastimil Babka 於 2020/7/10 下午11:31 寫道:
> On 7/9/20 4:48 AM, robbieko wrote:
>> From: Robbie Ko <robbieko@synology.com>
>>
>> When a migrate page occurs, we first create a migration entry
>> to replace the original pte, and then go to fallback_migrate_page
>> to execute a writeout if the migratepage is not supported.
>>
>> In the writeout, we will clear the dirty bit of the page and use
>> page_mkclean to clear the dirty bit along with the corresponding pte,
>> but page_mkclean does not support migration entry.
>>
>> The page ditry bit is cleared, but the dirty bit of the pte still exists,
>> so if mmap continues to write, it will result in data loss.
> Curious, did you observe this data loss? What filesystem? If yes, it seems
> serious enough to
> CC stable and determine a Fixes: tag?

Yes, there is data loss.
I'm using a btrfs environment, but not the following patch
btrfs: implement migratepage callback for data pages
https://git.kernel.org/pub/scm/linux/kernel 
/git/torvalds/linux.git/commit/?h=v5.8-rc5& 
id=f8e6608180a31cc72a23b74969da428da236dbd1


>> We fix the by first remove the migration entry and then clearing
>> the dirty bits of the page, which also clears the pte's dirty bits.
>>
>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>> ---
>>   mm/migrate.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index f37729673558..5c407434b9ba 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -875,10 +875,6 @@ static int writeout(struct address_space *mapping, struct page *page)
>>   		/* No write method for the address space */
>>   		return -EINVAL;
>>   
>> -	if (!clear_page_dirty_for_io(page))
>> -		/* Someone else already triggered a write */
>> -		return -EAGAIN;
>> -
>>   	/*
>>   	 * A dirty page may imply that the underlying filesystem has
>>   	 * the page on some queue. So the page must be clean for
>> @@ -889,6 +885,10 @@ static int writeout(struct address_space *mapping, struct page *page)
>>   	 */
>>   	remove_migration_ptes(page, page, false);
>>   
>> +	if (!clear_page_dirty_for_io(page))
>> +		/* Someone else already triggered a write */
>> +		return -EAGAIN;
>> +
>>   	rc = mapping->a_ops->writepage(page, &wbc);
>>   
>>   	if (rc != AOP_WRITEPAGE_ACTIVATE)
>>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm : fix pte _PAGE_DIRTY bit when fallback migrate page
  2020-07-13  1:57   ` Robbie Ko
@ 2020-07-14  9:46     ` Vlastimil Babka
  2020-07-14 10:19       ` Kirill A. Shutemov
  2020-07-15  2:05       ` Robbie Ko
  0 siblings, 2 replies; 10+ messages in thread
From: Vlastimil Babka @ 2020-07-14  9:46 UTC (permalink / raw)
  To: Robbie Ko, linux-mm
  Cc: LKML, linux-btrfs, Roman Gushchin, David Sterba, Kirill A. Shutemov

On 7/13/20 3:57 AM, Robbie Ko wrote:
> 
> Vlastimil Babka 於 2020/7/10 下午11:31 寫道:
>> On 7/9/20 4:48 AM, robbieko wrote:
>>> From: Robbie Ko <robbieko@synology.com>
>>>
>>> When a migrate page occurs, we first create a migration entry
>>> to replace the original pte, and then go to fallback_migrate_page
>>> to execute a writeout if the migratepage is not supported.
>>>
>>> In the writeout, we will clear the dirty bit of the page and use
>>> page_mkclean to clear the dirty bit along with the corresponding pte,
>>> but page_mkclean does not support migration entry.
>>>
>>> The page ditry bit is cleared, but the dirty bit of the pte still exists,
>>> so if mmap continues to write, it will result in data loss.
>> Curious, did you observe this data loss? What filesystem? If yes, it seems
>> serious enough to
>> CC stable and determine a Fixes: tag?
> 
> Yes, there is data loss.
> I'm using a btrfs environment, but not the following patch

And the kernel is otherwise upstream? Which version?
Anyway we better let btrfs guys know (+CC) even if the fix is in MM code.

> btrfs: implement migratepage callback for data pages
> https://git.kernel.org/pub/scm/linux/kernel 
> /git/torvalds/linux.git/commit/?h=v5.8-rc5& 
> id=f8e6608180a31cc72a23b74969da428da236dbd1

That's a new commit, so if this is really affecting upstream btrfs pre-5.8 we
should either backport that commit, or your fix (after review).

>>> We fix the by first remove the migration entry and then clearing
>>> the dirty bits of the page, which also clears the pte's dirty bits.
>>>
>>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>>> ---
>>>   mm/migrate.c | 8 ++++----
>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index f37729673558..5c407434b9ba 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -875,10 +875,6 @@ static int writeout(struct address_space *mapping, struct page *page)
>>>   		/* No write method for the address space */
>>>   		return -EINVAL;
>>>   
>>> -	if (!clear_page_dirty_for_io(page))
>>> -		/* Someone else already triggered a write */
>>> -		return -EAGAIN;
>>> -
>>>   	/*
>>>   	 * A dirty page may imply that the underlying filesystem has
>>>   	 * the page on some queue. So the page must be clean for
>>> @@ -889,6 +885,10 @@ static int writeout(struct address_space *mapping, struct page *page)
>>>   	 */
>>>   	remove_migration_ptes(page, page, false);
>>>   
>>> +	if (!clear_page_dirty_for_io(page))
>>> +		/* Someone else already triggered a write */
>>> +		return -EAGAIN;
>>> +
>>>   	rc = mapping->a_ops->writepage(page, &wbc);
>>>   
>>>   	if (rc != AOP_WRITEPAGE_ACTIVATE)
>>>
>>
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm : fix pte _PAGE_DIRTY bit when fallback migrate page
  2020-07-14  9:46     ` Vlastimil Babka
@ 2020-07-14 10:19       ` Kirill A. Shutemov
  2020-07-15  2:45         ` Robbie Ko
  2020-07-15  2:05       ` Robbie Ko
  1 sibling, 1 reply; 10+ messages in thread
From: Kirill A. Shutemov @ 2020-07-14 10:19 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Robbie Ko, linux-mm, LKML, linux-btrfs, Roman Gushchin,
	David Sterba, Kirill A. Shutemov

On Tue, Jul 14, 2020 at 11:46:12AM +0200, Vlastimil Babka wrote:
> On 7/13/20 3:57 AM, Robbie Ko wrote:
> > 
> > Vlastimil Babka 於 2020/7/10 下午11:31 寫道:
> >> On 7/9/20 4:48 AM, robbieko wrote:
> >>> From: Robbie Ko <robbieko@synology.com>
> >>>
> >>> When a migrate page occurs, we first create a migration entry
> >>> to replace the original pte, and then go to fallback_migrate_page
> >>> to execute a writeout if the migratepage is not supported.
> >>>
> >>> In the writeout, we will clear the dirty bit of the page and use
> >>> page_mkclean to clear the dirty bit along with the corresponding pte,
> >>> but page_mkclean does not support migration entry.

I don't follow the scenario.

When we establish migration entries with try_to_unmap(), it transfers
dirty bit from PTE to the page.

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm : fix pte _PAGE_DIRTY bit when fallback migrate page
  2020-07-14  9:46     ` Vlastimil Babka
  2020-07-14 10:19       ` Kirill A. Shutemov
@ 2020-07-15  2:05       ` Robbie Ko
  1 sibling, 0 replies; 10+ messages in thread
From: Robbie Ko @ 2020-07-15  2:05 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm
  Cc: LKML, linux-btrfs, Roman Gushchin, David Sterba, Kirill A. Shutemov


Vlastimil Babka 於 2020/7/14 下午5:46 寫道:
> On 7/13/20 3:57 AM, Robbie Ko wrote:
>> Vlastimil Babka 於 2020/7/10 下午11:31 寫道:
>>> On 7/9/20 4:48 AM, robbieko wrote:
>>>> From: Robbie Ko <robbieko@synology.com>
>>>>
>>>> When a migrate page occurs, we first create a migration entry
>>>> to replace the original pte, and then go to fallback_migrate_page
>>>> to execute a writeout if the migratepage is not supported.
>>>>
>>>> In the writeout, we will clear the dirty bit of the page and use
>>>> page_mkclean to clear the dirty bit along with the corresponding pte,
>>>> but page_mkclean does not support migration entry.
>>>>
>>>> The page ditry bit is cleared, but the dirty bit of the pte still exists,
>>>> so if mmap continues to write, it will result in data loss.
>>> Curious, did you observe this data loss? What filesystem? If yes, it seems
>>> serious enough to
>>> CC stable and determine a Fixes: tag?
>> Yes, there is data loss.
>> I'm using a btrfs environment, but not the following patch
> And the kernel is otherwise upstream? Which version?
> Anyway we better let btrfs guys know (+CC) even if the fix is in MM code.

Kernel verion is 4.4.
I think this is a bug that has been around for a long time.

I think the problem is not limited to btrfs, as long as other fs
have not implemented the migrationpage, they will encounter
the problem. (Eg ecryptfs, fat, nfs...)

>> btrfs: implement migratepage callback for data pages
>> https://git.kernel.org/pub/scm/linux/kernel
>> /git/torvalds/linux.git/commit/?h=v5.8-rc5&
>> id=f8e6608180a31cc72a23b74969da428da236dbd1
> That's a new commit, so if this is really affecting upstream btrfs pre-5.8 we
> should either backport that commit, or your fix (after review).
>
>>>> We fix the by first remove the migration entry and then clearing
>>>> the dirty bits of the page, which also clears the pte's dirty bits.
>>>>
>>>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>>>> ---
>>>>    mm/migrate.c | 8 ++++----
>>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>>> index f37729673558..5c407434b9ba 100644
>>>> --- a/mm/migrate.c
>>>> +++ b/mm/migrate.c
>>>> @@ -875,10 +875,6 @@ static int writeout(struct address_space *mapping, struct page *page)
>>>>    		/* No write method for the address space */
>>>>    		return -EINVAL;
>>>>    
>>>> -	if (!clear_page_dirty_for_io(page))
>>>> -		/* Someone else already triggered a write */
>>>> -		return -EAGAIN;
>>>> -
>>>>    	/*
>>>>    	 * A dirty page may imply that the underlying filesystem has
>>>>    	 * the page on some queue. So the page must be clean for
>>>> @@ -889,6 +885,10 @@ static int writeout(struct address_space *mapping, struct page *page)
>>>>    	 */
>>>>    	remove_migration_ptes(page, page, false);
>>>>    
>>>> +	if (!clear_page_dirty_for_io(page))
>>>> +		/* Someone else already triggered a write */
>>>> +		return -EAGAIN;
>>>> +
>>>>    	rc = mapping->a_ops->writepage(page, &wbc);
>>>>    
>>>>    	if (rc != AOP_WRITEPAGE_ACTIVATE)
>>>>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm : fix pte _PAGE_DIRTY bit when fallback migrate page
  2020-07-14 10:19       ` Kirill A. Shutemov
@ 2020-07-15  2:45         ` Robbie Ko
  2020-07-15  8:11           ` Kirill A. Shutemov
  0 siblings, 1 reply; 10+ messages in thread
From: Robbie Ko @ 2020-07-15  2:45 UTC (permalink / raw)
  To: Kirill A. Shutemov, Vlastimil Babka
  Cc: linux-mm, LKML, linux-btrfs, Roman Gushchin, David Sterba,
	Kirill A. Shutemov


Kirill A. Shutemov 於 2020/7/14 下午6:19 寫道:
> On Tue, Jul 14, 2020 at 11:46:12AM +0200, Vlastimil Babka wrote:
>> On 7/13/20 3:57 AM, Robbie Ko wrote:
>>> Vlastimil Babka 於 2020/7/10 下午11:31 寫道:
>>>> On 7/9/20 4:48 AM, robbieko wrote:
>>>>> From: Robbie Ko <robbieko@synology.com>
>>>>>
>>>>> When a migrate page occurs, we first create a migration entry
>>>>> to replace the original pte, and then go to fallback_migrate_page
>>>>> to execute a writeout if the migratepage is not supported.
>>>>>
>>>>> In the writeout, we will clear the dirty bit of the page and use
>>>>> page_mkclean to clear the dirty bit along with the corresponding pte,
>>>>> but page_mkclean does not support migration entry.
> I don't follow the scenario.
>
> When we establish migration entries with try_to_unmap(), it transfers
> dirty bit from PTE to the page.

Sorry, I mean is _PAGE_RW with pte_write

When we establish migration entries with try_to_unmap(),
we create a migration entry, and if pte_write we set it to SWP_MIGRATION_WRITE,
which will replace the migration entry with the original pte.

When migratepage,  we go to fallback_migrate_page to execute a writeout
if the migratepage is not supported.

In the writeout, we call clear_page_dirty_for_io to  clear the dirty bit of the page
and use page_mkclean to clear pte _PAGE_RW with pte_wrprotect in page_mkclean_one.

However, page_mkclean_one does not support migration entries, so the
migration entry is still SWP_MIGRATION_WRITE.

In writeout, then we call remove_migration_ptes to remove the migration entry,
because it is still SWP_MIGRATION_WRITE so set _PAGE_RW to pte via pte_mkwrite.

Therefore, subsequent mmap wirte will not trigger page_mkwrite to cause data loss.




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm : fix pte _PAGE_DIRTY bit when fallback migrate page
  2020-07-15  2:45         ` Robbie Ko
@ 2020-07-15  8:11           ` Kirill A. Shutemov
  2020-07-16 10:15             ` Robbie Ko
  0 siblings, 1 reply; 10+ messages in thread
From: Kirill A. Shutemov @ 2020-07-15  8:11 UTC (permalink / raw)
  To: Robbie Ko
  Cc: Vlastimil Babka, linux-mm, LKML, linux-btrfs, Roman Gushchin,
	David Sterba, Kirill A. Shutemov

On Wed, Jul 15, 2020 at 10:45:39AM +0800, Robbie Ko wrote:
> 
> Kirill A. Shutemov 於 2020/7/14 下午6:19 寫道:
> > On Tue, Jul 14, 2020 at 11:46:12AM +0200, Vlastimil Babka wrote:
> > > On 7/13/20 3:57 AM, Robbie Ko wrote:
> > > > Vlastimil Babka 於 2020/7/10 下午11:31 寫道:
> > > > > On 7/9/20 4:48 AM, robbieko wrote:
> > > > > > From: Robbie Ko <robbieko@synology.com>
> > > > > > 
> > > > > > When a migrate page occurs, we first create a migration entry
> > > > > > to replace the original pte, and then go to fallback_migrate_page
> > > > > > to execute a writeout if the migratepage is not supported.
> > > > > > 
> > > > > > In the writeout, we will clear the dirty bit of the page and use
> > > > > > page_mkclean to clear the dirty bit along with the corresponding pte,
> > > > > > but page_mkclean does not support migration entry.
> > I don't follow the scenario.
> > 
> > When we establish migration entries with try_to_unmap(), it transfers
> > dirty bit from PTE to the page.
> 
> Sorry, I mean is _PAGE_RW with pte_write
> 
> When we establish migration entries with try_to_unmap(),
> we create a migration entry, and if pte_write we set it to SWP_MIGRATION_WRITE,
> which will replace the migration entry with the original pte.
> 
> When migratepage,  we go to fallback_migrate_page to execute a writeout
> if the migratepage is not supported.
> 
> In the writeout, we call clear_page_dirty_for_io to  clear the dirty bit of the page
> and use page_mkclean to clear pte _PAGE_RW with pte_wrprotect in page_mkclean_one.
> 
> However, page_mkclean_one does not support migration entries, so the
> migration entry is still SWP_MIGRATION_WRITE.
> 
> In writeout, then we call remove_migration_ptes to remove the migration entry,
> because it is still SWP_MIGRATION_WRITE so set _PAGE_RW to pte via pte_mkwrite.
> 
> Therefore, subsequent mmap wirte will not trigger page_mkwrite to cause data loss.

Hm, okay.

Folks, is there any good reason why try_to_unmap(TTU_MIGRATION) should not
clear PTE (make the PTE none) for file page?

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm : fix pte _PAGE_DIRTY bit when fallback migrate page
  2020-07-15  8:11           ` Kirill A. Shutemov
@ 2020-07-16 10:15             ` Robbie Ko
  2020-07-17 17:41               ` Chris Mason
  0 siblings, 1 reply; 10+ messages in thread
From: Robbie Ko @ 2020-07-16 10:15 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Vlastimil Babka, linux-mm, LKML, linux-btrfs, Roman Gushchin,
	David Sterba, Kirill A. Shutemov


Kirill A. Shutemov 於 2020/7/15 下午4:11 寫道:
> On Wed, Jul 15, 2020 at 10:45:39AM +0800, Robbie Ko wrote:
>> Kirill A. Shutemov 於 2020/7/14 下午6:19 寫道:
>>> On Tue, Jul 14, 2020 at 11:46:12AM +0200, Vlastimil Babka wrote:
>>>> On 7/13/20 3:57 AM, Robbie Ko wrote:
>>>>> Vlastimil Babka 於 2020/7/10 下午11:31 寫道:
>>>>>> On 7/9/20 4:48 AM, robbieko wrote:
>>>>>>> From: Robbie Ko <robbieko@synology.com>
>>>>>>>
>>>>>>> When a migrate page occurs, we first create a migration entry
>>>>>>> to replace the original pte, and then go to fallback_migrate_page
>>>>>>> to execute a writeout if the migratepage is not supported.
>>>>>>>
>>>>>>> In the writeout, we will clear the dirty bit of the page and use
>>>>>>> page_mkclean to clear the dirty bit along with the corresponding pte,
>>>>>>> but page_mkclean does not support migration entry.
>>> I don't follow the scenario.
>>>
>>> When we establish migration entries with try_to_unmap(), it transfers
>>> dirty bit from PTE to the page.
>> Sorry, I mean is _PAGE_RW with pte_write
>>
>> When we establish migration entries with try_to_unmap(),
>> we create a migration entry, and if pte_write we set it to SWP_MIGRATION_WRITE,
>> which will replace the migration entry with the original pte.
>>
>> When migratepage,  we go to fallback_migrate_page to execute a writeout
>> if the migratepage is not supported.
>>
>> In the writeout, we call clear_page_dirty_for_io to  clear the dirty bit of the page
>> and use page_mkclean to clear pte _PAGE_RW with pte_wrprotect in page_mkclean_one.
>>
>> However, page_mkclean_one does not support migration entries, so the
>> migration entry is still SWP_MIGRATION_WRITE.
>>
>> In writeout, then we call remove_migration_ptes to remove the migration entry,
>> because it is still SWP_MIGRATION_WRITE so set _PAGE_RW to pte via pte_mkwrite.
>>
>> Therefore, subsequent mmap wirte will not trigger page_mkwrite to cause data loss.
> Hm, okay.
>
> Folks, is there any good reason why try_to_unmap(TTU_MIGRATION) should not
> clear PTE (make the PTE none) for file page?
>
This, I'm not sure.
But I think that for the fs that support migratepage, when migratepage 
is finished,
the page should still be dirty, and the pte should still have _PAGE_RW,
when the next mmap write occurs, we don't need to trigger the 
page_mkwrite again.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm : fix pte _PAGE_DIRTY bit when fallback migrate page
  2020-07-16 10:15             ` Robbie Ko
@ 2020-07-17 17:41               ` Chris Mason
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Mason @ 2020-07-17 17:41 UTC (permalink / raw)
  To: Robbie Ko
  Cc: Kirill A. Shutemov, Vlastimil Babka, linux-mm, LKML, linux-btrfs,
	Roman Gushchin, David Sterba, Kirill A. Shutemov

On 16 Jul 2020, at 6:15, Robbie Ko wrote:

> Kirill A. Shutemov 於 2020/7/15 下午4:11 寫道:
>> On Wed, Jul 15, 2020 at 10:45:39AM +0800, Robbie Ko wrote:
>>> Kirill A. Shutemov 於 2020/7/14 下午6:19 寫道:
>>>> On Tue, Jul 14, 2020 at 11:46:12AM +0200, Vlastimil Babka wrote:
>>>>> On 7/13/20 3:57 AM, Robbie Ko wrote:
>>>>>> Vlastimil Babka 於 2020/7/10 下午11:31 寫道:
>>>>>>> On 7/9/20 4:48 AM, robbieko wrote:
>>>>>>>> From: Robbie Ko <robbieko@synology.com>
>>>>>>>>
>>>>>>>> When a migrate page occurs, we first create a migration entry
>>>>>>>> to replace the original pte, and then go to 
>>>>>>>> fallback_migrate_page
>>>>>>>> to execute a writeout if the migratepage is not supported.
>>>>>>>>
>>>>>>>> In the writeout, we will clear the dirty bit of the page and 
>>>>>>>> use
>>>>>>>> page_mkclean to clear the dirty bit along with the 
>>>>>>>> corresponding pte,
>>>>>>>> but page_mkclean does not support migration entry.
>>>> I don't follow the scenario.
>>>>
>>>> When we establish migration entries with try_to_unmap(), it 
>>>> transfers
>>>> dirty bit from PTE to the page.
>>> Sorry, I mean is _PAGE_RW with pte_write
>>>
>>> When we establish migration entries with try_to_unmap(),
>>> we create a migration entry, and if pte_write we set it to 
>>> SWP_MIGRATION_WRITE,
>>> which will replace the migration entry with the original pte.
>>>
>>> When migratepage,  we go to fallback_migrate_page to execute a 
>>> writeout
>>> if the migratepage is not supported.
>>>
>>> In the writeout, we call clear_page_dirty_for_io to  clear the dirty 
>>> bit of the page
>>> and use page_mkclean to clear pte _PAGE_RW with pte_wrprotect in 
>>> page_mkclean_one.
>>>
>>> However, page_mkclean_one does not support migration entries, so the
>>> migration entry is still SWP_MIGRATION_WRITE.
>>>
>>> In writeout, then we call remove_migration_ptes to remove the 
>>> migration entry,
>>> because it is still SWP_MIGRATION_WRITE so set _PAGE_RW to pte via 
>>> pte_mkwrite.
>>>
>>> Therefore, subsequent mmap wirte will not trigger page_mkwrite to 
>>> cause data loss.
>> Hm, okay.
>>
>> Folks, is there any good reason why try_to_unmap(TTU_MIGRATION) 
>> should not
>> clear PTE (make the PTE none) for file page?
>>
> This, I'm not sure.
> But I think that for the fs that support migratepage, when migratepage 
> is finished,
> the page should still be dirty, and the pte should still have 
> _PAGE_RW,
> when the next mmap write occurs, we don't need to trigger the 
> page_mkwrite again.

I don’t know the page migration code well, but you’ll need this one 
as well on the 4.4 kernel you mentioned:

commit 25f3c5021985e885292980d04a1423fd83c967bb
Author: Chris Mason <clm@fb.com>
Date:   Tue Jan 21 11:51:42 2020 -0500

     Btrfs: keep pages dirty when using btrfs_writepage_fixup_worker

And this one as well:

commit 7703bdd8d23e6ef057af3253958a793ec6066b28
Author: Chris Mason <clm@fb.com>
Date:   Wed Jun 20 07:56:11 2018 -0700

     Btrfs: don't clean dirty pages during buffered writes

With those two in place, we haven’t found lost data from the migration 
code, but we did see the fallback migration helper dirtying pages 
without going through page_mkwrite, which triggers the suboptimal btrfs 
fixup worker code path.  This isn’t a yea or nay on the patch, just 
additional info.

-chris


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2020-07-17 17:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-09  2:48 [PATCH] mm : fix pte _PAGE_DIRTY bit when fallback migrate page robbieko
2020-07-10 15:31 ` Vlastimil Babka
2020-07-13  1:57   ` Robbie Ko
2020-07-14  9:46     ` Vlastimil Babka
2020-07-14 10:19       ` Kirill A. Shutemov
2020-07-15  2:45         ` Robbie Ko
2020-07-15  8:11           ` Kirill A. Shutemov
2020-07-16 10:15             ` Robbie Ko
2020-07-17 17:41               ` Chris Mason
2020-07-15  2:05       ` Robbie Ko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).