All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] A few fixup patches for mm
@ 2022-04-21 12:53 Miaohe Lin
  2022-04-21 12:53 ` [PATCH v2 1/3] mm/swapfile: unuse_pte can map random data if swap read fails Miaohe Lin
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Miaohe Lin @ 2022-04-21 12:53 UTC (permalink / raw)
  To: akpm
  Cc: willy, vbabka, dhowells, neilb, david, apopple, surenb, minchan,
	peterx, sfr, naoya.horiguchi, linux-mm, linux-kernel, linmiaohe

Hi everyone,
This series contains a few patches to avoid mapping random data if swap
read fails and fix lost swap bits in unuse_pte. Also we free hwpoison and
swapin error entry in madvise_free_pte_range. More details can be found
in the respective changelogs. Thanks!

---
v2:
  make the terminology consistent and collect Acked-by tag per David
  fix lost swap bits in unuse_pte per Peter
  free hwpoison and swapin error entry per Alistair
  Many thanks Alistair, David and Peter for review!
---
Miaohe Lin (3):
  mm/swapfile: unuse_pte can map random data if swap read fails
  mm/swapfile: Fix lost swap bits in unuse_pte()
  mm/madvise: free hwpoison and swapin error entry in
    madvise_free_pte_range

 include/linux/swap.h    |  7 ++++++-
 include/linux/swapops.h | 10 ++++++++++
 mm/madvise.c            | 13 ++++++++-----
 mm/memory.c             |  5 ++++-
 mm/swapfile.c           | 23 ++++++++++++++++++++---
 5 files changed, 48 insertions(+), 10 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/3] mm/swapfile: unuse_pte can map random data if swap read fails
  2022-04-21 12:53 [PATCH v2 0/3] A few fixup patches for mm Miaohe Lin
@ 2022-04-21 12:53 ` Miaohe Lin
  2022-04-21 12:53 ` [PATCH v2 2/3] mm/swapfile: Fix lost swap bits in unuse_pte() Miaohe Lin
  2022-04-21 12:53 ` [PATCH v2 3/3] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range Miaohe Lin
  2 siblings, 0 replies; 12+ messages in thread
From: Miaohe Lin @ 2022-04-21 12:53 UTC (permalink / raw)
  To: akpm
  Cc: willy, vbabka, dhowells, neilb, david, apopple, surenb, minchan,
	peterx, sfr, naoya.horiguchi, linux-mm, linux-kernel, linmiaohe

There is a bug in unuse_pte(): when swap page happens to be unreadable,
page filled with random data is mapped into user address space.  In case
of error, a special swap entry indicating swap read fails is set to the
page table.  So the swapcache page can be freed and the user won't end up
with a permanently mounted swap because a sector is bad.  And if the page
is accessed later, the user process will be killed so that corrupted data
is never consumed.  On the other hand, if the page is never accessed, the
user won't even notice it.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
 include/linux/swap.h    |  7 ++++++-
 include/linux/swapops.h | 10 ++++++++++
 mm/memory.c             |  5 ++++-
 mm/swapfile.c           | 11 +++++++++++
 4 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 5553189d0215..b82c196d8867 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -55,6 +55,10 @@ static inline int current_is_kswapd(void)
  * actions on faults.
  */
 
+#define SWP_SWAPIN_ERROR_NUM 1
+#define SWP_SWAPIN_ERROR     (MAX_SWAPFILES + SWP_HWPOISON_NUM + \
+			     SWP_MIGRATION_NUM + SWP_DEVICE_NUM + \
+			     SWP_PTE_MARKER_NUM)
 /*
  * PTE markers are used to persist information onto PTEs that are mapped with
  * file-backed memories.  As its name "PTE" hints, it should only be applied to
@@ -120,7 +124,8 @@ static inline int current_is_kswapd(void)
 
 #define MAX_SWAPFILES \
 	((1 << MAX_SWAPFILES_SHIFT) - SWP_DEVICE_NUM - \
-	SWP_MIGRATION_NUM - SWP_HWPOISON_NUM - SWP_PTE_MARKER_NUM)
+	SWP_MIGRATION_NUM - SWP_HWPOISON_NUM - \
+	SWP_PTE_MARKER_NUM - SWP_SWAPIN_ERROR_NUM)
 
 /*
  * Magic header for a swap area. The first part of the union is
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index a291f210e7f8..9d989ed049a6 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -108,6 +108,16 @@ static inline void *swp_to_radix_entry(swp_entry_t entry)
 	return xa_mk_value(entry.val);
 }
 
+static inline swp_entry_t make_swapin_error_entry(struct page *page)
+{
+	return swp_entry(SWP_SWAPIN_ERROR, page_to_pfn(page));
+}
+
+static inline int is_swapin_error_entry(swp_entry_t entry)
+{
+	return swp_type(entry) == SWP_SWAPIN_ERROR;
+}
+
 #if IS_ENABLED(CONFIG_DEVICE_PRIVATE)
 static inline swp_entry_t make_readable_device_private_entry(pgoff_t offset)
 {
diff --git a/mm/memory.c b/mm/memory.c
index f4161fb07ffa..626f63858e0c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1488,7 +1488,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			/* Only drop the uffd-wp marker if explicitly requested */
 			if (!zap_drop_file_uffd_wp(details))
 				continue;
-		} else if (is_hwpoison_entry(entry)) {
+		} else if (is_hwpoison_entry(entry) ||
+			   is_swapin_error_entry(entry)) {
 			if (!should_zap_cows(details))
 				continue;
 		} else {
@@ -3728,6 +3729,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 			ret = vmf->page->pgmap->ops->migrate_to_ram(vmf);
 		} else if (is_hwpoison_entry(entry)) {
 			ret = VM_FAULT_HWPOISON;
+		} else if (is_swapin_error_entry(entry)) {
+			ret = VM_FAULT_SIGBUS;
 		} else if (is_pte_marker_entry(entry)) {
 			ret = handle_pte_marker(vmf);
 		} else {
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 9398e915b36b..95b63f69f388 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1797,6 +1797,17 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 		goto out;
 	}
 
+	if (unlikely(!PageUptodate(page))) {
+		pte_t pteval;
+
+		dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
+		pteval = swp_entry_to_pte(make_swapin_error_entry(page));
+		set_pte_at(vma->vm_mm, addr, pte, pteval);
+		swap_free(entry);
+		ret = 0;
+		goto out;
+	}
+
 	/* See do_swap_page() */
 	BUG_ON(!PageAnon(page) && PageMappedToDisk(page));
 	BUG_ON(PageAnon(page) && PageAnonExclusive(page));
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] mm/swapfile: Fix lost swap bits in unuse_pte()
  2022-04-21 12:53 [PATCH v2 0/3] A few fixup patches for mm Miaohe Lin
  2022-04-21 12:53 ` [PATCH v2 1/3] mm/swapfile: unuse_pte can map random data if swap read fails Miaohe Lin
@ 2022-04-21 12:53 ` Miaohe Lin
  2022-04-21 13:13   ` David Hildenbrand
  2022-04-21 12:53 ` [PATCH v2 3/3] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range Miaohe Lin
  2 siblings, 1 reply; 12+ messages in thread
From: Miaohe Lin @ 2022-04-21 12:53 UTC (permalink / raw)
  To: akpm
  Cc: willy, vbabka, dhowells, neilb, david, apopple, surenb, minchan,
	peterx, sfr, naoya.horiguchi, linux-mm, linux-kernel, linmiaohe

This is observed by code review only but not any real report.

When we turn off swapping we could have lost the bits stored in the swap
ptes. The new rmap-exclusive bit is fine since that turned into a page
flag, but not for soft-dirty and uffd-wp. Add them.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/swapfile.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 95b63f69f388..332ccfc76142 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1783,7 +1783,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 {
 	struct page *swapcache;
 	spinlock_t *ptl;
-	pte_t *pte;
+	pte_t *pte, new_pte;
 	int ret = 1;
 
 	swapcache = page;
@@ -1832,8 +1832,14 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 		page_add_new_anon_rmap(page, vma, addr);
 		lru_cache_add_inactive_or_unevictable(page, vma);
 	}
-	set_pte_at(vma->vm_mm, addr, pte,
-		   pte_mkold(mk_pte(page, vma->vm_page_prot)));
+	new_pte = pte_mkold(mk_pte(page, vma->vm_page_prot));
+	if (pte_swp_soft_dirty(*pte))
+		new_pte = pte_mksoft_dirty(new_pte);
+	if (pte_swp_uffd_wp(*pte)) {
+		new_pte = pte_mkuffd_wp(new_pte);
+		new_pte = pte_wrprotect(new_pte);
+	}
+	set_pte_at(vma->vm_mm, addr, pte, new_pte);
 	swap_free(entry);
 out:
 	pte_unmap_unlock(pte, ptl);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range
  2022-04-21 12:53 [PATCH v2 0/3] A few fixup patches for mm Miaohe Lin
  2022-04-21 12:53 ` [PATCH v2 1/3] mm/swapfile: unuse_pte can map random data if swap read fails Miaohe Lin
  2022-04-21 12:53 ` [PATCH v2 2/3] mm/swapfile: Fix lost swap bits in unuse_pte() Miaohe Lin
@ 2022-04-21 12:53 ` Miaohe Lin
  2022-04-21 13:25   ` David Hildenbrand
  2022-04-21 14:28   ` Peter Xu
  2 siblings, 2 replies; 12+ messages in thread
From: Miaohe Lin @ 2022-04-21 12:53 UTC (permalink / raw)
  To: akpm
  Cc: willy, vbabka, dhowells, neilb, david, apopple, surenb, minchan,
	peterx, sfr, naoya.horiguchi, linux-mm, linux-kernel, linmiaohe

Once the MADV_FREE operation has succeeded, callers can expect they might
get zero-fill pages if accessing the memory again. Therefore it should be
safe to delete the hwpoison entry and swapin error entry. There is no
reason to kill the process if it has called MADV_FREE on the range.

Suggested-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/madvise.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 4d6592488b51..5f4537511532 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -624,11 +624,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 			swp_entry_t entry;
 
 			entry = pte_to_swp_entry(ptent);
-			if (non_swap_entry(entry))
-				continue;
-			nr_swap--;
-			free_swap_and_cache(entry);
-			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
+			if (!non_swap_entry(entry)) {
+				nr_swap--;
+				free_swap_and_cache(entry);
+				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
+			} else if (is_hwpoison_entry(entry) ||
+				   is_swapin_error_entry(entry)) {
+				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
+			}
 			continue;
 		}
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] mm/swapfile: Fix lost swap bits in unuse_pte()
  2022-04-21 12:53 ` [PATCH v2 2/3] mm/swapfile: Fix lost swap bits in unuse_pte() Miaohe Lin
@ 2022-04-21 13:13   ` David Hildenbrand
  2022-04-21 13:50     ` Miaohe Lin
  0 siblings, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2022-04-21 13:13 UTC (permalink / raw)
  To: Miaohe Lin, akpm
  Cc: willy, vbabka, dhowells, neilb, apopple, surenb, minchan, peterx,
	sfr, naoya.horiguchi, linux-mm, linux-kernel

On 21.04.22 14:53, Miaohe Lin wrote:
> This is observed by code review only but not any real report.
> 
> When we turn off swapping we could have lost the bits stored in the swap
> ptes. The new rmap-exclusive bit is fine since that turned into a page
> flag, but not for soft-dirty and uffd-wp. Add them.
> 
> Suggested-by: Peter Xu <peterx@redhat.com>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/swapfile.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 95b63f69f388..332ccfc76142 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1783,7 +1783,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
>  {
>  	struct page *swapcache;
>  	spinlock_t *ptl;
> -	pte_t *pte;
> +	pte_t *pte, new_pte;
>  	int ret = 1;
>  
>  	swapcache = page;
> @@ -1832,8 +1832,14 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
>  		page_add_new_anon_rmap(page, vma, addr);
>  		lru_cache_add_inactive_or_unevictable(page, vma);
>  	}
> -	set_pte_at(vma->vm_mm, addr, pte,
> -		   pte_mkold(mk_pte(page, vma->vm_page_prot)));
> +	new_pte = pte_mkold(mk_pte(page, vma->vm_page_prot));
> +	if (pte_swp_soft_dirty(*pte))
> +		new_pte = pte_mksoft_dirty(new_pte);
> +	if (pte_swp_uffd_wp(*pte)) {
> +		new_pte = pte_mkuffd_wp(new_pte);
> +		new_pte = pte_wrprotect(new_pte);

The wrprotect shouldn't be necessary, we don't do a pte_mkwrite(). Note
that in do_swap_page() we might have done a
maybe_mkwrite(pte_mkdirty(pte)), which is why the pte_wrprotect() is
required there.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range
  2022-04-21 12:53 ` [PATCH v2 3/3] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range Miaohe Lin
@ 2022-04-21 13:25   ` David Hildenbrand
  2022-04-21 13:44     ` Miaohe Lin
  2022-04-21 14:28   ` Peter Xu
  1 sibling, 1 reply; 12+ messages in thread
From: David Hildenbrand @ 2022-04-21 13:25 UTC (permalink / raw)
  To: Miaohe Lin, akpm
  Cc: willy, vbabka, dhowells, neilb, apopple, surenb, minchan, peterx,
	sfr, naoya.horiguchi, linux-mm, linux-kernel

On 21.04.22 14:53, Miaohe Lin wrote:
> Once the MADV_FREE operation has succeeded, callers can expect they might
> get zero-fill pages if accessing the memory again. Therefore it should be
> safe to delete the hwpoison entry and swapin error entry. There is no
> reason to kill the process if it has called MADV_FREE on the range.
> 
> Suggested-by: Alistair Popple <apopple@nvidia.com>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/madvise.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 4d6592488b51..5f4537511532 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -624,11 +624,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>  			swp_entry_t entry;
>  
>  			entry = pte_to_swp_entry(ptent);
> -			if (non_swap_entry(entry))
> -				continue;
> -			nr_swap--;
> -			free_swap_and_cache(entry);
> -			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> +			if (!non_swap_entry(entry)) {
> +				nr_swap--;
> +				free_swap_and_cache(entry);
> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> +			} else if (is_hwpoison_entry(entry) ||
> +				   is_swapin_error_entry(entry)) {
> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> +			}
>  			continue;
>  		}
>  

Reading the man page that should be fine, but might not be required.

"[...] the kernel can free the pages at any time. Once pages in the
range have been freed, the caller will see zero-fill-on-demand pages
upon subsequent page references."


LGTM

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range
  2022-04-21 13:25   ` David Hildenbrand
@ 2022-04-21 13:44     ` Miaohe Lin
  0 siblings, 0 replies; 12+ messages in thread
From: Miaohe Lin @ 2022-04-21 13:44 UTC (permalink / raw)
  To: David Hildenbrand, akpm
  Cc: willy, vbabka, dhowells, neilb, apopple, surenb, minchan, peterx,
	sfr, naoya.horiguchi, linux-mm, linux-kernel

On 2022/4/21 21:25, David Hildenbrand wrote:
> On 21.04.22 14:53, Miaohe Lin wrote:
>> Once the MADV_FREE operation has succeeded, callers can expect they might
>> get zero-fill pages if accessing the memory again. Therefore it should be
>> safe to delete the hwpoison entry and swapin error entry. There is no
>> reason to kill the process if it has called MADV_FREE on the range.
>>
>> Suggested-by: Alistair Popple <apopple@nvidia.com>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/madvise.c | 13 ++++++++-----
>>  1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 4d6592488b51..5f4537511532 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -624,11 +624,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>>  			swp_entry_t entry;
>>  
>>  			entry = pte_to_swp_entry(ptent);
>> -			if (non_swap_entry(entry))
>> -				continue;
>> -			nr_swap--;
>> -			free_swap_and_cache(entry);
>> -			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>> +			if (!non_swap_entry(entry)) {
>> +				nr_swap--;
>> +				free_swap_and_cache(entry);
>> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>> +			} else if (is_hwpoison_entry(entry) ||
>> +				   is_swapin_error_entry(entry)) {
>> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>> +			}
>>  			continue;
>>  		}
>>  
> 
> Reading the man page that should be fine, but might not be required.
> 
> "[...] the kernel can free the pages at any time. Once pages in the
> range have been freed, the caller will see zero-fill-on-demand pages
> upon subsequent page references."

Yes, this part is not mentioned in the man page.

> 
> 
> LGTM
> 
> Acked-by: David Hildenbrand <david@redhat.com>
> 

Many thanks for your quick respond and review!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/3] mm/swapfile: Fix lost swap bits in unuse_pte()
  2022-04-21 13:13   ` David Hildenbrand
@ 2022-04-21 13:50     ` Miaohe Lin
  0 siblings, 0 replies; 12+ messages in thread
From: Miaohe Lin @ 2022-04-21 13:50 UTC (permalink / raw)
  To: David Hildenbrand, akpm
  Cc: willy, vbabka, dhowells, neilb, apopple, surenb, minchan, peterx,
	sfr, naoya.horiguchi, linux-mm, linux-kernel

On 2022/4/21 21:13, David Hildenbrand wrote:
> On 21.04.22 14:53, Miaohe Lin wrote:
>> This is observed by code review only but not any real report.
>>
>> When we turn off swapping we could have lost the bits stored in the swap
>> ptes. The new rmap-exclusive bit is fine since that turned into a page
>> flag, but not for soft-dirty and uffd-wp. Add them.
>>
>> Suggested-by: Peter Xu <peterx@redhat.com>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/swapfile.c | 12 +++++++++---
>>  1 file changed, 9 insertions(+), 3 deletions(-)
>>
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 95b63f69f388..332ccfc76142 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -1783,7 +1783,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
>>  {
>>  	struct page *swapcache;
>>  	spinlock_t *ptl;
>> -	pte_t *pte;
>> +	pte_t *pte, new_pte;
>>  	int ret = 1;
>>  
>>  	swapcache = page;
>> @@ -1832,8 +1832,14 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
>>  		page_add_new_anon_rmap(page, vma, addr);
>>  		lru_cache_add_inactive_or_unevictable(page, vma);
>>  	}
>> -	set_pte_at(vma->vm_mm, addr, pte,
>> -		   pte_mkold(mk_pte(page, vma->vm_page_prot)));
>> +	new_pte = pte_mkold(mk_pte(page, vma->vm_page_prot));
>> +	if (pte_swp_soft_dirty(*pte))
>> +		new_pte = pte_mksoft_dirty(new_pte);
>> +	if (pte_swp_uffd_wp(*pte)) {
>> +		new_pte = pte_mkuffd_wp(new_pte);
>> +		new_pte = pte_wrprotect(new_pte);
> 
> The wrprotect shouldn't be necessary, we don't do a pte_mkwrite(). Note
> that in do_swap_page() we might have done a
> maybe_mkwrite(pte_mkdirty(pte)), which is why the pte_wrprotect() is
> required there.

You're so smart. I happened to be referring to the code in do_swap_page. ;)
Now I see why pte_wrprotect() is only required there. Will remove it in the
next verison when there is enough feedback. Many thanks!

> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range
  2022-04-21 12:53 ` [PATCH v2 3/3] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range Miaohe Lin
  2022-04-21 13:25   ` David Hildenbrand
@ 2022-04-21 14:28   ` Peter Xu
  2022-04-22  2:47     ` Miaohe Lin
  1 sibling, 1 reply; 12+ messages in thread
From: Peter Xu @ 2022-04-21 14:28 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, willy, vbabka, dhowells, neilb, david, apopple, surenb,
	minchan, sfr, naoya.horiguchi, linux-mm, linux-kernel

On Thu, Apr 21, 2022 at 08:53:48PM +0800, Miaohe Lin wrote:
> Once the MADV_FREE operation has succeeded, callers can expect they might
> get zero-fill pages if accessing the memory again. Therefore it should be
> safe to delete the hwpoison entry and swapin error entry. There is no
> reason to kill the process if it has called MADV_FREE on the range.
> 
> Suggested-by: Alistair Popple <apopple@nvidia.com>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/madvise.c | 13 ++++++++-----
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 4d6592488b51..5f4537511532 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -624,11 +624,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>  			swp_entry_t entry;
>  
>  			entry = pte_to_swp_entry(ptent);
> -			if (non_swap_entry(entry))
> -				continue;
> -			nr_swap--;
> -			free_swap_and_cache(entry);
> -			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);

Nitpick: IMHO you don't need to invert non_swap_entry() then it'll generate
a smaller diff, just add the new code above "continue".

> +			if (!non_swap_entry(entry)) {
> +				nr_swap--;
> +				free_swap_and_cache(entry);
> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> +			} else if (is_hwpoison_entry(entry) ||
> +				   is_swapin_error_entry(entry)) {
> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);

Since it's been discussed and you're reposting a new version anyway, why
not start with either reusing hwpoison or pte markers?  Or do you think it
should be for future to drop the new swap entry again?

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range
  2022-04-21 14:28   ` Peter Xu
@ 2022-04-22  2:47     ` Miaohe Lin
  2022-04-22  2:52       ` Peter Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Miaohe Lin @ 2022-04-22  2:47 UTC (permalink / raw)
  To: Peter Xu
  Cc: akpm, willy, vbabka, dhowells, neilb, david, apopple, surenb,
	minchan, sfr, naoya.horiguchi, linux-mm, linux-kernel

On 2022/4/21 22:28, Peter Xu wrote:
> On Thu, Apr 21, 2022 at 08:53:48PM +0800, Miaohe Lin wrote:
>> Once the MADV_FREE operation has succeeded, callers can expect they might
>> get zero-fill pages if accessing the memory again. Therefore it should be
>> safe to delete the hwpoison entry and swapin error entry. There is no
>> reason to kill the process if it has called MADV_FREE on the range.
>>
>> Suggested-by: Alistair Popple <apopple@nvidia.com>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/madvise.c | 13 ++++++++-----
>>  1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 4d6592488b51..5f4537511532 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -624,11 +624,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>>  			swp_entry_t entry;
>>  
>>  			entry = pte_to_swp_entry(ptent);
>> -			if (non_swap_entry(entry))
>> -				continue;
>> -			nr_swap--;
>> -			free_swap_and_cache(entry);
>> -			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> 
> Nitpick: IMHO you don't need to invert non_swap_entry() then it'll generate
> a smaller diff, just add the new code above "continue".

I tried this way, but that lead to long line splitting, so I rewrote the code like this.
If you prefer to just add the new code above "continue", I will do it in the next version.

> 
>> +			if (!non_swap_entry(entry)) {
>> +				nr_swap--;
>> +				free_swap_and_cache(entry);
>> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>> +			} else if (is_hwpoison_entry(entry) ||
>> +				   is_swapin_error_entry(entry)) {
>> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> 
> Since it's been discussed and you're reposting a new version anyway, why
> not start with either reusing hwpoison or pte markers?  Or do you think it
> should be for future to drop the new swap entry again?
> 

IMHO if reusing hwpoison markers, there are some places that we need to distinguish them and do
different processing (and maybe also well comment them) which will make code more complicated and
somewhat hard to follow. And the "swapin error marker" here is most straightforward. And If pte markers
will support the "swapin error case" in the future, I think it's fine to change to use it then.
Does this make sense for you?

Thanks a lot!

> Thanks,
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range
  2022-04-22  2:47     ` Miaohe Lin
@ 2022-04-22  2:52       ` Peter Xu
  2022-04-22  3:15         ` Miaohe Lin
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Xu @ 2022-04-22  2:52 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, willy, vbabka, dhowells, neilb, david, apopple, surenb,
	minchan, sfr, naoya.horiguchi, linux-mm, linux-kernel

On Fri, Apr 22, 2022 at 10:47:32AM +0800, Miaohe Lin wrote:
> On 2022/4/21 22:28, Peter Xu wrote:
> > On Thu, Apr 21, 2022 at 08:53:48PM +0800, Miaohe Lin wrote:
> >> Once the MADV_FREE operation has succeeded, callers can expect they might
> >> get zero-fill pages if accessing the memory again. Therefore it should be
> >> safe to delete the hwpoison entry and swapin error entry. There is no
> >> reason to kill the process if it has called MADV_FREE on the range.
> >>
> >> Suggested-by: Alistair Popple <apopple@nvidia.com>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >> ---
> >>  mm/madvise.c | 13 ++++++++-----
> >>  1 file changed, 8 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/mm/madvise.c b/mm/madvise.c
> >> index 4d6592488b51..5f4537511532 100644
> >> --- a/mm/madvise.c
> >> +++ b/mm/madvise.c
> >> @@ -624,11 +624,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
> >>  			swp_entry_t entry;
> >>  
> >>  			entry = pte_to_swp_entry(ptent);
> >> -			if (non_swap_entry(entry))
> >> -				continue;
> >> -			nr_swap--;
> >> -			free_swap_and_cache(entry);
> >> -			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> > 
> > Nitpick: IMHO you don't need to invert non_swap_entry() then it'll generate
> > a smaller diff, just add the new code above "continue".
> 
> I tried this way, but that lead to long line splitting, so I rewrote the code like this.
> If you prefer to just add the new code above "continue", I will do it in the next version.

No worry then, feel free to keep it as is.

> 
> > 
> >> +			if (!non_swap_entry(entry)) {
> >> +				nr_swap--;
> >> +				free_swap_and_cache(entry);
> >> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> >> +			} else if (is_hwpoison_entry(entry) ||
> >> +				   is_swapin_error_entry(entry)) {
> >> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
> > 
> > Since it's been discussed and you're reposting a new version anyway, why
> > not start with either reusing hwpoison or pte markers?  Or do you think it
> > should be for future to drop the new swap entry again?
> > 
> 
> IMHO if reusing hwpoison markers, there are some places that we need to distinguish them and do
> different processing (and maybe also well comment them) which will make code more complicated and
> somewhat hard to follow. And the "swapin error marker" here is most straightforward. And If pte markers
> will support the "swapin error case" in the future, I think it's fine to change to use it then.
> Does this make sense for you?

Yeah it's fine.  If the pte marker things can finally land as expected,
maybe I can try it out as the 2nd user of it. :)

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 3/3] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range
  2022-04-22  2:52       ` Peter Xu
@ 2022-04-22  3:15         ` Miaohe Lin
  0 siblings, 0 replies; 12+ messages in thread
From: Miaohe Lin @ 2022-04-22  3:15 UTC (permalink / raw)
  To: Peter Xu
  Cc: akpm, willy, vbabka, dhowells, neilb, david, apopple, surenb,
	minchan, sfr, naoya.horiguchi, linux-mm, linux-kernel

On 2022/4/22 10:52, Peter Xu wrote:
> On Fri, Apr 22, 2022 at 10:47:32AM +0800, Miaohe Lin wrote:
>> On 2022/4/21 22:28, Peter Xu wrote:
>>> On Thu, Apr 21, 2022 at 08:53:48PM +0800, Miaohe Lin wrote:
>>>> Once the MADV_FREE operation has succeeded, callers can expect they might
>>>> get zero-fill pages if accessing the memory again. Therefore it should be
>>>> safe to delete the hwpoison entry and swapin error entry. There is no
>>>> reason to kill the process if it has called MADV_FREE on the range.
>>>>
>>>> Suggested-by: Alistair Popple <apopple@nvidia.com>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>> ---
>>>>  mm/madvise.c | 13 ++++++++-----
>>>>  1 file changed, 8 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>>> index 4d6592488b51..5f4537511532 100644
>>>> --- a/mm/madvise.c
>>>> +++ b/mm/madvise.c
>>>> @@ -624,11 +624,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>>>>  			swp_entry_t entry;
>>>>  
>>>>  			entry = pte_to_swp_entry(ptent);
>>>> -			if (non_swap_entry(entry))
>>>> -				continue;
>>>> -			nr_swap--;
>>>> -			free_swap_and_cache(entry);
>>>> -			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>>>
>>> Nitpick: IMHO you don't need to invert non_swap_entry() then it'll generate
>>> a smaller diff, just add the new code above "continue".
>>
>> I tried this way, but that lead to long line splitting, so I rewrote the code like this.
>> If you prefer to just add the new code above "continue", I will do it in the next version.
> 
> No worry then, feel free to keep it as is

Will keep it. Thanks!

>>
>>>
>>>> +			if (!non_swap_entry(entry)) {
>>>> +				nr_swap--;
>>>> +				free_swap_and_cache(entry);
>>>> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>>>> +			} else if (is_hwpoison_entry(entry) ||
>>>> +				   is_swapin_error_entry(entry)) {
>>>> +				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
>>>
>>> Since it's been discussed and you're reposting a new version anyway, why
>>> not start with either reusing hwpoison or pte markers?  Or do you think it
>>> should be for future to drop the new swap entry again?
>>>
>>
>> IMHO if reusing hwpoison markers, there are some places that we need to distinguish them and do
>> different processing (and maybe also well comment them) which will make code more complicated and
>> somewhat hard to follow. And the "swapin error marker" here is most straightforward. And If pte markers
>> will support the "swapin error case" in the future, I think it's fine to change to use it then.
>> Does this make sense for you?
> 
> Yeah it's fine.  If the pte marker things can finally land as expected,
> maybe I can try it out as the 2nd user of it. :)

Sounds good to me. And if needed, I am glad to do it then. Thanks! ;)

> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-04-22  3:15 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-21 12:53 [PATCH v2 0/3] A few fixup patches for mm Miaohe Lin
2022-04-21 12:53 ` [PATCH v2 1/3] mm/swapfile: unuse_pte can map random data if swap read fails Miaohe Lin
2022-04-21 12:53 ` [PATCH v2 2/3] mm/swapfile: Fix lost swap bits in unuse_pte() Miaohe Lin
2022-04-21 13:13   ` David Hildenbrand
2022-04-21 13:50     ` Miaohe Lin
2022-04-21 12:53 ` [PATCH v2 3/3] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range Miaohe Lin
2022-04-21 13:25   ` David Hildenbrand
2022-04-21 13:44     ` Miaohe Lin
2022-04-21 14:28   ` Peter Xu
2022-04-22  2:47     ` Miaohe Lin
2022-04-22  2:52       ` Peter Xu
2022-04-22  3:15         ` Miaohe Lin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.