All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/5] A few fixup patches for mm
@ 2022-05-19 12:50 Miaohe Lin
  2022-05-19 12:50 ` [PATCH v4 1/5] mm/swapfile: unuse_pte can map random data if swap read fails Miaohe Lin
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Miaohe Lin @ 2022-05-19 12:50 UTC (permalink / raw)
  To: akpm, hughd
  Cc: willy, vbabka, dhowells, neilb, apopple, david, surenb, peterx,
	rcampbell, naoya.horiguchi, linux-mm, linux-kernel, linmiaohe

Hi everyone,
This series contains a few patches to avoid mapping random data if swap
read fails and fix lost swap bits in unuse_pte. Also we free hwpoison and
swapin error entry in madvise_free_pte_range and so on. More details can
be found in the respective changelogs. Thanks!

---
v4:
  collect Reviewed-by per Naoya and David. Thanks for review!
  fix infinite loop when swap in shmem error. Thanks Naoya for reporting
the issue!
v3:
  collect Acked-by tag per David
  remove unneeded pte wrprotect per David
v2:
  make the terminology consistent and collect Acked-by tag per David
  fix lost swap bits in unuse_pte per Peter
  free hwpoison and swapin error entry per Alistair
  Many thanks Alistair, David and Peter for review!
---
Miaohe Lin (5):
  mm/swapfile: unuse_pte can map random data if swap read fails
  mm/swapfile: Fix lost swap bits in unuse_pte()
  mm/madvise: free hwpoison and swapin error entry in
    madvise_free_pte_range
  mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  mm: filter out swapin error entry in shmem mapping

 include/linux/swap.h    |  7 ++++++-
 include/linux/swapops.h | 10 ++++++++++
 mm/madvise.c            | 18 ++++++++++++------
 mm/memory.c             |  5 ++++-
 mm/shmem.c              | 39 +++++++++++++++++++++++++++++++++++++++
 mm/swap_state.c         |  3 +++
 mm/swapfile.c           | 21 ++++++++++++++++++---
 7 files changed, 92 insertions(+), 11 deletions(-)

-- 
2.23.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v4 1/5] mm/swapfile: unuse_pte can map random data if swap read fails
  2022-05-19 12:50 [PATCH v4 0/5] A few fixup patches for mm Miaohe Lin
@ 2022-05-19 12:50 ` Miaohe Lin
  2022-05-19 12:50 ` [PATCH v4 2/5] mm/swapfile: Fix lost swap bits in unuse_pte() Miaohe Lin
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Miaohe Lin @ 2022-05-19 12:50 UTC (permalink / raw)
  To: akpm, hughd
  Cc: willy, vbabka, dhowells, neilb, apopple, david, surenb, peterx,
	rcampbell, naoya.horiguchi, linux-mm, linux-kernel, linmiaohe

There is a bug in unuse_pte(): when swap page happens to be unreadable,
page filled with random data is mapped into user address space.  In case
of error, a special swap entry indicating swap read fails is set to the
page table.  So the swapcache page can be freed and the user won't end up
with a permanently mounted swap because a sector is bad.  And if the page
is accessed later, the user process will be killed so that corrupted data
is never consumed.  On the other hand, if the page is never accessed, the
user won't even notice it.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
---
 include/linux/swap.h    |  7 ++++++-
 include/linux/swapops.h | 10 ++++++++++
 mm/memory.c             |  5 ++++-
 mm/swapfile.c           | 11 +++++++++++
 4 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index f3ae17b43f20..0c0fed1b348f 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -55,6 +55,10 @@ static inline int current_is_kswapd(void)
  * actions on faults.
  */
 
+#define SWP_SWAPIN_ERROR_NUM 1
+#define SWP_SWAPIN_ERROR     (MAX_SWAPFILES + SWP_HWPOISON_NUM + \
+			     SWP_MIGRATION_NUM + SWP_DEVICE_NUM + \
+			     SWP_PTE_MARKER_NUM)
 /*
  * PTE markers are used to persist information onto PTEs that are mapped with
  * file-backed memories.  As its name "PTE" hints, it should only be applied to
@@ -120,7 +124,8 @@ static inline int current_is_kswapd(void)
 
 #define MAX_SWAPFILES \
 	((1 << MAX_SWAPFILES_SHIFT) - SWP_DEVICE_NUM - \
-	SWP_MIGRATION_NUM - SWP_HWPOISON_NUM - SWP_PTE_MARKER_NUM)
+	SWP_MIGRATION_NUM - SWP_HWPOISON_NUM - \
+	SWP_PTE_MARKER_NUM - SWP_SWAPIN_ERROR_NUM)
 
 /*
  * Magic header for a swap area. The first part of the union is
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index fe220df499f1..f24775b41880 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -108,6 +108,16 @@ static inline void *swp_to_radix_entry(swp_entry_t entry)
 	return xa_mk_value(entry.val);
 }
 
+static inline swp_entry_t make_swapin_error_entry(struct page *page)
+{
+	return swp_entry(SWP_SWAPIN_ERROR, page_to_pfn(page));
+}
+
+static inline int is_swapin_error_entry(swp_entry_t entry)
+{
+	return swp_type(entry) == SWP_SWAPIN_ERROR;
+}
+
 #if IS_ENABLED(CONFIG_DEVICE_PRIVATE)
 static inline swp_entry_t make_readable_device_private_entry(pgoff_t offset)
 {
diff --git a/mm/memory.c b/mm/memory.c
index 8fc2bc4c9d21..76f371702e99 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1496,7 +1496,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			/* Only drop the uffd-wp marker if explicitly requested */
 			if (!zap_drop_file_uffd_wp(details))
 				continue;
-		} else if (is_hwpoison_entry(entry)) {
+		} else if (is_hwpoison_entry(entry) ||
+			   is_swapin_error_entry(entry)) {
 			if (!should_zap_cows(details))
 				continue;
 		} else {
@@ -3742,6 +3743,8 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 			ret = vmf->page->pgmap->ops->migrate_to_ram(vmf);
 		} else if (is_hwpoison_entry(entry)) {
 			ret = VM_FAULT_HWPOISON;
+		} else if (is_swapin_error_entry(entry)) {
+			ret = VM_FAULT_SIGBUS;
 		} else if (is_pte_marker_entry(entry)) {
 			ret = handle_pte_marker(vmf);
 		} else {
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1c3d5b970ddb..226c7d3ba896 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1788,6 +1788,17 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 		goto out;
 	}
 
+	if (unlikely(!PageUptodate(page))) {
+		pte_t pteval;
+
+		dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
+		pteval = swp_entry_to_pte(make_swapin_error_entry(page));
+		set_pte_at(vma->vm_mm, addr, pte, pteval);
+		swap_free(entry);
+		ret = 0;
+		goto out;
+	}
+
 	/* See do_swap_page() */
 	BUG_ON(!PageAnon(page) && PageMappedToDisk(page));
 	BUG_ON(PageAnon(page) && PageAnonExclusive(page));
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 2/5] mm/swapfile: Fix lost swap bits in unuse_pte()
  2022-05-19 12:50 [PATCH v4 0/5] A few fixup patches for mm Miaohe Lin
  2022-05-19 12:50 ` [PATCH v4 1/5] mm/swapfile: unuse_pte can map random data if swap read fails Miaohe Lin
@ 2022-05-19 12:50 ` Miaohe Lin
  2022-05-19 12:50 ` [PATCH v4 3/5] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range Miaohe Lin
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 20+ messages in thread
From: Miaohe Lin @ 2022-05-19 12:50 UTC (permalink / raw)
  To: akpm, hughd
  Cc: willy, vbabka, dhowells, neilb, apopple, david, surenb, peterx,
	rcampbell, naoya.horiguchi, linux-mm, linux-kernel, linmiaohe

This is observed by code review only but not any real report.

When we turn off swapping we could have lost the bits stored in the swap
ptes. The new rmap-exclusive bit is fine since that turned into a page
flag, but not for soft-dirty and uffd-wp. Add them.

Suggested-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
---
 mm/swapfile.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 226c7d3ba896..0beeaffc7dee 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1774,7 +1774,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 {
 	struct page *swapcache;
 	spinlock_t *ptl;
-	pte_t *pte;
+	pte_t *pte, new_pte;
 	int ret = 1;
 
 	swapcache = page;
@@ -1823,8 +1823,12 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd,
 		page_add_new_anon_rmap(page, vma, addr);
 		lru_cache_add_inactive_or_unevictable(page, vma);
 	}
-	set_pte_at(vma->vm_mm, addr, pte,
-		   pte_mkold(mk_pte(page, vma->vm_page_prot)));
+	new_pte = pte_mkold(mk_pte(page, vma->vm_page_prot));
+	if (pte_swp_soft_dirty(*pte))
+		new_pte = pte_mksoft_dirty(new_pte);
+	if (pte_swp_uffd_wp(*pte))
+		new_pte = pte_mkuffd_wp(new_pte);
+	set_pte_at(vma->vm_mm, addr, pte, new_pte);
 	swap_free(entry);
 out:
 	pte_unmap_unlock(pte, ptl);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 3/5] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range
  2022-05-19 12:50 [PATCH v4 0/5] A few fixup patches for mm Miaohe Lin
  2022-05-19 12:50 ` [PATCH v4 1/5] mm/swapfile: unuse_pte can map random data if swap read fails Miaohe Lin
  2022-05-19 12:50 ` [PATCH v4 2/5] mm/swapfile: Fix lost swap bits in unuse_pte() Miaohe Lin
@ 2022-05-19 12:50 ` Miaohe Lin
  2022-05-19 12:50 ` [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time Miaohe Lin
  2022-05-19 12:50 ` [PATCH v4 5/5] mm: filter out swapin error entry in shmem mapping Miaohe Lin
  4 siblings, 0 replies; 20+ messages in thread
From: Miaohe Lin @ 2022-05-19 12:50 UTC (permalink / raw)
  To: akpm, hughd
  Cc: willy, vbabka, dhowells, neilb, apopple, david, surenb, peterx,
	rcampbell, naoya.horiguchi, linux-mm, linux-kernel, linmiaohe

Once the MADV_FREE operation has succeeded, callers can expect they might
get zero-fill pages if accessing the memory again. Therefore it should be
safe to delete the hwpoison entry and swapin error entry. There is no
reason to kill the process if it has called MADV_FREE on the range.

Suggested-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Acked-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
---
 mm/madvise.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index caa53806addd..a42165bc4735 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -624,11 +624,14 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 			swp_entry_t entry;
 
 			entry = pte_to_swp_entry(ptent);
-			if (non_swap_entry(entry))
-				continue;
-			nr_swap--;
-			free_swap_and_cache(entry);
-			pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
+			if (!non_swap_entry(entry)) {
+				nr_swap--;
+				free_swap_and_cache(entry);
+				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
+			} else if (is_hwpoison_entry(entry) ||
+				   is_swapin_error_entry(entry)) {
+				pte_clear_not_present_full(mm, addr, pte, tlb->fullmm);
+			}
 			continue;
 		}
 
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  2022-05-19 12:50 [PATCH v4 0/5] A few fixup patches for mm Miaohe Lin
                   ` (2 preceding siblings ...)
  2022-05-19 12:50 ` [PATCH v4 3/5] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range Miaohe Lin
@ 2022-05-19 12:50 ` Miaohe Lin
  2022-05-20  6:34   ` HORIGUCHI NAOYA(堀口 直也)
  2022-05-25  4:32   ` HORIGUCHI NAOYA(堀口 直也)
  2022-05-19 12:50 ` [PATCH v4 5/5] mm: filter out swapin error entry in shmem mapping Miaohe Lin
  4 siblings, 2 replies; 20+ messages in thread
From: Miaohe Lin @ 2022-05-19 12:50 UTC (permalink / raw)
  To: akpm, hughd
  Cc: willy, vbabka, dhowells, neilb, apopple, david, surenb, peterx,
	rcampbell, naoya.horiguchi, linux-mm, linux-kernel, linmiaohe

When swap in shmem error at swapoff time, there would be a infinite loop
in the while loop in shmem_unuse_inode(). It's because swapin error is
deliberately ignored now and thus info->swapped will never reach 0. So
we can't escape the loop in shmem_unuse().

In order to fix the issue, swapin_error entry is stored in the mapping
when swapin error occurs. So the swapcache page can be freed and the
user won't end up with a permanently mounted swap because a sector is
bad. If the page is accessed later, the user process will be killed
so that corrupted data is never consumed. On the other hand, if the
page is never accessed, the user won't even notice it.

Reported-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/shmem.c | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index d3c7970e0179..d55dd972023a 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1175,6 +1175,10 @@ static int shmem_find_swap_entries(struct address_space *mapping,
 			continue;
 
 		entry = radix_to_swp_entry(folio);
+		/*
+		 * swapin error entries can be found in the mapping. But they're
+		 * deliberately ignored here as we've done everything we can do.
+		 */
 		if (swp_type(entry) != type)
 			continue;
 
@@ -1672,6 +1676,36 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
 	return error;
 }
 
+static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
+					 struct folio *folio, swp_entry_t swap)
+{
+	struct address_space *mapping = inode->i_mapping;
+	struct shmem_inode_info *info = SHMEM_I(inode);
+	swp_entry_t swapin_error;
+	void *old;
+
+	swapin_error = make_swapin_error_entry(&folio->page);
+	old = xa_cmpxchg_irq(&mapping->i_pages, index,
+			     swp_to_radix_entry(swap),
+			     swp_to_radix_entry(swapin_error), 0);
+	if (old != swp_to_radix_entry(swap))
+		return;
+
+	folio_wait_writeback(folio);
+	delete_from_swap_cache(&folio->page);
+	spin_lock_irq(&info->lock);
+	/*
+	 * Don't treat swapin error folio as alloced. Otherwise inode->i_blocks won't
+	 * be 0 when inode is released and thus trigger WARN_ON(inode->i_blocks) in
+	 * shmem_evict_inode.
+	 */
+	info->alloced--;
+	info->swapped--;
+	shmem_recalc_inode(inode);
+	spin_unlock_irq(&info->lock);
+	swap_free(swap);
+}
+
 /*
  * Swap in the page pointed to by *pagep.
  * Caller has to make sure that *pagep contains a valid swapped page.
@@ -1695,6 +1729,9 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 	swap = radix_to_swp_entry(*foliop);
 	*foliop = NULL;
 
+	if (is_swapin_error_entry(swap))
+		return -EIO;
+
 	/* Look it up and read it in.. */
 	page = lookup_swap_cache(swap, NULL, 0);
 	if (!page) {
@@ -1762,6 +1799,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 failed:
 	if (!shmem_confirm_swap(mapping, index, swap))
 		error = -EEXIST;
+	if (error == -EIO)
+		shmem_set_folio_swapin_error(inode, index, folio, swap);
 unlock:
 	if (folio) {
 		folio_unlock(folio);
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v4 5/5] mm: filter out swapin error entry in shmem mapping
  2022-05-19 12:50 [PATCH v4 0/5] A few fixup patches for mm Miaohe Lin
                   ` (3 preceding siblings ...)
  2022-05-19 12:50 ` [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time Miaohe Lin
@ 2022-05-19 12:50 ` Miaohe Lin
  2022-05-25  4:53   ` HORIGUCHI NAOYA(堀口 直也)
  4 siblings, 1 reply; 20+ messages in thread
From: Miaohe Lin @ 2022-05-19 12:50 UTC (permalink / raw)
  To: akpm, hughd
  Cc: willy, vbabka, dhowells, neilb, apopple, david, surenb, peterx,
	rcampbell, naoya.horiguchi, linux-mm, linux-kernel, linmiaohe

There might be swapin error entries in shmem mapping. Filter them out to
avoid "Bad swap file entry" complaint.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/madvise.c    | 5 ++++-
 mm/swap_state.c | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index a42165bc4735..31582b6ff551 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -248,10 +248,13 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma,
 
 		if (!xa_is_value(page))
 			continue;
+		swap = radix_to_swp_entry(page);
+		/* There might be swapin error entries in shmem mapping. */
+		if (non_swap_entry(swap))
+			continue;
 		xas_pause(&xas);
 		rcu_read_unlock();
 
-		swap = radix_to_swp_entry(page);
 		page = read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE,
 					     NULL, 0, false, &splug);
 		if (page)
diff --git a/mm/swap_state.c b/mm/swap_state.c
index b9e4ed2e90bf..778d57d2d92d 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -410,6 +410,9 @@ struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index)
 		return NULL;
 
 	swp = radix_to_swp_entry(page);
+	/* There might be swapin error entries in shmem mapping. */
+	if (non_swap_entry(swp))
+		return NULL;
 	/* Prevent swapoff from happening to us */
 	si = get_swap_device(swp);
 	if (!si)
-- 
2.23.0


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  2022-05-19 12:50 ` [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time Miaohe Lin
@ 2022-05-20  6:34   ` HORIGUCHI NAOYA(堀口 直也)
  2022-05-20  8:17     ` Miaohe Lin
  2022-05-21  9:34     ` Miaohe Lin
  2022-05-25  4:32   ` HORIGUCHI NAOYA(堀口 直也)
  1 sibling, 2 replies; 20+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-05-20  6:34 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, hughd, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, rcampbell, linux-mm, linux-kernel

On Thu, May 19, 2022 at 08:50:29PM +0800, Miaohe Lin wrote:
> When swap in shmem error at swapoff time, there would be a infinite loop
> in the while loop in shmem_unuse_inode(). It's because swapin error is
> deliberately ignored now and thus info->swapped will never reach 0. So
> we can't escape the loop in shmem_unuse().
> 
> In order to fix the issue, swapin_error entry is stored in the mapping
> when swapin error occurs. So the swapcache page can be freed and the
> user won't end up with a permanently mounted swap because a sector is
> bad. If the page is accessed later, the user process will be killed
> so that corrupted data is never consumed. On the other hand, if the
> page is never accessed, the user won't even notice it.
> 
> Reported-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>

Hi Miaohe,

Thank you for the update.  I might miss something, but I still see the same
problem (I checked it on mm-everything-2022-05-19-00-03 + this patchset).

This patch has the effect to change the return value of shmem_swapin_folio(),
-EIO (without this patch) to -EEXIST (with this patch).
But shmem_unuse_swap_entries() checks neither, so no change from caller's view point.
Maybe breaking in errors (rather than ENOMEM) in for loop in shmem_unuse_swap_entries()
solves the issue?  I briefly checked with the below change, then swapoff can return
with failure.

@@ -1222,7 +1222,7 @@ static int shmem_unuse_swap_entries(struct inode *inode,
                        folio_put(folio);
                        ret++;
                }
-               if (error == -ENOMEM)
+               if (error < 0)
                        break;
                error = 0;
        }

> ---
>  mm/shmem.c | 39 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 39 insertions(+)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d3c7970e0179..d55dd972023a 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1175,6 +1175,10 @@ static int shmem_find_swap_entries(struct address_space *mapping,
>  			continue;
>  
>  		entry = radix_to_swp_entry(folio);
> +		/*
> +		 * swapin error entries can be found in the mapping. But they're
> +		 * deliberately ignored here as we've done everything we can do.
> +		 */
>  		if (swp_type(entry) != type)
>  			continue;
>  
> @@ -1672,6 +1676,36 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
>  	return error;
>  }
>  
> +static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
> +					 struct folio *folio, swp_entry_t swap)
> +{
> +	struct address_space *mapping = inode->i_mapping;
> +	struct shmem_inode_info *info = SHMEM_I(inode);
> +	swp_entry_t swapin_error;
> +	void *old;
> +
> +	swapin_error = make_swapin_error_entry(&folio->page);
> +	old = xa_cmpxchg_irq(&mapping->i_pages, index,
> +			     swp_to_radix_entry(swap),
> +			     swp_to_radix_entry(swapin_error), 0);
> +	if (old != swp_to_radix_entry(swap))
> +		return;
> +
> +	folio_wait_writeback(folio);
> +	delete_from_swap_cache(&folio->page);
> +	spin_lock_irq(&info->lock);
> +	/*
> +	 * Don't treat swapin error folio as alloced. Otherwise inode->i_blocks won't
> +	 * be 0 when inode is released and thus trigger WARN_ON(inode->i_blocks) in
> +	 * shmem_evict_inode.
> +	 */
> +	info->alloced--;
> +	info->swapped--;
> +	shmem_recalc_inode(inode);
> +	spin_unlock_irq(&info->lock);
> +	swap_free(swap);
> +}
> +
>  /*
>   * Swap in the page pointed to by *pagep.
>   * Caller has to make sure that *pagep contains a valid swapped page.

(off-topic a little) BTW, the comment on shmem_swapin_folio() still mentions
*pagep, but maybe it can be updated to *foliop.

Thanks,
Naoya Horiguchi

> @@ -1695,6 +1729,9 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>  	swap = radix_to_swp_entry(*foliop);
>  	*foliop = NULL;
>  
> +	if (is_swapin_error_entry(swap))
> +		return -EIO;
> +
>  	/* Look it up and read it in.. */
>  	page = lookup_swap_cache(swap, NULL, 0);
>  	if (!page) {
> @@ -1762,6 +1799,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>  failed:
>  	if (!shmem_confirm_swap(mapping, index, swap))
>  		error = -EEXIST;
> +	if (error == -EIO)
> +		shmem_set_folio_swapin_error(inode, index, folio, swap);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  2022-05-20  6:34   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-05-20  8:17     ` Miaohe Lin
  2022-05-22 23:53       ` HORIGUCHI NAOYA(堀口 直也)
  2022-05-21  9:34     ` Miaohe Lin
  1 sibling, 1 reply; 20+ messages in thread
From: Miaohe Lin @ 2022-05-20  8:17 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, hughd, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, rcampbell, linux-mm, linux-kernel

On 2022/5/20 14:34, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Thu, May 19, 2022 at 08:50:29PM +0800, Miaohe Lin wrote:
>> When swap in shmem error at swapoff time, there would be a infinite loop
>> in the while loop in shmem_unuse_inode(). It's because swapin error is
>> deliberately ignored now and thus info->swapped will never reach 0. So
>> we can't escape the loop in shmem_unuse().
>>
>> In order to fix the issue, swapin_error entry is stored in the mapping
>> when swapin error occurs. So the swapcache page can be freed and the
>> user won't end up with a permanently mounted swap because a sector is
>> bad. If the page is accessed later, the user process will be killed
>> so that corrupted data is never consumed. On the other hand, if the
>> page is never accessed, the user won't even notice it.
>>
>> Reported-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> 
> Hi Miaohe,
> 
> Thank you for the update.  I might miss something, but I still see the same
> problem (I checked it on mm-everything-2022-05-19-00-03 + this patchset).

I was testing this patch on my 5.10 kernel. I reproduced the problem in my env and
fixed it. It seems there might be some critical difference though I checked that by
reviewing the code... Sorry. :(

> 
> This patch has the effect to change the return value of shmem_swapin_folio(),
> -EIO (without this patch) to -EEXIST (with this patch).

In fact, I didn't change the return value from -EIO to -EEXIST:

@@ -1762,6 +1799,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
 failed:
 	if (!shmem_confirm_swap(mapping, index, swap))
 		error = -EEXIST;
+	if (error == -EIO)
+		shmem_set_folio_swapin_error(inode, index, folio, swap)

> But shmem_unuse_swap_entries() checks neither, so no change from caller's view point.
> Maybe breaking in errors (rather than ENOMEM) in for loop in shmem_unuse_swap_entries()
> solves the issue?  I briefly checked with the below change, then swapoff can return
> with failure.
> 
> @@ -1222,7 +1222,7 @@ static int shmem_unuse_swap_entries(struct inode *inode,
>                         folio_put(folio);
>                         ret++;
>                 }
> -               if (error == -ENOMEM)
> +               if (error < 0)
>                         break;
>                 error = 0;
>         }

Yes, this is the simplest and straightforward way to fix the issue. But it has the side effect
that user will end up with a permanently mounted swap just because a sector is bad. That might
be somewhat unacceptable?

> 
>> ---
>>  mm/shmem.c | 39 +++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 39 insertions(+)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index d3c7970e0179..d55dd972023a 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -1175,6 +1175,10 @@ static int shmem_find_swap_entries(struct address_space *mapping,
>>  			continue;
>>  
>>  		entry = radix_to_swp_entry(folio);
>> +		/*
>> +		 * swapin error entries can be found in the mapping. But they're
>> +		 * deliberately ignored here as we've done everything we can do.
>> +		 */
>>  		if (swp_type(entry) != type)
>>  			continue;
>>  
>> @@ -1672,6 +1676,36 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
>>  	return error;
>>  }
>>  
>> +static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
>> +					 struct folio *folio, swp_entry_t swap)
>> +{
>> +	struct address_space *mapping = inode->i_mapping;
>> +	struct shmem_inode_info *info = SHMEM_I(inode);
>> +	swp_entry_t swapin_error;
>> +	void *old;
>> +
>> +	swapin_error = make_swapin_error_entry(&folio->page);
>> +	old = xa_cmpxchg_irq(&mapping->i_pages, index,
>> +			     swp_to_radix_entry(swap),
>> +			     swp_to_radix_entry(swapin_error), 0);
>> +	if (old != swp_to_radix_entry(swap))
>> +		return;
>> +
>> +	folio_wait_writeback(folio);
>> +	delete_from_swap_cache(&folio->page);
>> +	spin_lock_irq(&info->lock);
>> +	/*
>> +	 * Don't treat swapin error folio as alloced. Otherwise inode->i_blocks won't
>> +	 * be 0 when inode is released and thus trigger WARN_ON(inode->i_blocks) in
>> +	 * shmem_evict_inode.
>> +	 */
>> +	info->alloced--;
>> +	info->swapped--;
>> +	shmem_recalc_inode(inode);
>> +	spin_unlock_irq(&info->lock);
>> +	swap_free(swap);
>> +}
>> +
>>  /*
>>   * Swap in the page pointed to by *pagep.
>>   * Caller has to make sure that *pagep contains a valid swapped page.
> 
> (off-topic a little) BTW, the comment on shmem_swapin_folio() still mentions
> *pagep, but maybe it can be updated to *foliop.

Will do it.

> 
> Thanks,
> Naoya Horiguchi

Many thanks for comment and test ! :)

> 
>> @@ -1695,6 +1729,9 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>>  	swap = radix_to_swp_entry(*foliop);
>>  	*foliop = NULL;
>>  
>> +	if (is_swapin_error_entry(swap))
>> +		return -EIO;
>> +
>>  	/* Look it up and read it in.. */
>>  	page = lookup_swap_cache(swap, NULL, 0);
>>  	if (!page) {
>> @@ -1762,6 +1799,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>>  failed:
>>  	if (!shmem_confirm_swap(mapping, index, swap))
>>  		error = -EEXIST;
>> +	if (error == -EIO)
>> +		shmem_set_folio_swapin_error(inode, index, folio, swap);


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  2022-05-20  6:34   ` HORIGUCHI NAOYA(堀口 直也)
  2022-05-20  8:17     ` Miaohe Lin
@ 2022-05-21  9:34     ` Miaohe Lin
  2022-05-24 22:10       ` Andrew Morton
  1 sibling, 1 reply; 20+ messages in thread
From: Miaohe Lin @ 2022-05-21  9:34 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, hughd, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, rcampbell, linux-mm, linux-kernel

On 2022/5/20 14:34, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Thu, May 19, 2022 at 08:50:29PM +0800, Miaohe Lin wrote:
>> When swap in shmem error at swapoff time, there would be a infinite loop
>> in the while loop in shmem_unuse_inode(). It's because swapin error is
>> deliberately ignored now and thus info->swapped will never reach 0. So
>> we can't escape the loop in shmem_unuse().
>>
>> In order to fix the issue, swapin_error entry is stored in the mapping
>> when swapin error occurs. So the swapcache page can be freed and the
>> user won't end up with a permanently mounted swap because a sector is
>> bad. If the page is accessed later, the user process will be killed
>> so that corrupted data is never consumed. On the other hand, if the
>> page is never accessed, the user won't even notice it.
>>
>> Reported-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> 
> Hi Miaohe,
> 
> Thank you for the update.  I might miss something, but I still see the same
> problem (I checked it on mm-everything-2022-05-19-00-03 + this patchset).

Hi Naoya,
I reproduce the issue in the linux-next-next-20220520 version. And I found even if
I *do not inject the swapin error*, the deadloop still occurs. After investigating the
code for a long while, I found the root cause:

diff --git a/mm/shmem.c b/mm/shmem.c
index d55dd972023a..6d23ed4d23cb 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1182,7 +1182,7 @@ static int shmem_find_swap_entries(struct address_space *mapping,
                if (swp_type(entry) != type)
                        continue;

-               indices[ret] = xas.xa_index;
+               indices[ret++] = xas.xa_index;
                if (!folio_batch_add(fbatch, folio))
                        break;

The origin code does not increment the ret value when a folio is found. I will send a patch
to fix this next week. Thanks! :)

BTW: With the above change, deadloop doesn't occur when swapin error is injected. I will take
a more close look at next week.

Thanks!

> 
> This patch has the effect to change the return value of shmem_swapin_folio(),
> -EIO (without this patch) to -EEXIST (with this patch).
> But shmem_unuse_swap_entries() checks neither, so no change from caller's view point.
> Maybe breaking in errors (rather than ENOMEM) in for loop in shmem_unuse_swap_entries()
> solves the issue?  I briefly checked with the below change, then swapoff can return
> with failure.
> 
> @@ -1222,7 +1222,7 @@ static int shmem_unuse_swap_entries(struct inode *inode,
>                         folio_put(folio);
>                         ret++;
>                 }
> -               if (error == -ENOMEM)
> +               if (error < 0)
>                         break;
>                 error = 0;
>         }
> 
>> ---
>>  mm/shmem.c | 39 +++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 39 insertions(+)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index d3c7970e0179..d55dd972023a 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -1175,6 +1175,10 @@ static int shmem_find_swap_entries(struct address_space *mapping,
>>  			continue;
>>  
>>  		entry = radix_to_swp_entry(folio);
>> +		/*
>> +		 * swapin error entries can be found in the mapping. But they're
>> +		 * deliberately ignored here as we've done everything we can do.
>> +		 */
>>  		if (swp_type(entry) != type)
>>  			continue;
>>  
>> @@ -1672,6 +1676,36 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
>>  	return error;
>>  }
>>  
>> +static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
>> +					 struct folio *folio, swp_entry_t swap)
>> +{
>> +	struct address_space *mapping = inode->i_mapping;
>> +	struct shmem_inode_info *info = SHMEM_I(inode);
>> +	swp_entry_t swapin_error;
>> +	void *old;
>> +
>> +	swapin_error = make_swapin_error_entry(&folio->page);
>> +	old = xa_cmpxchg_irq(&mapping->i_pages, index,
>> +			     swp_to_radix_entry(swap),
>> +			     swp_to_radix_entry(swapin_error), 0);
>> +	if (old != swp_to_radix_entry(swap))
>> +		return;
>> +
>> +	folio_wait_writeback(folio);
>> +	delete_from_swap_cache(&folio->page);
>> +	spin_lock_irq(&info->lock);
>> +	/*
>> +	 * Don't treat swapin error folio as alloced. Otherwise inode->i_blocks won't
>> +	 * be 0 when inode is released and thus trigger WARN_ON(inode->i_blocks) in
>> +	 * shmem_evict_inode.
>> +	 */
>> +	info->alloced--;
>> +	info->swapped--;
>> +	shmem_recalc_inode(inode);
>> +	spin_unlock_irq(&info->lock);
>> +	swap_free(swap);
>> +}
>> +
>>  /*
>>   * Swap in the page pointed to by *pagep.
>>   * Caller has to make sure that *pagep contains a valid swapped page.
> 
> (off-topic a little) BTW, the comment on shmem_swapin_folio() still mentions
> *pagep, but maybe it can be updated to *foliop.
> 
> Thanks,
> Naoya Horiguchi
> 
>> @@ -1695,6 +1729,9 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>>  	swap = radix_to_swp_entry(*foliop);
>>  	*foliop = NULL;
>>  
>> +	if (is_swapin_error_entry(swap))
>> +		return -EIO;
>> +
>>  	/* Look it up and read it in.. */
>>  	page = lookup_swap_cache(swap, NULL, 0);
>>  	if (!page) {
>> @@ -1762,6 +1799,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>>  failed:
>>  	if (!shmem_confirm_swap(mapping, index, swap))
>>  		error = -EEXIST;
>> +	if (error == -EIO)
>> +		shmem_set_folio_swapin_error(inode, index, folio, swap);


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  2022-05-20  8:17     ` Miaohe Lin
@ 2022-05-22 23:53       ` HORIGUCHI NAOYA(堀口 直也)
  2022-05-23  3:01         ` Miaohe Lin
  0 siblings, 1 reply; 20+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-05-22 23:53 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, hughd, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, rcampbell, linux-mm, linux-kernel

On Fri, May 20, 2022 at 04:17:45PM +0800, Miaohe Lin wrote:
> On 2022/5/20 14:34, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Thu, May 19, 2022 at 08:50:29PM +0800, Miaohe Lin wrote:
> >> When swap in shmem error at swapoff time, there would be a infinite loop
> >> in the while loop in shmem_unuse_inode(). It's because swapin error is
> >> deliberately ignored now and thus info->swapped will never reach 0. So
> >> we can't escape the loop in shmem_unuse().
> >>
> >> In order to fix the issue, swapin_error entry is stored in the mapping
> >> when swapin error occurs. So the swapcache page can be freed and the
> >> user won't end up with a permanently mounted swap because a sector is
> >> bad. If the page is accessed later, the user process will be killed
> >> so that corrupted data is never consumed. On the other hand, if the
> >> page is never accessed, the user won't even notice it.
> >>
> >> Reported-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> > 
> > Hi Miaohe,
> > 
> > Thank you for the update.  I might miss something, but I still see the same
> > problem (I checked it on mm-everything-2022-05-19-00-03 + this patchset).
> 
> I was testing this patch on my 5.10 kernel. I reproduced the problem in my env and
> fixed it. It seems there might be some critical difference though I checked that by
> reviewing the code... Sorry. :(
> 
> > 
> > This patch has the effect to change the return value of shmem_swapin_folio(),
> > -EIO (without this patch) to -EEXIST (with this patch).
> 
> In fact, I didn't change the return value from -EIO to -EEXIST:
> 
> @@ -1762,6 +1799,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>  failed:
>  	if (!shmem_confirm_swap(mapping, index, swap))
>  		error = -EEXIST;
> +	if (error == -EIO)
> +		shmem_set_folio_swapin_error(inode, index, folio, swap)
> 
> > But shmem_unuse_swap_entries() checks neither, so no change from caller's view point.
> > Maybe breaking in errors (rather than ENOMEM) in for loop in shmem_unuse_swap_entries()
> > solves the issue?  I briefly checked with the below change, then swapoff can return
> > with failure.
> > 
> > @@ -1222,7 +1222,7 @@ static int shmem_unuse_swap_entries(struct inode *inode,
> >                         folio_put(folio);
> >                         ret++;
> >                 }
> > -               if (error == -ENOMEM)
> > +               if (error < 0)
> >                         break;
> >                 error = 0;
> >         }
> 
> Yes, this is the simplest and straightforward way to fix the issue. But it has the side effect
> that user will end up with a permanently mounted swap just because a sector is bad. That might
> be somewhat unacceptable?

Ah, you're right, swapoff should return with success instead of with
failure.  I tried the fix in your another email, and that makes swapoff
return with success, so your fix looks better than mine.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  2022-05-22 23:53       ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-05-23  3:01         ` Miaohe Lin
  2022-05-23 11:23           ` Miaohe Lin
  0 siblings, 1 reply; 20+ messages in thread
From: Miaohe Lin @ 2022-05-23  3:01 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, hughd, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, rcampbell, linux-mm, linux-kernel

On 2022/5/23 7:53, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Fri, May 20, 2022 at 04:17:45PM +0800, Miaohe Lin wrote:
>> On 2022/5/20 14:34, HORIGUCHI NAOYA(堀口 直也) wrote:
>>> On Thu, May 19, 2022 at 08:50:29PM +0800, Miaohe Lin wrote:
>>>> When swap in shmem error at swapoff time, there would be a infinite loop
>>>> in the while loop in shmem_unuse_inode(). It's because swapin error is
>>>> deliberately ignored now and thus info->swapped will never reach 0. So
>>>> we can't escape the loop in shmem_unuse().
>>>>
>>>> In order to fix the issue, swapin_error entry is stored in the mapping
>>>> when swapin error occurs. So the swapcache page can be freed and the
>>>> user won't end up with a permanently mounted swap because a sector is
>>>> bad. If the page is accessed later, the user process will be killed
>>>> so that corrupted data is never consumed. On the other hand, if the
>>>> page is never accessed, the user won't even notice it.
>>>>
>>>> Reported-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>
>>> Hi Miaohe,
>>>
>>> Thank you for the update.  I might miss something, but I still see the same
>>> problem (I checked it on mm-everything-2022-05-19-00-03 + this patchset).
>>
>> I was testing this patch on my 5.10 kernel. I reproduced the problem in my env and
>> fixed it. It seems there might be some critical difference though I checked that by
>> reviewing the code... Sorry. :(
>>
>>>
>>> This patch has the effect to change the return value of shmem_swapin_folio(),
>>> -EIO (without this patch) to -EEXIST (with this patch).
>>
>> In fact, I didn't change the return value from -EIO to -EEXIST:
>>
>> @@ -1762,6 +1799,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>>  failed:
>>  	if (!shmem_confirm_swap(mapping, index, swap))
>>  		error = -EEXIST;
>> +	if (error == -EIO)
>> +		shmem_set_folio_swapin_error(inode, index, folio, swap)
>>
>>> But shmem_unuse_swap_entries() checks neither, so no change from caller's view point.
>>> Maybe breaking in errors (rather than ENOMEM) in for loop in shmem_unuse_swap_entries()
>>> solves the issue?  I briefly checked with the below change, then swapoff can return
>>> with failure.
>>>
>>> @@ -1222,7 +1222,7 @@ static int shmem_unuse_swap_entries(struct inode *inode,
>>>                         folio_put(folio);
>>>                         ret++;
>>>                 }
>>> -               if (error == -ENOMEM)
>>> +               if (error < 0)
>>>                         break;
>>>                 error = 0;
>>>         }
>>
>> Yes, this is the simplest and straightforward way to fix the issue. But it has the side effect
>> that user will end up with a permanently mounted swap just because a sector is bad. That might
>> be somewhat unacceptable?
> 
> Ah, you're right, swapoff should return with success instead of with
> failure.  I tried the fix in your another email, and that makes swapoff
> return with success, so your fix looks better than mine.

I reproduced the deadloop issues when swapin error occurs at swapoff time in my linux-next-next-20220520 env,
and I found this patch could solve the issue now with the fix in my another email.

BTW: When I use dm-dust to inject the swapin IO error, I don't see non-uptodate folio when shmem_swapin_folio
and swapoff succeeds. There might be some issues around that module (so I resort to the another way to inject
the swapin error), but the patch itself works anyway. ;)

> 
> Thanks,

Thanks a lot!

> Naoya Horiguchi
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  2022-05-23  3:01         ` Miaohe Lin
@ 2022-05-23 11:23           ` Miaohe Lin
  2022-05-24  6:44             ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 20+ messages in thread
From: Miaohe Lin @ 2022-05-23 11:23 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也), akpm
  Cc: hughd, willy, vbabka, dhowells, neilb, apopple, david, surenb,
	peterx, rcampbell, linux-mm, linux-kernel

On 2022/5/23 11:01, Miaohe Lin wrote:
> On 2022/5/23 7:53, HORIGUCHI NAOYA(堀口 直也) wrote:
>> On Fri, May 20, 2022 at 04:17:45PM +0800, Miaohe Lin wrote:
>>> On 2022/5/20 14:34, HORIGUCHI NAOYA(堀口 直也) wrote:
>>>> On Thu, May 19, 2022 at 08:50:29PM +0800, Miaohe Lin wrote:
>>>>> When swap in shmem error at swapoff time, there would be a infinite loop
>>>>> in the while loop in shmem_unuse_inode(). It's because swapin error is
>>>>> deliberately ignored now and thus info->swapped will never reach 0. So
>>>>> we can't escape the loop in shmem_unuse().
>>>>>
>>>>> In order to fix the issue, swapin_error entry is stored in the mapping
>>>>> when swapin error occurs. So the swapcache page can be freed and the
>>>>> user won't end up with a permanently mounted swap because a sector is
>>>>> bad. If the page is accessed later, the user process will be killed
>>>>> so that corrupted data is never consumed. On the other hand, if the
>>>>> page is never accessed, the user won't even notice it.
>>>>>
>>>>> Reported-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>>
>>>> Hi Miaohe,
>>>>
>>>> Thank you for the update.  I might miss something, but I still see the same
>>>> problem (I checked it on mm-everything-2022-05-19-00-03 + this patchset).
>>>
>>> I was testing this patch on my 5.10 kernel. I reproduced the problem in my env and
>>> fixed it. It seems there might be some critical difference though I checked that by
>>> reviewing the code... Sorry. :(
>>>
>>>>
>>>> This patch has the effect to change the return value of shmem_swapin_folio(),
>>>> -EIO (without this patch) to -EEXIST (with this patch).
>>>
>>> In fact, I didn't change the return value from -EIO to -EEXIST:
>>>
>>> @@ -1762,6 +1799,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index,
>>>  failed:
>>>  	if (!shmem_confirm_swap(mapping, index, swap))
>>>  		error = -EEXIST;
>>> +	if (error == -EIO)
>>> +		shmem_set_folio_swapin_error(inode, index, folio, swap)
>>>
>>>> But shmem_unuse_swap_entries() checks neither, so no change from caller's view point.
>>>> Maybe breaking in errors (rather than ENOMEM) in for loop in shmem_unuse_swap_entries()
>>>> solves the issue?  I briefly checked with the below change, then swapoff can return
>>>> with failure.
>>>>
>>>> @@ -1222,7 +1222,7 @@ static int shmem_unuse_swap_entries(struct inode *inode,
>>>>                         folio_put(folio);
>>>>                         ret++;
>>>>                 }
>>>> -               if (error == -ENOMEM)
>>>> +               if (error < 0)
>>>>                         break;
>>>>                 error = 0;
>>>>         }
>>>
>>> Yes, this is the simplest and straightforward way to fix the issue. But it has the side effect
>>> that user will end up with a permanently mounted swap just because a sector is bad. That might
>>> be somewhat unacceptable?
>>
>> Ah, you're right, swapoff should return with success instead of with
>> failure.  I tried the fix in your another email, and that makes swapoff
>> return with success, so your fix looks better than mine.
> 
> I reproduced the deadloop issues when swapin error occurs at swapoff time in my linux-next-next-20220520 env,
> and I found this patch could solve the issue now with the fix in my another email.
> 
> BTW: When I use dm-dust to inject the swapin IO error, I don't see non-uptodate folio when shmem_swapin_folio
> and swapoff succeeds. There might be some issues around that module (so I resort to the another way to inject
> the swapin error), but the patch itself works anyway. ;)

Sorry, the reason I don't see non-uptodate folio when shmem_swapin_folio is that all the shmem pages are still
in the swapcache. They're not read from disk so there is no really IO error. :) When they're indeed freed, the
deadloop issue occurs.

I am thinking about extending the function of MADV_PAGEOUT to free the swapcache page. The page resides in the
swapcache does not save the system memory anyway. And this could help test the swapin behavior. But I'm not
sure whether it's needed.

Thanks! ;)

> 
>>
>> Thanks,
> 
> Thanks a lot!
> 
>> Naoya Horiguchi
>>
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  2022-05-23 11:23           ` Miaohe Lin
@ 2022-05-24  6:44             ` HORIGUCHI NAOYA(堀口 直也)
  2022-05-24 10:56               ` Miaohe Lin
  0 siblings, 1 reply; 20+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-05-24  6:44 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, hughd, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, rcampbell, linux-mm, linux-kernel

On Mon, May 23, 2022 at 07:23:53PM +0800, Miaohe Lin wrote:
...
> > 
> > I reproduced the deadloop issues when swapin error occurs at swapoff time in my linux-next-next-20220520 env,
> > and I found this patch could solve the issue now with the fix in my another email.
> > 
> > BTW: When I use dm-dust to inject the swapin IO error, I don't see non-uptodate folio when shmem_swapin_folio
> > and swapoff succeeds. There might be some issues around that module (so I resort to the another way to inject
> > the swapin error), but the patch itself works anyway. ;)
> 
> Sorry, the reason I don't see non-uptodate folio when shmem_swapin_folio is that all the shmem pages are still
> in the swapcache. They're not read from disk so there is no really IO error. :) When they're indeed freed, the
> deadloop issue occurs.
> 
> I am thinking about extending the function of MADV_PAGEOUT to free the swapcache page. The page resides in the
> swapcache does not save the system memory anyway. And this could help test the swapin behavior. But I'm not
> sure whether it's needed.

The extension make MADV_PAGEOUT free swapcaches makes sense to me,
so I'll support it if the original implementer agrees the change.

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  2022-05-24  6:44             ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-05-24 10:56               ` Miaohe Lin
  0 siblings, 0 replies; 20+ messages in thread
From: Miaohe Lin @ 2022-05-24 10:56 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, hughd, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, rcampbell, linux-mm, linux-kernel

On 2022/5/24 14:44, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Mon, May 23, 2022 at 07:23:53PM +0800, Miaohe Lin wrote:
> ...
>>>
>>> I reproduced the deadloop issues when swapin error occurs at swapoff time in my linux-next-next-20220520 env,
>>> and I found this patch could solve the issue now with the fix in my another email.
>>>
>>> BTW: When I use dm-dust to inject the swapin IO error, I don't see non-uptodate folio when shmem_swapin_folio
>>> and swapoff succeeds. There might be some issues around that module (so I resort to the another way to inject
>>> the swapin error), but the patch itself works anyway. ;)
>>
>> Sorry, the reason I don't see non-uptodate folio when shmem_swapin_folio is that all the shmem pages are still
>> in the swapcache. They're not read from disk so there is no really IO error. :) When they're indeed freed, the
>> deadloop issue occurs.
>>
>> I am thinking about extending the function of MADV_PAGEOUT to free the swapcache page. The page resides in the
>> swapcache does not save the system memory anyway. And this could help test the swapin behavior. But I'm not
>> sure whether it's needed.
> 
> The extension make MADV_PAGEOUT free swapcaches makes sense to me,
> so I'll support it if the original implementer agrees the change.

I'd like trying to do it when I have time. :) Thanks!

> 
> Thanks,
> Naoya Horiguchi
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  2022-05-21  9:34     ` Miaohe Lin
@ 2022-05-24 22:10       ` Andrew Morton
  2022-05-25  1:42         ` Miaohe Lin
  0 siblings, 1 reply; 20+ messages in thread
From: Andrew Morton @ 2022-05-24 22:10 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: HORIGUCHI NAOYA, hughd, willy, vbabka, dhowells, neilb, apopple,
	david, surenb, peterx, rcampbell, linux-mm, linux-kernel

On Sat, 21 May 2022 17:34:28 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:

> On 2022/5/20 14:34, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Thu, May 19, 2022 at 08:50:29PM +0800, Miaohe Lin wrote:
> >> When swap in shmem error at swapoff time, there would be a infinite loop
> >> in the while loop in shmem_unuse_inode(). It's because swapin error is
> >> deliberately ignored now and thus info->swapped will never reach 0. So
> >> we can't escape the loop in shmem_unuse().
> >>
> >> In order to fix the issue, swapin_error entry is stored in the mapping
> >> when swapin error occurs. So the swapcache page can be freed and the
> >> user won't end up with a permanently mounted swap because a sector is
> >> bad. If the page is accessed later, the user process will be killed
> >> so that corrupted data is never consumed. On the other hand, if the
> >> page is never accessed, the user won't even notice it.
> >>
> >> Reported-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> > 
> > Hi Miaohe,
> > 
> > Thank you for the update.  I might miss something, but I still see the same
> > problem (I checked it on mm-everything-2022-05-19-00-03 + this patchset).
> 
> Hi Naoya,
> I reproduce the issue in the linux-next-next-20220520 version. And I found even if
> I *do not inject the swapin error*, the deadloop still occurs. After investigating the
> code for a long while, I found the root cause:
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d55dd972023a..6d23ed4d23cb 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1182,7 +1182,7 @@ static int shmem_find_swap_entries(struct address_space *mapping,
>                 if (swp_type(entry) != type)
>                         continue;
> 
> -               indices[ret] = xas.xa_index;
> +               indices[ret++] = xas.xa_index;
>                 if (!folio_batch_add(fbatch, folio))
>                         break;
> 
> The origin code does not increment the ret value when a folio is found. I will send a patch
> to fix this next week. Thanks! :)

So I'm thinking that with Hugh's "mm/shmem: fix shmem folio swapoff
hang", this patchset is now looking OK, so

mm-swapfile-unuse_pte-can-map-random-data-if-swap-read-fails.patch
mm-swapfile-fix-lost-swap-bits-in-unuse_pte.patch
mm-madvise-free-hwpoison-and-swapin-error-entry-in-madvise_free_pte_range.patch
mm-shmem-fix-infinite-loop-when-swap-in-shmem-error-at-swapoff-time.patch
mm-filter-out-swapin-error-entry-in-shmem-mapping.patch
#

are OK for merging into mainline?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  2022-05-24 22:10       ` Andrew Morton
@ 2022-05-25  1:42         ` Miaohe Lin
  0 siblings, 0 replies; 20+ messages in thread
From: Miaohe Lin @ 2022-05-25  1:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: HORIGUCHI NAOYA (堀口 直也),
	hughd, willy, vbabka, dhowells, neilb, apopple, david, surenb,
	peterx, rcampbell, linux-mm, linux-kernel

On 2022/5/25 6:10, Andrew Morton wrote:
> On Sat, 21 May 2022 17:34:28 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:
> 
>> On 2022/5/20 14:34, HORIGUCHI NAOYA(堀口 直也) wrote:
>>> On Thu, May 19, 2022 at 08:50:29PM +0800, Miaohe Lin wrote:
>>>> When swap in shmem error at swapoff time, there would be a infinite loop
>>>> in the while loop in shmem_unuse_inode(). It's because swapin error is
>>>> deliberately ignored now and thus info->swapped will never reach 0. So
>>>> we can't escape the loop in shmem_unuse().
>>>>
>>>> In order to fix the issue, swapin_error entry is stored in the mapping
>>>> when swapin error occurs. So the swapcache page can be freed and the
>>>> user won't end up with a permanently mounted swap because a sector is
>>>> bad. If the page is accessed later, the user process will be killed
>>>> so that corrupted data is never consumed. On the other hand, if the
>>>> page is never accessed, the user won't even notice it.
>>>>
>>>> Reported-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>>
>>> Hi Miaohe,
>>>
>>> Thank you for the update.  I might miss something, but I still see the same
>>> problem (I checked it on mm-everything-2022-05-19-00-03 + this patchset).
>>
>> Hi Naoya,
>> I reproduce the issue in the linux-next-next-20220520 version. And I found even if
>> I *do not inject the swapin error*, the deadloop still occurs. After investigating the
>> code for a long while, I found the root cause:
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index d55dd972023a..6d23ed4d23cb 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -1182,7 +1182,7 @@ static int shmem_find_swap_entries(struct address_space *mapping,
>>                 if (swp_type(entry) != type)
>>                         continue;
>>
>> -               indices[ret] = xas.xa_index;
>> +               indices[ret++] = xas.xa_index;
>>                 if (!folio_batch_add(fbatch, folio))
>>                         break;
>>
>> The origin code does not increment the ret value when a folio is found. I will send a patch
>> to fix this next week. Thanks! :)
> 
> So I'm thinking that with Hugh's "mm/shmem: fix shmem folio swapoff
> hang", this patchset is now looking OK, so
> 
> mm-swapfile-unuse_pte-can-map-random-data-if-swap-read-fails.patch
> mm-swapfile-fix-lost-swap-bits-in-unuse_pte.patch
> mm-madvise-free-hwpoison-and-swapin-error-entry-in-madvise_free_pte_range.patch
> mm-shmem-fix-infinite-loop-when-swap-in-shmem-error-at-swapoff-time.patch
> mm-filter-out-swapin-error-entry-in-shmem-mapping.patch
> #
> 
> are OK for merging into mainline?

I think so. But it might be better to have Acked-by or Reviewed-by tag for [PATCH v4 4/5] first. :)

Thanks!

> .
> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  2022-05-19 12:50 ` [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time Miaohe Lin
  2022-05-20  6:34   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-05-25  4:32   ` HORIGUCHI NAOYA(堀口 直也)
  2022-05-25  6:40     ` Miaohe Lin
  1 sibling, 1 reply; 20+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-05-25  4:32 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, hughd, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, rcampbell, linux-mm, linux-kernel

On Thu, May 19, 2022 at 08:50:29PM +0800, Miaohe Lin wrote:
> When swap in shmem error at swapoff time, there would be a infinite loop
> in the while loop in shmem_unuse_inode(). It's because swapin error is
> deliberately ignored now and thus info->swapped will never reach 0. So
> we can't escape the loop in shmem_unuse().
> 
> In order to fix the issue, swapin_error entry is stored in the mapping
> when swapin error occurs. So the swapcache page can be freed and the
> user won't end up with a permanently mounted swap because a sector is
> bad. If the page is accessed later, the user process will be killed
> so that corrupted data is never consumed. On the other hand, if the
> page is never accessed, the user won't even notice it.
> 
> Reported-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---

...
> @@ -1672,6 +1676,36 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
>  	return error;
>  }
>  
> +static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
> +					 struct folio *folio, swp_entry_t swap)
> +{
> +	struct address_space *mapping = inode->i_mapping;
> +	struct shmem_inode_info *info = SHMEM_I(inode);
> +	swp_entry_t swapin_error;
> +	void *old;
> +
> +	swapin_error = make_swapin_error_entry(&folio->page);
> +	old = xa_cmpxchg_irq(&mapping->i_pages, index,
> +			     swp_to_radix_entry(swap),
> +			     swp_to_radix_entry(swapin_error), 0);
> +	if (old != swp_to_radix_entry(swap))
> +		return;
> +
> +	folio_wait_writeback(folio);
> +	delete_from_swap_cache(&folio->page);
> +	spin_lock_irq(&info->lock);
> +	/*
> +	 * Don't treat swapin error folio as alloced. Otherwise inode->i_blocks won't
> +	 * be 0 when inode is released and thus trigger WARN_ON(inode->i_blocks) in
> +	 * shmem_evict_inode.
> +	 */
> +	info->alloced--;
> +	info->swapped--;

I'm not familiar with folio yet and might miss some basic thing,
but is it OK to decrement by one instead of folio_nr_pages()?

Thanks,
Naoya Horiguchi

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 5/5] mm: filter out swapin error entry in shmem mapping
  2022-05-19 12:50 ` [PATCH v4 5/5] mm: filter out swapin error entry in shmem mapping Miaohe Lin
@ 2022-05-25  4:53   ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 20+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-05-25  4:53 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, hughd, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, rcampbell, linux-mm, linux-kernel

On Thu, May 19, 2022 at 08:50:30PM +0800, Miaohe Lin wrote:
> There might be swapin error entries in shmem mapping. Filter them out to
> avoid "Bad swap file entry" complaint.
> 
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/madvise.c    | 5 ++++-
>  mm/swap_state.c | 3 +++
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index a42165bc4735..31582b6ff551 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -248,10 +248,13 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma,
>  
>  		if (!xa_is_value(page))
>  			continue;
> +		swap = radix_to_swp_entry(page);
> +		/* There might be swapin error entries in shmem mapping. */
> +		if (non_swap_entry(swap))
> +			continue;

The inline comment mentions swapin error entries but other types of
non-swap entries should be skipped by this check, which is helpful too.
So I'm fine with the change, thank you.

Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

>  		xas_pause(&xas);
>  		rcu_read_unlock();
>  
> -		swap = radix_to_swp_entry(page);
>  		page = read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE,
>  					     NULL, 0, false, &splug);
>  		if (page)
> diff --git a/mm/swap_state.c b/mm/swap_state.c
> index b9e4ed2e90bf..778d57d2d92d 100644
> --- a/mm/swap_state.c
> +++ b/mm/swap_state.c
> @@ -410,6 +410,9 @@ struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index)
>  		return NULL;
>  
>  	swp = radix_to_swp_entry(page);
> +	/* There might be swapin error entries in shmem mapping. */
> +	if (non_swap_entry(swp))
> +		return NULL;
>  	/* Prevent swapoff from happening to us */
>  	si = get_swap_device(swp);
>  	if (!si)
> -- 
> 2.23.0

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  2022-05-25  4:32   ` HORIGUCHI NAOYA(堀口 直也)
@ 2022-05-25  6:40     ` Miaohe Lin
  2022-05-26  6:08       ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 1 reply; 20+ messages in thread
From: Miaohe Lin @ 2022-05-25  6:40 UTC (permalink / raw)
  To: HORIGUCHI NAOYA(堀口 直也)
  Cc: akpm, hughd, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, rcampbell, linux-mm, linux-kernel

On 2022/5/25 12:32, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Thu, May 19, 2022 at 08:50:29PM +0800, Miaohe Lin wrote:
>> When swap in shmem error at swapoff time, there would be a infinite loop
>> in the while loop in shmem_unuse_inode(). It's because swapin error is
>> deliberately ignored now and thus info->swapped will never reach 0. So
>> we can't escape the loop in shmem_unuse().
>>
>> In order to fix the issue, swapin_error entry is stored in the mapping
>> when swapin error occurs. So the swapcache page can be freed and the
>> user won't end up with a permanently mounted swap because a sector is
>> bad. If the page is accessed later, the user process will be killed
>> so that corrupted data is never consumed. On the other hand, if the
>> page is never accessed, the user won't even notice it.
>>
>> Reported-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
> 
> ...
>> @@ -1672,6 +1676,36 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
>>  	return error;
>>  }
>>  
>> +static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
>> +					 struct folio *folio, swp_entry_t swap)
>> +{
>> +	struct address_space *mapping = inode->i_mapping;
>> +	struct shmem_inode_info *info = SHMEM_I(inode);
>> +	swp_entry_t swapin_error;
>> +	void *old;
>> +
>> +	swapin_error = make_swapin_error_entry(&folio->page);
>> +	old = xa_cmpxchg_irq(&mapping->i_pages, index,
>> +			     swp_to_radix_entry(swap),
>> +			     swp_to_radix_entry(swapin_error), 0);
>> +	if (old != swp_to_radix_entry(swap))
>> +		return;
>> +
>> +	folio_wait_writeback(folio);
>> +	delete_from_swap_cache(&folio->page);
>> +	spin_lock_irq(&info->lock);
>> +	/*
>> +	 * Don't treat swapin error folio as alloced. Otherwise inode->i_blocks won't
>> +	 * be 0 when inode is released and thus trigger WARN_ON(inode->i_blocks) in
>> +	 * shmem_evict_inode.
>> +	 */
>> +	info->alloced--;
>> +	info->swapped--;
> 
> I'm not familiar with folio yet and might miss some basic thing,
> but is it OK to decrement by one instead of folio_nr_pages()?

info->swapped is also decremented by one in shmem_swapin_folio(). In fact, no huge page
swapin is supported yet (this is also true for non-shmem case). So I think info->swapped--
should be OK. Or am I miss something?

> 
> Thanks,
> Naoya Horiguchi

Many thanks for review and comment! It's really helpful! :)

> 


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time
  2022-05-25  6:40     ` Miaohe Lin
@ 2022-05-26  6:08       ` HORIGUCHI NAOYA(堀口 直也)
  0 siblings, 0 replies; 20+ messages in thread
From: HORIGUCHI NAOYA(堀口 直也) @ 2022-05-26  6:08 UTC (permalink / raw)
  To: Miaohe Lin
  Cc: akpm, hughd, willy, vbabka, dhowells, neilb, apopple, david,
	surenb, peterx, rcampbell, linux-mm, linux-kernel

On Wed, May 25, 2022 at 02:40:40PM +0800, Miaohe Lin wrote:
> On 2022/5/25 12:32, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Thu, May 19, 2022 at 08:50:29PM +0800, Miaohe Lin wrote:
> >> When swap in shmem error at swapoff time, there would be a infinite loop
> >> in the while loop in shmem_unuse_inode(). It's because swapin error is
> >> deliberately ignored now and thus info->swapped will never reach 0. So
> >> we can't escape the loop in shmem_unuse().
> >>
> >> In order to fix the issue, swapin_error entry is stored in the mapping
> >> when swapin error occurs. So the swapcache page can be freed and the
> >> user won't end up with a permanently mounted swap because a sector is
> >> bad. If the page is accessed later, the user process will be killed
> >> so that corrupted data is never consumed. On the other hand, if the
> >> page is never accessed, the user won't even notice it.
> >>
> >> Reported-by: Naoya Horiguchi <naoya.horiguchi@nec.com>
> >> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> >> ---
> > 
> > ...
> >> @@ -1672,6 +1676,36 @@ static int shmem_replace_page(struct page **pagep, gfp_t gfp,
> >>  	return error;
> >>  }
> >>  
> >> +static void shmem_set_folio_swapin_error(struct inode *inode, pgoff_t index,
> >> +					 struct folio *folio, swp_entry_t swap)
> >> +{
> >> +	struct address_space *mapping = inode->i_mapping;
> >> +	struct shmem_inode_info *info = SHMEM_I(inode);
> >> +	swp_entry_t swapin_error;
> >> +	void *old;
> >> +
> >> +	swapin_error = make_swapin_error_entry(&folio->page);
> >> +	old = xa_cmpxchg_irq(&mapping->i_pages, index,
> >> +			     swp_to_radix_entry(swap),
> >> +			     swp_to_radix_entry(swapin_error), 0);
> >> +	if (old != swp_to_radix_entry(swap))
> >> +		return;
> >> +
> >> +	folio_wait_writeback(folio);
> >> +	delete_from_swap_cache(&folio->page);
> >> +	spin_lock_irq(&info->lock);
> >> +	/*
> >> +	 * Don't treat swapin error folio as alloced. Otherwise inode->i_blocks won't
> >> +	 * be 0 when inode is released and thus trigger WARN_ON(inode->i_blocks) in
> >> +	 * shmem_evict_inode.
> >> +	 */
> >> +	info->alloced--;
> >> +	info->swapped--;
> > 
> > I'm not familiar with folio yet and might miss some basic thing,
> > but is it OK to decrement by one instead of folio_nr_pages()?
> 
> info->swapped is also decremented by one in shmem_swapin_folio(). In fact, no huge page
> swapin is supported yet (this is also true for non-shmem case). So I think info->swapped--
> should be OK. Or am I miss something?

OK, thanks for clarification.

Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-05-26  6:08 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-19 12:50 [PATCH v4 0/5] A few fixup patches for mm Miaohe Lin
2022-05-19 12:50 ` [PATCH v4 1/5] mm/swapfile: unuse_pte can map random data if swap read fails Miaohe Lin
2022-05-19 12:50 ` [PATCH v4 2/5] mm/swapfile: Fix lost swap bits in unuse_pte() Miaohe Lin
2022-05-19 12:50 ` [PATCH v4 3/5] mm/madvise: free hwpoison and swapin error entry in madvise_free_pte_range Miaohe Lin
2022-05-19 12:50 ` [PATCH v4 4/5] mm/shmem: fix infinite loop when swap in shmem error at swapoff time Miaohe Lin
2022-05-20  6:34   ` HORIGUCHI NAOYA(堀口 直也)
2022-05-20  8:17     ` Miaohe Lin
2022-05-22 23:53       ` HORIGUCHI NAOYA(堀口 直也)
2022-05-23  3:01         ` Miaohe Lin
2022-05-23 11:23           ` Miaohe Lin
2022-05-24  6:44             ` HORIGUCHI NAOYA(堀口 直也)
2022-05-24 10:56               ` Miaohe Lin
2022-05-21  9:34     ` Miaohe Lin
2022-05-24 22:10       ` Andrew Morton
2022-05-25  1:42         ` Miaohe Lin
2022-05-25  4:32   ` HORIGUCHI NAOYA(堀口 直也)
2022-05-25  6:40     ` Miaohe Lin
2022-05-26  6:08       ` HORIGUCHI NAOYA(堀口 直也)
2022-05-19 12:50 ` [PATCH v4 5/5] mm: filter out swapin error entry in shmem mapping Miaohe Lin
2022-05-25  4:53   ` HORIGUCHI NAOYA(堀口 直也)

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.