All of lore.kernel.org
 help / color / mirror / Atom feed
From: chengming.zhou@linux.dev
To: hannes@cmpxchg.org, yosryahmed@google.com, nphamcs@gmail.com,
	chengming.zhou@linux.dev, akpm@linux-foundation.org
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Zhongkun He <hezhongkun.hzk@bytedance.com>
Subject: [RFC PATCH] mm: add folio in swapcache if swapin from zswap
Date: Sat, 23 Mar 2024 00:39:39 +0800	[thread overview]
Message-ID: <20240322163939.17846-1-chengming.zhou@linux.dev> (raw)

From: Chengming Zhou <chengming.zhou@linux.dev>

There is a report of data corruption caused by double swapin, which is
only possible in the skip swapcache path on SWP_SYNCHRONOUS_IO backends.

The root cause is that zswap is not like other "normal" swap backends,
it won't keep the copy of data after the first time of swapin. So if
the folio in the first time of swapin can't be installed in the pagetable
successfully and we just free it directly. Then in the second time of
swapin, we can't find anything in zswap and read wrong data from swapfile,
so this data corruption problem happened.

We can fix it by always adding the folio into swapcache if we know the
pinned swap entry can be found in zswap, so it won't get freed even though
it can't be installed successfully in the first time of swapin.

And we have to check if the swap entry is in zswap after entry pinned,
only then we can make sure the check result is stable.

Reported-by: Zhongkun He <hezhongkun.hzk@bytedance.com>
Closes: https://lore.kernel.org/all/CACSyD1N+dUvsu8=zV9P691B9bVq33erwOXNTmEaUbi9DrDeJzw@mail.gmail.com
Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev>
---
 include/linux/zswap.h |  6 ++++++
 mm/memory.c           | 28 ++++++++++++++++++++++++----
 mm/zswap.c            | 10 ++++++++++
 3 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/include/linux/zswap.h b/include/linux/zswap.h
index 2a85b941db97..180d0b1f0886 100644
--- a/include/linux/zswap.h
+++ b/include/linux/zswap.h
@@ -36,6 +36,7 @@ void zswap_memcg_offline_cleanup(struct mem_cgroup *memcg);
 void zswap_lruvec_state_init(struct lruvec *lruvec);
 void zswap_folio_swapin(struct folio *folio);
 bool is_zswap_enabled(void);
+bool zswap_find(swp_entry_t swp);
 #else
 
 struct zswap_lruvec_state {};
@@ -65,6 +66,11 @@ static inline bool is_zswap_enabled(void)
 	return false;
 }
 
+static inline bool zswap_find(swp_entry_t swp)
+{
+	return false;
+}
+
 #endif
 
 #endif /* _LINUX_ZSWAP_H */
diff --git a/mm/memory.c b/mm/memory.c
index 4f2caf1c3c4d..a564b2b8faca 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4031,18 +4031,38 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 					ret = VM_FAULT_OOM;
 					goto out_page;
 				}
+
+				/*
+				 * We have to add the folio into swapcache if
+				 * this pinned swap entry is found in zswap,
+				 * which won't keep copy of data after swapin.
+				 * Or data will just get lost if later folio
+				 * can't be installed successfully in pagetable.
+				 */
+				if (zswap_find(entry)) {
+					if (add_to_swap_cache(folio, entry,
+							GFP_KERNEL, &shadow)) {
+						ret = VM_FAULT_OOM;
+						goto out_page;
+					}
+					swapcache = folio;
+					need_clear_cache = false;
+				} else {
+					shadow = get_shadow_from_swap_cache(entry);
+					/* To provide entry to swap_read_folio() */
+					folio->swap = entry;
+				}
+
 				mem_cgroup_swapin_uncharge_swap(entry);
 
-				shadow = get_shadow_from_swap_cache(entry);
 				if (shadow)
 					workingset_refault(folio, shadow);
 
 				folio_add_lru(folio);
 
-				/* To provide entry to swap_read_folio() */
-				folio->swap = entry;
 				swap_read_folio(folio, true, NULL);
-				folio->private = NULL;
+				if (need_clear_cache)
+					folio->private = NULL;
 			}
 		} else {
 			page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
diff --git a/mm/zswap.c b/mm/zswap.c
index c4979c76d58e..84a904a788a3 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1601,6 +1601,16 @@ void zswap_invalidate(swp_entry_t swp)
 		zswap_entry_free(entry);
 }
 
+bool zswap_find(swp_entry_t swp)
+{
+	pgoff_t offset = swp_offset(swp);
+	struct xarray *tree = swap_zswap_tree(swp);
+	struct zswap_entry *entry;
+
+	entry = xa_find(tree, &offset, offset, XA_PRESENT);
+	return entry != NULL;
+}
+
 int zswap_swapon(int type, unsigned long nr_pages)
 {
 	struct xarray *trees, *tree;
-- 
2.20.1


             reply	other threads:[~2024-03-22 16:40 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-22 16:39 chengming.zhou [this message]
2024-03-22 19:37 ` [RFC PATCH] mm: add folio in swapcache if swapin from zswap Yosry Ahmed
2024-03-22 21:41   ` Barry Song
2024-03-22 22:33     ` Yosry Ahmed
2024-03-22 23:48       ` Johannes Weiner
2024-03-23  0:12         ` Yosry Ahmed
2024-03-23  0:14           ` Yosry Ahmed
2024-03-23  1:55             ` Johannes Weiner
2024-03-23  2:02               ` Yosry Ahmed
2024-03-23  2:40                 ` [External] " Zhongkun He
2024-03-23  8:38         ` Zhongkun He
2024-03-23  2:49   ` Chengming Zhou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240322163939.17846-1-chengming.zhou@linux.dev \
    --to=chengming.zhou@linux.dev \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hezhongkun.hzk@bytedance.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nphamcs@gmail.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.