All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/21 V4] Repair SWAP-over_NFS
@ 2022-02-07  4:46 NeilBrown
  2022-02-07  4:46 ` [PATCH 03/21] MM: move responsibility for setting SWP_FS_OPS to ->swap_activate NeilBrown
                   ` (21 more replies)
  0 siblings, 22 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

This 4th version of the series address review comment, particularly
tidying up "NFS: swap IO handling is slightly different for O_DIRECT IO"
and collect reviewed-by etc.

I've also move 3 NFS patches which depend on the MM patches to the end
in case they helps maintainers land the patches in a consistent order.
Those three patches might go through the NFS free after the next merge
window.

Original intro follows.
Thanks,
NeilBrown

swap-over-NFS currently has a variety of problems.

swap writes call generic_write_checks(), which always fails on a swap
file, so it completely fails.
Even without this, various deadlocks are possible - largely due to
improvements in NFS memory allocation (using NOFS instead of ATOMIC)
which weren't tested against swap-out.

NFS is the only filesystem that has supported fs-based swap IO, and it
hasn't worked for several releases, so now is a convenient time to clean
up the swap-via-filesystem interfaces - we cannot break anything !

So the first few patches here clean up and improve various parts of the
swap-via-filesystem code.  ->activate_swap() is given a cleaner
interface, a new ->swap_rw is introduced instead of burdening
->direct_IO, etc.

Current swap-to-filesystem code only ever submits single-page reads and
writes.  These patches change that to allow multi-page IO when adjacent
requests are submitted.  Writes are also changed to be async rather than
sync.  This substantially speeds up write throughput for swap-over-NFS.

---

NeilBrown (21):
 MM - in merge winow
      MM: create new mm/swap.h header file.
      MM: drop swap_set_page_dirty
      MM: move responsibility for setting SWP_FS_OPS to ->swap_activate
      MM: reclaim mustn't enter FS for SWP_FS_OPS swap-space
      MM: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space
      MM: perform async writes to SWP_FS_OPS swap-space using ->swap_rw
      DOC: update documentation for swap_activate and swap_rw
      MM: submit multipage reads for SWP_FS_OPS swap-space
      MM: submit multipage write for SWP_FS_OPS swap-space
      VFS: Add FMODE_CAN_ODIRECT file flag

 NFS - in merge window
      NFS: remove IS_SWAPFILE hack
      SUNRPC/call_alloc: async tasks mustn't block waiting for memory
      SUNRPC/auth: async tasks mustn't block waiting for memory
      SUNRPC/xprt: async tasks mustn't block waiting for memory
      SUNRPC: remove scheduling boost for "SWAPPER" tasks.
      NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS
      SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC
      NFSv4: keep state manager thread active if swap is enabled

 NFS - after merge window
      NFS: rename nfs_direct_IO and use as ->swap_rw
      NFS: swap IO handling is slightly different for O_DIRECT IO
      NFS: swap-out must always use STABLE writes.


 Documentation/filesystems/locking.rst |  18 +-
 Documentation/filesystems/vfs.rst     |  17 +-
 drivers/block/loop.c                  |   4 +-
 fs/cifs/file.c                        |   7 +-
 fs/fcntl.c                            |   9 +-
 fs/nfs/direct.c                       |  67 +++++---
 fs/nfs/file.c                         |  39 +++--
 fs/nfs/nfs4_fs.h                      |   1 +
 fs/nfs/nfs4proc.c                     |  20 +++
 fs/nfs/nfs4state.c                    |  39 ++++-
 fs/nfs/read.c                         |   4 -
 fs/nfs/write.c                        |   2 +
 fs/open.c                             |   9 +-
 fs/overlayfs/file.c                   |  13 +-
 include/linux/fs.h                    |   4 +
 include/linux/nfs_fs.h                |  15 +-
 include/linux/nfs_xdr.h               |   2 +
 include/linux/sunrpc/auth.h           |   1 +
 include/linux/sunrpc/sched.h          |   1 -
 include/linux/swap.h                  |   7 +-
 include/linux/writeback.h             |   7 +
 include/trace/events/sunrpc.h         |   1 -
 mm/madvise.c                          |   8 +-
 mm/memory.c                           |   2 +-
 mm/page_io.c                          | 237 +++++++++++++++++++-------
 mm/swap.h                             |  30 +++-
 mm/swap_state.c                       |  22 ++-
 mm/swapfile.c                         |  13 +-
 mm/vmscan.c                           |  38 +++--
 net/sunrpc/auth.c                     |   8 +-
 net/sunrpc/auth_gss/auth_gss.c        |   6 +-
 net/sunrpc/auth_unix.c                |  10 +-
 net/sunrpc/clnt.c                     |   7 +-
 net/sunrpc/sched.c                    |  29 ++--
 net/sunrpc/xprt.c                     |  19 +--
 net/sunrpc/xprtrdma/transport.c       |  10 +-
 net/sunrpc/xprtsock.c                 |   8 +
 37 files changed, 519 insertions(+), 215 deletions(-)

--
Signature


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/21] MM: create new mm/swap.h header file.
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (5 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 06/21] MM: perform async writes to SWP_FS_OPS swap-space using ->swap_rw NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07 13:15   ` kernel test robot
                     ` (4 more replies)
  2022-02-07  4:46 ` [PATCH 09/21] MM: submit multipage write for SWP_FS_OPS swap-space NeilBrown
                   ` (14 subsequent siblings)
  21 siblings, 5 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

Many functions declared in include/linux/swap.h are only used within mm/

Create a new "mm/swap.h" and move some of these declarations there.
Remove the redundant 'extern' from the function declarations.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 include/linux/swap.h |  121 -----------------------------------------------
 mm/madvise.c         |    1 
 mm/memcontrol.c      |    1 
 mm/memory.c          |    1 
 mm/mincore.c         |    1 
 mm/page_alloc.c      |    1 
 mm/page_io.c         |    1 
 mm/shmem.c           |    1 
 mm/swap.h            |  129 ++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/swap_state.c      |    1 
 mm/swapfile.c        |    1 
 mm/util.c            |    1 
 mm/vmscan.c          |    1 
 13 files changed, 140 insertions(+), 121 deletions(-)
 create mode 100644 mm/swap.h

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1d38d9475c4d..3f54a8941c9d 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -419,62 +419,19 @@ extern void kswapd_stop(int nid);
 
 #ifdef CONFIG_SWAP
 
-#include <linux/blk_types.h> /* for bio_end_io_t */
-
-/* linux/mm/page_io.c */
-extern int swap_readpage(struct page *page, bool do_poll);
-extern int swap_writepage(struct page *page, struct writeback_control *wbc);
-extern void end_swap_bio_write(struct bio *bio);
-extern int __swap_writepage(struct page *page, struct writeback_control *wbc,
-	bio_end_io_t end_write_func);
 extern int swap_set_page_dirty(struct page *page);
-
 int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
 		unsigned long nr_pages, sector_t start_block);
 int generic_swapfile_activate(struct swap_info_struct *, struct file *,
 		sector_t *);
 
-/* linux/mm/swap_state.c */
-/* One swap address space for each 64M swap space */
-#define SWAP_ADDRESS_SPACE_SHIFT	14
-#define SWAP_ADDRESS_SPACE_PAGES	(1 << SWAP_ADDRESS_SPACE_SHIFT)
-extern struct address_space *swapper_spaces[];
-#define swap_address_space(entry)			    \
-	(&swapper_spaces[swp_type(entry)][swp_offset(entry) \
-		>> SWAP_ADDRESS_SPACE_SHIFT])
 static inline unsigned long total_swapcache_pages(void)
 {
 	return global_node_page_state(NR_SWAPCACHE);
 }
 
-extern void show_swap_cache_info(void);
-extern int add_to_swap(struct page *page);
-extern void *get_shadow_from_swap_cache(swp_entry_t entry);
-extern int add_to_swap_cache(struct page *page, swp_entry_t entry,
-			gfp_t gfp, void **shadowp);
-extern void __delete_from_swap_cache(struct page *page,
-			swp_entry_t entry, void *shadow);
-extern void delete_from_swap_cache(struct page *);
-extern void clear_shadow_from_swap_cache(int type, unsigned long begin,
-				unsigned long end);
-extern void free_swap_cache(struct page *);
 extern void free_page_and_swap_cache(struct page *);
 extern void free_pages_and_swap_cache(struct page **, int);
-extern struct page *lookup_swap_cache(swp_entry_t entry,
-				      struct vm_area_struct *vma,
-				      unsigned long addr);
-struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index);
-extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
-			struct vm_area_struct *vma, unsigned long addr,
-			bool do_poll);
-extern struct page *__read_swap_cache_async(swp_entry_t, gfp_t,
-			struct vm_area_struct *vma, unsigned long addr,
-			bool *new_page_allocated);
-extern struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag,
-				struct vm_fault *vmf);
-extern struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
-				struct vm_fault *vmf);
-
 /* linux/mm/swapfile.c */
 extern atomic_long_t nr_swap_pages;
 extern long total_swap_pages;
@@ -528,12 +485,6 @@ static inline void put_swap_device(struct swap_info_struct *si)
 }
 
 #else /* CONFIG_SWAP */
-
-static inline int swap_readpage(struct page *page, bool do_poll)
-{
-	return 0;
-}
-
 static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry)
 {
 	return NULL;
@@ -548,11 +499,6 @@ static inline void put_swap_device(struct swap_info_struct *si)
 {
 }
 
-static inline struct address_space *swap_address_space(swp_entry_t entry)
-{
-	return NULL;
-}
-
 #define get_nr_swap_pages()			0L
 #define total_swap_pages			0L
 #define total_swapcache_pages()			0UL
@@ -567,14 +513,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
 #define free_pages_and_swap_cache(pages, nr) \
 	release_pages((pages), (nr));
 
-static inline void free_swap_cache(struct page *page)
-{
-}
-
-static inline void show_swap_cache_info(void)
-{
-}
-
 /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
 #define free_swap_and_cache(e) is_pfn_swap_entry(e)
 
@@ -600,65 +538,6 @@ static inline void put_swap_page(struct page *page, swp_entry_t swp)
 {
 }
 
-static inline struct page *swap_cluster_readahead(swp_entry_t entry,
-				gfp_t gfp_mask, struct vm_fault *vmf)
-{
-	return NULL;
-}
-
-static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
-			struct vm_fault *vmf)
-{
-	return NULL;
-}
-
-static inline int swap_writepage(struct page *p, struct writeback_control *wbc)
-{
-	return 0;
-}
-
-static inline struct page *lookup_swap_cache(swp_entry_t swp,
-					     struct vm_area_struct *vma,
-					     unsigned long addr)
-{
-	return NULL;
-}
-
-static inline
-struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index)
-{
-	return find_get_page(mapping, index);
-}
-
-static inline int add_to_swap(struct page *page)
-{
-	return 0;
-}
-
-static inline void *get_shadow_from_swap_cache(swp_entry_t entry)
-{
-	return NULL;
-}
-
-static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
-					gfp_t gfp_mask, void **shadowp)
-{
-	return -1;
-}
-
-static inline void __delete_from_swap_cache(struct page *page,
-					swp_entry_t entry, void *shadow)
-{
-}
-
-static inline void delete_from_swap_cache(struct page *page)
-{
-}
-
-static inline void clear_shadow_from_swap_cache(int type, unsigned long begin,
-				unsigned long end)
-{
-}
 
 static inline int page_swapcount(struct page *page)
 {
diff --git a/mm/madvise.c b/mm/madvise.c
index 5604064df464..1ee4b7583379 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -35,6 +35,7 @@
 #include <asm/tlb.h>
 
 #include "internal.h"
+#include "swap.h"
 
 struct madvise_walk_private {
 	struct mmu_gather *tlb;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 09d342c7cbd0..9b7c8181a207 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -66,6 +66,7 @@
 #include <net/sock.h>
 #include <net/ip.h>
 #include "slab.h"
+#include "swap.h"
 
 #include <linux/uaccess.h>
 
diff --git a/mm/memory.c b/mm/memory.c
index c125c4969913..d25372340107 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -86,6 +86,7 @@
 
 #include "pgalloc-track.h"
 #include "internal.h"
+#include "swap.h"
 
 #if defined(LAST_CPUPID_NOT_IN_PAGE_FLAGS) && !defined(CONFIG_COMPILE_TEST)
 #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid.
diff --git a/mm/mincore.c b/mm/mincore.c
index 9122676b54d6..f4f627325e12 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -20,6 +20,7 @@
 #include <linux/pgtable.h>
 
 #include <linux/uaccess.h>
+#include "swap.h"
 
 static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr,
 			unsigned long end, struct mm_walk *walk)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3589febc6d31..221aa3c10b78 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -81,6 +81,7 @@
 #include "internal.h"
 #include "shuffle.h"
 #include "page_reporting.h"
+#include "swap.h"
 
 /* Free Page Internal flags: for internal, non-pcp variants of free_pages(). */
 typedef int __bitwise fpi_t;
diff --git a/mm/page_io.c b/mm/page_io.c
index 0bf8e40f4e57..f8c26092e869 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -26,6 +26,7 @@
 #include <linux/uio.h>
 #include <linux/sched/task.h>
 #include <linux/delayacct.h>
+#include "swap.h"
 
 void end_swap_bio_write(struct bio *bio)
 {
diff --git a/mm/shmem.c b/mm/shmem.c
index a09b29ec2b45..c8b8819fe2e6 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -38,6 +38,7 @@
 #include <linux/hugetlb.h>
 #include <linux/fs_parser.h>
 #include <linux/swapfile.h>
+#include "swap.h"
 
 static struct vfsmount *shm_mnt;
 
diff --git a/mm/swap.h b/mm/swap.h
new file mode 100644
index 000000000000..13e72a5023aa
--- /dev/null
+++ b/mm/swap.h
@@ -0,0 +1,129 @@
+
+#ifdef CONFIG_SWAP
+#include <linux/blk_types.h> /* for bio_end_io_t */
+
+/* linux/mm/page_io.c */
+int swap_readpage(struct page *page, bool do_poll);
+int swap_writepage(struct page *page, struct writeback_control *wbc);
+void end_swap_bio_write(struct bio *bio);
+int __swap_writepage(struct page *page, struct writeback_control *wbc,
+		     bio_end_io_t end_write_func);
+
+/* linux/mm/swap_state.c */
+/* One swap address space for each 64M swap space */
+#define SWAP_ADDRESS_SPACE_SHIFT	14
+#define SWAP_ADDRESS_SPACE_PAGES	(1 << SWAP_ADDRESS_SPACE_SHIFT)
+extern struct address_space *swapper_spaces[];
+#define swap_address_space(entry)			    \
+	(&swapper_spaces[swp_type(entry)][swp_offset(entry) \
+		>> SWAP_ADDRESS_SPACE_SHIFT])
+
+void show_swap_cache_info(void);
+int add_to_swap(struct page *page);
+void *get_shadow_from_swap_cache(swp_entry_t entry);
+int add_to_swap_cache(struct page *page, swp_entry_t entry,
+		      gfp_t gfp, void **shadowp);
+void __delete_from_swap_cache(struct page *page,
+			      swp_entry_t entry, void *shadow);
+void delete_from_swap_cache(struct page *);
+void clear_shadow_from_swap_cache(int type, unsigned long begin,
+				  unsigned long end);
+void free_swap_cache(struct page *);
+struct page *lookup_swap_cache(swp_entry_t entry,
+			       struct vm_area_struct *vma,
+			       unsigned long addr);
+struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index);
+
+struct page *read_swap_cache_async(swp_entry_t, gfp_t,
+				   struct vm_area_struct *vma,
+				   unsigned long addr,
+				   bool do_poll);
+struct page *__read_swap_cache_async(swp_entry_t, gfp_t,
+				     struct vm_area_struct *vma,
+				     unsigned long addr,
+				     bool *new_page_allocated);
+struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag,
+				    struct vm_fault *vmf);
+struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
+			      struct vm_fault *vmf);
+
+#else /* CONFIG_SWAP */
+static inline int swap_readpage(struct page *page, bool do_poll)
+{
+	return 0;
+}
+
+static inline struct address_space *swap_address_space(swp_entry_t entry)
+{
+	return NULL;
+}
+
+static inline void free_swap_cache(struct page *page)
+{
+}
+
+static inline void show_swap_cache_info(void)
+{
+}
+
+static inline struct page *swap_cluster_readahead(swp_entry_t entry,
+				gfp_t gfp_mask, struct vm_fault *vmf)
+{
+	return NULL;
+}
+
+static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
+			struct vm_fault *vmf)
+{
+	return NULL;
+}
+
+static inline int swap_writepage(struct page *p, struct writeback_control *wbc)
+{
+	return 0;
+}
+
+static inline struct page *lookup_swap_cache(swp_entry_t swp,
+					     struct vm_area_struct *vma,
+					     unsigned long addr)
+{
+	return NULL;
+}
+
+static inline
+struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index)
+{
+	return find_get_page(mapping, index);
+}
+
+static inline int add_to_swap(struct page *page)
+{
+	return 0;
+}
+
+static inline void *get_shadow_from_swap_cache(swp_entry_t entry)
+{
+	return NULL;
+}
+
+static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
+					gfp_t gfp_mask, void **shadowp)
+{
+	return -1;
+}
+
+static inline void __delete_from_swap_cache(struct page *page,
+					swp_entry_t entry, void *shadow)
+{
+}
+
+static inline void delete_from_swap_cache(struct page *page)
+{
+}
+
+static inline void clear_shadow_from_swap_cache(int type, unsigned long begin,
+				unsigned long end)
+{
+}
+
+#endif /* CONFIG_SWAP */
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 8d4104242100..bb38453425c7 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -23,6 +23,7 @@
 #include <linux/huge_mm.h>
 #include <linux/shmem_fs.h>
 #include "internal.h"
+#include "swap.h"
 
 /*
  * swapper_space is a fiction, retained to simplify the path through
diff --git a/mm/swapfile.c b/mm/swapfile.c
index bf0df7aa7158..71c7a31dd291 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -44,6 +44,7 @@
 #include <asm/tlbflush.h>
 #include <linux/swapops.h>
 #include <linux/swap_cgroup.h>
+#include "swap.h"
 
 static bool swap_count_continued(struct swap_info_struct *, pgoff_t,
 				 unsigned char);
diff --git a/mm/util.c b/mm/util.c
index 7e43369064c8..619697e3d935 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -27,6 +27,7 @@
 #include <linux/uaccess.h>
 
 #include "internal.h"
+#include "swap.h"
 
 /**
  * kfree_const - conditionally free memory
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 090bfb605ecf..5c734ffc6057 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -58,6 +58,7 @@
 #include <linux/balloon_compaction.h>
 
 #include "internal.h"
+#include "swap.h"
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/vmscan.h>



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/21] MM: drop swap_set_page_dirty
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (8 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 13/21] SUNRPC/auth: async tasks mustn't block waiting for memory NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 04/21] MM: reclaim mustn't enter FS for SWP_FS_OPS swap-space NeilBrown
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

Pages that are written to swap are owned by the MM subsystem - not any
filesystem.

When such a page is passed to a filesystem to be written out to a
swap-file, the filesystem handles the data, but the page itself does not
belong to the filesystem.  So calling the filesystem's set_page_dirty
address_space operation makes no sense.  This is for pages in the given
address space, and a page to be written to swap does not exist in the
given address space.

So drop swap_set_page_dirty() which calls the address-space's
set_page_dirty, and alway use __set_page_dirty_no_writeback, which is
appropriate for pages being swapped out.

Fixes-no-auto-backport: 62c230bc1790 ("mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 include/linux/swap.h |    1 -
 mm/page_io.c         |   14 --------------
 mm/swap_state.c      |    2 +-
 3 files changed, 1 insertion(+), 16 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 3f54a8941c9d..a43929f7033e 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -419,7 +419,6 @@ extern void kswapd_stop(int nid);
 
 #ifdef CONFIG_SWAP
 
-extern int swap_set_page_dirty(struct page *page);
 int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
 		unsigned long nr_pages, sector_t start_block);
 int generic_swapfile_activate(struct swap_info_struct *, struct file *,
diff --git a/mm/page_io.c b/mm/page_io.c
index f8c26092e869..34b12d6f94d7 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -438,17 +438,3 @@ int swap_readpage(struct page *page, bool synchronous)
 	delayacct_swapin_end();
 	return ret;
 }
-
-int swap_set_page_dirty(struct page *page)
-{
-	struct swap_info_struct *sis = page_swap_info(page);
-
-	if (data_race(sis->flags & SWP_FS_OPS)) {
-		struct address_space *mapping = sis->swap_file->f_mapping;
-
-		VM_BUG_ON_PAGE(!PageSwapCache(page), page);
-		return mapping->a_ops->set_page_dirty(page);
-	} else {
-		return __set_page_dirty_no_writeback(page);
-	}
-}
diff --git a/mm/swap_state.c b/mm/swap_state.c
index bb38453425c7..514b86b05488 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -31,7 +31,7 @@
  */
 static const struct address_space_operations swap_aops = {
 	.writepage	= swap_writepage,
-	.set_page_dirty	= swap_set_page_dirty,
+	.set_page_dirty	= __set_page_dirty_no_writeback,
 #ifdef CONFIG_MIGRATION
 	.migratepage	= migrate_page,
 #endif



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/21] MM: move responsibility for setting SWP_FS_OPS to ->swap_activate
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 07/21] DOC: update documentation for swap_activate and swap_rw NeilBrown
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

If a filesystem wishes to handle all swap IO itself (via ->direct_IO and
->readpage), rather than just providing devices addresses for
submit_bio(), SWP_FS_OPS must be set.
Currently the protocol for setting this it to have ->swap_activate
return zero.  In that case SWP_FS_OPS is set, and add_swap_extent()
is called for the entire file.

This is a little clumsy as different return values for ->swap_activate
have quite different meanings, and it makes it hard to search for which
filesystems require SWP_FS_OPS to be set.

So remove the special meaning of a zero return, and require the
filesystem to set SWP_FS_OPS if it so desires, and to always call
add_swap_extent() as required.

Currently only NFS and CIFS return zero for add_swap_extent().

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/cifs/file.c       |    3 ++-
 fs/nfs/file.c        |   13 +++++++++++--
 include/linux/swap.h |    6 ++++++
 mm/swapfile.c        |   10 +++-------
 4 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index e7af802dcfa6..fe49f1cab018 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -4917,7 +4917,8 @@ static int cifs_swap_activate(struct swap_info_struct *sis,
 	 * from reading or writing the file
 	 */
 
-	return 0;
+	sis->flags |= SWP_FS_OPS;
+	return add_swap_extent(sis, 0, sis->max, 0);
 }
 
 static void cifs_swap_deactivate(struct file *file)
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 76d76acbc594..d5aa55c7edb0 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -488,6 +488,7 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 {
 	unsigned long blocks;
 	long long isize;
+	int ret;
 	struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host);
 	struct inode *inode = file->f_mapping->host;
 
@@ -500,9 +501,17 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 		return -EINVAL;
 	}
 
+	ret = rpc_clnt_swap_activate(clnt);
+	if (ret)
+		return ret;
+	ret = add_swap_extent(sis, 0, sis->max, 0);
+	if (ret < 0) {
+		rpc_clnt_swap_deactivate(clnt);
+		return ret;
+	}
 	*span = sis->pages;
-
-	return rpc_clnt_swap_activate(clnt);
+	sis->flags |= SWP_FS_OPS;
+	return ret;
 }
 
 static void nfs_swap_deactivate(struct file *file)
diff --git a/include/linux/swap.h b/include/linux/swap.h
index a43929f7033e..b57cff3c5ac2 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -573,6 +573,12 @@ static inline swp_entry_t get_swap_page(struct page *page)
 	return entry;
 }
 
+static inline int add_swap_extent(struct swap_info_struct *sis,
+				  unsigned long start_page,
+				  unsigned long nr_pages, sector_t start_block)
+{
+	return -EINVAL;
+}
 #endif /* CONFIG_SWAP */
 
 #ifdef CONFIG_THP_SWAP
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 71c7a31dd291..ed6028aea8bf 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2347,13 +2347,9 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)
 
 	if (mapping->a_ops->swap_activate) {
 		ret = mapping->a_ops->swap_activate(sis, swap_file, span);
-		if (ret >= 0)
-			sis->flags |= SWP_ACTIVATED;
-		if (!ret) {
-			sis->flags |= SWP_FS_OPS;
-			ret = add_swap_extent(sis, 0, sis->max, 0);
-			*span = sis->pages;
-		}
+		if (ret < 0)
+			return ret;
+		sis->flags |= SWP_ACTIVATED;
 		return ret;
 	}
 



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/21] MM: reclaim mustn't enter FS for SWP_FS_OPS swap-space
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (9 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 02/21] MM: drop swap_set_page_dirty NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 14/21] SUNRPC/xprt: async tasks mustn't block waiting for memory NeilBrown
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

If swap-out is using filesystem operations (SWP_FS_OPS), then it is not
safe to enter the FS for reclaim.
So only down-grade the requirement for swap pages to __GFP_IO after
checking that SWP_FS_OPS are not being used.

This makes the calculation of "may_enter_fs" slightly more complex, so
move it into a separate function.  with that done, there is little value
in maintaining the bool variable any more.  So replace the
may_enter_fs variable with a may_enter_fs() function.  This removes any
risk for the variable becoming out-of-date.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 mm/swap.h   |    8 ++++++++
 mm/vmscan.c |   29 ++++++++++++++++++++---------
 2 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/mm/swap.h b/mm/swap.h
index 13e72a5023aa..5c676e55f288 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -47,6 +47,10 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag,
 struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
 			      struct vm_fault *vmf);
 
+static inline unsigned int page_swap_flags(struct page *page)
+{
+	return page_swap_info(page)->flags;
+}
 #else /* CONFIG_SWAP */
 static inline int swap_readpage(struct page *page, bool do_poll)
 {
@@ -126,4 +130,8 @@ static inline void clear_shadow_from_swap_cache(int type, unsigned long begin,
 {
 }
 
+static inline unsigned int page_swap_flags(struct page *page)
+{
+	return 0;
+}
 #endif /* CONFIG_SWAP */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5c734ffc6057..ad5026d06aa8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1506,6 +1506,22 @@ static unsigned int demote_page_list(struct list_head *demote_pages,
 	return nr_succeeded;
 }
 
+static bool may_enter_fs(struct page *page, gfp_t gfp_mask)
+{
+	if (gfp_mask & __GFP_FS)
+		return true;
+	if (!PageSwapCache(page) || !(gfp_mask & __GFP_IO))
+		return false;
+	/*
+	 * We can "enter_fs" for swap-cache with only __GFP_IO
+	 * providing this isn't SWP_FS_OPS.
+	 * ->flags can be updated non-atomicially (scan_swap_map_slots),
+	 * but that will never affect SWP_FS_OPS, so the data_race
+	 * is safe.
+	 */
+	return !data_race(page_swap_flags(page) & SWP_FS_OPS);
+}
+
 /*
  * shrink_page_list() returns the number of reclaimed pages
  */
@@ -1531,7 +1547,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
 		struct address_space *mapping;
 		struct page *page;
 		enum page_references references = PAGEREF_RECLAIM;
-		bool dirty, writeback, may_enter_fs;
+		bool dirty, writeback;
 		unsigned int nr_pages;
 
 		cond_resched();
@@ -1555,9 +1571,6 @@ static unsigned int shrink_page_list(struct list_head *page_list,
 		if (!sc->may_unmap && page_mapped(page))
 			goto keep_locked;
 
-		may_enter_fs = (sc->gfp_mask & __GFP_FS) ||
-			(PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));
-
 		/*
 		 * The number of dirty pages determines if a node is marked
 		 * reclaim_congested. kswapd will stall and start writing
@@ -1602,7 +1615,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
 		 *    not to fs). In this case mark the page for immediate
 		 *    reclaim and continue scanning.
 		 *
-		 *    Require may_enter_fs because we would wait on fs, which
+		 *    Require may_enter_fs() because we would wait on fs, which
 		 *    may not have submitted IO yet. And the loop driver might
 		 *    enter reclaim, and deadlock if it waits on a page for
 		 *    which it is needed to do the write (loop masks off
@@ -1634,7 +1647,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
 
 			/* Case 2 above */
 			} else if (writeback_throttling_sane(sc) ||
-			    !PageReclaim(page) || !may_enter_fs) {
+			    !PageReclaim(page) || !may_enter_fs(page, sc->gfp_mask)) {
 				/*
 				 * This is slightly racy - end_page_writeback()
 				 * might have just cleared PageReclaim, then
@@ -1724,8 +1737,6 @@ static unsigned int shrink_page_list(struct list_head *page_list,
 						goto activate_locked_split;
 				}
 
-				may_enter_fs = true;
-
 				/* Adding to swap updated mapping */
 				mapping = page_mapping(page);
 			}
@@ -1795,7 +1806,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
 
 			if (references == PAGEREF_RECLAIM_CLEAN)
 				goto keep_locked;
-			if (!may_enter_fs)
+			if (!may_enter_fs(page, sc->gfp_mask))
 				goto keep_locked;
 			if (!sc->may_writepage)
 				goto keep_locked;



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/21] MM: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (13 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 11/21] NFS: remove IS_SWAPFILE hack NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 08/21] MM: submit multipage reads for " NeilBrown
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

swap currently uses ->readpage to read swap pages.  This can only
request one page at a time from the filesystem, which is not most
efficient.

swap uses ->direct_IO for writes which while this is adequate is an
inappropriate over-loading.  ->direct_IO may need to had handle allocate
space for holes or other details that are not relevant for swap.

So this patch introduces a new address_space operation: ->swap_rw.
In this patch it is used for reads, and a subsequent patch will switch
writes to use it.

No filesystem yet supports ->swap_rw, but that is not a problem because
no filesystem actually works with filesystem-based swap.
Only two filesystems set SWP_FS_OPS:
- cifs sets the flag, but ->direct_IO always fails so swap cannot work.
- nfs sets the flag, but ->direct_IO calls generic_write_checks()
  which has failed on swap files for several releases.

To ensure that a NULL ->swap_rw isn't called, ->activate_swap() for both
NFS and cifs are changed to fail if ->swap_rw is not set.  This can be
removed if/when the function is added.

Future patches will restore swap-over-NFS functionality.

To submit an async read with ->swap_rw() we need to allocate a structure
to hold the kiocb and other details.  swap_readpage() cannot handle
transient failure, so we create a mempool to provide the structures.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/cifs/file.c     |    4 +++
 fs/nfs/file.c      |    4 +++
 include/linux/fs.h |    1 +
 mm/page_io.c       |   68 +++++++++++++++++++++++++++++++++++++++++++++++-----
 mm/swap.h          |    1 +
 mm/swapfile.c      |    5 ++++
 6 files changed, 77 insertions(+), 6 deletions(-)

diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index fe49f1cab018..6ae5b404b04b 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -4889,6 +4889,10 @@ static int cifs_swap_activate(struct swap_info_struct *sis,
 
 	cifs_dbg(FYI, "swap activate\n");
 
+	if (!swap_file->f_mapping->a_ops->swap_rw)
+		/* Cannot support swap */
+		return -EINVAL;
+
 	spin_lock(&inode->i_lock);
 	blocks = inode->i_blocks;
 	isize = inode->i_size;
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index d5aa55c7edb0..3dbef2c31567 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -492,6 +492,10 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 	struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host);
 	struct inode *inode = file->f_mapping->host;
 
+	if (!file->f_mapping->a_ops->swap_rw)
+		/* Cannot support swap */
+		return -EINVAL;
+
 	spin_lock(&inode->i_lock);
 	blocks = inode->i_blocks;
 	isize = inode->i_size;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e2d892b201b0..57e3b387cb17 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -409,6 +409,7 @@ struct address_space_operations {
 	int (*swap_activate)(struct swap_info_struct *sis, struct file *file,
 				sector_t *span);
 	void (*swap_deactivate)(struct file *file);
+	int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
 };
 
 extern const struct address_space_operations empty_aops;
diff --git a/mm/page_io.c b/mm/page_io.c
index 34b12d6f94d7..e90a3231f225 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -284,6 +284,25 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct page *page)
 #define bio_associate_blkg_from_page(bio, page)		do { } while (0)
 #endif /* CONFIG_MEMCG && CONFIG_BLK_CGROUP */
 
+struct swap_iocb {
+	struct kiocb		iocb;
+	struct bio_vec		bvec;
+};
+static mempool_t *sio_pool;
+
+int sio_pool_init(void)
+{
+	if (!sio_pool) {
+		mempool_t *pool = mempool_create_kmalloc_pool(
+			SWAP_CLUSTER_MAX, sizeof(struct swap_iocb));
+		if (cmpxchg(&sio_pool, NULL, pool))
+			mempool_destroy(pool);
+	}
+	if (!sio_pool)
+		return -ENOMEM;
+	return 0;
+}
+
 int __swap_writepage(struct page *page, struct writeback_control *wbc,
 		bio_end_io_t end_write_func)
 {
@@ -355,6 +374,48 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 	return 0;
 }
 
+static void sio_read_complete(struct kiocb *iocb, long ret)
+{
+	struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb);
+	struct page *page = sio->bvec.bv_page;
+
+	if (ret != 0 && ret != PAGE_SIZE) {
+		SetPageError(page);
+		ClearPageUptodate(page);
+		pr_alert_ratelimited("Read-error on swap-device\n");
+	} else {
+		SetPageUptodate(page);
+		count_vm_event(PSWPIN);
+	}
+	unlock_page(page);
+	mempool_free(sio, sio_pool);
+}
+
+static int swap_readpage_fs(struct page *page)
+{
+	struct swap_info_struct *sis = page_swap_info(page);
+	struct file *swap_file = sis->swap_file;
+	struct address_space *mapping = swap_file->f_mapping;
+	struct iov_iter from;
+	struct swap_iocb *sio;
+	loff_t pos = page_file_offset(page);
+	int ret;
+
+	sio = mempool_alloc(sio_pool, GFP_KERNEL);
+	init_sync_kiocb(&sio->iocb, swap_file);
+	sio->iocb.ki_pos = pos;
+	sio->iocb.ki_complete = sio_read_complete;
+	sio->bvec.bv_page = page;
+	sio->bvec.bv_len = PAGE_SIZE;
+	sio->bvec.bv_offset = 0;
+
+	iov_iter_bvec(&from, READ, &sio->bvec, 1, PAGE_SIZE);
+	ret = mapping->a_ops->swap_rw(&sio->iocb, &from);
+	if (ret != -EIOCBQUEUED)
+		sio_read_complete(&sio->iocb, ret);
+	return ret;
+}
+
 int swap_readpage(struct page *page, bool synchronous)
 {
 	struct bio *bio;
@@ -381,12 +442,7 @@ int swap_readpage(struct page *page, bool synchronous)
 	}
 
 	if (data_race(sis->flags & SWP_FS_OPS)) {
-		struct file *swap_file = sis->swap_file;
-		struct address_space *mapping = swap_file->f_mapping;
-
-		ret = mapping->a_ops->readpage(swap_file, page);
-		if (!ret)
-			count_vm_event(PSWPIN);
+		ret = swap_readpage_fs(page);
 		goto out;
 	}
 
diff --git a/mm/swap.h b/mm/swap.h
index 5c676e55f288..e8ee995cf8d8 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -3,6 +3,7 @@
 #include <linux/blk_types.h> /* for bio_end_io_t */
 
 /* linux/mm/page_io.c */
+int sio_pool_init(void);
 int swap_readpage(struct page *page, bool do_poll);
 int swap_writepage(struct page *page, struct writeback_control *wbc);
 void end_swap_bio_write(struct bio *bio);
diff --git a/mm/swapfile.c b/mm/swapfile.c
index ed6028aea8bf..c800c17bf0c8 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2350,6 +2350,11 @@ static int setup_swap_extents(struct swap_info_struct *sis, sector_t *span)
 		if (ret < 0)
 			return ret;
 		sis->flags |= SWP_ACTIVATED;
+		if ((sis->flags & SWP_FS_OPS) &&
+		    sio_pool_init() != 0) {
+			destroy_swap_extents(sis);
+			return -ENOMEM;
+		}
 		return ret;
 	}
 



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/21] MM: perform async writes to SWP_FS_OPS swap-space using ->swap_rw
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (4 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 10/21] VFS: Add FMODE_CAN_ODIRECT file flag NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 01/21] MM: create new mm/swap.h header file NeilBrown
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

This patch switches swap-out to SWP_FS_OPS swap-spaces to use ->swap_rw
and makes the writes asynchronous, like they are for other swap spaces.

To make it async we need to allocate the kiocb struct from a mempool.
This may block, but won't block as long as waiting for the write to
complete.  At most it will wait for some previous swap IO to complete.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 mm/page_io.c |   93 +++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 53 insertions(+), 40 deletions(-)

diff --git a/mm/page_io.c b/mm/page_io.c
index e90a3231f225..f391846ea82a 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -303,6 +303,57 @@ int sio_pool_init(void)
 	return 0;
 }
 
+static void sio_write_complete(struct kiocb *iocb, long ret)
+{
+	struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb);
+	struct page *page = sio->bvec.bv_page;
+
+	if (ret != PAGE_SIZE) {
+		/*
+		 * In the case of swap-over-nfs, this can be a
+		 * temporary failure if the system has limited
+		 * memory for allocating transmit buffers.
+		 * Mark the page dirty and avoid
+		 * folio_rotate_reclaimable but rate-limit the
+		 * messages but do not flag PageError like
+		 * the normal direct-to-bio case as it could
+		 * be temporary.
+		 */
+		set_page_dirty(page);
+		ClearPageReclaim(page);
+		pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n",
+				   ret, page_file_offset(page));
+	} else
+		count_vm_event(PSWPOUT);
+	end_page_writeback(page);
+	mempool_free(sio, sio_pool);
+}
+
+static int swap_writepage_fs(struct page *page, struct writeback_control *wbc)
+{
+	struct swap_iocb *sio;
+	struct swap_info_struct *sis = page_swap_info(page);
+	struct file *swap_file = sis->swap_file;
+	struct address_space *mapping = swap_file->f_mapping;
+	struct iov_iter from;
+	int ret;
+
+	set_page_writeback(page);
+	unlock_page(page);
+	sio = mempool_alloc(sio_pool, GFP_NOIO);
+	init_sync_kiocb(&sio->iocb, swap_file);
+	sio->iocb.ki_complete = sio_write_complete;
+	sio->iocb.ki_pos = page_file_offset(page);
+	sio->bvec.bv_page = page;
+	sio->bvec.bv_len = PAGE_SIZE;
+	sio->bvec.bv_offset = 0;
+	iov_iter_bvec(&from, WRITE, &sio->bvec, 1, PAGE_SIZE);
+	ret = mapping->a_ops->swap_rw(&sio->iocb, &from);
+	if (ret != -EIOCBQUEUED)
+		sio_write_complete(&sio->iocb, ret);
+	return ret;
+}
+
 int __swap_writepage(struct page *page, struct writeback_control *wbc,
 		bio_end_io_t end_write_func)
 {
@@ -311,46 +362,8 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 	struct swap_info_struct *sis = page_swap_info(page);
 
 	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
-	if (data_race(sis->flags & SWP_FS_OPS)) {
-		struct kiocb kiocb;
-		struct file *swap_file = sis->swap_file;
-		struct address_space *mapping = swap_file->f_mapping;
-		struct bio_vec bv = {
-			.bv_page = page,
-			.bv_len  = PAGE_SIZE,
-			.bv_offset = 0
-		};
-		struct iov_iter from;
-
-		iov_iter_bvec(&from, WRITE, &bv, 1, PAGE_SIZE);
-		init_sync_kiocb(&kiocb, swap_file);
-		kiocb.ki_pos = page_file_offset(page);
-
-		set_page_writeback(page);
-		unlock_page(page);
-		ret = mapping->a_ops->direct_IO(&kiocb, &from);
-		if (ret == PAGE_SIZE) {
-			count_vm_event(PSWPOUT);
-			ret = 0;
-		} else {
-			/*
-			 * In the case of swap-over-nfs, this can be a
-			 * temporary failure if the system has limited
-			 * memory for allocating transmit buffers.
-			 * Mark the page dirty and avoid
-			 * folio_rotate_reclaimable but rate-limit the
-			 * messages but do not flag PageError like
-			 * the normal direct-to-bio case as it could
-			 * be temporary.
-			 */
-			set_page_dirty(page);
-			ClearPageReclaim(page);
-			pr_err_ratelimited("Write error on dio swapfile (%llu)\n",
-					   page_file_offset(page));
-		}
-		end_page_writeback(page);
-		return ret;
-	}
+	if (data_race(sis->flags & SWP_FS_OPS))
+		return swap_writepage_fs(page, wbc);
 
 	ret = bdev_write_page(sis->bdev, swap_page_sector(page), page, wbc);
 	if (!ret) {



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/21] DOC: update documentation for swap_activate and swap_rw
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
  2022-02-07  4:46 ` [PATCH 03/21] MM: move responsibility for setting SWP_FS_OPS to ->swap_activate NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 18/21] NFSv4: keep state manager thread active if swap is enabled NeilBrown
                   ` (19 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

This documentation for ->swap_activate() has been out-of-date for a long
time.  This patch updates it to match recent changes, and adds
documentation for the associated ->swap_rw()

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 Documentation/filesystems/locking.rst |   18 ++++++++++++------
 Documentation/filesystems/vfs.rst     |   17 ++++++++++++-----
 2 files changed, 24 insertions(+), 11 deletions(-)

diff --git a/Documentation/filesystems/locking.rst b/Documentation/filesystems/locking.rst
index 3f9b1497ebb8..fbb10378d5ee 100644
--- a/Documentation/filesystems/locking.rst
+++ b/Documentation/filesystems/locking.rst
@@ -260,8 +260,9 @@ prototypes::
 	int (*launder_page)(struct page *);
 	int (*is_partially_uptodate)(struct page *, unsigned long, unsigned long);
 	int (*error_remove_page)(struct address_space *, struct page *);
-	int (*swap_activate)(struct file *);
+	int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span)
 	int (*swap_deactivate)(struct file *);
+	int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
 
 locking rules:
 	All except set_page_dirty and freepage may block
@@ -290,6 +291,7 @@ is_partially_uptodate:	yes
 error_remove_page:	yes
 swap_activate:		no
 swap_deactivate:	no
+swap_rw:		yes, unlocks
 ======================	======================== =========	===============
 
 ->write_begin(), ->write_end() and ->readpage() may be called from
@@ -392,15 +394,19 @@ cleaned, or an error value if not. Note that in order to prevent the page
 getting mapped back in and redirtied, it needs to be kept locked
 across the entire operation.
 
-->swap_activate will be called with a non-zero argument on
-files backing (non block device backed) swapfiles. A return value
-of zero indicates success, in which case this file can be used for
-backing swapspace. The swapspace operations will be proxied to the
-address space operations.
+->swap_activate() will be called to prepare the given file for swap.  It
+should perform any validation and preparation necessary to ensure that
+writes can be performed with minimal memory allocation.  It should call
+add_swap_extent(), or the helper iomap_swapfile_activate(), and return
+the number of extents added.  If IO should be submitted through
+->swap_rw(), it should set SWP_FS_OPS, otherwise IO will be submitted
+directly to the block device ``sis->bdev``.
 
 ->swap_deactivate() will be called in the sys_swapoff()
 path after ->swap_activate() returned success.
 
+->swap_rw will be called for swap IO if SWP_FS_OPS was set by ->swap_activate().
+
 file_lock_operations
 ====================
 
diff --git a/Documentation/filesystems/vfs.rst b/Documentation/filesystems/vfs.rst
index bf5c48066fac..779d23fc7954 100644
--- a/Documentation/filesystems/vfs.rst
+++ b/Documentation/filesystems/vfs.rst
@@ -751,8 +751,9 @@ cache in your filesystem.  The following members are defined:
 					      unsigned long);
 		void (*is_dirty_writeback) (struct page *, bool *, bool *);
 		int (*error_remove_page) (struct mapping *mapping, struct page *page);
-		int (*swap_activate)(struct file *);
+		int (*swap_activate)(struct swap_info_struct *sis, struct file *f, sector_t *span)
 		int (*swap_deactivate)(struct file *);
+		int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter);
 	};
 
 ``writepage``
@@ -959,15 +960,21 @@ cache in your filesystem.  The following members are defined:
 	unless you have them locked or reference counts increased.
 
 ``swap_activate``
-	Called when swapon is used on a file to allocate space if
-	necessary and pin the block lookup information in memory.  A
-	return value of zero indicates success, in which case this file
-	can be used to back swapspace.
+
+	Called to prepare the given file for swap.  It should perform
+	any validation and preparation necessary to ensure that writes
+	can be performed with minimal memory allocation.  It should call
+	add_swap_extent(), or the helper iomap_swapfile_activate(), and
+	return the number of extents added.  If IO should be submitted
+	through ->swap_rw(), it should set SWP_FS_OPS, otherwise IO will
+	be submitted directly to the block device ``sis->bdev``.
 
 ``swap_deactivate``
 	Called during swapoff on files where swap_activate was
 	successful.
 
+``swap_rw``
+	Called to read or write swap pages when SWP_FS_OPS is set.
 
 The File Object
 ===============



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/21] MM: submit multipage reads for SWP_FS_OPS swap-space
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (14 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 05/21] MM: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 20/21] NFS: swap IO handling is slightly different for O_DIRECT IO NeilBrown
                   ` (5 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

swap_readpage() is given one page at a time, but may be called
repeatedly in succession.
For block-device swap-space, the blk_plug functionality allows the
multiple pages to be combined together at lower layers.
That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is
only active when CONFIG_BLOCK=y.  Consequently all swap reads over NFS
are single page reads.

With this patch we pass in a pointer-to-pointer when swap_readpage can
store state between calls - much like the effect of blk_plug.  After
calling swap_readpage() some number of times, the state will be passed
to swap_read_unplug() which can submit the combined request.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 mm/madvise.c    |    8 +++-
 mm/memory.c     |    2 +
 mm/page_io.c    |  102 ++++++++++++++++++++++++++++++++++++-------------------
 mm/swap.h       |   17 ++++++++-
 mm/swap_state.c |   20 ++++++++---
 5 files changed, 102 insertions(+), 47 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 1ee4b7583379..2b1ab30af141 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -225,6 +225,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
 	pte_t *orig_pte;
 	struct vm_area_struct *vma = walk->private;
 	unsigned long index;
+	struct swap_iocb *splug = NULL;
 
 	if (pmd_none_or_trans_huge_or_clear_bad(pmd))
 		return 0;
@@ -246,10 +247,11 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
 			continue;
 
 		page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
-							vma, index, false);
+					     vma, index, false, &splug);
 		if (page)
 			put_page(page);
 	}
+	swap_read_unplug(splug);
 
 	return 0;
 }
@@ -265,6 +267,7 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma,
 	XA_STATE(xas, &mapping->i_pages, linear_page_index(vma, start));
 	pgoff_t end_index = linear_page_index(vma, end + PAGE_SIZE - 1);
 	struct page *page;
+	struct swap_iocb *splug = NULL;
 
 	rcu_read_lock();
 	xas_for_each(&xas, page, end_index) {
@@ -277,13 +280,14 @@ static void force_shm_swapin_readahead(struct vm_area_struct *vma,
 
 		swap = radix_to_swp_entry(page);
 		page = read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE,
-							NULL, 0, false);
+					     NULL, 0, false, &splug);
 		if (page)
 			put_page(page);
 
 		rcu_read_lock();
 	}
 	rcu_read_unlock();
+	swap_read_unplug(splug);
 
 	lru_add_drain();	/* Push any new pages onto the LRU now */
 }
diff --git a/mm/memory.c b/mm/memory.c
index d25372340107..8bd18c54eaa4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3559,7 +3559,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 
 				/* To provide entry to swap_readpage() */
 				set_page_private(page, entry.val);
-				swap_readpage(page, true);
+				swap_readpage(page, true, NULL);
 				set_page_private(page, 0);
 			}
 		} else {
diff --git a/mm/page_io.c b/mm/page_io.c
index f391846ea82a..fc82e8750e9b 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -286,7 +286,8 @@ static void bio_associate_blkg_from_page(struct bio *bio, struct page *page)
 
 struct swap_iocb {
 	struct kiocb		iocb;
-	struct bio_vec		bvec;
+	struct bio_vec		bvec[SWAP_CLUSTER_MAX];
+	int			pages;
 };
 static mempool_t *sio_pool;
 
@@ -306,7 +307,7 @@ int sio_pool_init(void)
 static void sio_write_complete(struct kiocb *iocb, long ret)
 {
 	struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb);
-	struct page *page = sio->bvec.bv_page;
+	struct page *page = sio->bvec[0].bv_page;
 
 	if (ret != PAGE_SIZE) {
 		/*
@@ -344,10 +345,10 @@ static int swap_writepage_fs(struct page *page, struct writeback_control *wbc)
 	init_sync_kiocb(&sio->iocb, swap_file);
 	sio->iocb.ki_complete = sio_write_complete;
 	sio->iocb.ki_pos = page_file_offset(page);
-	sio->bvec.bv_page = page;
-	sio->bvec.bv_len = PAGE_SIZE;
-	sio->bvec.bv_offset = 0;
-	iov_iter_bvec(&from, WRITE, &sio->bvec, 1, PAGE_SIZE);
+	sio->bvec[0].bv_page = page;
+	sio->bvec[0].bv_len = PAGE_SIZE;
+	sio->bvec[0].bv_offset = 0;
+	iov_iter_bvec(&from, WRITE, &sio->bvec[0], 1, PAGE_SIZE);
 	ret = mapping->a_ops->swap_rw(&sio->iocb, &from);
 	if (ret != -EIOCBQUEUED)
 		sio_write_complete(&sio->iocb, ret);
@@ -390,46 +391,64 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 static void sio_read_complete(struct kiocb *iocb, long ret)
 {
 	struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb);
-	struct page *page = sio->bvec.bv_page;
+	int p;
 
-	if (ret != 0 && ret != PAGE_SIZE) {
-		SetPageError(page);
-		ClearPageUptodate(page);
-		pr_alert_ratelimited("Read-error on swap-device\n");
+	if (ret == PAGE_SIZE * sio->pages) {
+		for (p = 0; p < sio->pages; p++) {
+			struct page *page = sio->bvec[p].bv_page;
+			SetPageUptodate(page);
+			unlock_page(page);
+		}
+		count_vm_events(PSWPIN, sio->pages);
 	} else {
-		SetPageUptodate(page);
-		count_vm_event(PSWPIN);
+		for (p = 0; p < sio->pages; p++) {
+			struct page *page = sio->bvec[p].bv_page;
+			SetPageError(page);
+			ClearPageUptodate(page);
+			unlock_page(page);
+		}
+		pr_alert_ratelimited("Read-error on swap-device\n");
 	}
-	unlock_page(page);
 	mempool_free(sio, sio_pool);
 }
 
-static int swap_readpage_fs(struct page *page)
+static void swap_readpage_fs(struct page *page,
+			     struct swap_iocb **plug)
 {
 	struct swap_info_struct *sis = page_swap_info(page);
-	struct file *swap_file = sis->swap_file;
-	struct address_space *mapping = swap_file->f_mapping;
-	struct iov_iter from;
-	struct swap_iocb *sio;
+	struct swap_iocb *sio = NULL;
 	loff_t pos = page_file_offset(page);
-	int ret;
-
-	sio = mempool_alloc(sio_pool, GFP_KERNEL);
-	init_sync_kiocb(&sio->iocb, swap_file);
-	sio->iocb.ki_pos = pos;
-	sio->iocb.ki_complete = sio_read_complete;
-	sio->bvec.bv_page = page;
-	sio->bvec.bv_len = PAGE_SIZE;
-	sio->bvec.bv_offset = 0;
 
-	iov_iter_bvec(&from, READ, &sio->bvec, 1, PAGE_SIZE);
-	ret = mapping->a_ops->swap_rw(&sio->iocb, &from);
-	if (ret != -EIOCBQUEUED)
-		sio_read_complete(&sio->iocb, ret);
-	return ret;
+	if (plug)
+		sio = *plug;
+	if (sio) {
+		if (sio->iocb.ki_filp != sis->swap_file ||
+		    sio->iocb.ki_pos + sio->pages * PAGE_SIZE != pos) {
+			swap_read_unplug(sio);
+			sio = NULL;
+		}
+	}
+	if (!sio) {
+		sio = mempool_alloc(sio_pool, GFP_KERNEL);
+		init_sync_kiocb(&sio->iocb, sis->swap_file);
+		sio->iocb.ki_pos = pos;
+		sio->iocb.ki_complete = sio_read_complete;
+		sio->pages = 0;
+	}
+	sio->bvec[sio->pages].bv_page = page;
+	sio->bvec[sio->pages].bv_len = PAGE_SIZE;
+	sio->bvec[sio->pages].bv_offset = 0;
+	sio->pages += 1;
+	if (sio->pages == ARRAY_SIZE(sio->bvec) || !plug) {
+		swap_read_unplug(sio);
+		sio = NULL;
+	}
+	if (plug)
+		*plug = sio;
 }
 
-int swap_readpage(struct page *page, bool synchronous)
+int swap_readpage(struct page *page, bool synchronous,
+		  struct swap_iocb **plug)
 {
 	struct bio *bio;
 	int ret = 0;
@@ -455,7 +474,7 @@ int swap_readpage(struct page *page, bool synchronous)
 	}
 
 	if (data_race(sis->flags & SWP_FS_OPS)) {
-		ret = swap_readpage_fs(page);
+		swap_readpage_fs(page, plug);
 		goto out;
 	}
 
@@ -507,3 +526,16 @@ int swap_readpage(struct page *page, bool synchronous)
 	delayacct_swapin_end();
 	return ret;
 }
+
+void __swap_read_unplug(struct swap_iocb *sio)
+{
+	struct iov_iter from;
+	struct address_space *mapping = sio->iocb.ki_filp->f_mapping;
+	int ret;
+
+	iov_iter_bvec(&from, READ, sio->bvec, sio->pages,
+		      PAGE_SIZE * sio->pages);
+	ret = mapping->a_ops->swap_rw(&sio->iocb, &from);
+	if (ret != -EIOCBQUEUED)
+		sio_read_complete(&sio->iocb, ret);
+}
diff --git a/mm/swap.h b/mm/swap.h
index e8ee995cf8d8..7aab4c82e2d0 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -4,7 +4,15 @@
 
 /* linux/mm/page_io.c */
 int sio_pool_init(void);
-int swap_readpage(struct page *page, bool do_poll);
+struct swap_iocb;
+int swap_readpage(struct page *page, bool do_poll,
+		  struct swap_iocb **plug);
+void __swap_read_unplug(struct swap_iocb *plug);
+static inline void swap_read_unplug(struct swap_iocb *plug)
+{
+	if (unlikely(plug))
+		__swap_read_unplug(plug);
+}
 int swap_writepage(struct page *page, struct writeback_control *wbc);
 void end_swap_bio_write(struct bio *bio);
 int __swap_writepage(struct page *page, struct writeback_control *wbc,
@@ -38,7 +46,8 @@ struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index);
 struct page *read_swap_cache_async(swp_entry_t, gfp_t,
 				   struct vm_area_struct *vma,
 				   unsigned long addr,
-				   bool do_poll);
+				   bool do_poll,
+				   struct swap_iocb **plug);
 struct page *__read_swap_cache_async(swp_entry_t, gfp_t,
 				     struct vm_area_struct *vma,
 				     unsigned long addr,
@@ -53,7 +62,9 @@ static inline unsigned int page_swap_flags(struct page *page)
 	return page_swap_info(page)->flags;
 }
 #else /* CONFIG_SWAP */
-static inline int swap_readpage(struct page *page, bool do_poll)
+struct swap_iocb;
+static inline int swap_readpage(struct page *page, bool do_poll,
+				struct swap_iocb **plug)
 {
 	return 0;
 }
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 514b86b05488..c84779e2518b 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -520,14 +520,16 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
  * the swap entry is no longer in use.
  */
 struct page *read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask,
-		struct vm_area_struct *vma, unsigned long addr, bool do_poll)
+				   struct vm_area_struct *vma,
+				   unsigned long addr, bool do_poll,
+				   struct swap_iocb **plug)
 {
 	bool page_was_allocated;
 	struct page *retpage = __read_swap_cache_async(entry, gfp_mask,
 			vma, addr, &page_was_allocated);
 
 	if (page_was_allocated)
-		swap_readpage(retpage, do_poll);
+		swap_readpage(retpage, do_poll, plug);
 
 	return retpage;
 }
@@ -621,6 +623,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 	unsigned long mask;
 	struct swap_info_struct *si = swp_swap_info(entry);
 	struct blk_plug plug;
+	struct swap_iocb *splug = NULL;
 	bool do_poll = true, page_allocated;
 	struct vm_area_struct *vma = vmf->vma;
 	unsigned long addr = vmf->address;
@@ -647,7 +650,7 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 		if (!page)
 			continue;
 		if (page_allocated) {
-			swap_readpage(page, false);
+			swap_readpage(page, false, &splug);
 			if (offset != entry_offset) {
 				SetPageReadahead(page);
 				count_vm_event(SWAP_RA);
@@ -656,10 +659,12 @@ struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t gfp_mask,
 		put_page(page);
 	}
 	blk_finish_plug(&plug);
+	swap_read_unplug(splug);
 
 	lru_add_drain();	/* Push any new pages onto the LRU now */
 skip:
-	return read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll);
+	/* The page was likely read above, so no need for plugging here */
+	return read_swap_cache_async(entry, gfp_mask, vma, addr, do_poll, NULL);
 }
 
 int init_swap_address_space(unsigned int type, unsigned long nr_pages)
@@ -790,6 +795,7 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
 				       struct vm_fault *vmf)
 {
 	struct blk_plug plug;
+	struct swap_iocb *splug = NULL;
 	struct vm_area_struct *vma = vmf->vma;
 	struct page *page;
 	pte_t *pte, pentry;
@@ -820,7 +826,7 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
 		if (!page)
 			continue;
 		if (page_allocated) {
-			swap_readpage(page, false);
+			swap_readpage(page, false, &splug);
 			if (i != ra_info.offset) {
 				SetPageReadahead(page);
 				count_vm_event(SWAP_RA);
@@ -829,10 +835,12 @@ static struct page *swap_vma_readahead(swp_entry_t fentry, gfp_t gfp_mask,
 		put_page(page);
 	}
 	blk_finish_plug(&plug);
+	swap_read_unplug(splug);
 	lru_add_drain();
 skip:
+	/* The page was likely read above, so no need for plugging here */
 	return read_swap_cache_async(fentry, gfp_mask, vma, vmf->address,
-				     ra_info.win == 1);
+				     ra_info.win == 1, NULL);
 }
 
 /**



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/21] MM: submit multipage write for SWP_FS_OPS swap-space
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (6 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 01/21] MM: create new mm/swap.h header file NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  8:40   ` Christoph Hellwig
  2022-02-07  4:46 ` [PATCH 13/21] SUNRPC/auth: async tasks mustn't block waiting for memory NeilBrown
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

swap_writepage() is given one page at a time, but may be called repeatedly
in succession.
For block-device swapspace, the blk_plug functionality allows the
multiple pages to be combined together at lower layers.
That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is
only active when CONFIG_BLOCK=y.  Consequently all swap reads over NFS
are single page reads.

With this patch we pass a pointer-to-pointer via the wbc.
swap_writepage can store state between calls - much like the pointer
passed explicitly to swap_readpage.  After calling swap_writepage() some
number of times, the state will be passed to swap_write_unplug() which
can submit the combined request.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 include/linux/writeback.h |    7 ++++
 mm/page_io.c              |   78 ++++++++++++++++++++++++++++++++-------------
 mm/swap.h                 |    4 ++
 mm/vmscan.c               |    9 ++++-
 4 files changed, 74 insertions(+), 24 deletions(-)

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index fec248ab1fec..32b35f21cb97 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -80,6 +80,13 @@ struct writeback_control {
 
 	unsigned punt_to_cgroup:1;	/* cgrp punting, see __REQ_CGROUP_PUNT */
 
+	/* To enable batching of swap writes to non-block-device backends,
+	 * "plug" can be set point to a 'struct swap_iocb *'.  When all swap
+	 * writes have been submitted, if with swap_iocb is not NULL,
+	 * swap_write_unplug() should be called.
+	 */
+	struct swap_iocb **swap_plug;
+
 #ifdef CONFIG_CGROUP_WRITEBACK
 	struct bdi_writeback *wb;	/* wb this writeback is issued under */
 	struct inode *inode;		/* inode being written out */
diff --git a/mm/page_io.c b/mm/page_io.c
index fc82e8750e9b..7684a8d81dcd 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -308,8 +308,9 @@ static void sio_write_complete(struct kiocb *iocb, long ret)
 {
 	struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb);
 	struct page *page = sio->bvec[0].bv_page;
+	int p;
 
-	if (ret != PAGE_SIZE) {
+	if (ret != PAGE_SIZE * sio->pages) {
 		/*
 		 * In the case of swap-over-nfs, this can be a
 		 * temporary failure if the system has limited
@@ -320,43 +321,63 @@ static void sio_write_complete(struct kiocb *iocb, long ret)
 		 * the normal direct-to-bio case as it could
 		 * be temporary.
 		 */
-		set_page_dirty(page);
-		ClearPageReclaim(page);
 		pr_err_ratelimited("Write error %ld on dio swapfile (%llu)\n",
 				   ret, page_file_offset(page));
+		for (p = 0; p < sio->pages; p++) {
+			page = sio->bvec[p].bv_page;
+			set_page_dirty(page);
+			ClearPageReclaim(page);
+		}
 	} else
-		count_vm_event(PSWPOUT);
-	end_page_writeback(page);
+		count_vm_events(PSWPOUT, sio->pages);
+
+	for (p = 0; p < sio->pages; p++)
+		end_page_writeback(sio->bvec[p].bv_page);
+
 	mempool_free(sio, sio_pool);
 }
 
 static int swap_writepage_fs(struct page *page, struct writeback_control *wbc)
 {
-	struct swap_iocb *sio;
+	struct swap_iocb *sio = NULL;
 	struct swap_info_struct *sis = page_swap_info(page);
 	struct file *swap_file = sis->swap_file;
-	struct address_space *mapping = swap_file->f_mapping;
-	struct iov_iter from;
-	int ret;
+	loff_t pos = page_file_offset(page);
 
 	set_page_writeback(page);
 	unlock_page(page);
-	sio = mempool_alloc(sio_pool, GFP_NOIO);
-	init_sync_kiocb(&sio->iocb, swap_file);
-	sio->iocb.ki_complete = sio_write_complete;
-	sio->iocb.ki_pos = page_file_offset(page);
-	sio->bvec[0].bv_page = page;
-	sio->bvec[0].bv_len = PAGE_SIZE;
-	sio->bvec[0].bv_offset = 0;
-	iov_iter_bvec(&from, WRITE, &sio->bvec[0], 1, PAGE_SIZE);
-	ret = mapping->a_ops->swap_rw(&sio->iocb, &from);
-	if (ret != -EIOCBQUEUED)
-		sio_write_complete(&sio->iocb, ret);
-	return ret;
+	if (wbc->swap_plug)
+		sio = *wbc->swap_plug;
+	if (sio) {
+		if (sio->iocb.ki_filp != swap_file ||
+		    sio->iocb.ki_pos + sio->pages * PAGE_SIZE != pos) {
+			swap_write_unplug(sio);
+			sio = NULL;
+		}
+	}
+	if (!sio) {
+		sio = mempool_alloc(sio_pool, GFP_NOIO);
+		init_sync_kiocb(&sio->iocb, swap_file);
+		sio->iocb.ki_complete = sio_write_complete;
+		sio->iocb.ki_pos = pos;
+		sio->pages = 0;
+	}
+	sio->bvec[sio->pages].bv_page = page;
+	sio->bvec[sio->pages].bv_len = PAGE_SIZE;
+	sio->bvec[sio->pages].bv_offset = 0;
+	sio->pages += 1;
+	if (sio->pages == ARRAY_SIZE(sio->bvec) || !wbc->swap_plug) {
+		swap_write_unplug(sio);
+		sio = NULL;
+	}
+	if (wbc->swap_plug)
+		*wbc->swap_plug = sio;
+
+	return 0;
 }
 
 int __swap_writepage(struct page *page, struct writeback_control *wbc,
-		bio_end_io_t end_write_func)
+		     bio_end_io_t end_write_func)
 {
 	struct bio *bio;
 	int ret;
@@ -388,6 +409,19 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 	return 0;
 }
 
+void swap_write_unplug(struct swap_iocb *sio)
+{
+	struct iov_iter from;
+	struct address_space *mapping = sio->iocb.ki_filp->f_mapping;
+	int ret;
+
+	iov_iter_bvec(&from, WRITE, sio->bvec, sio->pages,
+		      PAGE_SIZE * sio->pages);
+	ret = mapping->a_ops->swap_rw(&sio->iocb, &from);
+	if (ret != -EIOCBQUEUED)
+		sio_write_complete(&sio->iocb, ret);
+}
+
 static void sio_read_complete(struct kiocb *iocb, long ret)
 {
 	struct swap_iocb *sio = container_of(iocb, struct swap_iocb, iocb);
diff --git a/mm/swap.h b/mm/swap.h
index 7aab4c82e2d0..58f28e94ba56 100644
--- a/mm/swap.h
+++ b/mm/swap.h
@@ -13,6 +13,7 @@ static inline void swap_read_unplug(struct swap_iocb *plug)
 	if (unlikely(plug))
 		__swap_read_unplug(plug);
 }
+void swap_write_unplug(struct swap_iocb *sio);
 int swap_writepage(struct page *page, struct writeback_control *wbc);
 void end_swap_bio_write(struct bio *bio);
 int __swap_writepage(struct page *page, struct writeback_control *wbc,
@@ -68,6 +69,9 @@ static inline int swap_readpage(struct page *page, bool do_poll,
 {
 	return 0;
 }
+static inline void swap_write_unplug(struct swap_iocb *sio)
+{
+}
 
 static inline struct address_space *swap_address_space(swp_entry_t entry)
 {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ad5026d06aa8..572627412e7d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1164,7 +1164,8 @@ typedef enum {
  * pageout is called by shrink_page_list() for each dirty page.
  * Calls ->writepage().
  */
-static pageout_t pageout(struct page *page, struct address_space *mapping)
+static pageout_t pageout(struct page *page, struct address_space *mapping,
+			 struct swap_iocb **plug)
 {
 	/*
 	 * If the page is dirty, only perform writeback if that write
@@ -1211,6 +1212,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping)
 			.range_start = 0,
 			.range_end = LLONG_MAX,
 			.for_reclaim = 1,
+			.swap_plug = plug,
 		};
 
 		SetPageReclaim(page);
@@ -1537,6 +1539,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
 	unsigned int nr_reclaimed = 0;
 	unsigned int pgactivate = 0;
 	bool do_demote_pass;
+	struct swap_iocb *plug = NULL;
 
 	memset(stat, 0, sizeof(*stat));
 	cond_resched();
@@ -1817,7 +1820,7 @@ static unsigned int shrink_page_list(struct list_head *page_list,
 			 * starts and then write it out here.
 			 */
 			try_to_unmap_flush_dirty();
-			switch (pageout(page, mapping)) {
+			switch (pageout(page, mapping, &plug)) {
 			case PAGE_KEEP:
 				goto keep_locked;
 			case PAGE_ACTIVATE:
@@ -1971,6 +1974,8 @@ static unsigned int shrink_page_list(struct list_head *page_list,
 	list_splice(&ret_pages, page_list);
 	count_vm_events(PGACTIVATE, pgactivate);
 
+	if (plug)
+		swap_write_unplug(plug);
 	return nr_reclaimed;
 }
 



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/21] VFS: Add FMODE_CAN_ODIRECT file flag
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (3 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 16/21] NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 06/21] MM: perform async writes to SWP_FS_OPS swap-space using ->swap_rw NeilBrown
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

Currently various places test if direct IO is possible on a file by
checking for the existence of the direct_IO address space operation.
This is a poor choice, as the direct_IO operation may not be used - it is
only used if the generic_file_*_iter functions are called for direct IO
and some filesystems - particularly NFS - don't do this.

Instead, introduce a new f_mode flag: FMODE_CAN_ODIRECT and change the
various places to check this (avoiding pointer dereferences).
do_dentry_open() will set this flag if ->direct_IO is present, so
filesystems do not need to be changed.

NFS *is* changed, to set the flag explicitly and discard the direct_IO
entry in the address_space_operations for files.

Other filesystems which currently use noop_direct_IO could usefully be
changed to set this flag instead.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 drivers/block/loop.c |    4 ++--
 fs/fcntl.c           |    9 ++++-----
 fs/nfs/file.c        |    3 ++-
 fs/open.c            |    9 ++++-----
 fs/overlayfs/file.c  |   13 ++++---------
 include/linux/fs.h   |    3 +++
 6 files changed, 19 insertions(+), 22 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 01cbbfc4e9e2..a2609dd79370 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -184,8 +184,8 @@ static void __loop_update_dio(struct loop_device *lo, bool dio)
 	 */
 	if (dio) {
 		if (queue_logical_block_size(lo->lo_queue) >= sb_bsize &&
-				!(lo->lo_offset & dio_align) &&
-				mapping->a_ops->direct_IO)
+		    !(lo->lo_offset & dio_align) &&
+		    (file->f_mode & FMODE_CAN_ODIRECT))
 			use_dio = true;
 		else
 			use_dio = false;
diff --git a/fs/fcntl.c b/fs/fcntl.c
index 9c6c6a3e2de5..11e665242a76 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -56,11 +56,10 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
 		   arg |= O_NONBLOCK;
 
 	/* Pipe packetized mode is controlled by O_DIRECT flag */
-	if (!S_ISFIFO(inode->i_mode) && (arg & O_DIRECT)) {
-		if (!filp->f_mapping || !filp->f_mapping->a_ops ||
-			!filp->f_mapping->a_ops->direct_IO)
-				return -EINVAL;
-	}
+	if (!S_ISFIFO(inode->i_mode) &&
+	    (arg & O_DIRECT) &&
+	    !(filp->f_mode & FMODE_CAN_ODIRECT))
+		return -EINVAL;
 
 	if (filp->f_op->check_flags)
 		error = filp->f_op->check_flags(arg);
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 3dbef2c31567..9e2def045111 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -74,6 +74,8 @@ nfs_file_open(struct inode *inode, struct file *filp)
 		return res;
 
 	res = nfs_open(inode, filp);
+	if (res == 0)
+		filp->f_mode |= FMODE_CAN_ODIRECT;
 	return res;
 }
 
@@ -535,7 +537,6 @@ const struct address_space_operations nfs_file_aops = {
 	.write_end = nfs_write_end,
 	.invalidatepage = nfs_invalidate_page,
 	.releasepage = nfs_release_page,
-	.direct_IO = nfs_direct_IO,
 #ifdef CONFIG_MIGRATION
 	.migratepage = nfs_migrate_page,
 #endif
diff --git a/fs/open.c b/fs/open.c
index 9ff2f621b760..76ddf9014499 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -834,17 +834,16 @@ static int do_dentry_open(struct file *f,
 	if ((f->f_mode & FMODE_WRITE) &&
 	     likely(f->f_op->write || f->f_op->write_iter))
 		f->f_mode |= FMODE_CAN_WRITE;
+	if (f->f_mapping->a_ops && f->f_mapping->a_ops->direct_IO)
+		f->f_mode |= FMODE_CAN_ODIRECT;
 
 	f->f_write_hint = WRITE_LIFE_NOT_SET;
 	f->f_flags &= ~(O_CREAT | O_EXCL | O_NOCTTY | O_TRUNC);
 
 	file_ra_state_init(&f->f_ra, f->f_mapping->host->i_mapping);
 
-	/* NB: we're sure to have correct a_ops only after f_op->open */
-	if (f->f_flags & O_DIRECT) {
-		if (!f->f_mapping->a_ops || !f->f_mapping->a_ops->direct_IO)
-			return -EINVAL;
-	}
+	if ((f->f_flags & O_DIRECT) && !(f->f_mode & FMODE_CAN_ODIRECT))
+		return -EINVAL;
 
 	/*
 	 * XXX: Huge page cache doesn't support writing yet. Drop all page
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c
index fa125feed0ff..9d69b4dbb8c4 100644
--- a/fs/overlayfs/file.c
+++ b/fs/overlayfs/file.c
@@ -82,11 +82,8 @@ static int ovl_change_flags(struct file *file, unsigned int flags)
 	if (((flags ^ file->f_flags) & O_APPEND) && IS_APPEND(inode))
 		return -EPERM;
 
-	if (flags & O_DIRECT) {
-		if (!file->f_mapping->a_ops ||
-		    !file->f_mapping->a_ops->direct_IO)
-			return -EINVAL;
-	}
+	if ((flags & O_DIRECT) && !(file->f_mode & FMODE_CAN_ODIRECT))
+		return -EINVAL;
 
 	if (file->f_op->check_flags) {
 		err = file->f_op->check_flags(flags);
@@ -306,8 +303,7 @@ static ssize_t ovl_read_iter(struct kiocb *iocb, struct iov_iter *iter)
 
 	ret = -EINVAL;
 	if (iocb->ki_flags & IOCB_DIRECT &&
-	    (!real.file->f_mapping->a_ops ||
-	     !real.file->f_mapping->a_ops->direct_IO))
+	    !(real.file->f_mode & FMODE_CAN_ODIRECT))
 		goto out_fdput;
 
 	old_cred = ovl_override_creds(file_inode(file)->i_sb);
@@ -367,8 +363,7 @@ static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 
 	ret = -EINVAL;
 	if (iocb->ki_flags & IOCB_DIRECT &&
-	    (!real.file->f_mapping->a_ops ||
-	     !real.file->f_mapping->a_ops->direct_IO))
+	    !(real.file->f_mode & FMODE_CAN_ODIRECT))
 		goto out_fdput;
 
 	if (!ovl_should_sync(OVL_FS(inode->i_sb)))
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 57e3b387cb17..c34c53267415 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -161,6 +161,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset,
 /* File is stream-like */
 #define FMODE_STREAM		((__force fmode_t)0x200000)
 
+/* File supports DIRECT IO */
+#define	FMODE_CAN_ODIRECT	((__force fmode_t)0x400000)
+
 /* File was opened by fanotify and shouldn't generate fanotify events */
 #define FMODE_NONOTIFY		((__force fmode_t)0x4000000)
 



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 11/21] NFS: remove IS_SWAPFILE hack
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (12 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 21/21] NFS: swap-out must always use STABLE writes NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 05/21] MM: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space NeilBrown
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

This code is pointless as IS_SWAPFILE is always defined.
So remove it.

Suggested-by: Mark Hemment <markhemm@googlemail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/file.c |    5 -----
 1 file changed, 5 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 9e2def045111..4d4750738aeb 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -44,11 +44,6 @@
 
 static const struct vm_operations_struct nfs_file_vm_ops;
 
-/* Hack for future NFS swap support */
-#ifndef IS_SWAPFILE
-# define IS_SWAPFILE(inode)	(0)
-#endif
-
 int nfs_check_flags(int flags)
 {
 	if ((flags & (O_APPEND | O_DIRECT)) == (O_APPEND | O_DIRECT))



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 12/21] SUNRPC/call_alloc: async tasks mustn't block waiting for memory
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (17 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 17/21] SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 15/21] SUNRPC: remove scheduling boost for "SWAPPER" tasks NeilBrown
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

When memory is short, new worker threads cannot be created and we depend
on the minimum one rpciod thread to be able to handle everything.
So it must not block waiting for memory.

mempools are particularly a problem as memory can only be released back
to the mempool by an async rpc task running.  If all available
workqueue threads are waiting on the mempool, no thread is available to
return anything.

rpc_malloc() can block, and this might cause deadlocks.
So check RPC_IS_ASYNC(), rather than RPC_IS_SWAPPER() to determine if
blocking is acceptable.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 net/sunrpc/sched.c              |    4 +++-
 net/sunrpc/xprtrdma/transport.c |    4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index e2c835482791..d5b6e897f5a5 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -1023,8 +1023,10 @@ int rpc_malloc(struct rpc_task *task)
 	struct rpc_buffer *buf;
 	gfp_t gfp = GFP_NOFS;
 
+	if (RPC_IS_ASYNC(task))
+		gfp = GFP_NOWAIT | __GFP_NOWARN;
 	if (RPC_IS_SWAPPER(task))
-		gfp = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN;
+		gfp |= __GFP_MEMALLOC;
 
 	size += sizeof(struct rpc_buffer);
 	if (size <= RPC_BUFFER_MAXSIZE)
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 42e375dbdadb..5714bf880e95 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -570,8 +570,10 @@ xprt_rdma_allocate(struct rpc_task *task)
 	gfp_t flags;
 
 	flags = RPCRDMA_DEF_GFP;
+	if (RPC_IS_ASYNC(task))
+		flags = GFP_NOWAIT | __GFP_NOWARN;
 	if (RPC_IS_SWAPPER(task))
-		flags = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN;
+		flags |= __GFP_MEMALLOC;
 
 	if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize,
 				  flags))



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 13/21] SUNRPC/auth: async tasks mustn't block waiting for memory
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (7 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 09/21] MM: submit multipage write for SWP_FS_OPS swap-space NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 02/21] MM: drop swap_set_page_dirty NeilBrown
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

When memory is short, new worker threads cannot be created and we depend
on the minimum one rpciod thread to be able to handle everything.  So it
must not block waiting for memory.

mempools are particularly a problem as memory can only be released back
to the mempool by an async rpc task running.  If all available workqueue
threads are waiting on the mempool, no thread is available to return
anything.

lookup_cred() can block on a mempool or kmalloc - and this can cause
deadlocks.  So add a new RPCAUTH_LOOKUP flag for async lookups and don't
block on memory.  If the -ENOMEM gets back to call_refreshresult(), wait
a short while and try again.  HZ>>4 is chosen as it is used elsewhere
for -ENOMEM retries.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 include/linux/sunrpc/auth.h    |    1 +
 net/sunrpc/auth.c              |    6 +++++-
 net/sunrpc/auth_gss/auth_gss.c |    6 +++++-
 net/sunrpc/auth_unix.c         |   10 ++++++++--
 net/sunrpc/clnt.c              |    3 +++
 5 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h
index 98da816b5fc2..3e6ce288a7fc 100644
--- a/include/linux/sunrpc/auth.h
+++ b/include/linux/sunrpc/auth.h
@@ -99,6 +99,7 @@ struct rpc_auth_create_args {
 
 /* Flags for rpcauth_lookupcred() */
 #define RPCAUTH_LOOKUP_NEW		0x01	/* Accept an uninitialised cred */
+#define RPCAUTH_LOOKUP_ASYNC		0x02	/* Don't block waiting for memory */
 
 /*
  * Client authentication ops
diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index a9f0d17fdb0d..6bfa19f9fa6a 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -615,6 +615,8 @@ rpcauth_bind_root_cred(struct rpc_task *task, int lookupflags)
 	};
 	struct rpc_cred *ret;
 
+	if (RPC_IS_ASYNC(task))
+		lookupflags |= RPCAUTH_LOOKUP_ASYNC;
 	ret = auth->au_ops->lookup_cred(auth, &acred, lookupflags);
 	put_cred(acred.cred);
 	return ret;
@@ -631,6 +633,8 @@ rpcauth_bind_machine_cred(struct rpc_task *task, int lookupflags)
 
 	if (!acred.principal)
 		return NULL;
+	if (RPC_IS_ASYNC(task))
+		lookupflags |= RPCAUTH_LOOKUP_ASYNC;
 	return auth->au_ops->lookup_cred(auth, &acred, lookupflags);
 }
 
@@ -654,7 +658,7 @@ rpcauth_bindcred(struct rpc_task *task, const struct cred *cred, int flags)
 	};
 
 	if (flags & RPC_TASK_ASYNC)
-		lookupflags |= RPCAUTH_LOOKUP_NEW;
+		lookupflags |= RPCAUTH_LOOKUP_NEW | RPCAUTH_LOOKUP_ASYNC;
 	if (task->tk_op_cred)
 		/* Task must use exactly this rpc_cred */
 		new = get_rpccred(task->tk_op_cred);
diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 5f42aa5fc612..df72d6301f78 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -1341,7 +1341,11 @@ gss_hash_cred(struct auth_cred *acred, unsigned int hashbits)
 static struct rpc_cred *
 gss_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags)
 {
-	return rpcauth_lookup_credcache(auth, acred, flags, GFP_NOFS);
+	gfp_t gfp = GFP_NOFS;
+
+	if (flags & RPCAUTH_LOOKUP_ASYNC)
+		gfp = GFP_NOWAIT | __GFP_NOWARN;
+	return rpcauth_lookup_credcache(auth, acred, flags, gfp);
 }
 
 static struct rpc_cred *
diff --git a/net/sunrpc/auth_unix.c b/net/sunrpc/auth_unix.c
index e7df1f782b2e..e5819265dd1b 100644
--- a/net/sunrpc/auth_unix.c
+++ b/net/sunrpc/auth_unix.c
@@ -43,8 +43,14 @@ unx_destroy(struct rpc_auth *auth)
 static struct rpc_cred *
 unx_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags)
 {
-	struct rpc_cred *ret = mempool_alloc(unix_pool, GFP_NOFS);
-
+	gfp_t gfp = GFP_NOFS;
+	struct rpc_cred *ret;
+
+	if (flags & RPCAUTH_LOOKUP_ASYNC)
+		gfp = GFP_NOWAIT | __GFP_NOWARN;
+	ret = mempool_alloc(unix_pool, gfp);
+	if (!ret)
+		return ERR_PTR(-ENOMEM);
 	rpcauth_init_cred(ret, acred, auth, &unix_credops);
 	ret->cr_flags = 1UL << RPCAUTH_CRED_UPTODATE;
 	return ret;
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index c83fe618767c..d1fb7c0c7685 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1745,6 +1745,9 @@ call_refreshresult(struct rpc_task *task)
 		task->tk_cred_retry--;
 		trace_rpc_retry_refresh_status(task);
 		return;
+	case -ENOMEM:
+		rpc_delay(task, HZ >> 4);
+		return;
 	}
 	trace_rpc_refresh_status(task);
 	rpc_call_rpcerror(task, status);



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 14/21] SUNRPC/xprt: async tasks mustn't block waiting for memory
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (10 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 04/21] MM: reclaim mustn't enter FS for SWP_FS_OPS swap-space NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 21/21] NFS: swap-out must always use STABLE writes NeilBrown
                   ` (9 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

When memory is short, new worker threads cannot be created and we depend
on the minimum one rpciod thread to be able to handle everything.  So it
must not block waiting for memory.

xprt_dynamic_alloc_slot can block indefinitely.  This can tie up all
workqueue threads and NFS can deadlock.  So when called from a
workqueue, set __GFP_NORETRY.

The rdma alloc_slot already does not block.  However it sets the error
to -EAGAIN suggesting this will trigger a sleep.  It does not.  As we
can see in call_reserveresult(), only -ENOMEM causes a sleep.  -EAGAIN
causes immediate retry.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 net/sunrpc/xprt.c               |    5 ++++-
 net/sunrpc/xprtrdma/transport.c |    2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index a02de2bddb28..47d207e416ab 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1687,12 +1687,15 @@ static bool xprt_throttle_congested(struct rpc_xprt *xprt, struct rpc_task *task
 static struct rpc_rqst *xprt_dynamic_alloc_slot(struct rpc_xprt *xprt)
 {
 	struct rpc_rqst *req = ERR_PTR(-EAGAIN);
+	gfp_t gfp_mask = GFP_NOFS;
 
 	if (xprt->num_reqs >= xprt->max_reqs)
 		goto out;
 	++xprt->num_reqs;
 	spin_unlock(&xprt->reserve_lock);
-	req = kzalloc(sizeof(struct rpc_rqst), GFP_NOFS);
+	if (current->flags & PF_WQ_WORKER)
+		gfp_mask |= __GFP_NORETRY | __GFP_NOWARN;
+	req = kzalloc(sizeof(struct rpc_rqst), gfp_mask);
 	spin_lock(&xprt->reserve_lock);
 	if (req != NULL)
 		goto out;
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 5714bf880e95..923e4b512ee9 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -517,7 +517,7 @@ xprt_rdma_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task)
 	return;
 
 out_sleep:
-	task->tk_status = -EAGAIN;
+	task->tk_status = -ENOMEM;
 	xprt_add_backlog(xprt, task);
 }
 



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 15/21] SUNRPC: remove scheduling boost for "SWAPPER" tasks.
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (18 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 12/21] SUNRPC/call_alloc: async tasks mustn't block waiting for memory NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 19/21] NFS: rename nfs_direct_IO and use as ->swap_rw NeilBrown
  2022-02-10 15:22 ` [PATCH 00/21 V4] Repair SWAP-over_NFS Geert Uytterhoeven
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

Currently, tasks marked as "swapper" tasks get put to the front of
non-priority rpc_queues, and are sorted earlier than non-swapper tasks on
the transport's ->xmit_queue.

This is pointless as currently *all* tasks for a mount that has swap
enabled on *any* file are marked as "swapper" tasks.  So the net result
is that the non-priority rpc_queues are reverse-ordered (LIFO).

This scheduling boost is not necessary to avoid deadlocks, and hurts
fairness, so remove it.  If there were a need to expedite some requests,
the tk_priority mechanism is a more appropriate tool.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 net/sunrpc/sched.c |    7 -------
 net/sunrpc/xprt.c  |   11 -----------
 2 files changed, 18 deletions(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index d5b6e897f5a5..256302bf6557 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -186,11 +186,6 @@ static void __rpc_add_wait_queue_priority(struct rpc_wait_queue *queue,
 
 /*
  * Add new request to wait queue.
- *
- * Swapper tasks always get inserted at the head of the queue.
- * This should avoid many nasty memory deadlocks and hopefully
- * improve overall performance.
- * Everyone else gets appended to the queue to ensure proper FIFO behavior.
  */
 static void __rpc_add_wait_queue(struct rpc_wait_queue *queue,
 		struct rpc_task *task,
@@ -199,8 +194,6 @@ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue,
 	INIT_LIST_HEAD(&task->u.tk_wait.timer_list);
 	if (RPC_IS_PRIORITY(queue))
 		__rpc_add_wait_queue_priority(queue, task, queue_priority);
-	else if (RPC_IS_SWAPPER(task))
-		list_add(&task->u.tk_wait.list, &queue->tasks[0]);
 	else
 		list_add_tail(&task->u.tk_wait.list, &queue->tasks[0]);
 	task->tk_waitqueue = queue;
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 47d207e416ab..a0a2583fe941 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1354,17 +1354,6 @@ xprt_request_enqueue_transmit(struct rpc_task *task)
 				INIT_LIST_HEAD(&req->rq_xmit2);
 				goto out;
 			}
-		} else if (RPC_IS_SWAPPER(task)) {
-			list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) {
-				if (pos->rq_cong || pos->rq_bytes_sent)
-					continue;
-				if (RPC_IS_SWAPPER(pos->rq_task))
-					continue;
-				/* Note: req is added _before_ pos */
-				list_add_tail(&req->rq_xmit, &pos->rq_xmit);
-				INIT_LIST_HEAD(&req->rq_xmit2);
-				goto out;
-			}
 		} else if (!req->rq_seqno) {
 			list_for_each_entry(pos, &xprt->xmit_queue, rq_xmit) {
 				if (pos->rq_task->tk_owner != task->tk_owner)



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 16/21] NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (2 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 18/21] NFSv4: keep state manager thread active if swap is enabled NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 10/21] VFS: Add FMODE_CAN_ODIRECT file flag NeilBrown
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

NFS_RPC_SWAPFLAGS is only used for READ requests.
It sets RPC_TASK_SWAPPER which gives some memory-allocation priority to
requests.  This is not needed for swap READ - though it is for writes
where it is set via a different mechanism.

RPC_TASK_ROOTCREDS causes the 'machine' credential to be used.
This is not needed as the root credential is saved when the swap file is
opened, and this is used for all IO.

So NFS_RPC_SWAPFLAGS isn't needed, and as it is the only user of
RPC_TASK_ROOTCREDS, that isn't needed either.

Remove both.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/read.c                 |    4 ----
 include/linux/nfs_fs.h        |    5 -----
 include/linux/sunrpc/sched.h  |    1 -
 include/trace/events/sunrpc.h |    1 -
 net/sunrpc/auth.c             |    2 +-
 5 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index eb00229c1a50..cd797ce3a67c 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -194,10 +194,6 @@ static void nfs_initiate_read(struct nfs_pgio_header *hdr,
 			      const struct nfs_rpc_ops *rpc_ops,
 			      struct rpc_task_setup *task_setup_data, int how)
 {
-	struct inode *inode = hdr->inode;
-	int swap_flags = IS_SWAPFILE(inode) ? NFS_RPC_SWAPFLAGS : 0;
-
-	task_setup_data->flags |= swap_flags;
 	rpc_ops->read_setup(hdr, msg);
 	trace_nfs_initiate_read(hdr);
 }
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 02aa49323d1d..ff8b3820409c 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -45,11 +45,6 @@
  */
 #define NFS_MAX_TRANSPORTS 16
 
-/*
- * These are the default flags for swap requests
- */
-#define NFS_RPC_SWAPFLAGS		(RPC_TASK_SWAPPER|RPC_TASK_ROOTCREDS)
-
 /*
  * Size of the NFS directory verifier
  */
diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index db964bb63912..56710f8056d3 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -124,7 +124,6 @@ struct rpc_task_setup {
 #define RPC_TASK_MOVEABLE	0x0004		/* nfs4.1+ rpc tasks */
 #define RPC_TASK_NULLCREDS	0x0010		/* Use AUTH_NULL credential */
 #define RPC_CALL_MAJORSEEN	0x0020		/* major timeout seen */
-#define RPC_TASK_ROOTCREDS	0x0040		/* force root creds */
 #define RPC_TASK_DYNAMIC	0x0080		/* task was kmalloc'ed */
 #define	RPC_TASK_NO_ROUND_ROBIN	0x0100		/* send requests on "main" xprt */
 #define RPC_TASK_SOFT		0x0200		/* Use soft timeouts */
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index 29982d60b68a..ac33892da411 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -311,7 +311,6 @@ TRACE_EVENT(rpc_request,
 		{ RPC_TASK_MOVEABLE, "MOVEABLE" },			\
 		{ RPC_TASK_NULLCREDS, "NULLCREDS" },			\
 		{ RPC_CALL_MAJORSEEN, "MAJORSEEN" },			\
-		{ RPC_TASK_ROOTCREDS, "ROOTCREDS" },			\
 		{ RPC_TASK_DYNAMIC, "DYNAMIC" },			\
 		{ RPC_TASK_NO_ROUND_ROBIN, "NO_ROUND_ROBIN" },		\
 		{ RPC_TASK_SOFT, "SOFT" },				\
diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index 6bfa19f9fa6a..682fcd24bf43 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -670,7 +670,7 @@ rpcauth_bindcred(struct rpc_task *task, const struct cred *cred, int flags)
 	/* If machine cred couldn't be bound, try a root cred */
 	if (new)
 		;
-	else if (cred == &machine_cred || (flags & RPC_TASK_ROOTCREDS))
+	else if (cred == &machine_cred)
 		new = rpcauth_bind_root_cred(task, lookupflags);
 	else if (flags & RPC_TASK_NULLCREDS)
 		new = authnull_ops.lookup_cred(NULL, NULL, 0);



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 17/21] SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (16 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 20/21] NFS: swap IO handling is slightly different for O_DIRECT IO NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07 15:53   ` Chuck Lever III
  2022-02-07  4:46 ` [PATCH 12/21] SUNRPC/call_alloc: async tasks mustn't block waiting for memory NeilBrown
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

rpc tasks can be marked as RPC_TASK_SWAPPER.  This causes GFP_MEMALLOC
to be used for some allocations.  This is needed in some cases, but not
in all where it is currently provided, and in some where it isn't
provided.

Currently *all* tasks associated with a rpc_client on which swap is
enabled get the flag and hence some GFP_MEMALLOC support.

GFP_MEMALLOC is provided for ->buf_alloc() but only swap-writes need it.
However xdr_alloc_bvec does not get GFP_MEMALLOC - though it often does
need it.

xdr_alloc_bvec is called while the XPRT_LOCK is held.  If this blocks,
then it blocks all other queued tasks.  So this allocation needs
GFP_MEMALLOC for *all* requests, not just writes, when the xprt is used
for any swap writes.

Similarly, if the transport is not connected, that will block all
requests including swap writes, so memory allocations should get
GFP_MEMALLOC if swap writes are possible.

So with this patch:
 1/ we ONLY set RPC_TASK_SWAPPER for swap writes.
 2/ __rpc_execute() sets PF_MEMALLOC while handling any task
    with RPC_TASK_SWAPPER set, or when handling any task that
    holds the XPRT_LOCKED lock on an xprt used for swap.
    This removes the need for the RPC_IS_SWAPPER() test
    in ->buf_alloc handlers.
 3/ xprt_prepare_transmit() sets PF_MEMALLOC after locking
    any task to a swapper xprt.  __rpc_execute() will clear it.
 3/ PF_MEMALLOC is set for all the connect workers.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/write.c                  |    2 ++
 net/sunrpc/clnt.c               |    2 --
 net/sunrpc/sched.c              |   20 +++++++++++++++++---
 net/sunrpc/xprt.c               |    3 +++
 net/sunrpc/xprtrdma/transport.c |    6 ++++--
 net/sunrpc/xprtsock.c           |    8 ++++++++
 6 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 987a187bd39a..9f7176745fef 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1409,6 +1409,8 @@ static void nfs_initiate_write(struct nfs_pgio_header *hdr,
 {
 	int priority = flush_task_priority(how);
 
+	if (IS_SWAPFILE(hdr->inode))
+		task_setup_data->flags |= RPC_TASK_SWAPPER;
 	task_setup_data->priority = priority;
 	rpc_ops->write_setup(hdr, msg, &task_setup_data->rpc_client);
 	trace_nfs_initiate_write(hdr);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index d1fb7c0c7685..842366a2fc57 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1085,8 +1085,6 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
 		task->tk_flags |= RPC_TASK_TIMEOUT;
 	if (clnt->cl_noretranstimeo)
 		task->tk_flags |= RPC_TASK_NO_RETRANS_TIMEOUT;
-	if (atomic_read(&clnt->cl_swapper))
-		task->tk_flags |= RPC_TASK_SWAPPER;
 	/* Add to the client's list of all tasks */
 	spin_lock(&clnt->cl_lock);
 	list_add_tail(&task->tk_task, &clnt->cl_tasks);
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 256302bf6557..9020cedb7c95 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -869,6 +869,15 @@ void rpc_release_calldata(const struct rpc_call_ops *ops, void *calldata)
 		ops->rpc_release(calldata);
 }
 
+static bool xprt_needs_memalloc(struct rpc_xprt *xprt, struct rpc_task *tk)
+{
+	if (!xprt)
+		return false;
+	if (!atomic_read(&xprt->swapper))
+		return false;
+	return test_bit(XPRT_LOCKED, &xprt->state) && xprt->snd_task == tk;
+}
+
 /*
  * This is the RPC `scheduler' (or rather, the finite state machine).
  */
@@ -877,6 +886,7 @@ static void __rpc_execute(struct rpc_task *task)
 	struct rpc_wait_queue *queue;
 	int task_is_async = RPC_IS_ASYNC(task);
 	int status = 0;
+	unsigned long pflags = current->flags;
 
 	WARN_ON_ONCE(RPC_IS_QUEUED(task));
 	if (RPC_IS_QUEUED(task))
@@ -899,6 +909,10 @@ static void __rpc_execute(struct rpc_task *task)
 		}
 		if (!do_action)
 			break;
+		if (RPC_IS_SWAPPER(task) ||
+		    xprt_needs_memalloc(task->tk_xprt, task))
+			current->flags |= PF_MEMALLOC;
+
 		trace_rpc_task_run_action(task, do_action);
 		do_action(task);
 
@@ -936,7 +950,7 @@ static void __rpc_execute(struct rpc_task *task)
 		rpc_clear_running(task);
 		spin_unlock(&queue->lock);
 		if (task_is_async)
-			return;
+			goto out;
 
 		/* sync task: sleep here */
 		trace_rpc_task_sync_sleep(task, task->tk_action);
@@ -960,6 +974,8 @@ static void __rpc_execute(struct rpc_task *task)
 
 	/* Release all resources associated with the task */
 	rpc_release_task(task);
+out:
+	current_restore_flags(pflags, PF_MEMALLOC);
 }
 
 /*
@@ -1018,8 +1034,6 @@ int rpc_malloc(struct rpc_task *task)
 
 	if (RPC_IS_ASYNC(task))
 		gfp = GFP_NOWAIT | __GFP_NOWARN;
-	if (RPC_IS_SWAPPER(task))
-		gfp |= __GFP_MEMALLOC;
 
 	size += sizeof(struct rpc_buffer);
 	if (size <= RPC_BUFFER_MAXSIZE)
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index a0a2583fe941..0614e7463d4b 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1492,6 +1492,9 @@ bool xprt_prepare_transmit(struct rpc_task *task)
 		return false;
 
 	}
+	if (atomic_read(&xprt->swapper))
+		/* This will be clear in __rpc_execute */
+		current->flags |= PF_MEMALLOC;
 	return true;
 }
 
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 923e4b512ee9..6b7e10e5a141 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -235,8 +235,11 @@ xprt_rdma_connect_worker(struct work_struct *work)
 	struct rpcrdma_xprt *r_xprt = container_of(work, struct rpcrdma_xprt,
 						   rx_connect_worker.work);
 	struct rpc_xprt *xprt = &r_xprt->rx_xprt;
+	unsigned int pflags = current->flags;
 	int rc;
 
+	if (atomic_read(&xprt->swapper))
+		current->flags |= PF_MEMALLOC;
 	rc = rpcrdma_xprt_connect(r_xprt);
 	xprt_clear_connecting(xprt);
 	if (!rc) {
@@ -250,6 +253,7 @@ xprt_rdma_connect_worker(struct work_struct *work)
 		rpcrdma_xprt_disconnect(r_xprt);
 	xprt_unlock_connect(xprt, r_xprt);
 	xprt_wake_pending_tasks(xprt, rc);
+	current_restore_flags(pflags, PF_MEMALLOC);
 }
 
 /**
@@ -572,8 +576,6 @@ xprt_rdma_allocate(struct rpc_task *task)
 	flags = RPCRDMA_DEF_GFP;
 	if (RPC_IS_ASYNC(task))
 		flags = GFP_NOWAIT | __GFP_NOWARN;
-	if (RPC_IS_SWAPPER(task))
-		flags |= __GFP_MEMALLOC;
 
 	if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize,
 				  flags))
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 69b6ee5a5fd1..c461a0ce9531 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2047,7 +2047,10 @@ static void xs_udp_setup_socket(struct work_struct *work)
 	struct rpc_xprt *xprt = &transport->xprt;
 	struct socket *sock;
 	int status = -EIO;
+	unsigned int pflags = current->flags;
 
+	if (atomic_read(&xprt->swapper))
+		current->flags |= PF_MEMALLOC;
 	sock = xs_create_sock(xprt, transport,
 			xs_addr(xprt)->sa_family, SOCK_DGRAM,
 			IPPROTO_UDP, false);
@@ -2067,6 +2070,7 @@ static void xs_udp_setup_socket(struct work_struct *work)
 	xprt_clear_connecting(xprt);
 	xprt_unlock_connect(xprt, transport);
 	xprt_wake_pending_tasks(xprt, status);
+	current_restore_flags(pflags, PF_MEMALLOC);
 }
 
 /**
@@ -2226,7 +2230,10 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 	struct socket *sock = transport->sock;
 	struct rpc_xprt *xprt = &transport->xprt;
 	int status;
+	unsigned int pflags = current->flags;
 
+	if (atomic_read(&xprt->swapper))
+		current->flags |= PF_MEMALLOC;
 	if (!sock) {
 		sock = xs_create_sock(xprt, transport,
 				xs_addr(xprt)->sa_family, SOCK_STREAM,
@@ -2291,6 +2298,7 @@ static void xs_tcp_setup_socket(struct work_struct *work)
 	xprt_clear_connecting(xprt);
 out_unlock:
 	xprt_unlock_connect(xprt, transport);
+	current_restore_flags(pflags, PF_MEMALLOC);
 }
 
 /**



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 18/21] NFSv4: keep state manager thread active if swap is enabled
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
  2022-02-07  4:46 ` [PATCH 03/21] MM: move responsibility for setting SWP_FS_OPS to ->swap_activate NeilBrown
  2022-02-07  4:46 ` [PATCH 07/21] DOC: update documentation for swap_activate and swap_rw NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 16/21] NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS NeilBrown
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

If we are swapping over NFSv4, we may not be able to allocate memory to
start the state-manager thread at the time when we need it.
So keep it always running when swap is enabled, and just signal it to
start.

This requires updating and testing the cl_swapper count on the root
rpc_clnt after following all ->cl_parent links.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/file.c           |   15 ++++++++++++---
 fs/nfs/nfs4_fs.h        |    1 +
 fs/nfs/nfs4proc.c       |   20 ++++++++++++++++++++
 fs/nfs/nfs4state.c      |   39 +++++++++++++++++++++++++++++++++------
 include/linux/nfs_xdr.h |    2 ++
 net/sunrpc/clnt.c       |    2 ++
 6 files changed, 70 insertions(+), 9 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 4d4750738aeb..81fe996c6272 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -486,8 +486,9 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 	unsigned long blocks;
 	long long isize;
 	int ret;
-	struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host);
-	struct inode *inode = file->f_mapping->host;
+	struct inode *inode = file_inode(file);
+	struct rpc_clnt *clnt = NFS_CLIENT(inode);
+	struct nfs_client *cl = NFS_SERVER(inode)->nfs_client;
 
 	if (!file->f_mapping->a_ops->swap_rw)
 		/* Cannot support swap */
@@ -512,14 +513,22 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 	}
 	*span = sis->pages;
 	sis->flags |= SWP_FS_OPS;
+
+	if (cl->rpc_ops->enable_swap)
+		cl->rpc_ops->enable_swap(inode);
+
 	return ret;
 }
 
 static void nfs_swap_deactivate(struct file *file)
 {
-	struct rpc_clnt *clnt = NFS_CLIENT(file->f_mapping->host);
+	struct inode *inode = file_inode(file);
+	struct rpc_clnt *clnt = NFS_CLIENT(inode);
+	struct nfs_client *cl = NFS_SERVER(inode)->nfs_client;
 
 	rpc_clnt_swap_deactivate(clnt);
+	if (cl->rpc_ops->disable_swap)
+		cl->rpc_ops->disable_swap(file_inode(file));
 }
 
 const struct address_space_operations nfs_file_aops = {
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 84f39b6f1b1e..79df6e83881b 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -42,6 +42,7 @@ enum nfs4_client_state {
 	NFS4CLNT_LEASE_MOVED,
 	NFS4CLNT_DELEGATION_EXPIRED,
 	NFS4CLNT_RUN_MANAGER,
+	NFS4CLNT_MANAGER_AVAILABLE,
 	NFS4CLNT_RECALL_RUNNING,
 	NFS4CLNT_RECALL_ANY_LAYOUT_READ,
 	NFS4CLNT_RECALL_ANY_LAYOUT_RW,
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index b18f31b2c9e7..d3549f48b012 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -10461,6 +10461,24 @@ static ssize_t nfs4_listxattr(struct dentry *dentry, char *list, size_t size)
 	return error + error2 + error3;
 }
 
+static void nfs4_enable_swap(struct inode *inode)
+{
+	/* The state manager thread must always be running.
+	 * It will notice the client is a swapper, and stay put.
+	 */
+	struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+
+	nfs4_schedule_state_manager(clp);
+}
+
+static void nfs4_disable_swap(struct inode *inode)
+{
+	/* The state manager thread will now exit once it is
+	 * woken.
+	 */
+	wake_up_var(&NFS_SERVER(inode)->nfs_client->cl_state);
+}
+
 static const struct inode_operations nfs4_dir_inode_operations = {
 	.create		= nfs_create,
 	.lookup		= nfs_lookup,
@@ -10538,6 +10556,8 @@ const struct nfs_rpc_ops nfs_v4_clientops = {
 	.create_server	= nfs4_create_server,
 	.clone_server	= nfs_clone_server,
 	.discover_trunking = nfs4_discover_trunking,
+	.enable_swap	= nfs4_enable_swap,
+	.disable_swap	= nfs4_disable_swap,
 };
 
 static const struct xattr_handler nfs4_xattr_nfs4_acl_handler = {
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index f5a62c0d999b..5dc52eefaffb 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1205,10 +1205,17 @@ void nfs4_schedule_state_manager(struct nfs_client *clp)
 {
 	struct task_struct *task;
 	char buf[INET6_ADDRSTRLEN + sizeof("-manager") + 1];
+	struct rpc_clnt *cl = clp->cl_rpcclient;
+
+	while (cl != cl->cl_parent)
+		cl = cl->cl_parent;
 
 	set_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state);
-	if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0)
+	if (test_and_set_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state) != 0) {
+		wake_up_var(&clp->cl_state);
 		return;
+	}
+	set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state);
 	__module_get(THIS_MODULE);
 	refcount_inc(&clp->cl_count);
 
@@ -1224,6 +1231,7 @@ void nfs4_schedule_state_manager(struct nfs_client *clp)
 		printk(KERN_ERR "%s: kthread_run: %ld\n",
 			__func__, PTR_ERR(task));
 		nfs4_clear_state_manager_bit(clp);
+		clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state);
 		nfs_put_client(clp);
 		module_put(THIS_MODULE);
 	}
@@ -2669,11 +2677,8 @@ static void nfs4_state_manager(struct nfs_client *clp)
 			clear_bit(NFS4CLNT_RECALL_RUNNING, &clp->cl_state);
 		}
 
-		/* Did we race with an attempt to give us more work? */
-		if (!test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state))
-			return;
-		if (test_and_set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state) != 0)
-			return;
+		return;
+
 	} while (refcount_read(&clp->cl_count) > 1 && !signalled());
 	goto out_drain;
 
@@ -2693,9 +2698,31 @@ static void nfs4_state_manager(struct nfs_client *clp)
 static int nfs4_run_state_manager(void *ptr)
 {
 	struct nfs_client *clp = ptr;
+	struct rpc_clnt *cl = clp->cl_rpcclient;
+
+	while (cl != cl->cl_parent)
+		cl = cl->cl_parent;
 
 	allow_signal(SIGKILL);
+again:
+	set_bit(NFS4CLNT_MANAGER_RUNNING, &clp->cl_state);
 	nfs4_state_manager(clp);
+	if (atomic_read(&cl->cl_swapper)) {
+		wait_var_event_interruptible(&clp->cl_state,
+					     test_bit(NFS4CLNT_RUN_MANAGER,
+						      &clp->cl_state));
+		if (atomic_read(&cl->cl_swapper) &&
+		    test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state))
+			goto again;
+		/* Either no longer a swapper, or were signalled */
+	}
+	clear_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state);
+
+	if (refcount_read(&clp->cl_count) > 1 && !signalled() &&
+	    test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state) &&
+	    !test_and_set_bit(NFS4CLNT_MANAGER_AVAILABLE, &clp->cl_state))
+		goto again;
+
 	nfs_put_client(clp);
 	module_put_and_kthread_exit(0);
 	return 0;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 728cb0c1f0b6..6861ac8af808 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1798,6 +1798,8 @@ struct nfs_rpc_ops {
 	struct nfs_server *(*clone_server)(struct nfs_server *, struct nfs_fh *,
 					   struct nfs_fattr *, rpc_authflavor_t);
 	int	(*discover_trunking)(struct nfs_server *, struct nfs_fh *);
+	void	(*enable_swap)(struct inode *inode);
+	void	(*disable_swap)(struct inode *inode);
 };
 
 /*
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 842366a2fc57..04ccf6a06ca7 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -3069,6 +3069,8 @@ rpc_clnt_swap_activate_callback(struct rpc_clnt *clnt,
 int
 rpc_clnt_swap_activate(struct rpc_clnt *clnt)
 {
+	while (clnt != clnt->cl_parent)
+		clnt = clnt->cl_parent;
 	if (atomic_inc_return(&clnt->cl_swapper) == 1)
 		return rpc_clnt_iterate_for_each_xprt(clnt,
 				rpc_clnt_swap_activate_callback, NULL);



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 19/21] NFS: rename nfs_direct_IO and use as ->swap_rw
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (19 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 15/21] SUNRPC: remove scheduling boost for "SWAPPER" tasks NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-10 15:22 ` [PATCH 00/21 V4] Repair SWAP-over_NFS Geert Uytterhoeven
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

The nfs_direct_IO() exists to support SWAP IO, but hasn't worked for a
while.  We now need a ->swap_rw function which behaves slightly
differently, returning zero for success rather than a byte count.

So modify nfs_direct_IO accordingly, rename it, and use it as the
->swap_rw function.

Note: it still won't work - that will be fixed in later patches.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/direct.c        |   23 ++++++++++-------------
 fs/nfs/file.c          |    5 +----
 include/linux/nfs_fs.h |    2 +-
 3 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index eabfdab543c8..b929dd5b0c3a 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -153,28 +153,25 @@ nfs_direct_count_bytes(struct nfs_direct_req *dreq,
 }
 
 /**
- * nfs_direct_IO - NFS address space operation for direct I/O
+ * nfs_swap_rw - NFS address space operation for swap I/O
  * @iocb: target I/O control block
  * @iter: I/O buffer
  *
- * The presence of this routine in the address space ops vector means
- * the NFS client supports direct I/O. However, for most direct IO, we
- * shunt off direct read and write requests before the VFS gets them,
- * so this method is only ever called for swap.
+ * Perform IO to the swap-file.  This is much like direct IO.
  */
-ssize_t nfs_direct_IO(struct kiocb *iocb, struct iov_iter *iter)
+int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter)
 {
-	struct inode *inode = iocb->ki_filp->f_mapping->host;
-
-	/* we only support swap file calling nfs_direct_IO */
-	if (!IS_SWAPFILE(inode))
-		return 0;
+	ssize_t ret;
 
 	VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE);
 
 	if (iov_iter_rw(iter) == READ)
-		return nfs_file_direct_read(iocb, iter);
-	return nfs_file_direct_write(iocb, iter);
+		ret = nfs_file_direct_read(iocb, iter);
+	else
+		ret = nfs_file_direct_write(iocb, iter);
+	if (ret < 0)
+		return ret;
+	return 0;
 }
 
 static void nfs_direct_release_pages(struct page **pages, unsigned int npages)
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 81fe996c6272..7d42117b210d 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -490,10 +490,6 @@ static int nfs_swap_activate(struct swap_info_struct *sis, struct file *file,
 	struct rpc_clnt *clnt = NFS_CLIENT(inode);
 	struct nfs_client *cl = NFS_SERVER(inode)->nfs_client;
 
-	if (!file->f_mapping->a_ops->swap_rw)
-		/* Cannot support swap */
-		return -EINVAL;
-
 	spin_lock(&inode->i_lock);
 	blocks = inode->i_blocks;
 	isize = inode->i_size;
@@ -549,6 +545,7 @@ const struct address_space_operations nfs_file_aops = {
 	.error_remove_page = generic_error_remove_page,
 	.swap_activate = nfs_swap_activate,
 	.swap_deactivate = nfs_swap_deactivate,
+	.swap_rw = nfs_swap_rw,
 };
 
 /*
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index ff8b3820409c..58807406aff6 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -506,7 +506,7 @@ static inline const struct cred *nfs_file_cred(struct file *file)
 /*
  * linux/fs/nfs/direct.c
  */
-extern ssize_t nfs_direct_IO(struct kiocb *, struct iov_iter *);
+int nfs_swap_rw(struct kiocb *, struct iov_iter *);
 extern ssize_t nfs_file_direct_read(struct kiocb *iocb,
 			struct iov_iter *iter);
 extern ssize_t nfs_file_direct_write(struct kiocb *iocb,



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 20/21] NFS: swap IO handling is slightly different for O_DIRECT IO
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (15 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 08/21] MM: submit multipage reads for " NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 17/21] SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC NeilBrown
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding
   possible deadlocks with "fs_reclaim".  These deadlocks could, I believe,
   eventuate if a buffered read on the swapfile was attempted.

   We don't need coherence with the page cache for a swap file, and
   buffered writes are forbidden anyway.  There is no other need for
   i_rwsem during direct IO.  So never take it for swap_rw()

2/ generic_write_checks() explicitly forbids writes to swap, and
   performs checks that are not needed for swap.  So bypass it
   for swap_rw().

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/direct.c        |   42 ++++++++++++++++++++++++++++--------------
 fs/nfs/file.c          |    4 ++--
 include/linux/nfs_fs.h |    8 ++++----
 3 files changed, 34 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index b929dd5b0c3a..c5c53219beeb 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -166,9 +166,9 @@ int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter)
 	VM_BUG_ON(iov_iter_count(iter) != PAGE_SIZE);
 
 	if (iov_iter_rw(iter) == READ)
-		ret = nfs_file_direct_read(iocb, iter);
+		ret = nfs_file_direct_read(iocb, iter, true);
 	else
-		ret = nfs_file_direct_write(iocb, iter);
+		ret = nfs_file_direct_write(iocb, iter, true);
 	if (ret < 0)
 		return ret;
 	return 0;
@@ -422,6 +422,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
  * nfs_file_direct_read - file direct read operation for NFS files
  * @iocb: target I/O control block
  * @iter: vector of user buffers into which to read data
+ * @swap: flag indicating this is swap IO, not O_DIRECT IO
  *
  * We use this function for direct reads instead of calling
  * generic_file_aio_read() in order to avoid gfar's check to see if
@@ -437,7 +438,8 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
  * client must read the updated atime from the server back into its
  * cache.
  */
-ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter)
+ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter,
+			     bool swap)
 {
 	struct file *file = iocb->ki_filp;
 	struct address_space *mapping = file->f_mapping;
@@ -479,12 +481,14 @@ ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter)
 	if (iter_is_iovec(iter))
 		dreq->flags = NFS_ODIRECT_SHOULD_DIRTY;
 
-	nfs_start_io_direct(inode);
+	if (!swap)
+		nfs_start_io_direct(inode);
 
 	NFS_I(inode)->read_io += count;
 	requested = nfs_direct_read_schedule_iovec(dreq, iter, iocb->ki_pos);
 
-	nfs_end_io_direct(inode);
+	if (!swap)
+		nfs_end_io_direct(inode);
 
 	if (requested > 0) {
 		result = nfs_direct_wait(dreq);
@@ -873,6 +877,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
  * nfs_file_direct_write - file direct write operation for NFS files
  * @iocb: target I/O control block
  * @iter: vector of user buffers from which to write data
+ * @swap: flag indicating this is swap IO, not O_DIRECT IO
  *
  * We use this function for direct writes instead of calling
  * generic_file_aio_write() in order to avoid taking the inode
@@ -889,7 +894,8 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
  * Note that O_APPEND is not supported for NFS direct writes, as there
  * is no atomic O_APPEND write facility in the NFS protocol.
  */
-ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
+ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter,
+			      bool swap)
 {
 	ssize_t result, requested;
 	size_t count;
@@ -903,7 +909,11 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
 	dfprintk(FILE, "NFS: direct write(%pD2, %zd@%Ld)\n",
 		file, iov_iter_count(iter), (long long) iocb->ki_pos);
 
-	result = generic_write_checks(iocb, iter);
+	if (swap)
+		/* bypass generic checks */
+		result =  iov_iter_count(iter);
+	else
+		result = generic_write_checks(iocb, iter);
 	if (result <= 0)
 		return result;
 	count = result;
@@ -934,16 +944,20 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter)
 		dreq->iocb = iocb;
 	pnfs_init_ds_commit_info_ops(&dreq->ds_cinfo, inode);
 
-	nfs_start_io_direct(inode);
+	if (swap) {
+		requested = nfs_direct_write_schedule_iovec(dreq, iter, pos);
+	} else {
+		nfs_start_io_direct(inode);
 
-	requested = nfs_direct_write_schedule_iovec(dreq, iter, pos);
+		requested = nfs_direct_write_schedule_iovec(dreq, iter, pos);
 
-	if (mapping->nrpages) {
-		invalidate_inode_pages2_range(mapping,
-					      pos >> PAGE_SHIFT, end);
-	}
+		if (mapping->nrpages) {
+			invalidate_inode_pages2_range(mapping,
+						      pos >> PAGE_SHIFT, end);
+		}
 
-	nfs_end_io_direct(inode);
+		nfs_end_io_direct(inode);
+	}
 
 	if (requested > 0) {
 		result = nfs_direct_wait(dreq);
diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 7d42117b210d..ceacae8e7a38 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -159,7 +159,7 @@ nfs_file_read(struct kiocb *iocb, struct iov_iter *to)
 	ssize_t result;
 
 	if (iocb->ki_flags & IOCB_DIRECT)
-		return nfs_file_direct_read(iocb, to);
+		return nfs_file_direct_read(iocb, to, false);
 
 	dprintk("NFS: read(%pD2, %zu@%lu)\n",
 		iocb->ki_filp,
@@ -634,7 +634,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
 		return result;
 
 	if (iocb->ki_flags & IOCB_DIRECT)
-		return nfs_file_direct_write(iocb, from);
+		return nfs_file_direct_write(iocb, from, false);
 
 	dprintk("NFS: write(%pD2, %zu@%Ld)\n",
 		file, iov_iter_count(from), (long long) iocb->ki_pos);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 58807406aff6..22aa5c08e3ed 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -507,10 +507,10 @@ static inline const struct cred *nfs_file_cred(struct file *file)
  * linux/fs/nfs/direct.c
  */
 int nfs_swap_rw(struct kiocb *, struct iov_iter *);
-extern ssize_t nfs_file_direct_read(struct kiocb *iocb,
-			struct iov_iter *iter);
-extern ssize_t nfs_file_direct_write(struct kiocb *iocb,
-			struct iov_iter *iter);
+ssize_t nfs_file_direct_read(struct kiocb *iocb,
+			     struct iov_iter *iter, bool swap);
+ssize_t nfs_file_direct_write(struct kiocb *iocb,
+			      struct iov_iter *iter, bool swap);
 
 /*
  * linux/fs/nfs/dir.c



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 21/21] NFS: swap-out must always use STABLE writes.
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (11 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 14/21] SUNRPC/xprt: async tasks mustn't block waiting for memory NeilBrown
@ 2022-02-07  4:46 ` NeilBrown
  2022-02-07  4:46 ` [PATCH 11/21] NFS: remove IS_SWAPFILE hack NeilBrown
                   ` (8 subsequent siblings)
  21 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-07  4:46 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel

The commit handling code is not safe against memory-pressure deadlocks
when writing to swap.  In particular, nfs_commitdata_alloc() blocks
indefinitely waiting for memory, and this can consume all available
workqueue threads.

swap-out most likely uses STABLE writes anyway as COND_STABLE indicates
that a stable write should be used if the write fits in a single
request, and it normally does.  However if we ever swap with a small
wsize, or gather unusually large numbers of pages for a single write,
this might change.

For safety, make it explicit in the code that direct writes used for swap
must always use FLUSH_STABLE.

Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/direct.c |   10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index c5c53219beeb..4eb2a8380a28 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -791,7 +791,7 @@ static const struct nfs_pgio_completion_ops nfs_direct_write_completion_ops = {
  */
 static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 					       struct iov_iter *iter,
-					       loff_t pos)
+					       loff_t pos, int ioflags)
 {
 	struct nfs_pageio_descriptor desc;
 	struct inode *inode = dreq->inode;
@@ -799,7 +799,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 	size_t requested_bytes = 0;
 	size_t wsize = max_t(size_t, NFS_SERVER(inode)->wsize, PAGE_SIZE);
 
-	nfs_pageio_init_write(&desc, inode, FLUSH_COND_STABLE, false,
+	nfs_pageio_init_write(&desc, inode, ioflags, false,
 			      &nfs_direct_write_completion_ops);
 	desc.pg_dreq = dreq;
 	get_dreq(dreq);
@@ -945,11 +945,13 @@ ssize_t nfs_file_direct_write(struct kiocb *iocb, struct iov_iter *iter,
 	pnfs_init_ds_commit_info_ops(&dreq->ds_cinfo, inode);
 
 	if (swap) {
-		requested = nfs_direct_write_schedule_iovec(dreq, iter, pos);
+		requested = nfs_direct_write_schedule_iovec(dreq, iter, pos,
+							    FLUSH_STABLE);
 	} else {
 		nfs_start_io_direct(inode);
 
-		requested = nfs_direct_write_schedule_iovec(dreq, iter, pos);
+		requested = nfs_direct_write_schedule_iovec(dreq, iter, pos,
+							    FLUSH_COND_STABLE);
 
 		if (mapping->nrpages) {
 			invalidate_inode_pages2_range(mapping,



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 09/21] MM: submit multipage write for SWP_FS_OPS swap-space
  2022-02-07  4:46 ` [PATCH 09/21] MM: submit multipage write for SWP_FS_OPS swap-space NeilBrown
@ 2022-02-07  8:40   ` Christoph Hellwig
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Hellwig @ 2022-02-07  8:40 UTC (permalink / raw)
  To: NeilBrown
  Cc: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells, linux-nfs,
	linux-mm, linux-kernel

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/21] MM: create new mm/swap.h header file.
  2022-02-07  4:46 ` [PATCH 01/21] MM: create new mm/swap.h header file NeilBrown
@ 2022-02-07 13:15   ` kernel test robot
  2022-02-07 14:26     ` kernel test robot
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2022-02-07 13:15 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 10163 bytes --]

Hi NeilBrown,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on hnaz-mm/master cifs/for-next linus/master v5.17-rc3 next-20220207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/NeilBrown/Repair-SWAP-over_NFS/20220207-125206
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: sparc-randconfig-r036-20220207 (https://download.01.org/0day-ci/archive/20220207/202202072103.OsYQRzCU-lkp(a)intel.com/config)
compiler: sparc64-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/06d2bcb84187037252a0f764881ab51965e931ea
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review NeilBrown/Repair-SWAP-over_NFS/20220207-125206
        git checkout 06d2bcb84187037252a0f764881ab51965e931ea
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross O=build_dir ARCH=sparc SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   mm/huge_memory.c: In function '__split_huge_page':
>> mm/huge_memory.c:2423:30: error: implicit declaration of function 'swap_address_space' [-Werror=implicit-function-declaration]
    2423 |                 swap_cache = swap_address_space(entry);
         |                              ^~~~~~~~~~~~~~~~~~
   mm/huge_memory.c:2423:28: warning: assignment to 'struct address_space *' from 'int' makes pointer from integer without a cast [-Wint-conversion]
    2423 |                 swap_cache = swap_address_space(entry);
         |                            ^
   cc1: some warnings being treated as errors


vim +/swap_address_space +2423 mm/huge_memory.c

e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2404  
baa355fd331424 Kirill A. Shutemov      2016-07-26  2405  static void __split_huge_page(struct page *page, struct list_head *list,
b6769834aac1d4 Alex Shi                2020-12-15  2406  		pgoff_t end)
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2407  {
e809c3fedeeb80 Matthew Wilcox (Oracle  2021-06-28  2408) 	struct folio *folio = page_folio(page);
e809c3fedeeb80 Matthew Wilcox (Oracle  2021-06-28  2409) 	struct page *head = &folio->page;
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2410  	struct lruvec *lruvec;
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2411) 	struct address_space *swap_cache = NULL;
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2412) 	unsigned long offset = 0;
8cce54756806e5 Kirill A. Shutemov      2020-10-15  2413  	unsigned int nr = thp_nr_pages(head);
8df651c7059e79 Kirill A. Shutemov      2016-03-15  2414  	int i;
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2415  
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2416  	/* complete memcg works before add pages to LRU */
be6c8982e4ab9a Zhou Guanghui           2021-03-12  2417  	split_page_memcg(head, nr);
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2418  
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2419) 	if (PageAnon(head) && PageSwapCache(head)) {
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2420) 		swp_entry_t entry = { .val = page_private(head) };
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2421) 
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2422) 		offset = swp_offset(entry);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23 @2423) 		swap_cache = swap_address_space(entry);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2424) 		xa_lock(&swap_cache->i_pages);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2425) 	}
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2426) 
f0953a1bbaca71 Ingo Molnar             2021-05-06  2427  	/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
e809c3fedeeb80 Matthew Wilcox (Oracle  2021-06-28  2428) 	lruvec = folio_lruvec_lock(folio);
b6769834aac1d4 Alex Shi                2020-12-15  2429  
eac96c3efdb593 Yang Shi                2021-10-28  2430  	ClearPageHasHWPoisoned(head);
eac96c3efdb593 Yang Shi                2021-10-28  2431  
8cce54756806e5 Kirill A. Shutemov      2020-10-15  2432  	for (i = nr - 1; i >= 1; i--) {
8df651c7059e79 Kirill A. Shutemov      2016-03-15  2433  		__split_huge_page_tail(head, i, lruvec, list);
d144bf6205342a Hugh Dickins            2021-09-02  2434  		/* Some pages can be beyond EOF: drop them from page cache */
baa355fd331424 Kirill A. Shutemov      2016-07-26  2435  		if (head[i].index >= end) {
2d077d4b59924a Hugh Dickins            2018-06-01  2436  			ClearPageDirty(head + i);
baa355fd331424 Kirill A. Shutemov      2016-07-26  2437  			__delete_from_page_cache(head + i, NULL);
d144bf6205342a Hugh Dickins            2021-09-02  2438  			if (shmem_mapping(head->mapping))
800d8c63b2e989 Kirill A. Shutemov      2016-07-26  2439  				shmem_uncharge(head->mapping->host, 1);
baa355fd331424 Kirill A. Shutemov      2016-07-26  2440  			put_page(head + i);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2441) 		} else if (!PageAnon(page)) {
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2442) 			__xa_store(&head->mapping->i_pages, head[i].index,
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2443) 					head + i, 0);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2444) 		} else if (swap_cache) {
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2445) 			__xa_store(&swap_cache->i_pages, offset + i,
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2446) 					head + i, 0);
baa355fd331424 Kirill A. Shutemov      2016-07-26  2447  		}
baa355fd331424 Kirill A. Shutemov      2016-07-26  2448  	}
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2449  
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2450  	ClearPageCompound(head);
6168d0da2b479c Alex Shi                2020-12-15  2451  	unlock_page_lruvec(lruvec);
b6769834aac1d4 Alex Shi                2020-12-15  2452  	/* Caller disabled irqs, so they are still disabled here */
f7da677bc6e720 Vlastimil Babka         2019-08-24  2453  
8cce54756806e5 Kirill A. Shutemov      2020-10-15  2454  	split_page_owner(head, nr);
f7da677bc6e720 Vlastimil Babka         2019-08-24  2455  
baa355fd331424 Kirill A. Shutemov      2016-07-26  2456  	/* See comment in __split_huge_page_tail() */
baa355fd331424 Kirill A. Shutemov      2016-07-26  2457  	if (PageAnon(head)) {
aa5dc07f70c50a Matthew Wilcox          2017-12-04  2458  		/* Additional pin to swap cache */
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2459) 		if (PageSwapCache(head)) {
38d8b4e6bdc872 Huang Ying              2017-07-06  2460  			page_ref_add(head, 2);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2461) 			xa_unlock(&swap_cache->i_pages);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2462) 		} else {
baa355fd331424 Kirill A. Shutemov      2016-07-26  2463  			page_ref_inc(head);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2464) 		}
baa355fd331424 Kirill A. Shutemov      2016-07-26  2465  	} else {
aa5dc07f70c50a Matthew Wilcox          2017-12-04  2466  		/* Additional pin to page cache */
baa355fd331424 Kirill A. Shutemov      2016-07-26  2467  		page_ref_add(head, 2);
b93b016313b3ba Matthew Wilcox          2018-04-10  2468  		xa_unlock(&head->mapping->i_pages);
baa355fd331424 Kirill A. Shutemov      2016-07-26  2469  	}
b6769834aac1d4 Alex Shi                2020-12-15  2470  	local_irq_enable();
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2471  
8cce54756806e5 Kirill A. Shutemov      2020-10-15  2472  	remap_page(head, nr);
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2473  
c4f9c701f9b442 Huang Ying              2020-10-15  2474  	if (PageSwapCache(head)) {
c4f9c701f9b442 Huang Ying              2020-10-15  2475  		swp_entry_t entry = { .val = page_private(head) };
c4f9c701f9b442 Huang Ying              2020-10-15  2476  
c4f9c701f9b442 Huang Ying              2020-10-15  2477  		split_swap_cluster(entry);
c4f9c701f9b442 Huang Ying              2020-10-15  2478  	}
c4f9c701f9b442 Huang Ying              2020-10-15  2479  
8cce54756806e5 Kirill A. Shutemov      2020-10-15  2480  	for (i = 0; i < nr; i++) {
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2481  		struct page *subpage = head + i;
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2482  		if (subpage == page)
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2483  			continue;
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2484  		unlock_page(subpage);
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2485  
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2486  		/*
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2487  		 * Subpages may be freed if there wasn't any mapping
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2488  		 * like if add_to_swap() is running on a lru page that
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2489  		 * had its mapping zapped. And freeing these pages
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2490  		 * requires taking the lru_lock so we do the put_page
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2491  		 * of the tail pages after the split is complete.
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2492  		 */
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2493  		put_page(subpage);
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2494  	}
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2495  }
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2496  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/21] MM: create new mm/swap.h header file.
  2022-02-07  4:46 ` [PATCH 01/21] MM: create new mm/swap.h header file NeilBrown
@ 2022-02-07 14:26     ` kernel test robot
  2022-02-07 14:26     ` kernel test robot
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2022-02-07 14:26 UTC (permalink / raw)
  To: NeilBrown; +Cc: llvm, kbuild-all

Hi NeilBrown,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on hnaz-mm/master cifs/for-next linus/master v5.17-rc3 next-20220207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/NeilBrown/Repair-SWAP-over_NFS/20220207-125206
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: x86_64-randconfig-a002-20220207 (https://download.01.org/0day-ci/archive/20220207/202202072219.lW7FXue8-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 0d8850ae2cae85d49bea6ae0799fa41c7202c05c)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/06d2bcb84187037252a0f764881ab51965e931ea
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review NeilBrown/Repair-SWAP-over_NFS/20220207-125206
        git checkout 06d2bcb84187037252a0f764881ab51965e931ea
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/huge_memory.c:2423:16: error: implicit declaration of function 'swap_address_space' [-Werror,-Wimplicit-function-declaration]
                   swap_cache = swap_address_space(entry);
                                ^
   mm/huge_memory.c:2423:14: warning: incompatible integer to pointer conversion assigning to 'struct address_space *' from 'int' [-Wint-conversion]
                   swap_cache = swap_address_space(entry);
                              ^ ~~~~~~~~~~~~~~~~~~~~~~~~~
   1 warning and 1 error generated.


vim +/swap_address_space +2423 mm/huge_memory.c

e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2404  
baa355fd331424 Kirill A. Shutemov      2016-07-26  2405  static void __split_huge_page(struct page *page, struct list_head *list,
b6769834aac1d4 Alex Shi                2020-12-15  2406  		pgoff_t end)
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2407  {
e809c3fedeeb80 Matthew Wilcox (Oracle  2021-06-28  2408) 	struct folio *folio = page_folio(page);
e809c3fedeeb80 Matthew Wilcox (Oracle  2021-06-28  2409) 	struct page *head = &folio->page;
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2410  	struct lruvec *lruvec;
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2411) 	struct address_space *swap_cache = NULL;
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2412) 	unsigned long offset = 0;
8cce54756806e5 Kirill A. Shutemov      2020-10-15  2413  	unsigned int nr = thp_nr_pages(head);
8df651c7059e79 Kirill A. Shutemov      2016-03-15  2414  	int i;
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2415  
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2416  	/* complete memcg works before add pages to LRU */
be6c8982e4ab9a Zhou Guanghui           2021-03-12  2417  	split_page_memcg(head, nr);
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2418  
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2419) 	if (PageAnon(head) && PageSwapCache(head)) {
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2420) 		swp_entry_t entry = { .val = page_private(head) };
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2421) 
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2422) 		offset = swp_offset(entry);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23 @2423) 		swap_cache = swap_address_space(entry);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2424) 		xa_lock(&swap_cache->i_pages);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2425) 	}
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2426) 
f0953a1bbaca71 Ingo Molnar             2021-05-06  2427  	/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
e809c3fedeeb80 Matthew Wilcox (Oracle  2021-06-28  2428) 	lruvec = folio_lruvec_lock(folio);
b6769834aac1d4 Alex Shi                2020-12-15  2429  
eac96c3efdb593 Yang Shi                2021-10-28  2430  	ClearPageHasHWPoisoned(head);
eac96c3efdb593 Yang Shi                2021-10-28  2431  
8cce54756806e5 Kirill A. Shutemov      2020-10-15  2432  	for (i = nr - 1; i >= 1; i--) {
8df651c7059e79 Kirill A. Shutemov      2016-03-15  2433  		__split_huge_page_tail(head, i, lruvec, list);
d144bf6205342a Hugh Dickins            2021-09-02  2434  		/* Some pages can be beyond EOF: drop them from page cache */
baa355fd331424 Kirill A. Shutemov      2016-07-26  2435  		if (head[i].index >= end) {
2d077d4b59924a Hugh Dickins            2018-06-01  2436  			ClearPageDirty(head + i);
baa355fd331424 Kirill A. Shutemov      2016-07-26  2437  			__delete_from_page_cache(head + i, NULL);
d144bf6205342a Hugh Dickins            2021-09-02  2438  			if (shmem_mapping(head->mapping))
800d8c63b2e989 Kirill A. Shutemov      2016-07-26  2439  				shmem_uncharge(head->mapping->host, 1);
baa355fd331424 Kirill A. Shutemov      2016-07-26  2440  			put_page(head + i);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2441) 		} else if (!PageAnon(page)) {
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2442) 			__xa_store(&head->mapping->i_pages, head[i].index,
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2443) 					head + i, 0);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2444) 		} else if (swap_cache) {
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2445) 			__xa_store(&swap_cache->i_pages, offset + i,
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2446) 					head + i, 0);
baa355fd331424 Kirill A. Shutemov      2016-07-26  2447  		}
baa355fd331424 Kirill A. Shutemov      2016-07-26  2448  	}
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2449  
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2450  	ClearPageCompound(head);
6168d0da2b479c Alex Shi                2020-12-15  2451  	unlock_page_lruvec(lruvec);
b6769834aac1d4 Alex Shi                2020-12-15  2452  	/* Caller disabled irqs, so they are still disabled here */
f7da677bc6e720 Vlastimil Babka         2019-08-24  2453  
8cce54756806e5 Kirill A. Shutemov      2020-10-15  2454  	split_page_owner(head, nr);
f7da677bc6e720 Vlastimil Babka         2019-08-24  2455  
baa355fd331424 Kirill A. Shutemov      2016-07-26  2456  	/* See comment in __split_huge_page_tail() */
baa355fd331424 Kirill A. Shutemov      2016-07-26  2457  	if (PageAnon(head)) {
aa5dc07f70c50a Matthew Wilcox          2017-12-04  2458  		/* Additional pin to swap cache */
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2459) 		if (PageSwapCache(head)) {
38d8b4e6bdc872 Huang Ying              2017-07-06  2460  			page_ref_add(head, 2);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2461) 			xa_unlock(&swap_cache->i_pages);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2462) 		} else {
baa355fd331424 Kirill A. Shutemov      2016-07-26  2463  			page_ref_inc(head);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2464) 		}
baa355fd331424 Kirill A. Shutemov      2016-07-26  2465  	} else {
aa5dc07f70c50a Matthew Wilcox          2017-12-04  2466  		/* Additional pin to page cache */
baa355fd331424 Kirill A. Shutemov      2016-07-26  2467  		page_ref_add(head, 2);
b93b016313b3ba Matthew Wilcox          2018-04-10  2468  		xa_unlock(&head->mapping->i_pages);
baa355fd331424 Kirill A. Shutemov      2016-07-26  2469  	}
b6769834aac1d4 Alex Shi                2020-12-15  2470  	local_irq_enable();
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2471  
8cce54756806e5 Kirill A. Shutemov      2020-10-15  2472  	remap_page(head, nr);
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2473  
c4f9c701f9b442 Huang Ying              2020-10-15  2474  	if (PageSwapCache(head)) {
c4f9c701f9b442 Huang Ying              2020-10-15  2475  		swp_entry_t entry = { .val = page_private(head) };
c4f9c701f9b442 Huang Ying              2020-10-15  2476  
c4f9c701f9b442 Huang Ying              2020-10-15  2477  		split_swap_cluster(entry);
c4f9c701f9b442 Huang Ying              2020-10-15  2478  	}
c4f9c701f9b442 Huang Ying              2020-10-15  2479  
8cce54756806e5 Kirill A. Shutemov      2020-10-15  2480  	for (i = 0; i < nr; i++) {
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2481  		struct page *subpage = head + i;
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2482  		if (subpage == page)
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2483  			continue;
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2484  		unlock_page(subpage);
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2485  
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2486  		/*
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2487  		 * Subpages may be freed if there wasn't any mapping
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2488  		 * like if add_to_swap() is running on a lru page that
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2489  		 * had its mapping zapped. And freeing these pages
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2490  		 * requires taking the lru_lock so we do the put_page
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2491  		 * of the tail pages after the split is complete.
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2492  		 */
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2493  		put_page(subpage);
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2494  	}
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2495  }
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2496  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/21] MM: create new mm/swap.h header file.
@ 2022-02-07 14:26     ` kernel test robot
  0 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2022-02-07 14:26 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 10148 bytes --]

Hi NeilBrown,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on hnaz-mm/master cifs/for-next linus/master v5.17-rc3 next-20220207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/NeilBrown/Repair-SWAP-over_NFS/20220207-125206
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: x86_64-randconfig-a002-20220207 (https://download.01.org/0day-ci/archive/20220207/202202072219.lW7FXue8-lkp(a)intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 0d8850ae2cae85d49bea6ae0799fa41c7202c05c)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/06d2bcb84187037252a0f764881ab51965e931ea
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review NeilBrown/Repair-SWAP-over_NFS/20220207-125206
        git checkout 06d2bcb84187037252a0f764881ab51965e931ea
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/huge_memory.c:2423:16: error: implicit declaration of function 'swap_address_space' [-Werror,-Wimplicit-function-declaration]
                   swap_cache = swap_address_space(entry);
                                ^
   mm/huge_memory.c:2423:14: warning: incompatible integer to pointer conversion assigning to 'struct address_space *' from 'int' [-Wint-conversion]
                   swap_cache = swap_address_space(entry);
                              ^ ~~~~~~~~~~~~~~~~~~~~~~~~~
   1 warning and 1 error generated.


vim +/swap_address_space +2423 mm/huge_memory.c

e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2404  
baa355fd331424 Kirill A. Shutemov      2016-07-26  2405  static void __split_huge_page(struct page *page, struct list_head *list,
b6769834aac1d4 Alex Shi                2020-12-15  2406  		pgoff_t end)
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2407  {
e809c3fedeeb80 Matthew Wilcox (Oracle  2021-06-28  2408) 	struct folio *folio = page_folio(page);
e809c3fedeeb80 Matthew Wilcox (Oracle  2021-06-28  2409) 	struct page *head = &folio->page;
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2410  	struct lruvec *lruvec;
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2411) 	struct address_space *swap_cache = NULL;
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2412) 	unsigned long offset = 0;
8cce54756806e5 Kirill A. Shutemov      2020-10-15  2413  	unsigned int nr = thp_nr_pages(head);
8df651c7059e79 Kirill A. Shutemov      2016-03-15  2414  	int i;
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2415  
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2416  	/* complete memcg works before add pages to LRU */
be6c8982e4ab9a Zhou Guanghui           2021-03-12  2417  	split_page_memcg(head, nr);
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2418  
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2419) 	if (PageAnon(head) && PageSwapCache(head)) {
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2420) 		swp_entry_t entry = { .val = page_private(head) };
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2421) 
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2422) 		offset = swp_offset(entry);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23 @2423) 		swap_cache = swap_address_space(entry);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2424) 		xa_lock(&swap_cache->i_pages);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2425) 	}
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2426) 
f0953a1bbaca71 Ingo Molnar             2021-05-06  2427  	/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
e809c3fedeeb80 Matthew Wilcox (Oracle  2021-06-28  2428) 	lruvec = folio_lruvec_lock(folio);
b6769834aac1d4 Alex Shi                2020-12-15  2429  
eac96c3efdb593 Yang Shi                2021-10-28  2430  	ClearPageHasHWPoisoned(head);
eac96c3efdb593 Yang Shi                2021-10-28  2431  
8cce54756806e5 Kirill A. Shutemov      2020-10-15  2432  	for (i = nr - 1; i >= 1; i--) {
8df651c7059e79 Kirill A. Shutemov      2016-03-15  2433  		__split_huge_page_tail(head, i, lruvec, list);
d144bf6205342a Hugh Dickins            2021-09-02  2434  		/* Some pages can be beyond EOF: drop them from page cache */
baa355fd331424 Kirill A. Shutemov      2016-07-26  2435  		if (head[i].index >= end) {
2d077d4b59924a Hugh Dickins            2018-06-01  2436  			ClearPageDirty(head + i);
baa355fd331424 Kirill A. Shutemov      2016-07-26  2437  			__delete_from_page_cache(head + i, NULL);
d144bf6205342a Hugh Dickins            2021-09-02  2438  			if (shmem_mapping(head->mapping))
800d8c63b2e989 Kirill A. Shutemov      2016-07-26  2439  				shmem_uncharge(head->mapping->host, 1);
baa355fd331424 Kirill A. Shutemov      2016-07-26  2440  			put_page(head + i);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2441) 		} else if (!PageAnon(page)) {
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2442) 			__xa_store(&head->mapping->i_pages, head[i].index,
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2443) 					head + i, 0);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2444) 		} else if (swap_cache) {
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2445) 			__xa_store(&swap_cache->i_pages, offset + i,
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2446) 					head + i, 0);
baa355fd331424 Kirill A. Shutemov      2016-07-26  2447  		}
baa355fd331424 Kirill A. Shutemov      2016-07-26  2448  	}
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2449  
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2450  	ClearPageCompound(head);
6168d0da2b479c Alex Shi                2020-12-15  2451  	unlock_page_lruvec(lruvec);
b6769834aac1d4 Alex Shi                2020-12-15  2452  	/* Caller disabled irqs, so they are still disabled here */
f7da677bc6e720 Vlastimil Babka         2019-08-24  2453  
8cce54756806e5 Kirill A. Shutemov      2020-10-15  2454  	split_page_owner(head, nr);
f7da677bc6e720 Vlastimil Babka         2019-08-24  2455  
baa355fd331424 Kirill A. Shutemov      2016-07-26  2456  	/* See comment in __split_huge_page_tail() */
baa355fd331424 Kirill A. Shutemov      2016-07-26  2457  	if (PageAnon(head)) {
aa5dc07f70c50a Matthew Wilcox          2017-12-04  2458  		/* Additional pin to swap cache */
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2459) 		if (PageSwapCache(head)) {
38d8b4e6bdc872 Huang Ying              2017-07-06  2460  			page_ref_add(head, 2);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2461) 			xa_unlock(&swap_cache->i_pages);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2462) 		} else {
baa355fd331424 Kirill A. Shutemov      2016-07-26  2463  			page_ref_inc(head);
4101196b19d7f9 Matthew Wilcox (Oracle  2019-09-23  2464) 		}
baa355fd331424 Kirill A. Shutemov      2016-07-26  2465  	} else {
aa5dc07f70c50a Matthew Wilcox          2017-12-04  2466  		/* Additional pin to page cache */
baa355fd331424 Kirill A. Shutemov      2016-07-26  2467  		page_ref_add(head, 2);
b93b016313b3ba Matthew Wilcox          2018-04-10  2468  		xa_unlock(&head->mapping->i_pages);
baa355fd331424 Kirill A. Shutemov      2016-07-26  2469  	}
b6769834aac1d4 Alex Shi                2020-12-15  2470  	local_irq_enable();
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2471  
8cce54756806e5 Kirill A. Shutemov      2020-10-15  2472  	remap_page(head, nr);
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2473  
c4f9c701f9b442 Huang Ying              2020-10-15  2474  	if (PageSwapCache(head)) {
c4f9c701f9b442 Huang Ying              2020-10-15  2475  		swp_entry_t entry = { .val = page_private(head) };
c4f9c701f9b442 Huang Ying              2020-10-15  2476  
c4f9c701f9b442 Huang Ying              2020-10-15  2477  		split_swap_cluster(entry);
c4f9c701f9b442 Huang Ying              2020-10-15  2478  	}
c4f9c701f9b442 Huang Ying              2020-10-15  2479  
8cce54756806e5 Kirill A. Shutemov      2020-10-15  2480  	for (i = 0; i < nr; i++) {
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2481  		struct page *subpage = head + i;
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2482  		if (subpage == page)
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2483  			continue;
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2484  		unlock_page(subpage);
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2485  
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2486  		/*
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2487  		 * Subpages may be freed if there wasn't any mapping
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2488  		 * like if add_to_swap() is running on a lru page that
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2489  		 * had its mapping zapped. And freeing these pages
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2490  		 * requires taking the lru_lock so we do the put_page
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2491  		 * of the tail pages after the split is complete.
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2492  		 */
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2493  		put_page(subpage);
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2494  	}
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2495  }
e9b61f19858a5d Kirill A. Shutemov      2016-01-15  2496  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/21] MM: create new mm/swap.h header file.
  2022-02-07  4:46 ` [PATCH 01/21] MM: create new mm/swap.h header file NeilBrown
@ 2022-02-07 15:18     ` kernel test robot
  2022-02-07 14:26     ` kernel test robot
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2022-02-07 15:18 UTC (permalink / raw)
  To: NeilBrown; +Cc: llvm, kbuild-all

Hi NeilBrown,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on hnaz-mm/master cifs/for-next linus/master v5.17-rc3 next-20220207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/NeilBrown/Repair-SWAP-over_NFS/20220207-125206
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: hexagon-randconfig-r005-20220207 (https://download.01.org/0day-ci/archive/20220207/202202072351.RqMHsM0e-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 0d8850ae2cae85d49bea6ae0799fa41c7202c05c)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/06d2bcb84187037252a0f764881ab51965e931ea
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review NeilBrown/Repair-SWAP-over_NFS/20220207-125206
        git checkout 06d2bcb84187037252a0f764881ab51965e931ea
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/zswap.c:906:13: error: implicit declaration of function '__read_swap_cache_async' [-Werror,-Wimplicit-function-declaration]
           *retpage = __read_swap_cache_async(entry, GFP_KERNEL,
                      ^
   mm/zswap.c:906:11: warning: incompatible integer to pointer conversion assigning to 'struct page *' from 'int' [-Wint-conversion]
           *retpage = __read_swap_cache_async(entry, GFP_KERNEL,
                    ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> mm/zswap.c:1014:2: error: implicit declaration of function '__swap_writepage' [-Werror,-Wimplicit-function-declaration]
           __swap_writepage(page, &wbc, end_swap_bio_write);
           ^
>> mm/zswap.c:1014:31: error: use of undeclared identifier 'end_swap_bio_write'
           __swap_writepage(page, &wbc, end_swap_bio_write);
                                        ^
   1 warning and 3 errors generated.


vim +/__read_swap_cache_async +906 mm/zswap.c

2b2811178e85553 Seth Jennings      2013-07-10   884  
2b2811178e85553 Seth Jennings      2013-07-10   885  /*
2b2811178e85553 Seth Jennings      2013-07-10   886   * zswap_get_swap_cache_page
2b2811178e85553 Seth Jennings      2013-07-10   887   *
2b2811178e85553 Seth Jennings      2013-07-10   888   * This is an adaption of read_swap_cache_async()
2b2811178e85553 Seth Jennings      2013-07-10   889   *
2b2811178e85553 Seth Jennings      2013-07-10   890   * This function tries to find a page with the given swap entry
2b2811178e85553 Seth Jennings      2013-07-10   891   * in the swapper_space address space (the swap cache).  If the page
2b2811178e85553 Seth Jennings      2013-07-10   892   * is found, it is returned in retpage.  Otherwise, a page is allocated,
2b2811178e85553 Seth Jennings      2013-07-10   893   * added to the swap cache, and returned in retpage.
2b2811178e85553 Seth Jennings      2013-07-10   894   *
2b2811178e85553 Seth Jennings      2013-07-10   895   * If success, the swap cache page is returned in retpage
67d13fe846c57a5 Weijie Yang        2013-11-12   896   * Returns ZSWAP_SWAPCACHE_EXIST if page was already in the swap cache
67d13fe846c57a5 Weijie Yang        2013-11-12   897   * Returns ZSWAP_SWAPCACHE_NEW if the new page needs to be populated,
67d13fe846c57a5 Weijie Yang        2013-11-12   898   *     the new page is added to swapcache and locked
67d13fe846c57a5 Weijie Yang        2013-11-12   899   * Returns ZSWAP_SWAPCACHE_FAIL on error
2b2811178e85553 Seth Jennings      2013-07-10   900   */
2b2811178e85553 Seth Jennings      2013-07-10   901  static int zswap_get_swap_cache_page(swp_entry_t entry,
2b2811178e85553 Seth Jennings      2013-07-10   902  				struct page **retpage)
2b2811178e85553 Seth Jennings      2013-07-10   903  {
5b999aadbae6569 Dmitry Safonov     2015-09-08   904  	bool page_was_allocated;
2b2811178e85553 Seth Jennings      2013-07-10   905  
5b999aadbae6569 Dmitry Safonov     2015-09-08  @906  	*retpage = __read_swap_cache_async(entry, GFP_KERNEL,
5b999aadbae6569 Dmitry Safonov     2015-09-08   907  			NULL, 0, &page_was_allocated);
5b999aadbae6569 Dmitry Safonov     2015-09-08   908  	if (page_was_allocated)
2b2811178e85553 Seth Jennings      2013-07-10   909  		return ZSWAP_SWAPCACHE_NEW;
5b999aadbae6569 Dmitry Safonov     2015-09-08   910  	if (!*retpage)
67d13fe846c57a5 Weijie Yang        2013-11-12   911  		return ZSWAP_SWAPCACHE_FAIL;
2b2811178e85553 Seth Jennings      2013-07-10   912  	return ZSWAP_SWAPCACHE_EXIST;
2b2811178e85553 Seth Jennings      2013-07-10   913  }
2b2811178e85553 Seth Jennings      2013-07-10   914  
2b2811178e85553 Seth Jennings      2013-07-10   915  /*
2b2811178e85553 Seth Jennings      2013-07-10   916   * Attempts to free an entry by adding a page to the swap cache,
2b2811178e85553 Seth Jennings      2013-07-10   917   * decompressing the entry data into the page, and issuing a
2b2811178e85553 Seth Jennings      2013-07-10   918   * bio write to write the page back to the swap device.
2b2811178e85553 Seth Jennings      2013-07-10   919   *
2b2811178e85553 Seth Jennings      2013-07-10   920   * This can be thought of as a "resumed writeback" of the page
2b2811178e85553 Seth Jennings      2013-07-10   921   * to the swap device.  We are basically resuming the same swap
2b2811178e85553 Seth Jennings      2013-07-10   922   * writeback path that was intercepted with the frontswap_store()
2b2811178e85553 Seth Jennings      2013-07-10   923   * in the first place.  After the page has been decompressed into
2b2811178e85553 Seth Jennings      2013-07-10   924   * the swap cache, the compressed version stored by zswap can be
2b2811178e85553 Seth Jennings      2013-07-10   925   * freed.
2b2811178e85553 Seth Jennings      2013-07-10   926   */
12d79d64bfd3913 Dan Streetman      2014-08-06   927  static int zswap_writeback_entry(struct zpool *pool, unsigned long handle)
2b2811178e85553 Seth Jennings      2013-07-10   928  {
2b2811178e85553 Seth Jennings      2013-07-10   929  	struct zswap_header *zhdr;
2b2811178e85553 Seth Jennings      2013-07-10   930  	swp_entry_t swpentry;
2b2811178e85553 Seth Jennings      2013-07-10   931  	struct zswap_tree *tree;
2b2811178e85553 Seth Jennings      2013-07-10   932  	pgoff_t offset;
2b2811178e85553 Seth Jennings      2013-07-10   933  	struct zswap_entry *entry;
2b2811178e85553 Seth Jennings      2013-07-10   934  	struct page *page;
1ec3b5fe6eec782 Barry Song         2020-12-14   935  	struct scatterlist input, output;
1ec3b5fe6eec782 Barry Song         2020-12-14   936  	struct crypto_acomp_ctx *acomp_ctx;
1ec3b5fe6eec782 Barry Song         2020-12-14   937  
fc6697a89f56d97 Tian Tao           2021-02-25   938  	u8 *src, *tmp = NULL;
2b2811178e85553 Seth Jennings      2013-07-10   939  	unsigned int dlen;
0ab0abcf511545d Weijie Yang        2013-11-12   940  	int ret;
2b2811178e85553 Seth Jennings      2013-07-10   941  	struct writeback_control wbc = {
2b2811178e85553 Seth Jennings      2013-07-10   942  		.sync_mode = WB_SYNC_NONE,
2b2811178e85553 Seth Jennings      2013-07-10   943  	};
2b2811178e85553 Seth Jennings      2013-07-10   944  
fc6697a89f56d97 Tian Tao           2021-02-25   945  	if (!zpool_can_sleep_mapped(pool)) {
fc6697a89f56d97 Tian Tao           2021-02-25   946  		tmp = kmalloc(PAGE_SIZE, GFP_ATOMIC);
fc6697a89f56d97 Tian Tao           2021-02-25   947  		if (!tmp)
fc6697a89f56d97 Tian Tao           2021-02-25   948  			return -ENOMEM;
fc6697a89f56d97 Tian Tao           2021-02-25   949  	}
fc6697a89f56d97 Tian Tao           2021-02-25   950  
2b2811178e85553 Seth Jennings      2013-07-10   951  	/* extract swpentry from data */
12d79d64bfd3913 Dan Streetman      2014-08-06   952  	zhdr = zpool_map_handle(pool, handle, ZPOOL_MM_RO);
2b2811178e85553 Seth Jennings      2013-07-10   953  	swpentry = zhdr->swpentry; /* here */
2b2811178e85553 Seth Jennings      2013-07-10   954  	tree = zswap_trees[swp_type(swpentry)];
2b2811178e85553 Seth Jennings      2013-07-10   955  	offset = swp_offset(swpentry);
2b2811178e85553 Seth Jennings      2013-07-10   956  
2b2811178e85553 Seth Jennings      2013-07-10   957  	/* find and ref zswap entry */
2b2811178e85553 Seth Jennings      2013-07-10   958  	spin_lock(&tree->lock);
0ab0abcf511545d Weijie Yang        2013-11-12   959  	entry = zswap_entry_find_get(&tree->rbroot, offset);
2b2811178e85553 Seth Jennings      2013-07-10   960  	if (!entry) {
2b2811178e85553 Seth Jennings      2013-07-10   961  		/* entry was invalidated */
2b2811178e85553 Seth Jennings      2013-07-10   962  		spin_unlock(&tree->lock);
068619e32ff6229 Vitaly Wool        2019-09-23   963  		zpool_unmap_handle(pool, handle);
fc6697a89f56d97 Tian Tao           2021-02-25   964  		kfree(tmp);
2b2811178e85553 Seth Jennings      2013-07-10   965  		return 0;
2b2811178e85553 Seth Jennings      2013-07-10   966  	}
2b2811178e85553 Seth Jennings      2013-07-10   967  	spin_unlock(&tree->lock);
2b2811178e85553 Seth Jennings      2013-07-10   968  	BUG_ON(offset != entry->offset);
2b2811178e85553 Seth Jennings      2013-07-10   969  
46b76f2e09dc35f Miaohe Lin         2021-06-30   970  	src = (u8 *)zhdr + sizeof(struct zswap_header);
46b76f2e09dc35f Miaohe Lin         2021-06-30   971  	if (!zpool_can_sleep_mapped(pool)) {
46b76f2e09dc35f Miaohe Lin         2021-06-30   972  		memcpy(tmp, src, entry->length);
46b76f2e09dc35f Miaohe Lin         2021-06-30   973  		src = tmp;
46b76f2e09dc35f Miaohe Lin         2021-06-30   974  		zpool_unmap_handle(pool, handle);
46b76f2e09dc35f Miaohe Lin         2021-06-30   975  	}
46b76f2e09dc35f Miaohe Lin         2021-06-30   976  
2b2811178e85553 Seth Jennings      2013-07-10   977  	/* try to allocate swap cache page */
2b2811178e85553 Seth Jennings      2013-07-10   978  	switch (zswap_get_swap_cache_page(swpentry, &page)) {
67d13fe846c57a5 Weijie Yang        2013-11-12   979  	case ZSWAP_SWAPCACHE_FAIL: /* no memory or invalidate happened */
2b2811178e85553 Seth Jennings      2013-07-10   980  		ret = -ENOMEM;
2b2811178e85553 Seth Jennings      2013-07-10   981  		goto fail;
2b2811178e85553 Seth Jennings      2013-07-10   982  
67d13fe846c57a5 Weijie Yang        2013-11-12   983  	case ZSWAP_SWAPCACHE_EXIST:
2b2811178e85553 Seth Jennings      2013-07-10   984  		/* page is already in the swap cache, ignore for now */
09cbfeaf1a5a67b Kirill A. Shutemov 2016-04-01   985  		put_page(page);
2b2811178e85553 Seth Jennings      2013-07-10   986  		ret = -EEXIST;
2b2811178e85553 Seth Jennings      2013-07-10   987  		goto fail;
2b2811178e85553 Seth Jennings      2013-07-10   988  
2b2811178e85553 Seth Jennings      2013-07-10   989  	case ZSWAP_SWAPCACHE_NEW: /* page is locked */
2b2811178e85553 Seth Jennings      2013-07-10   990  		/* decompress */
1ec3b5fe6eec782 Barry Song         2020-12-14   991  		acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
2b2811178e85553 Seth Jennings      2013-07-10   992  		dlen = PAGE_SIZE;
fc6697a89f56d97 Tian Tao           2021-02-25   993  
1ec3b5fe6eec782 Barry Song         2020-12-14   994  		mutex_lock(acomp_ctx->mutex);
1ec3b5fe6eec782 Barry Song         2020-12-14   995  		sg_init_one(&input, src, entry->length);
1ec3b5fe6eec782 Barry Song         2020-12-14   996  		sg_init_table(&output, 1);
1ec3b5fe6eec782 Barry Song         2020-12-14   997  		sg_set_page(&output, page, PAGE_SIZE, 0);
1ec3b5fe6eec782 Barry Song         2020-12-14   998  		acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
1ec3b5fe6eec782 Barry Song         2020-12-14   999  		ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
1ec3b5fe6eec782 Barry Song         2020-12-14  1000  		dlen = acomp_ctx->req->dlen;
1ec3b5fe6eec782 Barry Song         2020-12-14  1001  		mutex_unlock(acomp_ctx->mutex);
1ec3b5fe6eec782 Barry Song         2020-12-14  1002  
2b2811178e85553 Seth Jennings      2013-07-10  1003  		BUG_ON(ret);
2b2811178e85553 Seth Jennings      2013-07-10  1004  		BUG_ON(dlen != PAGE_SIZE);
2b2811178e85553 Seth Jennings      2013-07-10  1005  
2b2811178e85553 Seth Jennings      2013-07-10  1006  		/* page is up to date */
2b2811178e85553 Seth Jennings      2013-07-10  1007  		SetPageUptodate(page);
2b2811178e85553 Seth Jennings      2013-07-10  1008  	}
2b2811178e85553 Seth Jennings      2013-07-10  1009  
b349acc76b7f654 Weijie Yang        2013-11-12  1010  	/* move it to the tail of the inactive list after end_writeback */
b349acc76b7f654 Weijie Yang        2013-11-12  1011  	SetPageReclaim(page);
b349acc76b7f654 Weijie Yang        2013-11-12  1012  
2b2811178e85553 Seth Jennings      2013-07-10  1013  	/* start writeback */
2b2811178e85553 Seth Jennings      2013-07-10 @1014  	__swap_writepage(page, &wbc, end_swap_bio_write);
09cbfeaf1a5a67b Kirill A. Shutemov 2016-04-01  1015  	put_page(page);
2b2811178e85553 Seth Jennings      2013-07-10  1016  	zswap_written_back_pages++;
2b2811178e85553 Seth Jennings      2013-07-10  1017  
2b2811178e85553 Seth Jennings      2013-07-10  1018  	spin_lock(&tree->lock);
2b2811178e85553 Seth Jennings      2013-07-10  1019  	/* drop local reference */
0ab0abcf511545d Weijie Yang        2013-11-12  1020  	zswap_entry_put(tree, entry);
2b2811178e85553 Seth Jennings      2013-07-10  1021  
2b2811178e85553 Seth Jennings      2013-07-10  1022  	/*
0ab0abcf511545d Weijie Yang        2013-11-12  1023  	* There are two possible situations for entry here:
0ab0abcf511545d Weijie Yang        2013-11-12  1024  	* (1) refcount is 1(normal case),  entry is valid and on the tree
0ab0abcf511545d Weijie Yang        2013-11-12  1025  	* (2) refcount is 0, entry is freed and not on the tree
0ab0abcf511545d Weijie Yang        2013-11-12  1026  	*     because invalidate happened during writeback
0ab0abcf511545d Weijie Yang        2013-11-12  1027  	*  search the tree and free the entry if find entry
2b2811178e85553 Seth Jennings      2013-07-10  1028  	*/
0ab0abcf511545d Weijie Yang        2013-11-12  1029  	if (entry == zswap_rb_search(&tree->rbroot, offset))
0ab0abcf511545d Weijie Yang        2013-11-12  1030  		zswap_entry_put(tree, entry);
2b2811178e85553 Seth Jennings      2013-07-10  1031  	spin_unlock(&tree->lock);
2b2811178e85553 Seth Jennings      2013-07-10  1032  
0ab0abcf511545d Weijie Yang        2013-11-12  1033  	goto end;
0ab0abcf511545d Weijie Yang        2013-11-12  1034  
0ab0abcf511545d Weijie Yang        2013-11-12  1035  	/*
0ab0abcf511545d Weijie Yang        2013-11-12  1036  	* if we get here due to ZSWAP_SWAPCACHE_EXIST
c0c641d77b9ab0d Randy Dunlap       2021-02-25  1037  	* a load may be happening concurrently.
c0c641d77b9ab0d Randy Dunlap       2021-02-25  1038  	* it is safe and okay to not free the entry.
0ab0abcf511545d Weijie Yang        2013-11-12  1039  	* if we free the entry in the following put
c0c641d77b9ab0d Randy Dunlap       2021-02-25  1040  	* it is also okay to return !0
0ab0abcf511545d Weijie Yang        2013-11-12  1041  	*/
2b2811178e85553 Seth Jennings      2013-07-10  1042  fail:
2b2811178e85553 Seth Jennings      2013-07-10  1043  	spin_lock(&tree->lock);
0ab0abcf511545d Weijie Yang        2013-11-12  1044  	zswap_entry_put(tree, entry);
2b2811178e85553 Seth Jennings      2013-07-10  1045  	spin_unlock(&tree->lock);
0ab0abcf511545d Weijie Yang        2013-11-12  1046  
0ab0abcf511545d Weijie Yang        2013-11-12  1047  end:
fc6697a89f56d97 Tian Tao           2021-02-25  1048  	if (zpool_can_sleep_mapped(pool))
068619e32ff6229 Vitaly Wool        2019-09-23  1049  		zpool_unmap_handle(pool, handle);
fc6697a89f56d97 Tian Tao           2021-02-25  1050  	else
fc6697a89f56d97 Tian Tao           2021-02-25  1051  		kfree(tmp);
fc6697a89f56d97 Tian Tao           2021-02-25  1052  
2b2811178e85553 Seth Jennings      2013-07-10  1053  	return ret;
2b2811178e85553 Seth Jennings      2013-07-10  1054  }
2b2811178e85553 Seth Jennings      2013-07-10  1055  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/21] MM: create new mm/swap.h header file.
@ 2022-02-07 15:18     ` kernel test robot
  0 siblings, 0 replies; 33+ messages in thread
From: kernel test robot @ 2022-02-07 15:18 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 16843 bytes --]

Hi NeilBrown,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on hnaz-mm/master cifs/for-next linus/master v5.17-rc3 next-20220207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/NeilBrown/Repair-SWAP-over_NFS/20220207-125206
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: hexagon-randconfig-r005-20220207 (https://download.01.org/0day-ci/archive/20220207/202202072351.RqMHsM0e-lkp(a)intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 0d8850ae2cae85d49bea6ae0799fa41c7202c05c)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/06d2bcb84187037252a0f764881ab51965e931ea
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review NeilBrown/Repair-SWAP-over_NFS/20220207-125206
        git checkout 06d2bcb84187037252a0f764881ab51965e931ea
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=hexagon SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/zswap.c:906:13: error: implicit declaration of function '__read_swap_cache_async' [-Werror,-Wimplicit-function-declaration]
           *retpage = __read_swap_cache_async(entry, GFP_KERNEL,
                      ^
   mm/zswap.c:906:11: warning: incompatible integer to pointer conversion assigning to 'struct page *' from 'int' [-Wint-conversion]
           *retpage = __read_swap_cache_async(entry, GFP_KERNEL,
                    ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> mm/zswap.c:1014:2: error: implicit declaration of function '__swap_writepage' [-Werror,-Wimplicit-function-declaration]
           __swap_writepage(page, &wbc, end_swap_bio_write);
           ^
>> mm/zswap.c:1014:31: error: use of undeclared identifier 'end_swap_bio_write'
           __swap_writepage(page, &wbc, end_swap_bio_write);
                                        ^
   1 warning and 3 errors generated.


vim +/__read_swap_cache_async +906 mm/zswap.c

2b2811178e85553 Seth Jennings      2013-07-10   884  
2b2811178e85553 Seth Jennings      2013-07-10   885  /*
2b2811178e85553 Seth Jennings      2013-07-10   886   * zswap_get_swap_cache_page
2b2811178e85553 Seth Jennings      2013-07-10   887   *
2b2811178e85553 Seth Jennings      2013-07-10   888   * This is an adaption of read_swap_cache_async()
2b2811178e85553 Seth Jennings      2013-07-10   889   *
2b2811178e85553 Seth Jennings      2013-07-10   890   * This function tries to find a page with the given swap entry
2b2811178e85553 Seth Jennings      2013-07-10   891   * in the swapper_space address space (the swap cache).  If the page
2b2811178e85553 Seth Jennings      2013-07-10   892   * is found, it is returned in retpage.  Otherwise, a page is allocated,
2b2811178e85553 Seth Jennings      2013-07-10   893   * added to the swap cache, and returned in retpage.
2b2811178e85553 Seth Jennings      2013-07-10   894   *
2b2811178e85553 Seth Jennings      2013-07-10   895   * If success, the swap cache page is returned in retpage
67d13fe846c57a5 Weijie Yang        2013-11-12   896   * Returns ZSWAP_SWAPCACHE_EXIST if page was already in the swap cache
67d13fe846c57a5 Weijie Yang        2013-11-12   897   * Returns ZSWAP_SWAPCACHE_NEW if the new page needs to be populated,
67d13fe846c57a5 Weijie Yang        2013-11-12   898   *     the new page is added to swapcache and locked
67d13fe846c57a5 Weijie Yang        2013-11-12   899   * Returns ZSWAP_SWAPCACHE_FAIL on error
2b2811178e85553 Seth Jennings      2013-07-10   900   */
2b2811178e85553 Seth Jennings      2013-07-10   901  static int zswap_get_swap_cache_page(swp_entry_t entry,
2b2811178e85553 Seth Jennings      2013-07-10   902  				struct page **retpage)
2b2811178e85553 Seth Jennings      2013-07-10   903  {
5b999aadbae6569 Dmitry Safonov     2015-09-08   904  	bool page_was_allocated;
2b2811178e85553 Seth Jennings      2013-07-10   905  
5b999aadbae6569 Dmitry Safonov     2015-09-08  @906  	*retpage = __read_swap_cache_async(entry, GFP_KERNEL,
5b999aadbae6569 Dmitry Safonov     2015-09-08   907  			NULL, 0, &page_was_allocated);
5b999aadbae6569 Dmitry Safonov     2015-09-08   908  	if (page_was_allocated)
2b2811178e85553 Seth Jennings      2013-07-10   909  		return ZSWAP_SWAPCACHE_NEW;
5b999aadbae6569 Dmitry Safonov     2015-09-08   910  	if (!*retpage)
67d13fe846c57a5 Weijie Yang        2013-11-12   911  		return ZSWAP_SWAPCACHE_FAIL;
2b2811178e85553 Seth Jennings      2013-07-10   912  	return ZSWAP_SWAPCACHE_EXIST;
2b2811178e85553 Seth Jennings      2013-07-10   913  }
2b2811178e85553 Seth Jennings      2013-07-10   914  
2b2811178e85553 Seth Jennings      2013-07-10   915  /*
2b2811178e85553 Seth Jennings      2013-07-10   916   * Attempts to free an entry by adding a page to the swap cache,
2b2811178e85553 Seth Jennings      2013-07-10   917   * decompressing the entry data into the page, and issuing a
2b2811178e85553 Seth Jennings      2013-07-10   918   * bio write to write the page back to the swap device.
2b2811178e85553 Seth Jennings      2013-07-10   919   *
2b2811178e85553 Seth Jennings      2013-07-10   920   * This can be thought of as a "resumed writeback" of the page
2b2811178e85553 Seth Jennings      2013-07-10   921   * to the swap device.  We are basically resuming the same swap
2b2811178e85553 Seth Jennings      2013-07-10   922   * writeback path that was intercepted with the frontswap_store()
2b2811178e85553 Seth Jennings      2013-07-10   923   * in the first place.  After the page has been decompressed into
2b2811178e85553 Seth Jennings      2013-07-10   924   * the swap cache, the compressed version stored by zswap can be
2b2811178e85553 Seth Jennings      2013-07-10   925   * freed.
2b2811178e85553 Seth Jennings      2013-07-10   926   */
12d79d64bfd3913 Dan Streetman      2014-08-06   927  static int zswap_writeback_entry(struct zpool *pool, unsigned long handle)
2b2811178e85553 Seth Jennings      2013-07-10   928  {
2b2811178e85553 Seth Jennings      2013-07-10   929  	struct zswap_header *zhdr;
2b2811178e85553 Seth Jennings      2013-07-10   930  	swp_entry_t swpentry;
2b2811178e85553 Seth Jennings      2013-07-10   931  	struct zswap_tree *tree;
2b2811178e85553 Seth Jennings      2013-07-10   932  	pgoff_t offset;
2b2811178e85553 Seth Jennings      2013-07-10   933  	struct zswap_entry *entry;
2b2811178e85553 Seth Jennings      2013-07-10   934  	struct page *page;
1ec3b5fe6eec782 Barry Song         2020-12-14   935  	struct scatterlist input, output;
1ec3b5fe6eec782 Barry Song         2020-12-14   936  	struct crypto_acomp_ctx *acomp_ctx;
1ec3b5fe6eec782 Barry Song         2020-12-14   937  
fc6697a89f56d97 Tian Tao           2021-02-25   938  	u8 *src, *tmp = NULL;
2b2811178e85553 Seth Jennings      2013-07-10   939  	unsigned int dlen;
0ab0abcf511545d Weijie Yang        2013-11-12   940  	int ret;
2b2811178e85553 Seth Jennings      2013-07-10   941  	struct writeback_control wbc = {
2b2811178e85553 Seth Jennings      2013-07-10   942  		.sync_mode = WB_SYNC_NONE,
2b2811178e85553 Seth Jennings      2013-07-10   943  	};
2b2811178e85553 Seth Jennings      2013-07-10   944  
fc6697a89f56d97 Tian Tao           2021-02-25   945  	if (!zpool_can_sleep_mapped(pool)) {
fc6697a89f56d97 Tian Tao           2021-02-25   946  		tmp = kmalloc(PAGE_SIZE, GFP_ATOMIC);
fc6697a89f56d97 Tian Tao           2021-02-25   947  		if (!tmp)
fc6697a89f56d97 Tian Tao           2021-02-25   948  			return -ENOMEM;
fc6697a89f56d97 Tian Tao           2021-02-25   949  	}
fc6697a89f56d97 Tian Tao           2021-02-25   950  
2b2811178e85553 Seth Jennings      2013-07-10   951  	/* extract swpentry from data */
12d79d64bfd3913 Dan Streetman      2014-08-06   952  	zhdr = zpool_map_handle(pool, handle, ZPOOL_MM_RO);
2b2811178e85553 Seth Jennings      2013-07-10   953  	swpentry = zhdr->swpentry; /* here */
2b2811178e85553 Seth Jennings      2013-07-10   954  	tree = zswap_trees[swp_type(swpentry)];
2b2811178e85553 Seth Jennings      2013-07-10   955  	offset = swp_offset(swpentry);
2b2811178e85553 Seth Jennings      2013-07-10   956  
2b2811178e85553 Seth Jennings      2013-07-10   957  	/* find and ref zswap entry */
2b2811178e85553 Seth Jennings      2013-07-10   958  	spin_lock(&tree->lock);
0ab0abcf511545d Weijie Yang        2013-11-12   959  	entry = zswap_entry_find_get(&tree->rbroot, offset);
2b2811178e85553 Seth Jennings      2013-07-10   960  	if (!entry) {
2b2811178e85553 Seth Jennings      2013-07-10   961  		/* entry was invalidated */
2b2811178e85553 Seth Jennings      2013-07-10   962  		spin_unlock(&tree->lock);
068619e32ff6229 Vitaly Wool        2019-09-23   963  		zpool_unmap_handle(pool, handle);
fc6697a89f56d97 Tian Tao           2021-02-25   964  		kfree(tmp);
2b2811178e85553 Seth Jennings      2013-07-10   965  		return 0;
2b2811178e85553 Seth Jennings      2013-07-10   966  	}
2b2811178e85553 Seth Jennings      2013-07-10   967  	spin_unlock(&tree->lock);
2b2811178e85553 Seth Jennings      2013-07-10   968  	BUG_ON(offset != entry->offset);
2b2811178e85553 Seth Jennings      2013-07-10   969  
46b76f2e09dc35f Miaohe Lin         2021-06-30   970  	src = (u8 *)zhdr + sizeof(struct zswap_header);
46b76f2e09dc35f Miaohe Lin         2021-06-30   971  	if (!zpool_can_sleep_mapped(pool)) {
46b76f2e09dc35f Miaohe Lin         2021-06-30   972  		memcpy(tmp, src, entry->length);
46b76f2e09dc35f Miaohe Lin         2021-06-30   973  		src = tmp;
46b76f2e09dc35f Miaohe Lin         2021-06-30   974  		zpool_unmap_handle(pool, handle);
46b76f2e09dc35f Miaohe Lin         2021-06-30   975  	}
46b76f2e09dc35f Miaohe Lin         2021-06-30   976  
2b2811178e85553 Seth Jennings      2013-07-10   977  	/* try to allocate swap cache page */
2b2811178e85553 Seth Jennings      2013-07-10   978  	switch (zswap_get_swap_cache_page(swpentry, &page)) {
67d13fe846c57a5 Weijie Yang        2013-11-12   979  	case ZSWAP_SWAPCACHE_FAIL: /* no memory or invalidate happened */
2b2811178e85553 Seth Jennings      2013-07-10   980  		ret = -ENOMEM;
2b2811178e85553 Seth Jennings      2013-07-10   981  		goto fail;
2b2811178e85553 Seth Jennings      2013-07-10   982  
67d13fe846c57a5 Weijie Yang        2013-11-12   983  	case ZSWAP_SWAPCACHE_EXIST:
2b2811178e85553 Seth Jennings      2013-07-10   984  		/* page is already in the swap cache, ignore for now */
09cbfeaf1a5a67b Kirill A. Shutemov 2016-04-01   985  		put_page(page);
2b2811178e85553 Seth Jennings      2013-07-10   986  		ret = -EEXIST;
2b2811178e85553 Seth Jennings      2013-07-10   987  		goto fail;
2b2811178e85553 Seth Jennings      2013-07-10   988  
2b2811178e85553 Seth Jennings      2013-07-10   989  	case ZSWAP_SWAPCACHE_NEW: /* page is locked */
2b2811178e85553 Seth Jennings      2013-07-10   990  		/* decompress */
1ec3b5fe6eec782 Barry Song         2020-12-14   991  		acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
2b2811178e85553 Seth Jennings      2013-07-10   992  		dlen = PAGE_SIZE;
fc6697a89f56d97 Tian Tao           2021-02-25   993  
1ec3b5fe6eec782 Barry Song         2020-12-14   994  		mutex_lock(acomp_ctx->mutex);
1ec3b5fe6eec782 Barry Song         2020-12-14   995  		sg_init_one(&input, src, entry->length);
1ec3b5fe6eec782 Barry Song         2020-12-14   996  		sg_init_table(&output, 1);
1ec3b5fe6eec782 Barry Song         2020-12-14   997  		sg_set_page(&output, page, PAGE_SIZE, 0);
1ec3b5fe6eec782 Barry Song         2020-12-14   998  		acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
1ec3b5fe6eec782 Barry Song         2020-12-14   999  		ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
1ec3b5fe6eec782 Barry Song         2020-12-14  1000  		dlen = acomp_ctx->req->dlen;
1ec3b5fe6eec782 Barry Song         2020-12-14  1001  		mutex_unlock(acomp_ctx->mutex);
1ec3b5fe6eec782 Barry Song         2020-12-14  1002  
2b2811178e85553 Seth Jennings      2013-07-10  1003  		BUG_ON(ret);
2b2811178e85553 Seth Jennings      2013-07-10  1004  		BUG_ON(dlen != PAGE_SIZE);
2b2811178e85553 Seth Jennings      2013-07-10  1005  
2b2811178e85553 Seth Jennings      2013-07-10  1006  		/* page is up to date */
2b2811178e85553 Seth Jennings      2013-07-10  1007  		SetPageUptodate(page);
2b2811178e85553 Seth Jennings      2013-07-10  1008  	}
2b2811178e85553 Seth Jennings      2013-07-10  1009  
b349acc76b7f654 Weijie Yang        2013-11-12  1010  	/* move it to the tail of the inactive list after end_writeback */
b349acc76b7f654 Weijie Yang        2013-11-12  1011  	SetPageReclaim(page);
b349acc76b7f654 Weijie Yang        2013-11-12  1012  
2b2811178e85553 Seth Jennings      2013-07-10  1013  	/* start writeback */
2b2811178e85553 Seth Jennings      2013-07-10 @1014  	__swap_writepage(page, &wbc, end_swap_bio_write);
09cbfeaf1a5a67b Kirill A. Shutemov 2016-04-01  1015  	put_page(page);
2b2811178e85553 Seth Jennings      2013-07-10  1016  	zswap_written_back_pages++;
2b2811178e85553 Seth Jennings      2013-07-10  1017  
2b2811178e85553 Seth Jennings      2013-07-10  1018  	spin_lock(&tree->lock);
2b2811178e85553 Seth Jennings      2013-07-10  1019  	/* drop local reference */
0ab0abcf511545d Weijie Yang        2013-11-12  1020  	zswap_entry_put(tree, entry);
2b2811178e85553 Seth Jennings      2013-07-10  1021  
2b2811178e85553 Seth Jennings      2013-07-10  1022  	/*
0ab0abcf511545d Weijie Yang        2013-11-12  1023  	* There are two possible situations for entry here:
0ab0abcf511545d Weijie Yang        2013-11-12  1024  	* (1) refcount is 1(normal case),  entry is valid and on the tree
0ab0abcf511545d Weijie Yang        2013-11-12  1025  	* (2) refcount is 0, entry is freed and not on the tree
0ab0abcf511545d Weijie Yang        2013-11-12  1026  	*     because invalidate happened during writeback
0ab0abcf511545d Weijie Yang        2013-11-12  1027  	*  search the tree and free the entry if find entry
2b2811178e85553 Seth Jennings      2013-07-10  1028  	*/
0ab0abcf511545d Weijie Yang        2013-11-12  1029  	if (entry == zswap_rb_search(&tree->rbroot, offset))
0ab0abcf511545d Weijie Yang        2013-11-12  1030  		zswap_entry_put(tree, entry);
2b2811178e85553 Seth Jennings      2013-07-10  1031  	spin_unlock(&tree->lock);
2b2811178e85553 Seth Jennings      2013-07-10  1032  
0ab0abcf511545d Weijie Yang        2013-11-12  1033  	goto end;
0ab0abcf511545d Weijie Yang        2013-11-12  1034  
0ab0abcf511545d Weijie Yang        2013-11-12  1035  	/*
0ab0abcf511545d Weijie Yang        2013-11-12  1036  	* if we get here due to ZSWAP_SWAPCACHE_EXIST
c0c641d77b9ab0d Randy Dunlap       2021-02-25  1037  	* a load may be happening concurrently.
c0c641d77b9ab0d Randy Dunlap       2021-02-25  1038  	* it is safe and okay to not free the entry.
0ab0abcf511545d Weijie Yang        2013-11-12  1039  	* if we free the entry in the following put
c0c641d77b9ab0d Randy Dunlap       2021-02-25  1040  	* it is also okay to return !0
0ab0abcf511545d Weijie Yang        2013-11-12  1041  	*/
2b2811178e85553 Seth Jennings      2013-07-10  1042  fail:
2b2811178e85553 Seth Jennings      2013-07-10  1043  	spin_lock(&tree->lock);
0ab0abcf511545d Weijie Yang        2013-11-12  1044  	zswap_entry_put(tree, entry);
2b2811178e85553 Seth Jennings      2013-07-10  1045  	spin_unlock(&tree->lock);
0ab0abcf511545d Weijie Yang        2013-11-12  1046  
0ab0abcf511545d Weijie Yang        2013-11-12  1047  end:
fc6697a89f56d97 Tian Tao           2021-02-25  1048  	if (zpool_can_sleep_mapped(pool))
068619e32ff6229 Vitaly Wool        2019-09-23  1049  		zpool_unmap_handle(pool, handle);
fc6697a89f56d97 Tian Tao           2021-02-25  1050  	else
fc6697a89f56d97 Tian Tao           2021-02-25  1051  		kfree(tmp);
fc6697a89f56d97 Tian Tao           2021-02-25  1052  
2b2811178e85553 Seth Jennings      2013-07-10  1053  	return ret;
2b2811178e85553 Seth Jennings      2013-07-10  1054  }
2b2811178e85553 Seth Jennings      2013-07-10  1055  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 17/21] SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC
  2022-02-07  4:46 ` [PATCH 17/21] SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC NeilBrown
@ 2022-02-07 15:53   ` Chuck Lever III
  0 siblings, 0 replies; 33+ messages in thread
From: Chuck Lever III @ 2022-02-07 15:53 UTC (permalink / raw)
  To: Neil Brown
  Cc: Trond Myklebust, Anna Schumaker, Andrew Morton, Mark Hemment,
	Christoph Hellwig, David Howells, Linux NFS Mailing List,
	linux-mm, linux-kernel



> On Feb 6, 2022, at 11:46 PM, NeilBrown <neilb@suse.de> wrote:
> 
> rpc tasks can be marked as RPC_TASK_SWAPPER.  This causes GFP_MEMALLOC
> to be used for some allocations.  This is needed in some cases, but not
> in all where it is currently provided, and in some where it isn't
> provided.
> 
> Currently *all* tasks associated with a rpc_client on which swap is
> enabled get the flag and hence some GFP_MEMALLOC support.
> 
> GFP_MEMALLOC is provided for ->buf_alloc() but only swap-writes need it.
> However xdr_alloc_bvec does not get GFP_MEMALLOC - though it often does
> need it.
> 
> xdr_alloc_bvec is called while the XPRT_LOCK is held.  If this blocks,
> then it blocks all other queued tasks.  So this allocation needs
> GFP_MEMALLOC for *all* requests, not just writes, when the xprt is used
> for any swap writes.
> 
> Similarly, if the transport is not connected, that will block all
> requests including swap writes, so memory allocations should get
> GFP_MEMALLOC if swap writes are possible.
> 
> So with this patch:
> 1/ we ONLY set RPC_TASK_SWAPPER for swap writes.
> 2/ __rpc_execute() sets PF_MEMALLOC while handling any task
>    with RPC_TASK_SWAPPER set, or when handling any task that
>    holds the XPRT_LOCKED lock on an xprt used for swap.
>    This removes the need for the RPC_IS_SWAPPER() test
>    in ->buf_alloc handlers.
> 3/ xprt_prepare_transmit() sets PF_MEMALLOC after locking
>    any task to a swapper xprt.  __rpc_execute() will clear it.
> 3/ PF_MEMALLOC is set for all the connect workers.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>

Thanks for including xprtrdma in the patch. Those changes
look consistent with the xprtsock hunks.

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>


> ---
> fs/nfs/write.c                  |    2 ++
> net/sunrpc/clnt.c               |    2 --
> net/sunrpc/sched.c              |   20 +++++++++++++++++---
> net/sunrpc/xprt.c               |    3 +++
> net/sunrpc/xprtrdma/transport.c |    6 ++++--
> net/sunrpc/xprtsock.c           |    8 ++++++++
> 6 files changed, 34 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 987a187bd39a..9f7176745fef 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1409,6 +1409,8 @@ static void nfs_initiate_write(struct nfs_pgio_header *hdr,
> {
> 	int priority = flush_task_priority(how);
> 
> +	if (IS_SWAPFILE(hdr->inode))
> +		task_setup_data->flags |= RPC_TASK_SWAPPER;
> 	task_setup_data->priority = priority;
> 	rpc_ops->write_setup(hdr, msg, &task_setup_data->rpc_client);
> 	trace_nfs_initiate_write(hdr);
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index d1fb7c0c7685..842366a2fc57 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1085,8 +1085,6 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
> 		task->tk_flags |= RPC_TASK_TIMEOUT;
> 	if (clnt->cl_noretranstimeo)
> 		task->tk_flags |= RPC_TASK_NO_RETRANS_TIMEOUT;
> -	if (atomic_read(&clnt->cl_swapper))
> -		task->tk_flags |= RPC_TASK_SWAPPER;
> 	/* Add to the client's list of all tasks */
> 	spin_lock(&clnt->cl_lock);
> 	list_add_tail(&task->tk_task, &clnt->cl_tasks);
> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
> index 256302bf6557..9020cedb7c95 100644
> --- a/net/sunrpc/sched.c
> +++ b/net/sunrpc/sched.c
> @@ -869,6 +869,15 @@ void rpc_release_calldata(const struct rpc_call_ops *ops, void *calldata)
> 		ops->rpc_release(calldata);
> }
> 
> +static bool xprt_needs_memalloc(struct rpc_xprt *xprt, struct rpc_task *tk)
> +{
> +	if (!xprt)
> +		return false;
> +	if (!atomic_read(&xprt->swapper))
> +		return false;
> +	return test_bit(XPRT_LOCKED, &xprt->state) && xprt->snd_task == tk;
> +}
> +
> /*
>  * This is the RPC `scheduler' (or rather, the finite state machine).
>  */
> @@ -877,6 +886,7 @@ static void __rpc_execute(struct rpc_task *task)
> 	struct rpc_wait_queue *queue;
> 	int task_is_async = RPC_IS_ASYNC(task);
> 	int status = 0;
> +	unsigned long pflags = current->flags;
> 
> 	WARN_ON_ONCE(RPC_IS_QUEUED(task));
> 	if (RPC_IS_QUEUED(task))
> @@ -899,6 +909,10 @@ static void __rpc_execute(struct rpc_task *task)
> 		}
> 		if (!do_action)
> 			break;
> +		if (RPC_IS_SWAPPER(task) ||
> +		    xprt_needs_memalloc(task->tk_xprt, task))
> +			current->flags |= PF_MEMALLOC;
> +
> 		trace_rpc_task_run_action(task, do_action);
> 		do_action(task);
> 
> @@ -936,7 +950,7 @@ static void __rpc_execute(struct rpc_task *task)
> 		rpc_clear_running(task);
> 		spin_unlock(&queue->lock);
> 		if (task_is_async)
> -			return;
> +			goto out;
> 
> 		/* sync task: sleep here */
> 		trace_rpc_task_sync_sleep(task, task->tk_action);
> @@ -960,6 +974,8 @@ static void __rpc_execute(struct rpc_task *task)
> 
> 	/* Release all resources associated with the task */
> 	rpc_release_task(task);
> +out:
> +	current_restore_flags(pflags, PF_MEMALLOC);
> }
> 
> /*
> @@ -1018,8 +1034,6 @@ int rpc_malloc(struct rpc_task *task)
> 
> 	if (RPC_IS_ASYNC(task))
> 		gfp = GFP_NOWAIT | __GFP_NOWARN;
> -	if (RPC_IS_SWAPPER(task))
> -		gfp |= __GFP_MEMALLOC;
> 
> 	size += sizeof(struct rpc_buffer);
> 	if (size <= RPC_BUFFER_MAXSIZE)
> diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
> index a0a2583fe941..0614e7463d4b 100644
> --- a/net/sunrpc/xprt.c
> +++ b/net/sunrpc/xprt.c
> @@ -1492,6 +1492,9 @@ bool xprt_prepare_transmit(struct rpc_task *task)
> 		return false;
> 
> 	}
> +	if (atomic_read(&xprt->swapper))
> +		/* This will be clear in __rpc_execute */
> +		current->flags |= PF_MEMALLOC;
> 	return true;
> }
> 
> diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
> index 923e4b512ee9..6b7e10e5a141 100644
> --- a/net/sunrpc/xprtrdma/transport.c
> +++ b/net/sunrpc/xprtrdma/transport.c
> @@ -235,8 +235,11 @@ xprt_rdma_connect_worker(struct work_struct *work)
> 	struct rpcrdma_xprt *r_xprt = container_of(work, struct rpcrdma_xprt,
> 						   rx_connect_worker.work);
> 	struct rpc_xprt *xprt = &r_xprt->rx_xprt;
> +	unsigned int pflags = current->flags;
> 	int rc;
> 
> +	if (atomic_read(&xprt->swapper))
> +		current->flags |= PF_MEMALLOC;
> 	rc = rpcrdma_xprt_connect(r_xprt);
> 	xprt_clear_connecting(xprt);
> 	if (!rc) {
> @@ -250,6 +253,7 @@ xprt_rdma_connect_worker(struct work_struct *work)
> 		rpcrdma_xprt_disconnect(r_xprt);
> 	xprt_unlock_connect(xprt, r_xprt);
> 	xprt_wake_pending_tasks(xprt, rc);
> +	current_restore_flags(pflags, PF_MEMALLOC);
> }
> 
> /**
> @@ -572,8 +576,6 @@ xprt_rdma_allocate(struct rpc_task *task)
> 	flags = RPCRDMA_DEF_GFP;
> 	if (RPC_IS_ASYNC(task))
> 		flags = GFP_NOWAIT | __GFP_NOWARN;
> -	if (RPC_IS_SWAPPER(task))
> -		flags |= __GFP_MEMALLOC;
> 
> 	if (!rpcrdma_check_regbuf(r_xprt, req->rl_sendbuf, rqst->rq_callsize,
> 				  flags))
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 69b6ee5a5fd1..c461a0ce9531 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -2047,7 +2047,10 @@ static void xs_udp_setup_socket(struct work_struct *work)
> 	struct rpc_xprt *xprt = &transport->xprt;
> 	struct socket *sock;
> 	int status = -EIO;
> +	unsigned int pflags = current->flags;
> 
> +	if (atomic_read(&xprt->swapper))
> +		current->flags |= PF_MEMALLOC;
> 	sock = xs_create_sock(xprt, transport,
> 			xs_addr(xprt)->sa_family, SOCK_DGRAM,
> 			IPPROTO_UDP, false);
> @@ -2067,6 +2070,7 @@ static void xs_udp_setup_socket(struct work_struct *work)
> 	xprt_clear_connecting(xprt);
> 	xprt_unlock_connect(xprt, transport);
> 	xprt_wake_pending_tasks(xprt, status);
> +	current_restore_flags(pflags, PF_MEMALLOC);
> }
> 
> /**
> @@ -2226,7 +2230,10 @@ static void xs_tcp_setup_socket(struct work_struct *work)
> 	struct socket *sock = transport->sock;
> 	struct rpc_xprt *xprt = &transport->xprt;
> 	int status;
> +	unsigned int pflags = current->flags;
> 
> +	if (atomic_read(&xprt->swapper))
> +		current->flags |= PF_MEMALLOC;
> 	if (!sock) {
> 		sock = xs_create_sock(xprt, transport,
> 				xs_addr(xprt)->sa_family, SOCK_STREAM,
> @@ -2291,6 +2298,7 @@ static void xs_tcp_setup_socket(struct work_struct *work)
> 	xprt_clear_connecting(xprt);
> out_unlock:
> 	xprt_unlock_connect(xprt, transport);
> +	current_restore_flags(pflags, PF_MEMALLOC);
> }
> 
> /**
> 
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/21 -  revised] MM: create new mm/swap.h header file.
  2022-02-07  4:46 ` [PATCH 01/21] MM: create new mm/swap.h header file NeilBrown
                     ` (2 preceding siblings ...)
  2022-02-07 15:18     ` kernel test robot
@ 2022-02-10  3:24   ` NeilBrown
  2022-02-10 15:19   ` [PATCH 01/21] " Geert Uytterhoeven
  4 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-10  3:24 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells
  Cc: linux-nfs, linux-mm, linux-kernel


Many functions declared in include/linux/swap.h are only used within mm/

Create a new "mm/swap.h" and move some of these declarations there.
Remove the redundant 'extern' from the function declarations.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---

Added missing include in mm/huge_memory.c

 include/linux/swap.h | 121 ----------------------------------------
 mm/huge_memory.c     |   1 +
 mm/madvise.c         |   1 +
 mm/memcontrol.c      |   1 +
 mm/memory.c          |   1 +
 mm/mincore.c         |   1 +
 mm/page_alloc.c      |   1 +
 mm/page_io.c         |   1 +
 mm/shmem.c           |   1 +
 mm/swap.h            | 129 +++++++++++++++++++++++++++++++++++++++++++
 mm/swap_state.c      |   1 +
 mm/swapfile.c        |   1 +
 mm/util.c            |   1 +
 mm/vmscan.c          |   1 +
 mm/zswap.c           |   2 +
 15 files changed, 143 insertions(+), 121 deletions(-)
 create mode 100644 mm/swap.h

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 1d38d9475c4d..3f54a8941c9d 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -419,62 +419,19 @@ extern void kswapd_stop(int nid);
 
 #ifdef CONFIG_SWAP
 
-#include <linux/blk_types.h> /* for bio_end_io_t */
-
-/* linux/mm/page_io.c */
-extern int swap_readpage(struct page *page, bool do_poll);
-extern int swap_writepage(struct page *page, struct writeback_control *wbc);
-extern void end_swap_bio_write(struct bio *bio);
-extern int __swap_writepage(struct page *page, struct writeback_control *wbc,
-	bio_end_io_t end_write_func);
 extern int swap_set_page_dirty(struct page *page);
-
 int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
 		unsigned long nr_pages, sector_t start_block);
 int generic_swapfile_activate(struct swap_info_struct *, struct file *,
 		sector_t *);
 
-/* linux/mm/swap_state.c */
-/* One swap address space for each 64M swap space */
-#define SWAP_ADDRESS_SPACE_SHIFT	14
-#define SWAP_ADDRESS_SPACE_PAGES	(1 << SWAP_ADDRESS_SPACE_SHIFT)
-extern struct address_space *swapper_spaces[];
-#define swap_address_space(entry)			    \
-	(&swapper_spaces[swp_type(entry)][swp_offset(entry) \
-		>> SWAP_ADDRESS_SPACE_SHIFT])
 static inline unsigned long total_swapcache_pages(void)
 {
 	return global_node_page_state(NR_SWAPCACHE);
 }
 
-extern void show_swap_cache_info(void);
-extern int add_to_swap(struct page *page);
-extern void *get_shadow_from_swap_cache(swp_entry_t entry);
-extern int add_to_swap_cache(struct page *page, swp_entry_t entry,
-			gfp_t gfp, void **shadowp);
-extern void __delete_from_swap_cache(struct page *page,
-			swp_entry_t entry, void *shadow);
-extern void delete_from_swap_cache(struct page *);
-extern void clear_shadow_from_swap_cache(int type, unsigned long begin,
-				unsigned long end);
-extern void free_swap_cache(struct page *);
 extern void free_page_and_swap_cache(struct page *);
 extern void free_pages_and_swap_cache(struct page **, int);
-extern struct page *lookup_swap_cache(swp_entry_t entry,
-				      struct vm_area_struct *vma,
-				      unsigned long addr);
-struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index);
-extern struct page *read_swap_cache_async(swp_entry_t, gfp_t,
-			struct vm_area_struct *vma, unsigned long addr,
-			bool do_poll);
-extern struct page *__read_swap_cache_async(swp_entry_t, gfp_t,
-			struct vm_area_struct *vma, unsigned long addr,
-			bool *new_page_allocated);
-extern struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag,
-				struct vm_fault *vmf);
-extern struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
-				struct vm_fault *vmf);
-
 /* linux/mm/swapfile.c */
 extern atomic_long_t nr_swap_pages;
 extern long total_swap_pages;
@@ -528,12 +485,6 @@ static inline void put_swap_device(struct swap_info_struct *si)
 }
 
 #else /* CONFIG_SWAP */
-
-static inline int swap_readpage(struct page *page, bool do_poll)
-{
-	return 0;
-}
-
 static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry)
 {
 	return NULL;
@@ -548,11 +499,6 @@ static inline void put_swap_device(struct swap_info_struct *si)
 {
 }
 
-static inline struct address_space *swap_address_space(swp_entry_t entry)
-{
-	return NULL;
-}
-
 #define get_nr_swap_pages()			0L
 #define total_swap_pages			0L
 #define total_swapcache_pages()			0UL
@@ -567,14 +513,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry)
 #define free_pages_and_swap_cache(pages, nr) \
 	release_pages((pages), (nr));
 
-static inline void free_swap_cache(struct page *page)
-{
-}
-
-static inline void show_swap_cache_info(void)
-{
-}
-
 /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */
 #define free_swap_and_cache(e) is_pfn_swap_entry(e)
 
@@ -600,65 +538,6 @@ static inline void put_swap_page(struct page *page, swp_entry_t swp)
 {
 }
 
-static inline struct page *swap_cluster_readahead(swp_entry_t entry,
-				gfp_t gfp_mask, struct vm_fault *vmf)
-{
-	return NULL;
-}
-
-static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
-			struct vm_fault *vmf)
-{
-	return NULL;
-}
-
-static inline int swap_writepage(struct page *p, struct writeback_control *wbc)
-{
-	return 0;
-}
-
-static inline struct page *lookup_swap_cache(swp_entry_t swp,
-					     struct vm_area_struct *vma,
-					     unsigned long addr)
-{
-	return NULL;
-}
-
-static inline
-struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index)
-{
-	return find_get_page(mapping, index);
-}
-
-static inline int add_to_swap(struct page *page)
-{
-	return 0;
-}
-
-static inline void *get_shadow_from_swap_cache(swp_entry_t entry)
-{
-	return NULL;
-}
-
-static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
-					gfp_t gfp_mask, void **shadowp)
-{
-	return -1;
-}
-
-static inline void __delete_from_swap_cache(struct page *page,
-					swp_entry_t entry, void *shadow)
-{
-}
-
-static inline void delete_from_swap_cache(struct page *page)
-{
-}
-
-static inline void clear_shadow_from_swap_cache(int type, unsigned long begin,
-				unsigned long end)
-{
-}
 
 static inline int page_swapcount(struct page *page)
 {
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 406a3c28c026..dae090f09038 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -38,6 +38,7 @@
 #include <asm/tlb.h>
 #include <asm/pgalloc.h>
 #include "internal.h"
+#include "swap.h"
 
 /*
  * By default, transparent hugepage support is disabled in order to avoid
diff --git a/mm/madvise.c b/mm/madvise.c
index 5604064df464..1ee4b7583379 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -35,6 +35,7 @@
 #include <asm/tlb.h>
 
 #include "internal.h"
+#include "swap.h"
 
 struct madvise_walk_private {
 	struct mmu_gather *tlb;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 09d342c7cbd0..9b7c8181a207 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -66,6 +66,7 @@
 #include <net/sock.h>
 #include <net/ip.h>
 #include "slab.h"
+#include "swap.h"
 
 #include <linux/uaccess.h>
 
diff --git a/mm/memory.c b/mm/memory.c
index c125c4969913..d25372340107 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -86,6 +86,7 @@
 
 #include "pgalloc-track.h"
 #include "internal.h"
+#include "swap.h"
 
 #if defined(LAST_CPUPID_NOT_IN_PAGE_FLAGS) && !defined(CONFIG_COMPILE_TEST)
 #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid.
diff --git a/mm/mincore.c b/mm/mincore.c
index 9122676b54d6..f4f627325e12 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c
@@ -20,6 +20,7 @@
 #include <linux/pgtable.h>
 
 #include <linux/uaccess.h>
+#include "swap.h"
 
 static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr,
 			unsigned long end, struct mm_walk *walk)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3589febc6d31..221aa3c10b78 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -81,6 +81,7 @@
 #include "internal.h"
 #include "shuffle.h"
 #include "page_reporting.h"
+#include "swap.h"
 
 /* Free Page Internal flags: for internal, non-pcp variants of free_pages(). */
 typedef int __bitwise fpi_t;
diff --git a/mm/page_io.c b/mm/page_io.c
index 0bf8e40f4e57..f8c26092e869 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -26,6 +26,7 @@
 #include <linux/uio.h>
 #include <linux/sched/task.h>
 #include <linux/delayacct.h>
+#include "swap.h"
 
 void end_swap_bio_write(struct bio *bio)
 {
diff --git a/mm/shmem.c b/mm/shmem.c
index a09b29ec2b45..c8b8819fe2e6 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -38,6 +38,7 @@
 #include <linux/hugetlb.h>
 #include <linux/fs_parser.h>
 #include <linux/swapfile.h>
+#include "swap.h"
 
 static struct vfsmount *shm_mnt;
 
diff --git a/mm/swap.h b/mm/swap.h
new file mode 100644
index 000000000000..13e72a5023aa
--- /dev/null
+++ b/mm/swap.h
@@ -0,0 +1,129 @@
+
+#ifdef CONFIG_SWAP
+#include <linux/blk_types.h> /* for bio_end_io_t */
+
+/* linux/mm/page_io.c */
+int swap_readpage(struct page *page, bool do_poll);
+int swap_writepage(struct page *page, struct writeback_control *wbc);
+void end_swap_bio_write(struct bio *bio);
+int __swap_writepage(struct page *page, struct writeback_control *wbc,
+		     bio_end_io_t end_write_func);
+
+/* linux/mm/swap_state.c */
+/* One swap address space for each 64M swap space */
+#define SWAP_ADDRESS_SPACE_SHIFT	14
+#define SWAP_ADDRESS_SPACE_PAGES	(1 << SWAP_ADDRESS_SPACE_SHIFT)
+extern struct address_space *swapper_spaces[];
+#define swap_address_space(entry)			    \
+	(&swapper_spaces[swp_type(entry)][swp_offset(entry) \
+		>> SWAP_ADDRESS_SPACE_SHIFT])
+
+void show_swap_cache_info(void);
+int add_to_swap(struct page *page);
+void *get_shadow_from_swap_cache(swp_entry_t entry);
+int add_to_swap_cache(struct page *page, swp_entry_t entry,
+		      gfp_t gfp, void **shadowp);
+void __delete_from_swap_cache(struct page *page,
+			      swp_entry_t entry, void *shadow);
+void delete_from_swap_cache(struct page *);
+void clear_shadow_from_swap_cache(int type, unsigned long begin,
+				  unsigned long end);
+void free_swap_cache(struct page *);
+struct page *lookup_swap_cache(swp_entry_t entry,
+			       struct vm_area_struct *vma,
+			       unsigned long addr);
+struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index);
+
+struct page *read_swap_cache_async(swp_entry_t, gfp_t,
+				   struct vm_area_struct *vma,
+				   unsigned long addr,
+				   bool do_poll);
+struct page *__read_swap_cache_async(swp_entry_t, gfp_t,
+				     struct vm_area_struct *vma,
+				     unsigned long addr,
+				     bool *new_page_allocated);
+struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag,
+				    struct vm_fault *vmf);
+struct page *swapin_readahead(swp_entry_t entry, gfp_t flag,
+			      struct vm_fault *vmf);
+
+#else /* CONFIG_SWAP */
+static inline int swap_readpage(struct page *page, bool do_poll)
+{
+	return 0;
+}
+
+static inline struct address_space *swap_address_space(swp_entry_t entry)
+{
+	return NULL;
+}
+
+static inline void free_swap_cache(struct page *page)
+{
+}
+
+static inline void show_swap_cache_info(void)
+{
+}
+
+static inline struct page *swap_cluster_readahead(swp_entry_t entry,
+				gfp_t gfp_mask, struct vm_fault *vmf)
+{
+	return NULL;
+}
+
+static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask,
+			struct vm_fault *vmf)
+{
+	return NULL;
+}
+
+static inline int swap_writepage(struct page *p, struct writeback_control *wbc)
+{
+	return 0;
+}
+
+static inline struct page *lookup_swap_cache(swp_entry_t swp,
+					     struct vm_area_struct *vma,
+					     unsigned long addr)
+{
+	return NULL;
+}
+
+static inline
+struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index)
+{
+	return find_get_page(mapping, index);
+}
+
+static inline int add_to_swap(struct page *page)
+{
+	return 0;
+}
+
+static inline void *get_shadow_from_swap_cache(swp_entry_t entry)
+{
+	return NULL;
+}
+
+static inline int add_to_swap_cache(struct page *page, swp_entry_t entry,
+					gfp_t gfp_mask, void **shadowp)
+{
+	return -1;
+}
+
+static inline void __delete_from_swap_cache(struct page *page,
+					swp_entry_t entry, void *shadow)
+{
+}
+
+static inline void delete_from_swap_cache(struct page *page)
+{
+}
+
+static inline void clear_shadow_from_swap_cache(int type, unsigned long begin,
+				unsigned long end)
+{
+}
+
+#endif /* CONFIG_SWAP */
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 8d4104242100..bb38453425c7 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -23,6 +23,7 @@
 #include <linux/huge_mm.h>
 #include <linux/shmem_fs.h>
 #include "internal.h"
+#include "swap.h"
 
 /*
  * swapper_space is a fiction, retained to simplify the path through
diff --git a/mm/swapfile.c b/mm/swapfile.c
index bf0df7aa7158..71c7a31dd291 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -44,6 +44,7 @@
 #include <asm/tlbflush.h>
 #include <linux/swapops.h>
 #include <linux/swap_cgroup.h>
+#include "swap.h"
 
 static bool swap_count_continued(struct swap_info_struct *, pgoff_t,
 				 unsigned char);
diff --git a/mm/util.c b/mm/util.c
index 7e43369064c8..619697e3d935 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -27,6 +27,7 @@
 #include <linux/uaccess.h>
 
 #include "internal.h"
+#include "swap.h"
 
 /**
  * kfree_const - conditionally free memory
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 090bfb605ecf..5c734ffc6057 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -58,6 +58,7 @@
 #include <linux/balloon_compaction.h>
 
 #include "internal.h"
+#include "swap.h"
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/vmscan.h>
diff --git a/mm/zswap.c b/mm/zswap.c
index cdf6950fcb2e..9192dc5f678f 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -36,6 +36,8 @@
 #include <linux/pagemap.h>
 #include <linux/workqueue.h>
 
+#include "swap.h"
+
 /*********************************
 * statistics
 **********************************/
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/21] MM: create new mm/swap.h header file.
  2022-02-07  4:46 ` [PATCH 01/21] MM: create new mm/swap.h header file NeilBrown
                     ` (3 preceding siblings ...)
  2022-02-10  3:24   ` [PATCH 01/21 - revised] " NeilBrown
@ 2022-02-10 15:19   ` Geert Uytterhoeven
  2022-02-14 23:50     ` NeilBrown
  4 siblings, 1 reply; 33+ messages in thread
From: Geert Uytterhoeven @ 2022-02-10 15:19 UTC (permalink / raw)
  To: NeilBrown
  Cc: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells, open list:NFS,
	SUNRPC, AND...,
	Linux MM, Linux Kernel Mailing List

Hi Neil,

On Wed, Feb 9, 2022 at 10:52 AM NeilBrown <neilb@suse.de> wrote:
> Many functions declared in include/linux/swap.h are only used within mm/
>
> Create a new "mm/swap.h" and move some of these declarations there.
> Remove the redundant 'extern' from the function declarations.
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: NeilBrown <neilb@suse.de>

Thanks for your patch!

> --- /dev/null
> +++ b/mm/swap.h
> @@ -0,0 +1,129 @@
> +

scripts/checkpatch.pl:
WARNING: Missing or malformed SPDX-License-Identifier tag in line 1

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/21 V4] Repair SWAP-over_NFS
  2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
                   ` (20 preceding siblings ...)
  2022-02-07  4:46 ` [PATCH 19/21] NFS: rename nfs_direct_IO and use as ->swap_rw NeilBrown
@ 2022-02-10 15:22 ` Geert Uytterhoeven
  21 siblings, 0 replies; 33+ messages in thread
From: Geert Uytterhoeven @ 2022-02-10 15:22 UTC (permalink / raw)
  To: NeilBrown
  Cc: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells, open list:NFS,
	SUNRPC, AND...,
	Linux MM, Linux Kernel Mailing List

Hi Neil,

On Wed, Feb 9, 2022 at 11:29 AM NeilBrown <neilb@suse.de> wrote:
> This 4th version of the series address review comment, particularly
> tidying up "NFS: swap IO handling is slightly different for O_DIRECT IO"
> and collect reviewed-by etc.
>
> I've also move 3 NFS patches which depend on the MM patches to the end
> in case they helps maintainers land the patches in a consistent order.
> Those three patches might go through the NFS free after the next merge
> window.

Thanks for the update!
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
(on Renesas RSK+RZA1 with 32 MiB of SDRAM)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 01/21] MM: create new mm/swap.h header file.
  2022-02-10 15:19   ` [PATCH 01/21] " Geert Uytterhoeven
@ 2022-02-14 23:50     ` NeilBrown
  0 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2022-02-14 23:50 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Trond Myklebust, Anna Schumaker, Chuck Lever, Andrew Morton,
	Mark Hemment, Christoph Hellwig, David Howells, open list:NFS,
	SUNRPC, AND...,
	Linux MM, Linux Kernel Mailing List

On Fri, 11 Feb 2022, Geert Uytterhoeven wrote:
> Hi Neil,
> 
> On Wed, Feb 9, 2022 at 10:52 AM NeilBrown <neilb@suse.de> wrote:
> > Many functions declared in include/linux/swap.h are only used within mm/
> >
> > Create a new "mm/swap.h" and move some of these declarations there.
> > Remove the redundant 'extern' from the function declarations.
> >
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: NeilBrown <neilb@suse.de>
> 
> Thanks for your patch!
> 
> > --- /dev/null
> > +++ b/mm/swap.h
> > @@ -0,0 +1,129 @@
> > +
> 
> scripts/checkpatch.pl:
> WARNING: Missing or malformed SPDX-License-Identifier tag in line 1
> 
> Gr{oetje,eeting}s,
> 
>                         Geert

Argg...  I think you pointed that out previously and I forgot to act on
it.
I've now copied the SPDX line from linux/swap.h, and also added that
standard "#ifndef _MM_SWAP_H" etc protection.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-02-14 23:50 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-07  4:46 [PATCH 00/21 V4] Repair SWAP-over_NFS NeilBrown
2022-02-07  4:46 ` [PATCH 03/21] MM: move responsibility for setting SWP_FS_OPS to ->swap_activate NeilBrown
2022-02-07  4:46 ` [PATCH 07/21] DOC: update documentation for swap_activate and swap_rw NeilBrown
2022-02-07  4:46 ` [PATCH 18/21] NFSv4: keep state manager thread active if swap is enabled NeilBrown
2022-02-07  4:46 ` [PATCH 16/21] NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS NeilBrown
2022-02-07  4:46 ` [PATCH 10/21] VFS: Add FMODE_CAN_ODIRECT file flag NeilBrown
2022-02-07  4:46 ` [PATCH 06/21] MM: perform async writes to SWP_FS_OPS swap-space using ->swap_rw NeilBrown
2022-02-07  4:46 ` [PATCH 01/21] MM: create new mm/swap.h header file NeilBrown
2022-02-07 13:15   ` kernel test robot
2022-02-07 14:26   ` kernel test robot
2022-02-07 14:26     ` kernel test robot
2022-02-07 15:18   ` kernel test robot
2022-02-07 15:18     ` kernel test robot
2022-02-10  3:24   ` [PATCH 01/21 - revised] " NeilBrown
2022-02-10 15:19   ` [PATCH 01/21] " Geert Uytterhoeven
2022-02-14 23:50     ` NeilBrown
2022-02-07  4:46 ` [PATCH 09/21] MM: submit multipage write for SWP_FS_OPS swap-space NeilBrown
2022-02-07  8:40   ` Christoph Hellwig
2022-02-07  4:46 ` [PATCH 13/21] SUNRPC/auth: async tasks mustn't block waiting for memory NeilBrown
2022-02-07  4:46 ` [PATCH 02/21] MM: drop swap_set_page_dirty NeilBrown
2022-02-07  4:46 ` [PATCH 04/21] MM: reclaim mustn't enter FS for SWP_FS_OPS swap-space NeilBrown
2022-02-07  4:46 ` [PATCH 14/21] SUNRPC/xprt: async tasks mustn't block waiting for memory NeilBrown
2022-02-07  4:46 ` [PATCH 21/21] NFS: swap-out must always use STABLE writes NeilBrown
2022-02-07  4:46 ` [PATCH 11/21] NFS: remove IS_SWAPFILE hack NeilBrown
2022-02-07  4:46 ` [PATCH 05/21] MM: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space NeilBrown
2022-02-07  4:46 ` [PATCH 08/21] MM: submit multipage reads for " NeilBrown
2022-02-07  4:46 ` [PATCH 20/21] NFS: swap IO handling is slightly different for O_DIRECT IO NeilBrown
2022-02-07  4:46 ` [PATCH 17/21] SUNRPC: improve 'swap' handling: scheduling and PF_MEMALLOC NeilBrown
2022-02-07 15:53   ` Chuck Lever III
2022-02-07  4:46 ` [PATCH 12/21] SUNRPC/call_alloc: async tasks mustn't block waiting for memory NeilBrown
2022-02-07  4:46 ` [PATCH 15/21] SUNRPC: remove scheduling boost for "SWAPPER" tasks NeilBrown
2022-02-07  4:46 ` [PATCH 19/21] NFS: rename nfs_direct_IO and use as ->swap_rw NeilBrown
2022-02-10 15:22 ` [PATCH 00/21 V4] Repair SWAP-over_NFS Geert Uytterhoeven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.