linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] mm: swapoff: fixes for 5.1-rc
@ 2019-04-08 19:53 Hugh Dickins
  2019-04-08 19:56 ` [PATCH 1/4] mm: swapoff: shmem_find_swap_entries() filter out other types Hugh Dickins
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Hugh Dickins @ 2019-04-08 19:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Konstantin Khlebnikov, Alex Xu (Hello71),
	Vineeth Pillai, Kelley Nielsen, Rik van Riel, Huang Ying,
	Hugh Dickins, linux-kernel, linux-mm

Here are four fixes to the new "rid quadratic" swapoff in 5.1-rc:

1/4 mm: swapoff: shmem_find_swap_entries() filter out other types
2/4 mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES
3/4 mm: swapoff: take notice of completion sooner
4/4 mm: swapoff: shmem_unuse() stop eviction without igrab()

 include/linux/shmem_fs.h |    1 
 mm/shmem.c               |   57 +++++++++++++++++--------------------
 mm/swapfile.c            |   32 +++++++++++---------
 3 files changed, 45 insertions(+), 45 deletions(-)

Hugh


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/4] mm: swapoff: shmem_find_swap_entries() filter out other types
  2019-04-08 19:53 [PATCH 0/4] mm: swapoff: fixes for 5.1-rc Hugh Dickins
@ 2019-04-08 19:56 ` Hugh Dickins
  2019-04-08 19:58 ` [PATCH 2/4] mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES Hugh Dickins
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Hugh Dickins @ 2019-04-08 19:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Konstantin Khlebnikov, Alex Xu (Hello71),
	Vineeth Pillai, Kelley Nielsen, Rik van Riel, Huang Ying,
	Hugh Dickins, linux-kernel, linux-mm

Swapfile "type" was passed all the way down to shmem_unuse_inode(), but
then forgotten from shmem_find_swap_entries(): with the result that
removing one swapfile would try to free up all the swap from shmem - no
problem when only one swapfile anyway, but counter-productive when more,
causing swapoff to be unnecessarily OOM-killed when it should succeed.

Fixes: b56a2d8af914 ("mm: rid swapoff of quadratic complexity")
Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/shmem.c |   18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

--- 5.1-rc4/mm/shmem.c	2019-03-17 16:18:15.701823872 -0700
+++ linux/mm/shmem.c	2019-04-07 19:12:23.603858531 -0700
@@ -1099,10 +1099,11 @@ extern struct swap_info_struct *swap_inf
 static int shmem_find_swap_entries(struct address_space *mapping,
 				   pgoff_t start, unsigned int nr_entries,
 				   struct page **entries, pgoff_t *indices,
-				   bool frontswap)
+				   unsigned int type, bool frontswap)
 {
 	XA_STATE(xas, &mapping->i_pages, start);
 	struct page *page;
+	swp_entry_t entry;
 	unsigned int ret = 0;
 
 	if (!nr_entries)
@@ -1116,13 +1117,12 @@ static int shmem_find_swap_entries(struc
 		if (!xa_is_value(page))
 			continue;
 
-		if (frontswap) {
-			swp_entry_t entry = radix_to_swp_entry(page);
-
-			if (!frontswap_test(swap_info[swp_type(entry)],
-					    swp_offset(entry)))
-				continue;
-		}
+		entry = radix_to_swp_entry(page);
+		if (swp_type(entry) != type)
+			continue;
+		if (frontswap &&
+		    !frontswap_test(swap_info[type], swp_offset(entry)))
+			continue;
 
 		indices[ret] = xas.xa_index;
 		entries[ret] = page;
@@ -1194,7 +1194,7 @@ static int shmem_unuse_inode(struct inod
 
 		pvec.nr = shmem_find_swap_entries(mapping, start, nr_entries,
 						  pvec.pages, indices,
-						  frontswap);
+						  type, frontswap);
 		if (pvec.nr == 0) {
 			ret = 0;
 			break;


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/4] mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES
  2019-04-08 19:53 [PATCH 0/4] mm: swapoff: fixes for 5.1-rc Hugh Dickins
  2019-04-08 19:56 ` [PATCH 1/4] mm: swapoff: shmem_find_swap_entries() filter out other types Hugh Dickins
@ 2019-04-08 19:58 ` Hugh Dickins
  2019-04-08 19:59 ` [PATCH 3/4] mm: swapoff: take notice of completion sooner Hugh Dickins
  2019-04-08 20:01 ` [PATCH 4/4] mm: swapoff: shmem_unuse() stop eviction without igrab() Hugh Dickins
  3 siblings, 0 replies; 7+ messages in thread
From: Hugh Dickins @ 2019-04-08 19:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Konstantin Khlebnikov, Alex Xu (Hello71),
	Vineeth Pillai, Kelley Nielsen, Rik van Riel, Huang Ying,
	Hugh Dickins, linux-kernel, linux-mm

SWAP_UNUSE_MAX_TRIES 3 appeared to work well in earlier testing, but
further testing has proved it to be a source of unnecessary swapoff
EBUSY failures (which can then be followed by unmount EBUSY failures).

When mmget_not_zero() or shmem's igrab() fails, there is an mm exiting
or inode being evicted, freeing up swap independent of try_to_unuse().
Those typically completed much sooner than the old quadratic swapoff,
but now it's more common that swapoff may need to wait for them.

It's possible to move those cases from init_mm.mmlist and shmem_swaplist
to separate "exiting" swaplists, and try_to_unuse() then wait for those
lists to be emptied; but we've not bothered with that in the past, and
don't want to risk missing some other forgotten case. So just revert
to cycling around until the swap is gone, without any retries limit.

Fixes: b56a2d8af914 ("mm: rid swapoff of quadratic complexity")
Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/swapfile.c |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

--- 5.1-rc4/mm/swapfile.c	2019-03-17 16:18:15.713823942 -0700
+++ linux/mm/swapfile.c	2019-04-07 19:15:01.269054187 -0700
@@ -2023,7 +2023,6 @@ static unsigned int find_next_to_unuse(s
  * If the boolean frontswap is true, only unuse pages_to_unuse pages;
  * pages_to_unuse==0 means all pages; ignored if frontswap is false
  */
-#define SWAP_UNUSE_MAX_TRIES 3
 int try_to_unuse(unsigned int type, bool frontswap,
 		 unsigned long pages_to_unuse)
 {
@@ -2035,7 +2034,6 @@ int try_to_unuse(unsigned int type, bool
 	struct page *page;
 	swp_entry_t entry;
 	unsigned int i;
-	int retries = 0;
 
 	if (!si->inuse_pages)
 		return 0;
@@ -2117,14 +2115,16 @@ retry:
 	 * If yes, we would need to do retry the unuse logic again.
 	 * Under global memory pressure, swap entries can be reinserted back
 	 * into process space after the mmlist loop above passes over them.
-	 * Its not worth continuosuly retrying to unuse the swap in this case.
-	 * So we try SWAP_UNUSE_MAX_TRIES times.
+	 *
+	 * Limit the number of retries? No: when shmem_unuse()'s igrab() fails,
+	 * a shmem inode using swap is being evicted; and when mmget_not_zero()
+	 * above fails, that mm is likely to be freeing swap from exit_mmap().
+	 * Both proceed at their own independent pace: we could move them to
+	 * separate lists, and wait for those lists to be emptied; but it's
+	 * easier and more robust (though cpu-intensive) just to keep retrying.
 	 */
-	if (++retries >= SWAP_UNUSE_MAX_TRIES)
-		retval = -EBUSY;
-	else if (si->inuse_pages)
+	if (si->inuse_pages)
 		goto retry;
-
 out:
 	return (retval == FRONTSWAP_PAGES_UNUSED) ? 0 : retval;
 }


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 3/4] mm: swapoff: take notice of completion sooner
  2019-04-08 19:53 [PATCH 0/4] mm: swapoff: fixes for 5.1-rc Hugh Dickins
  2019-04-08 19:56 ` [PATCH 1/4] mm: swapoff: shmem_find_swap_entries() filter out other types Hugh Dickins
  2019-04-08 19:58 ` [PATCH 2/4] mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES Hugh Dickins
@ 2019-04-08 19:59 ` Hugh Dickins
  2019-04-08 20:01 ` [PATCH 4/4] mm: swapoff: shmem_unuse() stop eviction without igrab() Hugh Dickins
  3 siblings, 0 replies; 7+ messages in thread
From: Hugh Dickins @ 2019-04-08 19:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Konstantin Khlebnikov, Alex Xu (Hello71),
	Vineeth Pillai, Kelley Nielsen, Rik van Riel, Huang Ying,
	Hugh Dickins, linux-kernel, linux-mm

The old try_to_unuse() implementation was driven by find_next_to_unuse(),
which terminated as soon as all the swap had been freed.  Add inuse_pages
checks now (alongside signal_pending()) to stop scanning mms and swap_map
once finished.  The same ought to be done in shmem_unuse() too, but never
was before, and needs a different interface: so leave it as is for now.

Fixes: b56a2d8af914 ("mm: rid swapoff of quadratic complexity")
Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/swapfile.c |   19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

--- 5.1-rc4/mm/swapfile.c	2019-04-07 19:15:01.269054187 -0700
+++ linux/mm/swapfile.c	2019-04-07 19:17:13.291957539 -0700
@@ -2051,11 +2051,9 @@ retry:
 
 	spin_lock(&mmlist_lock);
 	p = &init_mm.mmlist;
-	while ((p = p->next) != &init_mm.mmlist) {
-		if (signal_pending(current)) {
-			retval = -EINTR;
-			break;
-		}
+	while (si->inuse_pages &&
+	       !signal_pending(current) &&
+	       (p = p->next) != &init_mm.mmlist) {
 
 		mm = list_entry(p, struct mm_struct, mmlist);
 		if (!mmget_not_zero(mm))
@@ -2082,7 +2080,9 @@ retry:
 	mmput(prev_mm);
 
 	i = 0;
-	while ((i = find_next_to_unuse(si, i, frontswap)) != 0) {
+	while (si->inuse_pages &&
+	       !signal_pending(current) &&
+	       (i = find_next_to_unuse(si, i, frontswap)) != 0) {
 
 		entry = swp_entry(type, i);
 		page = find_get_page(swap_address_space(entry), i);
@@ -2123,8 +2123,11 @@ retry:
 	 * separate lists, and wait for those lists to be emptied; but it's
 	 * easier and more robust (though cpu-intensive) just to keep retrying.
 	 */
-	if (si->inuse_pages)
-		goto retry;
+	if (si->inuse_pages) {
+		if (!signal_pending(current))
+			goto retry;
+		retval = -EINTR;
+	}
 out:
 	return (retval == FRONTSWAP_PAGES_UNUSED) ? 0 : retval;
 }


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 4/4] mm: swapoff: shmem_unuse() stop eviction without igrab()
  2019-04-08 19:53 [PATCH 0/4] mm: swapoff: fixes for 5.1-rc Hugh Dickins
                   ` (2 preceding siblings ...)
  2019-04-08 19:59 ` [PATCH 3/4] mm: swapoff: take notice of completion sooner Hugh Dickins
@ 2019-04-08 20:01 ` Hugh Dickins
  2019-04-09  7:50   ` Konstantin Khlebnikov
  3 siblings, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2019-04-08 20:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Konstantin Khlebnikov, Alex Xu (Hello71),
	Vineeth Pillai, Kelley Nielsen, Rik van Riel, Huang Ying,
	Hugh Dickins, linux-kernel, linux-mm

The igrab() in shmem_unuse() looks good, but we forgot that it gives no
protection against concurrent unmounting: a point made by Konstantin
Khlebnikov eight years ago, and then fixed in 2.6.39 by 778dd893ae78
("tmpfs: fix race between umount and swapoff"). The current 5.1-rc
swapoff is liable to hit "VFS: Busy inodes after unmount of tmpfs.
Self-destruct in 5 seconds.  Have a nice day..." followed by GPF.

Once again, give up on using igrab(); but don't go back to making such
heavy-handed use of shmem_swaplist_mutex as last time: that would spoil
the new design, and I expect could deadlock inside shmem_swapin_page().

Instead, shmem_unuse() just raise a "stop_eviction" count in the shmem-
specific inode, and shmem_evict_inode() wait for that to go down to 0.
Call it "stop_eviction" rather than "swapoff_busy" because it can be
put to use for others later (huge tmpfs patches expect to use it).

That simplifies shmem_unuse(), protecting it from both unlink and unmount;
and in practice lets it locate all the swap in its first try.  But do not
rely on that: there's still a theoretical case, when shmem_writepage()
might have been preempted after its get_swap_page(), before making the
swap entry visible to swapoff.

Fixes: b56a2d8af914 ("mm: rid swapoff of quadratic complexity")
Signed-off-by: Hugh Dickins <hughd@google.com>
---

 include/linux/shmem_fs.h |    1 +
 mm/shmem.c               |   39 ++++++++++++++++++---------------------
 mm/swapfile.c            |   11 +++++------
 3 files changed, 24 insertions(+), 27 deletions(-)

--- 5.1-rc4/include/linux/shmem_fs.h	2019-03-17 16:18:15.181820820 -0700
+++ linux/include/linux/shmem_fs.h	2019-04-07 19:18:43.248639711 -0700
@@ -21,6 +21,7 @@ struct shmem_inode_info {
 	struct list_head	swaplist;	/* chain of maybes on swap */
 	struct shared_policy	policy;		/* NUMA memory alloc policy */
 	struct simple_xattrs	xattrs;		/* list of xattrs */
+	atomic_t		stop_eviction;	/* hold when working on inode */
 	struct inode		vfs_inode;
 };
 
--- 5.1-rc4/mm/shmem.c	2019-04-07 19:12:23.603858531 -0700
+++ linux/mm/shmem.c	2019-04-07 19:18:43.248639711 -0700
@@ -1081,9 +1081,15 @@ static void shmem_evict_inode(struct ino
 			}
 			spin_unlock(&sbinfo->shrinklist_lock);
 		}
-		if (!list_empty(&info->swaplist)) {
+		while (!list_empty(&info->swaplist)) {
+			/* Wait while shmem_unuse() is scanning this inode... */
+			wait_var_event(&info->stop_eviction,
+				       !atomic_read(&info->stop_eviction));
 			mutex_lock(&shmem_swaplist_mutex);
 			list_del_init(&info->swaplist);
+			/* ...but beware of the race if we peeked too early */
+			if (!atomic_read(&info->stop_eviction))
+				list_del_init(&info->swaplist);
 			mutex_unlock(&shmem_swaplist_mutex);
 		}
 	}
@@ -1227,36 +1233,27 @@ int shmem_unuse(unsigned int type, bool
 		unsigned long *fs_pages_to_unuse)
 {
 	struct shmem_inode_info *info, *next;
-	struct inode *inode;
-	struct inode *prev_inode = NULL;
 	int error = 0;
 
 	if (list_empty(&shmem_swaplist))
 		return 0;
 
 	mutex_lock(&shmem_swaplist_mutex);
-
-	/*
-	 * The extra refcount on the inode is necessary to safely dereference
-	 * p->next after re-acquiring the lock. New shmem inodes with swap
-	 * get added to the end of the list and we will scan them all.
-	 */
 	list_for_each_entry_safe(info, next, &shmem_swaplist, swaplist) {
 		if (!info->swapped) {
 			list_del_init(&info->swaplist);
 			continue;
 		}
-
-		inode = igrab(&info->vfs_inode);
-		if (!inode)
-			continue;
-
+		/*
+		 * Drop the swaplist mutex while searching the inode for swap;
+		 * but before doing so, make sure shmem_evict_inode() will not
+		 * remove placeholder inode from swaplist, nor let it be freed
+		 * (igrab() would protect from unlink, but not from unmount).
+		 */
+		atomic_inc(&info->stop_eviction);
 		mutex_unlock(&shmem_swaplist_mutex);
-		if (prev_inode)
-			iput(prev_inode);
-		prev_inode = inode;
 
-		error = shmem_unuse_inode(inode, type, frontswap,
+		error = shmem_unuse_inode(&info->vfs_inode, type, frontswap,
 					  fs_pages_to_unuse);
 		cond_resched();
 
@@ -1264,14 +1261,13 @@ int shmem_unuse(unsigned int type, bool
 		next = list_next_entry(info, swaplist);
 		if (!info->swapped)
 			list_del_init(&info->swaplist);
+		if (atomic_dec_and_test(&info->stop_eviction))
+			wake_up_var(&info->stop_eviction);
 		if (error)
 			break;
 	}
 	mutex_unlock(&shmem_swaplist_mutex);
 
-	if (prev_inode)
-		iput(prev_inode);
-
 	return error;
 }
 
@@ -2238,6 +2234,7 @@ static struct inode *shmem_get_inode(str
 		info = SHMEM_I(inode);
 		memset(info, 0, (char *)inode - (char *)info);
 		spin_lock_init(&info->lock);
+		atomic_set(&info->stop_eviction, 0);
 		info->seals = F_SEAL_SEAL;
 		info->flags = flags & VM_NORESERVE;
 		INIT_LIST_HEAD(&info->shrinklist);
--- 5.1-rc4/mm/swapfile.c	2019-04-07 19:17:13.291957539 -0700
+++ linux/mm/swapfile.c	2019-04-07 19:18:43.248639711 -0700
@@ -2116,12 +2116,11 @@ retry:
 	 * Under global memory pressure, swap entries can be reinserted back
 	 * into process space after the mmlist loop above passes over them.
 	 *
-	 * Limit the number of retries? No: when shmem_unuse()'s igrab() fails,
-	 * a shmem inode using swap is being evicted; and when mmget_not_zero()
-	 * above fails, that mm is likely to be freeing swap from exit_mmap().
-	 * Both proceed at their own independent pace: we could move them to
-	 * separate lists, and wait for those lists to be emptied; but it's
-	 * easier and more robust (though cpu-intensive) just to keep retrying.
+	 * Limit the number of retries? No: when mmget_not_zero() above fails,
+	 * that mm is likely to be freeing swap from exit_mmap(), which proceeds
+	 * at its own independent pace; and even shmem_writepage() could have
+	 * been preempted after get_swap_page(), temporarily hiding that swap.
+	 * It's easy and robust (though cpu-intensive) just to keep retrying.
 	 */
 	if (si->inuse_pages) {
 		if (!signal_pending(current))


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/4] mm: swapoff: shmem_unuse() stop eviction without igrab()
  2019-04-08 20:01 ` [PATCH 4/4] mm: swapoff: shmem_unuse() stop eviction without igrab() Hugh Dickins
@ 2019-04-09  7:50   ` Konstantin Khlebnikov
  2019-04-09 18:43     ` Hugh Dickins
  0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Khlebnikov @ 2019-04-09  7:50 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton
  Cc: Alex Xu (Hello71),
	Vineeth Pillai, Kelley Nielsen, Rik van Riel, Huang Ying,
	linux-kernel, linux-mm

On 08.04.2019 23:01, Hugh Dickins wrote:
> The igrab() in shmem_unuse() looks good, but we forgot that it gives no
> protection against concurrent unmounting: a point made by Konstantin
> Khlebnikov eight years ago, and then fixed in 2.6.39 by 778dd893ae78
> ("tmpfs: fix race between umount and swapoff"). The current 5.1-rc
> swapoff is liable to hit "VFS: Busy inodes after unmount of tmpfs.
> Self-destruct in 5 seconds.  Have a nice day..." followed by GPF.
> 
> Once again, give up on using igrab(); but don't go back to making such
> heavy-handed use of shmem_swaplist_mutex as last time: that would spoil
> the new design, and I expect could deadlock inside shmem_swapin_page().
> 
> Instead, shmem_unuse() just raise a "stop_eviction" count in the shmem-
> specific inode, and shmem_evict_inode() wait for that to go down to 0.
> Call it "stop_eviction" rather than "swapoff_busy" because it can be
> put to use for others later (huge tmpfs patches expect to use it).
> 
> That simplifies shmem_unuse(), protecting it from both unlink and unmount;
> and in practice lets it locate all the swap in its first try.  But do not
> rely on that: there's still a theoretical case, when shmem_writepage()
> might have been preempted after its get_swap_page(), before making the
> swap entry visible to swapoff.
> 
> Fixes: b56a2d8af914 ("mm: rid swapoff of quadratic complexity")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> 
>   include/linux/shmem_fs.h |    1 +
>   mm/shmem.c               |   39 ++++++++++++++++++---------------------
>   mm/swapfile.c            |   11 +++++------
>   3 files changed, 24 insertions(+), 27 deletions(-)
> 
> --- 5.1-rc4/include/linux/shmem_fs.h	2019-03-17 16:18:15.181820820 -0700
> +++ linux/include/linux/shmem_fs.h	2019-04-07 19:18:43.248639711 -0700
> @@ -21,6 +21,7 @@ struct shmem_inode_info {
>   	struct list_head	swaplist;	/* chain of maybes on swap */
>   	struct shared_policy	policy;		/* NUMA memory alloc policy */
>   	struct simple_xattrs	xattrs;		/* list of xattrs */
> +	atomic_t		stop_eviction;	/* hold when working on inode */
>   	struct inode		vfs_inode;
>   };
>   
> --- 5.1-rc4/mm/shmem.c	2019-04-07 19:12:23.603858531 -0700
> +++ linux/mm/shmem.c	2019-04-07 19:18:43.248639711 -0700
> @@ -1081,9 +1081,15 @@ static void shmem_evict_inode(struct ino
>   			}
>   			spin_unlock(&sbinfo->shrinklist_lock);
>   		}
> -		if (!list_empty(&info->swaplist)) {
> +		while (!list_empty(&info->swaplist)) {
> +			/* Wait while shmem_unuse() is scanning this inode... */
> +			wait_var_event(&info->stop_eviction,
> +				       !atomic_read(&info->stop_eviction));
>   			mutex_lock(&shmem_swaplist_mutex);

>   			list_del_init(&info->swaplist);

Obviously, line above should be deleted.

> +			/* ...but beware of the race if we peeked too early */
> +			if (!atomic_read(&info->stop_eviction))
> +				list_del_init(&info->swaplist);
>   			mutex_unlock(&shmem_swaplist_mutex);
>   		}
>   	}
> @@ -1227,36 +1233,27 @@ int shmem_unuse(unsigned int type, bool
>   		unsigned long *fs_pages_to_unuse)
>   {
>   	struct shmem_inode_info *info, *next;
> -	struct inode *inode;
> -	struct inode *prev_inode = NULL;
>   	int error = 0;
>   
>   	if (list_empty(&shmem_swaplist))
>   		return 0;
>   
>   	mutex_lock(&shmem_swaplist_mutex);
> -
> -	/*
> -	 * The extra refcount on the inode is necessary to safely dereference
> -	 * p->next after re-acquiring the lock. New shmem inodes with swap
> -	 * get added to the end of the list and we will scan them all.
> -	 */
>   	list_for_each_entry_safe(info, next, &shmem_swaplist, swaplist) {
>   		if (!info->swapped) {
>   			list_del_init(&info->swaplist);
>   			continue;
>   		}
> -
> -		inode = igrab(&info->vfs_inode);
> -		if (!inode)
> -			continue;
> -
> +		/*
> +		 * Drop the swaplist mutex while searching the inode for swap;
> +		 * but before doing so, make sure shmem_evict_inode() will not
> +		 * remove placeholder inode from swaplist, nor let it be freed
> +		 * (igrab() would protect from unlink, but not from unmount).
> +		 */
> +		atomic_inc(&info->stop_eviction);
>   		mutex_unlock(&shmem_swaplist_mutex);
> -		if (prev_inode)
> -			iput(prev_inode);
> -		prev_inode = inode;
>   
> -		error = shmem_unuse_inode(inode, type, frontswap,
> +		error = shmem_unuse_inode(&info->vfs_inode, type, frontswap,
>   					  fs_pages_to_unuse);
>   		cond_resched();
>   
> @@ -1264,14 +1261,13 @@ int shmem_unuse(unsigned int type, bool
>   		next = list_next_entry(info, swaplist);
>   		if (!info->swapped)
>   			list_del_init(&info->swaplist);
> +		if (atomic_dec_and_test(&info->stop_eviction))
> +			wake_up_var(&info->stop_eviction);
>   		if (error)
>   			break;
>   	}
>   	mutex_unlock(&shmem_swaplist_mutex);
>   
> -	if (prev_inode)
> -		iput(prev_inode);
> -
>   	return error;
>   }
>   
> @@ -2238,6 +2234,7 @@ static struct inode *shmem_get_inode(str
>   		info = SHMEM_I(inode);
>   		memset(info, 0, (char *)inode - (char *)info);
>   		spin_lock_init(&info->lock);
> +		atomic_set(&info->stop_eviction, 0);
>   		info->seals = F_SEAL_SEAL;
>   		info->flags = flags & VM_NORESERVE;
>   		INIT_LIST_HEAD(&info->shrinklist);
> --- 5.1-rc4/mm/swapfile.c	2019-04-07 19:17:13.291957539 -0700
> +++ linux/mm/swapfile.c	2019-04-07 19:18:43.248639711 -0700
> @@ -2116,12 +2116,11 @@ retry:
>   	 * Under global memory pressure, swap entries can be reinserted back
>   	 * into process space after the mmlist loop above passes over them.
>   	 *
> -	 * Limit the number of retries? No: when shmem_unuse()'s igrab() fails,
> -	 * a shmem inode using swap is being evicted; and when mmget_not_zero()
> -	 * above fails, that mm is likely to be freeing swap from exit_mmap().
> -	 * Both proceed at their own independent pace: we could move them to
> -	 * separate lists, and wait for those lists to be emptied; but it's
> -	 * easier and more robust (though cpu-intensive) just to keep retrying.
> +	 * Limit the number of retries? No: when mmget_not_zero() above fails,
> +	 * that mm is likely to be freeing swap from exit_mmap(), which proceeds
> +	 * at its own independent pace; and even shmem_writepage() could have
> +	 * been preempted after get_swap_page(), temporarily hiding that swap.
> +	 * It's easy and robust (though cpu-intensive) just to keep retrying.
>   	 */
>   	if (si->inuse_pages) {
>   		if (!signal_pending(current))
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 4/4] mm: swapoff: shmem_unuse() stop eviction without igrab()
  2019-04-09  7:50   ` Konstantin Khlebnikov
@ 2019-04-09 18:43     ` Hugh Dickins
  0 siblings, 0 replies; 7+ messages in thread
From: Hugh Dickins @ 2019-04-09 18:43 UTC (permalink / raw)
  To: Andrew Morton, Konstantin Khlebnikov
  Cc: Hugh Dickins, Alex Xu (Hello71),
	Vineeth Pillai, Kelley Nielsen, Rik van Riel, Huang Ying,
	linux-kernel, linux-mm

On Tue, 9 Apr 2019, Konstantin Khlebnikov wrote:
> On 08.04.2019 23:01, Hugh Dickins wrote:
> > -		if (!list_empty(&info->swaplist)) {
> > +		while (!list_empty(&info->swaplist)) {
> > +			/* Wait while shmem_unuse() is scanning this inode...
> > */
> > +			wait_var_event(&info->stop_eviction,
> > +				       !atomic_read(&info->stop_eviction));
> >   			mutex_lock(&shmem_swaplist_mutex);
> >   			list_del_init(&info->swaplist);
> 
> Obviously, line above should be deleted.

Definitely. Worryingly stupid. I guess I left it behind while translating
from an earlier tree.  Many thanks for catching that in time, Konstantin.
I've rechecked the rest of this patch, and the others, and didn't find
anything else as stupid.

Andrew, please add this fixup for folding in - thanks:

[PATCH] mm: swapoff: shmem_unuse() stop eviction without igrab() fix

Fix my stupidity, thankfully caught by Konstantin.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
Fix to fold into mm-swapoff-shmem_unuse-stop-eviction-without-igrab.patch

 mm/shmem.c |    1 -
 1 file changed, 1 deletion(-)

--- patch4/mm/shmem.c	2019-04-07 19:18:43.248639711 -0700
+++ patch5/mm/shmem.c	2019-04-09 11:24:32.745337734 -0700
@@ -1086,7 +1086,6 @@ static void shmem_evict_inode(struct ino
 			wait_var_event(&info->stop_eviction,
 				       !atomic_read(&info->stop_eviction));
 			mutex_lock(&shmem_swaplist_mutex);
-			list_del_init(&info->swaplist);
 			/* ...but beware of the race if we peeked too early */
 			if (!atomic_read(&info->stop_eviction))
 				list_del_init(&info->swaplist);


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-04-09 18:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-08 19:53 [PATCH 0/4] mm: swapoff: fixes for 5.1-rc Hugh Dickins
2019-04-08 19:56 ` [PATCH 1/4] mm: swapoff: shmem_find_swap_entries() filter out other types Hugh Dickins
2019-04-08 19:58 ` [PATCH 2/4] mm: swapoff: remove too limiting SWAP_UNUSE_MAX_TRIES Hugh Dickins
2019-04-08 19:59 ` [PATCH 3/4] mm: swapoff: take notice of completion sooner Hugh Dickins
2019-04-08 20:01 ` [PATCH 4/4] mm: swapoff: shmem_unuse() stop eviction without igrab() Hugh Dickins
2019-04-09  7:50   ` Konstantin Khlebnikov
2019-04-09 18:43     ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).