linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] SHM_UNLOCK: fix long unpreemptible section
@ 2012-01-06 21:10 Hugh Dickins
  2012-01-06 21:12 ` [PATCH 2/2] SHM_UNLOCK: fix Unevictable pages stranded after swap Hugh Dickins
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Hugh Dickins @ 2012-01-06 21:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Minchan Kim, Rik van Riel, Shaohua Li,
	Eric Dumazet, Johannes Weiner, Michel Lespinasse, linux-kernel,
	linux-mm

scan_mapping_unevictable_pages() is used to make SysV SHM_LOCKed pages
evictable again once the shared memory is unlocked.  It does this with
pagevec_lookup()s across the whole object (which might occupy most of
memory), and takes 300ms to unlock 7GB here.  A cond_resched() every
PAGEVEC_SIZE pages would be good.

However, KOSAKI-san points out that this is called under shmem.c's
info->lock, and it's also under shm.c's shm_lock(), both spinlocks.
There is no strong reason for that: we need to take these pages off
the unevictable list soonish, but those locks are not required for it.

So move the call to scan_mapping_unevictable_pages() from shmem.c's
unlock handling up to shm.c's unlock handling.  Remove the recently
added barrier, not needed now we have spin_unlock() before the scan.

Use get_file(), with subsequent fput(), to make sure we have a
reference to mapping throughout scan_mapping_unevictable_pages():
that's something that was previously guaranteed by the shm_lock().

Remove shmctl's lru_add_drain_all(): we don't fault in pages at
SHM_LOCK time, and we lazily discover them to be Unevictable later,
so it serves no purpose for SHM_LOCK; and serves no purpose for
SHM_UNLOCK, since pages still on pagevec are not marked Unevictable.

The original code avoided redundant rescans by checking VM_LOCKED
flag at its level: now avoid them by checking shp's SHM_LOCKED.

The original code called scan_mapping_unevictable_pages() on a
locked area at shm_destroy() time: perhaps we once had accounting
cross-checks which required that, but not now, so skip the overhead
and just let inode eviction deal with them.

Put check_move_unevictable_page() and scan_mapping_unevictable_pages()
under CONFIG_SHMEM (with stub for the TINY case when ramfs is used),
more as comment than to save space; comment them used for SHM_UNLOCK.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org [back to 2.6.32 but will need respins]
---
This version of the patch spun to apply on top of mmotm.

 ipc/shm.c   |   37 ++++++++++++++++++++++---------------
 mm/shmem.c  |    7 -------
 mm/vmscan.c |   12 +++++++++++-
 3 files changed, 33 insertions(+), 23 deletions(-)

--- mmotm.orig/ipc/shm.c	2012-01-06 10:04:54.000000000 -0800
+++ mmotm/ipc/shm.c	2012-01-06 10:06:13.937943603 -0800
@@ -870,9 +870,7 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int,
 	case SHM_LOCK:
 	case SHM_UNLOCK:
 	{
-		struct file *uninitialized_var(shm_file);
-
-		lru_add_drain_all();  /* drain pagevecs to lru lists */
+		struct file *shm_file;
 
 		shp = shm_lock_check(ns, shmid);
 		if (IS_ERR(shp)) {
@@ -895,22 +893,31 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int,
 		err = security_shm_shmctl(shp, cmd);
 		if (err)
 			goto out_unlock;
-		
-		if(cmd==SHM_LOCK) {
+
+		shm_file = shp->shm_file;
+		if (is_file_hugepages(shm_file))
+			goto out_unlock;
+
+		if (cmd == SHM_LOCK) {
 			struct user_struct *user = current_user();
-			if (!is_file_hugepages(shp->shm_file)) {
-				err = shmem_lock(shp->shm_file, 1, user);
-				if (!err && !(shp->shm_perm.mode & SHM_LOCKED)){
-					shp->shm_perm.mode |= SHM_LOCKED;
-					shp->mlock_user = user;
-				}
+			err = shmem_lock(shm_file, 1, user);
+			if (!err && !(shp->shm_perm.mode & SHM_LOCKED)) {
+				shp->shm_perm.mode |= SHM_LOCKED;
+				shp->mlock_user = user;
 			}
-		} else if (!is_file_hugepages(shp->shm_file)) {
-			shmem_lock(shp->shm_file, 0, shp->mlock_user);
-			shp->shm_perm.mode &= ~SHM_LOCKED;
-			shp->mlock_user = NULL;
+			goto out_unlock;
 		}
+
+		/* SHM_UNLOCK */
+		if (!(shp->shm_perm.mode & SHM_LOCKED))
+			goto out_unlock;
+		shmem_lock(shm_file, 0, shp->mlock_user);
+		shp->shm_perm.mode &= ~SHM_LOCKED;
+		shp->mlock_user = NULL;
+		get_file(shm_file);
 		shm_unlock(shp);
+		scan_mapping_unevictable_pages(shm_file->f_mapping);
+		fput(shm_file);
 		goto out;
 	}
 	case IPC_RMID:
--- mmotm.orig/mm/shmem.c	2012-01-06 10:05:00.000000000 -0800
+++ mmotm/mm/shmem.c	2012-01-06 10:08:05.505947516 -0800
@@ -1068,13 +1068,6 @@ int shmem_lock(struct file *file, int lo
 		user_shm_unlock(inode->i_size, user);
 		info->flags &= ~VM_LOCKED;
 		mapping_clear_unevictable(file->f_mapping);
-		/*
-		 * Ensure that a racing putback_lru_page() can see
-		 * the pages of this mapping are evictable when we
-		 * skip them due to !PageLRU during the scan.
-		 */
-		smp_mb__after_clear_bit();
-		scan_mapping_unevictable_pages(file->f_mapping);
 	}
 	retval = 0;
 
--- mmotm.orig/mm/vmscan.c	2012-01-06 10:04:54.000000000 -0800
+++ mmotm/mm/vmscan.c	2012-01-06 10:06:13.941943604 -0800
@@ -3499,6 +3499,7 @@ int page_evictable(struct page *page, st
 	return 1;
 }
 
+#ifdef CONFIG_SHMEM
 /**
  * check_move_unevictable_page - check page for evictability and move to appropriate zone lru list
  * @page: page to check evictability and move to appropriate lru list
@@ -3509,6 +3510,8 @@ int page_evictable(struct page *page, st
  *
  * Restrictions: zone->lru_lock must be held, page must be on LRU and must
  * have PageUnevictable set.
+ *
+ * This function is only used for SysV IPC SHM_UNLOCK.
  */
 static void check_move_unevictable_page(struct page *page, struct zone *zone)
 {
@@ -3545,6 +3548,8 @@ retry:
  *
  * Scan all pages in mapping.  Check unevictable pages for
  * evictability and move them to the appropriate zone lru list.
+ *
+ * This function is only used for SysV IPC SHM_UNLOCK.
  */
 void scan_mapping_unevictable_pages(struct address_space *mapping)
 {
@@ -3590,9 +3595,14 @@ void scan_mapping_unevictable_pages(stru
 		pagevec_release(&pvec);
 
 		count_vm_events(UNEVICTABLE_PGSCANNED, pg_scanned);
+		cond_resched();
 	}
-
 }
+#else
+void scan_mapping_unevictable_pages(struct address_space *mapping)
+{
+}
+#endif /* CONFIG_SHMEM */
 
 static void warn_scan_unevictable_pages(void)
 {

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/2] SHM_UNLOCK: fix Unevictable pages stranded after swap
  2012-01-06 21:10 [PATCH 1/2] SHM_UNLOCK: fix long unpreemptible section Hugh Dickins
@ 2012-01-06 21:12 ` Hugh Dickins
  2012-01-09 20:42   ` KOSAKI Motohiro
  2012-01-15  0:20   ` Hugh Dickins
  2012-01-07  8:28 ` [PATCH 1/2] SHM_UNLOCK: fix long unpreemptible section KOSAKI Motohiro
  2012-01-15  0:18 ` Hugh Dickins
  2 siblings, 2 replies; 11+ messages in thread
From: Hugh Dickins @ 2012-01-06 21:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Minchan Kim, Rik van Riel, Shaohua Li,
	Eric Dumazet, Johannes Weiner, Michel Lespinasse, linux-kernel,
	linux-mm

Commit cc39c6a9bbde "mm: account skipped entries to avoid looping in
find_get_pages" correctly fixed an infinite loop; but left a problem
that find_get_pages() on shmem would return 0 (appearing to callers
to mean end of tree) when it meets a run of nr_pages swap entries.

The only uses of find_get_pages() on shmem are via pagevec_lookup(),
called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's
scan_mapping_unevictable_pages().  The first is already commented,
and not worth worrying about; but the second can leave pages on the
Unevictable list after an unusual sequence of swapping and locking.

Fix that by using shmem_find_get_pages_and_swap() (then ignoring
the swap) instead of pagevec_lookup().

But I don't want to contaminate vmscan.c with shmem internals, nor
shmem.c with LRU locking.  So move scan_mapping_unevictable_pages()
into shmem.c, renaming it shmem_unlock_mapping(); and rename
check_move_unevictable_page() to check_move_unevictable_pages(),
looping down an array of pages, oftentimes under the same lock.

Leave out the "rotate unevictable list" block: that's a leftover
from when this was used for /proc/sys/vm/scan_unevictable_pages,
whose flawed handling involved looking at pages at tail of LRU.

Was there significance to the sequence first ClearPageUnevictable,
then test page_evictable, then SetPageUnevictable here?  I think
not, we're under LRU lock, and have no barriers between those.

Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org [back to 3.1 but will need respins]
---
This version of the patch spun to apply on top of mmotm.

 include/linux/shmem_fs.h |    1 
 include/linux/swap.h     |    2 
 ipc/shm.c                |    2 
 mm/shmem.c               |   46 +++++++++++--
 mm/vmscan.c              |  128 +++++++++++--------------------------
 5 files changed, 83 insertions(+), 96 deletions(-)

--- mmotm.orig/include/linux/shmem_fs.h	2012-01-06 09:55:50.901928800 -0800
+++ mmotm/include/linux/shmem_fs.h	2012-01-06 10:10:15.645949349 -0800
@@ -48,6 +48,7 @@ extern struct file *shmem_file_setup(con
 					loff_t size, unsigned long flags);
 extern int shmem_zero_setup(struct vm_area_struct *);
 extern int shmem_lock(struct file *file, int lock, struct user_struct *user);
+extern void shmem_unlock_mapping(struct address_space *mapping);
 extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
 					pgoff_t index, gfp_t gfp_mask);
 extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
--- mmotm.orig/include/linux/swap.h	2012-01-06 09:55:50.901928800 -0800
+++ mmotm/include/linux/swap.h	2012-01-06 10:10:15.645949349 -0800
@@ -279,7 +279,7 @@ static inline int zone_reclaim(struct zo
 #endif
 
 extern int page_evictable(struct page *page, struct vm_area_struct *vma);
-extern void scan_mapping_unevictable_pages(struct address_space *);
+extern void check_move_unevictable_pages(struct page **, int nr_pages);
 
 extern unsigned long scan_unevictable_pages;
 extern int scan_unevictable_handler(struct ctl_table *, int,
--- mmotm.orig/ipc/shm.c	2012-01-06 10:06:13.937943603 -0800
+++ mmotm/ipc/shm.c	2012-01-06 10:10:15.649949348 -0800
@@ -916,7 +916,7 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int,
 		shp->mlock_user = NULL;
 		get_file(shm_file);
 		shm_unlock(shp);
-		scan_mapping_unevictable_pages(shm_file->f_mapping);
+		shmem_unlock_mapping(shm_file->f_mapping);
 		fput(shm_file);
 		goto out;
 	}
--- mmotm.orig/mm/shmem.c	2012-01-06 10:08:05.505947516 -0800
+++ mmotm/mm/shmem.c	2012-01-06 10:10:15.649949348 -0800
@@ -379,7 +379,7 @@ static int shmem_free_swap(struct addres
 /*
  * Pagevec may contain swap entries, so shuffle up pages before releasing.
  */
-static void shmem_pagevec_release(struct pagevec *pvec)
+static void shmem_deswap_pagevec(struct pagevec *pvec)
 {
 	int i, j;
 
@@ -389,7 +389,36 @@ static void shmem_pagevec_release(struct
 			pvec->pages[j++] = page;
 	}
 	pvec->nr = j;
-	pagevec_release(pvec);
+}
+
+/*
+ * SysV IPC SHM_UNLOCK restore Unevictable pages to their evictable lists.
+ */
+void shmem_unlock_mapping(struct address_space *mapping)
+{
+	struct pagevec pvec;
+	pgoff_t indices[PAGEVEC_SIZE];
+	pgoff_t index = 0;
+
+	pagevec_init(&pvec, 0);
+	/*
+	 * Minor point, but we might as well stop if someone else SHM_LOCKs it.
+	 */
+	while (!mapping_unevictable(mapping)) {
+		/*
+		 * Avoid pagevec_lookup(): find_get_pages() returns 0 as if it
+		 * has finished, if it hits a row of PAGEVEC_SIZE swap entries.
+		 */
+		pvec.nr = shmem_find_get_pages_and_swap(mapping, index,
+					PAGEVEC_SIZE, pvec.pages, indices);
+		if (!pvec.nr)
+			break;
+		index = indices[pvec.nr - 1] + 1;
+		shmem_deswap_pagevec(&pvec);
+		check_move_unevictable_pages(pvec.pages, pvec.nr);
+		pagevec_release(&pvec);
+		cond_resched();
+	}
 }
 
 /*
@@ -440,7 +469,8 @@ void shmem_truncate_range(struct inode *
 			}
 			unlock_page(page);
 		}
-		shmem_pagevec_release(&pvec);
+		shmem_deswap_pagevec(&pvec);
+		pagevec_release(&pvec);
 		mem_cgroup_uncharge_end();
 		cond_resched();
 		index++;
@@ -470,7 +500,8 @@ void shmem_truncate_range(struct inode *
 			continue;
 		}
 		if (index == start && indices[0] > end) {
-			shmem_pagevec_release(&pvec);
+			shmem_deswap_pagevec(&pvec);
+			pagevec_release(&pvec);
 			break;
 		}
 		mem_cgroup_uncharge_start();
@@ -494,7 +525,8 @@ void shmem_truncate_range(struct inode *
 			}
 			unlock_page(page);
 		}
-		shmem_pagevec_release(&pvec);
+		shmem_deswap_pagevec(&pvec);
+		pagevec_release(&pvec);
 		mem_cgroup_uncharge_end();
 		index++;
 	}
@@ -2439,6 +2471,10 @@ int shmem_lock(struct file *file, int lo
 	return 0;
 }
 
+void shmem_unlock_mapping(struct address_space *mapping)
+{
+}
+
 void shmem_truncate_range(struct inode *inode, loff_t lstart, loff_t lend)
 {
 	truncate_inode_pages_range(inode->i_mapping, lstart, lend);
--- mmotm.orig/mm/vmscan.c	2012-01-06 10:06:13.941943604 -0800
+++ mmotm/mm/vmscan.c	2012-01-06 10:10:15.649949348 -0800
@@ -26,7 +26,6 @@
 #include <linux/buffer_head.h>	/* for try_to_release_page(),
 					buffer_heads_over_limit */
 #include <linux/mm_inline.h>
-#include <linux/pagevec.h>
 #include <linux/backing-dev.h>
 #include <linux/rmap.h>
 #include <linux/topology.h>
@@ -661,7 +660,7 @@ redo:
 		 * When racing with an mlock or AS_UNEVICTABLE clearing
 		 * (page is unlocked) make sure that if the other thread
 		 * does not observe our setting of PG_lru and fails
-		 * isolation/check_move_unevictable_page,
+		 * isolation/check_move_unevictable_pages,
 		 * we see PG_mlocked/AS_UNEVICTABLE cleared below and move
 		 * the page back to the evictable list.
 		 *
@@ -3501,107 +3500,58 @@ int page_evictable(struct page *page, st
 
 #ifdef CONFIG_SHMEM
 /**
- * check_move_unevictable_page - check page for evictability and move to appropriate zone lru list
- * @page: page to check evictability and move to appropriate lru list
- * @zone: zone page is in
+ * check_move_unevictable_pages - check pages for evictability and move to appropriate zone lru list
+ * @pages:	array of pages to check
+ * @nr_pages:	number of pages to check
  *
- * Checks a page for evictability and moves the page to the appropriate
- * zone lru list.
- *
- * Restrictions: zone->lru_lock must be held, page must be on LRU and must
- * have PageUnevictable set.
+ * Checks pages for evictability and moves them to the appropriate lru list.
  *
  * This function is only used for SysV IPC SHM_UNLOCK.
  */
-static void check_move_unevictable_page(struct page *page, struct zone *zone)
+void check_move_unevictable_pages(struct page **pages, int nr_pages)
 {
 	struct lruvec *lruvec;
+	struct zone *zone = NULL;
+	int pgscanned = 0;
+	int pgrescued = 0;
+	int i;
 
-	VM_BUG_ON(PageActive(page));
-retry:
-	ClearPageUnevictable(page);
-	if (page_evictable(page, NULL)) {
-		enum lru_list l = page_lru_base_type(page);
-
-		__dec_zone_state(zone, NR_UNEVICTABLE);
-		lruvec = mem_cgroup_lru_move_lists(zone, page,
-						   LRU_UNEVICTABLE, l);
-		list_move(&page->lru, &lruvec->lists[l]);
-		__inc_zone_state(zone, NR_INACTIVE_ANON + l);
-		__count_vm_event(UNEVICTABLE_PGRESCUED);
-	} else {
-		/*
-		 * rotate unevictable list
-		 */
-		SetPageUnevictable(page);
-		lruvec = mem_cgroup_lru_move_lists(zone, page, LRU_UNEVICTABLE,
-						   LRU_UNEVICTABLE);
-		list_move(&page->lru, &lruvec->lists[LRU_UNEVICTABLE]);
-		if (page_evictable(page, NULL))
-			goto retry;
-	}
-}
-
-/**
- * scan_mapping_unevictable_pages - scan an address space for evictable pages
- * @mapping: struct address_space to scan for evictable pages
- *
- * Scan all pages in mapping.  Check unevictable pages for
- * evictability and move them to the appropriate zone lru list.
- *
- * This function is only used for SysV IPC SHM_UNLOCK.
- */
-void scan_mapping_unevictable_pages(struct address_space *mapping)
-{
-	pgoff_t next = 0;
-	pgoff_t end   = (i_size_read(mapping->host) + PAGE_CACHE_SIZE - 1) >>
-			 PAGE_CACHE_SHIFT;
-	struct zone *zone;
-	struct pagevec pvec;
+	for (i = 0; i < nr_pages; i++) {
+		struct page *page = pages[i];
+		struct zone *pagezone;
+
+		pgscanned++;
+		pagezone = page_zone(page);
+		if (pagezone != zone) {
+			if (zone)
+				spin_unlock_irq(&zone->lru_lock);
+			zone = pagezone;
+			spin_lock_irq(&zone->lru_lock);
+		}
 
-	if (mapping->nrpages == 0)
-		return;
+		if (!PageLRU(page) || !PageUnevictable(page))
+			continue;
 
-	pagevec_init(&pvec, 0);
-	while (next < end &&
-		pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
-		int i;
-		int pg_scanned = 0;
-
-		zone = NULL;
-
-		for (i = 0; i < pagevec_count(&pvec); i++) {
-			struct page *page = pvec.pages[i];
-			pgoff_t page_index = page->index;
-			struct zone *pagezone = page_zone(page);
-
-			pg_scanned++;
-			if (page_index > next)
-				next = page_index;
-			next++;
-
-			if (pagezone != zone) {
-				if (zone)
-					spin_unlock_irq(&zone->lru_lock);
-				zone = pagezone;
-				spin_lock_irq(&zone->lru_lock);
-			}
+		if (page_evictable(page, NULL)) {
+			enum lru_list lru = page_lru_base_type(page);
 
-			if (PageLRU(page) && PageUnevictable(page))
-				check_move_unevictable_page(page, zone);
+			VM_BUG_ON(PageActive(page));
+			ClearPageUnevictable(page);
+			__dec_zone_state(zone, NR_UNEVICTABLE);
+			lruvec = mem_cgroup_lru_move_lists(zone, page,
+						LRU_UNEVICTABLE, lru);
+			list_move(&page->lru, &lruvec->lists[lru]);
+			__inc_zone_state(zone, NR_INACTIVE_ANON + lru);
+			pgrescued++;
 		}
-		if (zone)
-			spin_unlock_irq(&zone->lru_lock);
-		pagevec_release(&pvec);
+	}
 
-		count_vm_events(UNEVICTABLE_PGSCANNED, pg_scanned);
-		cond_resched();
+	if (zone) {
+		__count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued);
+		__count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned);
+		spin_unlock_irq(&zone->lru_lock);
 	}
 }
-#else
-void scan_mapping_unevictable_pages(struct address_space *mapping)
-{
-}
 #endif /* CONFIG_SHMEM */
 
 static void warn_scan_unevictable_pages(void)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] SHM_UNLOCK: fix long unpreemptible section
  2012-01-06 21:10 [PATCH 1/2] SHM_UNLOCK: fix long unpreemptible section Hugh Dickins
  2012-01-06 21:12 ` [PATCH 2/2] SHM_UNLOCK: fix Unevictable pages stranded after swap Hugh Dickins
@ 2012-01-07  8:28 ` KOSAKI Motohiro
  2012-01-15  0:18 ` Hugh Dickins
  2 siblings, 0 replies; 11+ messages in thread
From: KOSAKI Motohiro @ 2012-01-07  8:28 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Minchan Kim, Rik van Riel, Shaohua Li,
	Eric Dumazet, Johannes Weiner, Michel Lespinasse, linux-kernel,
	linux-mm

(1/6/12 4:10 PM), Hugh Dickins wrote:
> scan_mapping_unevictable_pages() is used to make SysV SHM_LOCKed pages
> evictable again once the shared memory is unlocked.  It does this with
> pagevec_lookup()s across the whole object (which might occupy most of
> memory), and takes 300ms to unlock 7GB here.  A cond_resched() every
> PAGEVEC_SIZE pages would be good.
>
> However, KOSAKI-san points out that this is called under shmem.c's
> info->lock, and it's also under shm.c's shm_lock(), both spinlocks.
> There is no strong reason for that: we need to take these pages off
> the unevictable list soonish, but those locks are not required for it.
>
> So move the call to scan_mapping_unevictable_pages() from shmem.c's
> unlock handling up to shm.c's unlock handling.  Remove the recently
> added barrier, not needed now we have spin_unlock() before the scan.
>
> Use get_file(), with subsequent fput(), to make sure we have a
> reference to mapping throughout scan_mapping_unevictable_pages():
> that's something that was previously guaranteed by the shm_lock().
>
> Remove shmctl's lru_add_drain_all(): we don't fault in pages at
> SHM_LOCK time, and we lazily discover them to be Unevictable later,
> so it serves no purpose for SHM_LOCK; and serves no purpose for
> SHM_UNLOCK, since pages still on pagevec are not marked Unevictable.
>
> The original code avoided redundant rescans by checking VM_LOCKED
> flag at its level: now avoid them by checking shp's SHM_LOCKED.
>
> The original code called scan_mapping_unevictable_pages() on a
> locked area at shm_destroy() time: perhaps we once had accounting
> cross-checks which required that, but not now, so skip the overhead
> and just let inode eviction deal with them.
>
> Put check_move_unevictable_page() and scan_mapping_unevictable_pages()
> under CONFIG_SHMEM (with stub for the TINY case when ramfs is used),
> more as comment than to save space; comment them used for SHM_UNLOCK.
>
> Signed-off-by: Hugh Dickins<hughd@google.com>
> Cc: stable@vger.kernel.org [back to 2.6.32 but will need respins]

Looks completely make sense.
	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] SHM_UNLOCK: fix Unevictable pages stranded after swap
  2012-01-06 21:12 ` [PATCH 2/2] SHM_UNLOCK: fix Unevictable pages stranded after swap Hugh Dickins
@ 2012-01-09 20:42   ` KOSAKI Motohiro
  2012-01-09 22:25     ` Hugh Dickins
  2012-01-15  0:20   ` Hugh Dickins
  1 sibling, 1 reply; 11+ messages in thread
From: KOSAKI Motohiro @ 2012-01-09 20:42 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: kosaki.motohiro, Andrew Morton, Minchan Kim, Rik van Riel,
	Shaohua Li, Eric Dumazet, Johannes Weiner, Michel Lespinasse,
	linux-kernel, linux-mm



2012/1/6 Hugh Dickins <hughd@google.com>:
> Commit cc39c6a9bbde "mm: account skipped entries to avoid looping in
> find_get_pages" correctly fixed an infinite loop; but left a problem
> that find_get_pages() on shmem would return 0 (appearing to callers
> to mean end of tree) when it meets a run of nr_pages swap entries.
>
> The only uses of find_get_pages() on shmem are via pagevec_lookup(),
> called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's
> scan_mapping_unevictable_pages().  The first is already commented,
> and not worth worrying about; but the second can leave pages on the
> Unevictable list after an unusual sequence of swapping and locking.
>
> Fix that by using shmem_find_get_pages_and_swap() (then ignoring
> the swap) instead of pagevec_lookup().
>
> But I don't want to contaminate vmscan.c with shmem internals, nor
> shmem.c with LRU locking.  So move scan_mapping_unevictable_pages()
> into shmem.c, renaming it shmem_unlock_mapping(); and rename
> check_move_unevictable_page() to check_move_unevictable_pages(),
> looping down an array of pages, oftentimes under the same lock.
>
> Leave out the "rotate unevictable list" block: that's a leftover
> from when this was used for /proc/sys/vm/scan_unevictable_pages,
> whose flawed handling involved looking at pages at tail of LRU.
>
> Was there significance to the sequence first ClearPageUnevictable,
> then test page_evictable, then SetPageUnevictable here?  I think
> not, we're under LRU lock, and have no barriers between those.

If I understand correctly, this is not exactly correct. Because of,
PG_mlocked operation is not protected by LRU lock. So, I think we
have three choice.

1) check_move_unevictable_pages() aimed retry logic and put pages back
    into correct lru.
2) check_move_unevictable_pages() unconditionally move the pages into
    evictable lru, and vmacan put them back into correct lru later.
3) To protect PG_mlock operation by lru lock.


other parts looks fine to me.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] SHM_UNLOCK: fix Unevictable pages stranded after swap
  2012-01-09 20:42   ` KOSAKI Motohiro
@ 2012-01-09 22:25     ` Hugh Dickins
  2012-01-09 23:09       ` KOSAKI Motohiro
  0 siblings, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2012-01-09 22:25 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Minchan Kim, Rik van Riel, Shaohua Li,
	Eric Dumazet, Johannes Weiner, Michel Lespinasse, linux-kernel,
	linux-mm

On Mon, 9 Jan 2012, KOSAKI Motohiro wrote:
> 2012/1/6 Hugh Dickins <hughd@google.com>:

[ check_move_unevictable_page(s) ]

> > 
> > Leave out the "rotate unevictable list" block: that's a leftover
> > from when this was used for /proc/sys/vm/scan_unevictable_pages,
> > whose flawed handling involved looking at pages at tail of LRU.
> > 
> > Was there significance to the sequence first ClearPageUnevictable,
> > then test page_evictable, then SetPageUnevictable here?  I think
> > not, we're under LRU lock, and have no barriers between those.
> 
> If I understand correctly, this is not exactly correct. Because of,

Thank you for giving it serious thought:
such races are hard work to think about.

> PG_mlocked operation is not protected by LRU lock. So, I think we

Right.  But I don't see that I've made a significant change there.

I may be being lazy, and rushing back to answer you, without giving
constructive thought to what the precise race is that you see, and
how we might fix it.  If the case you have in mind is easy for you
to describe in detail, please do so; but don't hesitate to tell me
to my own work for myself!

Since the original code didn't have any barriers in it (in the
!page_evictable path, in the global case: I think that's true of the
memcg case also, but that is more complicated), how did it differ from

retry:
	if (page_evictable)
		blah blah blah;
	else if (page_evictable)
		goto retry;

which could be made even "safer" ;) if it were replaced by

retry:
	if (page_evictable)
		blah blah blah;
	else if (page_evictable)
		goto retry;
	else if (page_evictable)
		goto retry;

putback_lru_page() goes through similar doubts as to whether it's made
the right decision ("Oh, did I leave the oven on?"), but it does contain
an explicit smp_mb() and comment.

I am being lazy, I haven't even stopped to convince myself that that
smp_mb() is correctly placed (I'm not saying it isn't, I just haven't
done the thinking).

> have three choice.
> 
> 1) check_move_unevictable_pages() aimed retry logic and put pages back
>    into correct lru.

I think we'd need more than just the old retry.

> 2) check_move_unevictable_pages() unconditionally move the pages into
>    evictable lru, and vmacan put them back into correct lru later.

That's a good thought: these are all pages which we allowed to be found
Unevictable lazily in the first place, so why should be so anxious to
assign them right now.  Ah, but if they are still unevictable, then it's
because they're Mlocked, and we do make much more effort to account the
Mlocked pages right.  We'd probably best keep on trying.

> 3) To protect PG_mlock operation by lru lock.

Surely not if we can avoid it.  Certainly not to bolster my cleanup.

> 
> other parts looks fine to me.

Thanks,
Hugh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] SHM_UNLOCK: fix Unevictable pages stranded after swap
  2012-01-09 22:25     ` Hugh Dickins
@ 2012-01-09 23:09       ` KOSAKI Motohiro
  2012-01-09 23:56         ` Hugh Dickins
  0 siblings, 1 reply; 11+ messages in thread
From: KOSAKI Motohiro @ 2012-01-09 23:09 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Minchan Kim, Rik van Riel, Shaohua Li,
	Eric Dumazet, Johannes Weiner, Michel Lespinasse, linux-kernel,
	linux-mm

(1/9/12 5:25 PM), Hugh Dickins wrote:
> On Mon, 9 Jan 2012, KOSAKI Motohiro wrote:
>> 2012/1/6 Hugh Dickins<hughd@google.com>:
>
> [ check_move_unevictable_page(s) ]
>
>>>
>>> Leave out the "rotate unevictable list" block: that's a leftover
>>> from when this was used for /proc/sys/vm/scan_unevictable_pages,
>>> whose flawed handling involved looking at pages at tail of LRU.
>>>
>>> Was there significance to the sequence first ClearPageUnevictable,
>>> then test page_evictable, then SetPageUnevictable here?  I think
>>> not, we're under LRU lock, and have no barriers between those.
>>
>> If I understand correctly, this is not exactly correct. Because of,
>
> Thank you for giving it serious thought:
> such races are hard work to think about.
>
>> PG_mlocked operation is not protected by LRU lock. So, I think we
>
> Right.  But I don't see that I've made a significant change there.
>
> I may be being lazy, and rushing back to answer you, without giving
> constructive thought to what the precise race is that you see, and
> how we might fix it.  If the case you have in mind is easy for you
> to describe in detail, please do so; but don't hesitate to tell me
> to my own work for myself!

Bah! I was moron. I now think your code is right.

spin_lock(lru_lock)
if (page_evictable(page))
	blah blah blah
spin_unlock(lru_lock)

is always safe. Counter part should have following code and
waiting spin_lock(lru_lock) in isolate_lru_page().

                 if (!isolate_lru_page(page))
                         putback_lru_page(page);

then, even if check_move_unevictable_pages() observed wrong page status,
putback_lru_page() should put back the page into right lru.

I'm very sorry for annoying you.

	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>



Probably, page_evictable() might be needed some additional comments. But
I have no idea what comment clearly explain this complex rule.....
  

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] SHM_UNLOCK: fix Unevictable pages stranded after swap
  2012-01-09 23:09       ` KOSAKI Motohiro
@ 2012-01-09 23:56         ` Hugh Dickins
  0 siblings, 0 replies; 11+ messages in thread
From: Hugh Dickins @ 2012-01-09 23:56 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Minchan Kim, Rik van Riel, Shaohua Li,
	Eric Dumazet, Johannes Weiner, Michel Lespinasse, linux-kernel,
	linux-mm

On Mon, 9 Jan 2012, KOSAKI Motohiro wrote:
> (1/9/12 5:25 PM), Hugh Dickins wrote:
> > On Mon, 9 Jan 2012, KOSAKI Motohiro wrote:
> > 
> > > PG_mlocked operation is not protected by LRU lock. So, I think we
> > 
> > Right.  But I don't see that I've made a significant change there.
> > 
> > I may be being lazy, and rushing back to answer you, without giving
> > constructive thought to what the precise race is that you see, and
> > how we might fix it.  If the case you have in mind is easy for you
> > to describe in detail, please do so; but don't hesitate to tell me
> > to my own work for myself!
> 
> Bah! I was moron. I now think your code is right.
> 
> spin_lock(lru_lock)
> if (page_evictable(page))
> 	blah blah blah
> spin_unlock(lru_lock)
> 
> is always safe. Counter part should have following code and
> waiting spin_lock(lru_lock) in isolate_lru_page().
> 
>                 if (!isolate_lru_page(page))
>                         putback_lru_page(page);
> 
> then, even if check_move_unevictable_pages() observed wrong page status,
> putback_lru_page() should put back the page into right lru.
> 
> I'm very sorry for annoying you.

Far from it, thank you again for giving it serious thought.

I am not going to pretend to have thought down these paths myself,
not recently - I was just relying on not changing the behaviour.
But I am reassured to know that you have worked through it again
and are now satisfied.

> 
> 	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Thank you.

> 
> Probably, page_evictable() might be needed some additional comments. But
> I have no idea what comment clearly explain this complex rule.....

I don't know any language that can make it clear: when forced to,
one just has to think through it back and forth by oneself; and
even then, it's so quickly forgotten.

Hugh

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] SHM_UNLOCK: fix long unpreemptible section
  2012-01-06 21:10 [PATCH 1/2] SHM_UNLOCK: fix long unpreemptible section Hugh Dickins
  2012-01-06 21:12 ` [PATCH 2/2] SHM_UNLOCK: fix Unevictable pages stranded after swap Hugh Dickins
  2012-01-07  8:28 ` [PATCH 1/2] SHM_UNLOCK: fix long unpreemptible section KOSAKI Motohiro
@ 2012-01-15  0:18 ` Hugh Dickins
  2012-01-18 22:37   ` Andrew Morton
  2 siblings, 1 reply; 11+ messages in thread
From: Hugh Dickins @ 2012-01-15  0:18 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Minchan Kim, Rik van Riel, Shaohua Li,
	Eric Dumazet, Johannes Weiner, Michel Lespinasse, linux-kernel,
	linux-mm

scan_mapping_unevictable_pages() is used to make SysV SHM_LOCKed pages
evictable again once the shared memory is unlocked.  It does this with
pagevec_lookup()s across the whole object (which might occupy most of
memory), and takes 300ms to unlock 7GB here.  A cond_resched() every
PAGEVEC_SIZE pages would be good.

However, KOSAKI-san points out that this is called under shmem.c's
info->lock, and it's also under shm.c's shm_lock(), both spinlocks.
There is no strong reason for that: we need to take these pages off
the unevictable list soonish, but those locks are not required for it.

So move the call to scan_mapping_unevictable_pages() from shmem.c's
unlock handling up to shm.c's unlock handling.  Remove the recently
added barrier, not needed now we have spin_unlock() before the scan.

Use get_file(), with subsequent fput(), to make sure we have a
reference to mapping throughout scan_mapping_unevictable_pages():
that's something that was previously guaranteed by the shm_lock().

Remove shmctl's lru_add_drain_all(): we don't fault in pages at
SHM_LOCK time, and we lazily discover them to be Unevictable later,
so it serves no purpose for SHM_LOCK; and serves no purpose for
SHM_UNLOCK, since pages still on pagevec are not marked Unevictable.

The original code avoided redundant rescans by checking VM_LOCKED
flag at its level: now avoid them by checking shp's SHM_LOCKED.

The original code called scan_mapping_unevictable_pages() on a
locked area at shm_destroy() time: perhaps we once had accounting
cross-checks which required that, but not now, so skip the overhead
and just let inode eviction deal with them.

Put check_move_unevictable_page() and scan_mapping_unevictable_pages()
under CONFIG_SHMEM (with stub for the TINY case when ramfs is used),
more as comment than to save space; comment them used for SHM_UNLOCK.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: stable@vger.kernel.org [back to 2.6.32 but will need respins]
---
Resend in the hope that it can get into 3.3.

 ipc/shm.c   |   37 ++++++++++++++++++++++---------------
 mm/shmem.c  |    7 -------
 mm/vmscan.c |   12 +++++++++++-
 3 files changed, 33 insertions(+), 23 deletions(-)

--- mmotm.orig/ipc/shm.c	2012-01-06 10:04:54.000000000 -0800
+++ mmotm/ipc/shm.c	2012-01-06 10:06:13.937943603 -0800
@@ -870,9 +870,7 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int,
 	case SHM_LOCK:
 	case SHM_UNLOCK:
 	{
-		struct file *uninitialized_var(shm_file);
-
-		lru_add_drain_all();  /* drain pagevecs to lru lists */
+		struct file *shm_file;
 
 		shp = shm_lock_check(ns, shmid);
 		if (IS_ERR(shp)) {
@@ -895,22 +893,31 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int,
 		err = security_shm_shmctl(shp, cmd);
 		if (err)
 			goto out_unlock;
-		
-		if(cmd==SHM_LOCK) {
+
+		shm_file = shp->shm_file;
+		if (is_file_hugepages(shm_file))
+			goto out_unlock;
+
+		if (cmd == SHM_LOCK) {
 			struct user_struct *user = current_user();
-			if (!is_file_hugepages(shp->shm_file)) {
-				err = shmem_lock(shp->shm_file, 1, user);
-				if (!err && !(shp->shm_perm.mode & SHM_LOCKED)){
-					shp->shm_perm.mode |= SHM_LOCKED;
-					shp->mlock_user = user;
-				}
+			err = shmem_lock(shm_file, 1, user);
+			if (!err && !(shp->shm_perm.mode & SHM_LOCKED)) {
+				shp->shm_perm.mode |= SHM_LOCKED;
+				shp->mlock_user = user;
 			}
-		} else if (!is_file_hugepages(shp->shm_file)) {
-			shmem_lock(shp->shm_file, 0, shp->mlock_user);
-			shp->shm_perm.mode &= ~SHM_LOCKED;
-			shp->mlock_user = NULL;
+			goto out_unlock;
 		}
+
+		/* SHM_UNLOCK */
+		if (!(shp->shm_perm.mode & SHM_LOCKED))
+			goto out_unlock;
+		shmem_lock(shm_file, 0, shp->mlock_user);
+		shp->shm_perm.mode &= ~SHM_LOCKED;
+		shp->mlock_user = NULL;
+		get_file(shm_file);
 		shm_unlock(shp);
+		scan_mapping_unevictable_pages(shm_file->f_mapping);
+		fput(shm_file);
 		goto out;
 	}
 	case IPC_RMID:
--- mmotm.orig/mm/shmem.c	2012-01-06 10:05:00.000000000 -0800
+++ mmotm/mm/shmem.c	2012-01-06 10:08:05.505947516 -0800
@@ -1068,13 +1068,6 @@ int shmem_lock(struct file *file, int lo
 		user_shm_unlock(inode->i_size, user);
 		info->flags &= ~VM_LOCKED;
 		mapping_clear_unevictable(file->f_mapping);
-		/*
-		 * Ensure that a racing putback_lru_page() can see
-		 * the pages of this mapping are evictable when we
-		 * skip them due to !PageLRU during the scan.
-		 */
-		smp_mb__after_clear_bit();
-		scan_mapping_unevictable_pages(file->f_mapping);
 	}
 	retval = 0;
 
--- mmotm.orig/mm/vmscan.c	2012-01-06 10:04:54.000000000 -0800
+++ mmotm/mm/vmscan.c	2012-01-06 10:06:13.941943604 -0800
@@ -3499,6 +3499,7 @@ int page_evictable(struct page *page, st
 	return 1;
 }
 
+#ifdef CONFIG_SHMEM
 /**
  * check_move_unevictable_page - check page for evictability and move to appropriate zone lru list
  * @page: page to check evictability and move to appropriate lru list
@@ -3509,6 +3510,8 @@ int page_evictable(struct page *page, st
  *
  * Restrictions: zone->lru_lock must be held, page must be on LRU and must
  * have PageUnevictable set.
+ *
+ * This function is only used for SysV IPC SHM_UNLOCK.
  */
 static void check_move_unevictable_page(struct page *page, struct zone *zone)
 {
@@ -3545,6 +3548,8 @@ retry:
  *
  * Scan all pages in mapping.  Check unevictable pages for
  * evictability and move them to the appropriate zone lru list.
+ *
+ * This function is only used for SysV IPC SHM_UNLOCK.
  */
 void scan_mapping_unevictable_pages(struct address_space *mapping)
 {
@@ -3590,9 +3595,14 @@ void scan_mapping_unevictable_pages(stru
 		pagevec_release(&pvec);
 
 		count_vm_events(UNEVICTABLE_PGSCANNED, pg_scanned);
+		cond_resched();
 	}
-
 }
+#else
+void scan_mapping_unevictable_pages(struct address_space *mapping)
+{
+}
+#endif /* CONFIG_SHMEM */
 
 static void warn_scan_unevictable_pages(void)
 {

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/2] SHM_UNLOCK: fix Unevictable pages stranded after swap
  2012-01-06 21:12 ` [PATCH 2/2] SHM_UNLOCK: fix Unevictable pages stranded after swap Hugh Dickins
  2012-01-09 20:42   ` KOSAKI Motohiro
@ 2012-01-15  0:20   ` Hugh Dickins
  1 sibling, 0 replies; 11+ messages in thread
From: Hugh Dickins @ 2012-01-15  0:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Minchan Kim, Rik van Riel, Shaohua Li,
	Eric Dumazet, Johannes Weiner, Michel Lespinasse, linux-kernel,
	linux-mm

Commit cc39c6a9bbde "mm: account skipped entries to avoid looping in
find_get_pages" correctly fixed an infinite loop; but left a problem
that find_get_pages() on shmem would return 0 (appearing to callers
to mean end of tree) when it meets a run of nr_pages swap entries.

The only uses of find_get_pages() on shmem are via pagevec_lookup(),
called from invalidate_mapping_pages(), and from shmctl SHM_UNLOCK's
scan_mapping_unevictable_pages().  The first is already commented,
and not worth worrying about; but the second can leave pages on the
Unevictable list after an unusual sequence of swapping and locking.

Fix that by using shmem_find_get_pages_and_swap() (then ignoring
the swap) instead of pagevec_lookup().

But I don't want to contaminate vmscan.c with shmem internals, nor
shmem.c with LRU locking.  So move scan_mapping_unevictable_pages()
into shmem.c, renaming it shmem_unlock_mapping(); and rename
check_move_unevictable_page() to check_move_unevictable_pages(),
looping down an array of pages, oftentimes under the same lock.

Leave out the "rotate unevictable list" block: that's a leftover
from when this was used for /proc/sys/vm/scan_unevictable_pages,
whose flawed handling involved looking at pages at tail of LRU.

Was there significance to the sequence first ClearPageUnevictable,
then test page_evictable, then SetPageUnevictable here?  I think
not, we're under LRU lock, and have no barriers between those.

Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: stable@vger.kernel.org [back to 3.1 but will need respins]
---
Resend in the hope that it can get into 3.3.

 include/linux/shmem_fs.h |    1 
 include/linux/swap.h     |    2 
 ipc/shm.c                |    2 
 mm/shmem.c               |   46 +++++++++++--
 mm/vmscan.c              |  128 +++++++++++--------------------------
 5 files changed, 83 insertions(+), 96 deletions(-)

--- mmotm.orig/include/linux/shmem_fs.h	2012-01-06 09:55:50.901928800 -0800
+++ mmotm/include/linux/shmem_fs.h	2012-01-06 10:10:15.645949349 -0800
@@ -48,6 +48,7 @@ extern struct file *shmem_file_setup(con
 					loff_t size, unsigned long flags);
 extern int shmem_zero_setup(struct vm_area_struct *);
 extern int shmem_lock(struct file *file, int lock, struct user_struct *user);
+extern void shmem_unlock_mapping(struct address_space *mapping);
 extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
 					pgoff_t index, gfp_t gfp_mask);
 extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
--- mmotm.orig/include/linux/swap.h	2012-01-06 09:55:50.901928800 -0800
+++ mmotm/include/linux/swap.h	2012-01-06 10:10:15.645949349 -0800
@@ -279,7 +279,7 @@ static inline int zone_reclaim(struct zo
 #endif
 
 extern int page_evictable(struct page *page, struct vm_area_struct *vma);
-extern void scan_mapping_unevictable_pages(struct address_space *);
+extern void check_move_unevictable_pages(struct page **, int nr_pages);
 
 extern unsigned long scan_unevictable_pages;
 extern int scan_unevictable_handler(struct ctl_table *, int,
--- mmotm.orig/ipc/shm.c	2012-01-06 10:06:13.937943603 -0800
+++ mmotm/ipc/shm.c	2012-01-06 10:10:15.649949348 -0800
@@ -916,7 +916,7 @@ SYSCALL_DEFINE3(shmctl, int, shmid, int,
 		shp->mlock_user = NULL;
 		get_file(shm_file);
 		shm_unlock(shp);
-		scan_mapping_unevictable_pages(shm_file->f_mapping);
+		shmem_unlock_mapping(shm_file->f_mapping);
 		fput(shm_file);
 		goto out;
 	}
--- mmotm.orig/mm/shmem.c	2012-01-06 10:08:05.505947516 -0800
+++ mmotm/mm/shmem.c	2012-01-06 10:10:15.649949348 -0800
@@ -379,7 +379,7 @@ static int shmem_free_swap(struct addres
 /*
  * Pagevec may contain swap entries, so shuffle up pages before releasing.
  */
-static void shmem_pagevec_release(struct pagevec *pvec)
+static void shmem_deswap_pagevec(struct pagevec *pvec)
 {
 	int i, j;
 
@@ -389,7 +389,36 @@ static void shmem_pagevec_release(struct
 			pvec->pages[j++] = page;
 	}
 	pvec->nr = j;
-	pagevec_release(pvec);
+}
+
+/*
+ * SysV IPC SHM_UNLOCK restore Unevictable pages to their evictable lists.
+ */
+void shmem_unlock_mapping(struct address_space *mapping)
+{
+	struct pagevec pvec;
+	pgoff_t indices[PAGEVEC_SIZE];
+	pgoff_t index = 0;
+
+	pagevec_init(&pvec, 0);
+	/*
+	 * Minor point, but we might as well stop if someone else SHM_LOCKs it.
+	 */
+	while (!mapping_unevictable(mapping)) {
+		/*
+		 * Avoid pagevec_lookup(): find_get_pages() returns 0 as if it
+		 * has finished, if it hits a row of PAGEVEC_SIZE swap entries.
+		 */
+		pvec.nr = shmem_find_get_pages_and_swap(mapping, index,
+					PAGEVEC_SIZE, pvec.pages, indices);
+		if (!pvec.nr)
+			break;
+		index = indices[pvec.nr - 1] + 1;
+		shmem_deswap_pagevec(&pvec);
+		check_move_unevictable_pages(pvec.pages, pvec.nr);
+		pagevec_release(&pvec);
+		cond_resched();
+	}
 }
 
 /*
@@ -440,7 +469,8 @@ void shmem_truncate_range(struct inode *
 			}
 			unlock_page(page);
 		}
-		shmem_pagevec_release(&pvec);
+		shmem_deswap_pagevec(&pvec);
+		pagevec_release(&pvec);
 		mem_cgroup_uncharge_end();
 		cond_resched();
 		index++;
@@ -470,7 +500,8 @@ void shmem_truncate_range(struct inode *
 			continue;
 		}
 		if (index == start && indices[0] > end) {
-			shmem_pagevec_release(&pvec);
+			shmem_deswap_pagevec(&pvec);
+			pagevec_release(&pvec);
 			break;
 		}
 		mem_cgroup_uncharge_start();
@@ -494,7 +525,8 @@ void shmem_truncate_range(struct inode *
 			}
 			unlock_page(page);
 		}
-		shmem_pagevec_release(&pvec);
+		shmem_deswap_pagevec(&pvec);
+		pagevec_release(&pvec);
 		mem_cgroup_uncharge_end();
 		index++;
 	}
@@ -2439,6 +2471,10 @@ int shmem_lock(struct file *file, int lo
 	return 0;
 }
 
+void shmem_unlock_mapping(struct address_space *mapping)
+{
+}
+
 void shmem_truncate_range(struct inode *inode, loff_t lstart, loff_t lend)
 {
 	truncate_inode_pages_range(inode->i_mapping, lstart, lend);
--- mmotm.orig/mm/vmscan.c	2012-01-06 10:06:13.941943604 -0800
+++ mmotm/mm/vmscan.c	2012-01-06 10:10:15.649949348 -0800
@@ -26,7 +26,6 @@
 #include <linux/buffer_head.h>	/* for try_to_release_page(),
 					buffer_heads_over_limit */
 #include <linux/mm_inline.h>
-#include <linux/pagevec.h>
 #include <linux/backing-dev.h>
 #include <linux/rmap.h>
 #include <linux/topology.h>
@@ -661,7 +660,7 @@ redo:
 		 * When racing with an mlock or AS_UNEVICTABLE clearing
 		 * (page is unlocked) make sure that if the other thread
 		 * does not observe our setting of PG_lru and fails
-		 * isolation/check_move_unevictable_page,
+		 * isolation/check_move_unevictable_pages,
 		 * we see PG_mlocked/AS_UNEVICTABLE cleared below and move
 		 * the page back to the evictable list.
 		 *
@@ -3501,107 +3500,58 @@ int page_evictable(struct page *page, st
 
 #ifdef CONFIG_SHMEM
 /**
- * check_move_unevictable_page - check page for evictability and move to appropriate zone lru list
- * @page: page to check evictability and move to appropriate lru list
- * @zone: zone page is in
+ * check_move_unevictable_pages - check pages for evictability and move to appropriate zone lru list
+ * @pages:	array of pages to check
+ * @nr_pages:	number of pages to check
  *
- * Checks a page for evictability and moves the page to the appropriate
- * zone lru list.
- *
- * Restrictions: zone->lru_lock must be held, page must be on LRU and must
- * have PageUnevictable set.
+ * Checks pages for evictability and moves them to the appropriate lru list.
  *
  * This function is only used for SysV IPC SHM_UNLOCK.
  */
-static void check_move_unevictable_page(struct page *page, struct zone *zone)
+void check_move_unevictable_pages(struct page **pages, int nr_pages)
 {
 	struct lruvec *lruvec;
+	struct zone *zone = NULL;
+	int pgscanned = 0;
+	int pgrescued = 0;
+	int i;
 
-	VM_BUG_ON(PageActive(page));
-retry:
-	ClearPageUnevictable(page);
-	if (page_evictable(page, NULL)) {
-		enum lru_list l = page_lru_base_type(page);
-
-		__dec_zone_state(zone, NR_UNEVICTABLE);
-		lruvec = mem_cgroup_lru_move_lists(zone, page,
-						   LRU_UNEVICTABLE, l);
-		list_move(&page->lru, &lruvec->lists[l]);
-		__inc_zone_state(zone, NR_INACTIVE_ANON + l);
-		__count_vm_event(UNEVICTABLE_PGRESCUED);
-	} else {
-		/*
-		 * rotate unevictable list
-		 */
-		SetPageUnevictable(page);
-		lruvec = mem_cgroup_lru_move_lists(zone, page, LRU_UNEVICTABLE,
-						   LRU_UNEVICTABLE);
-		list_move(&page->lru, &lruvec->lists[LRU_UNEVICTABLE]);
-		if (page_evictable(page, NULL))
-			goto retry;
-	}
-}
-
-/**
- * scan_mapping_unevictable_pages - scan an address space for evictable pages
- * @mapping: struct address_space to scan for evictable pages
- *
- * Scan all pages in mapping.  Check unevictable pages for
- * evictability and move them to the appropriate zone lru list.
- *
- * This function is only used for SysV IPC SHM_UNLOCK.
- */
-void scan_mapping_unevictable_pages(struct address_space *mapping)
-{
-	pgoff_t next = 0;
-	pgoff_t end   = (i_size_read(mapping->host) + PAGE_CACHE_SIZE - 1) >>
-			 PAGE_CACHE_SHIFT;
-	struct zone *zone;
-	struct pagevec pvec;
+	for (i = 0; i < nr_pages; i++) {
+		struct page *page = pages[i];
+		struct zone *pagezone;
+
+		pgscanned++;
+		pagezone = page_zone(page);
+		if (pagezone != zone) {
+			if (zone)
+				spin_unlock_irq(&zone->lru_lock);
+			zone = pagezone;
+			spin_lock_irq(&zone->lru_lock);
+		}
 
-	if (mapping->nrpages == 0)
-		return;
+		if (!PageLRU(page) || !PageUnevictable(page))
+			continue;
 
-	pagevec_init(&pvec, 0);
-	while (next < end &&
-		pagevec_lookup(&pvec, mapping, next, PAGEVEC_SIZE)) {
-		int i;
-		int pg_scanned = 0;
-
-		zone = NULL;
-
-		for (i = 0; i < pagevec_count(&pvec); i++) {
-			struct page *page = pvec.pages[i];
-			pgoff_t page_index = page->index;
-			struct zone *pagezone = page_zone(page);
-
-			pg_scanned++;
-			if (page_index > next)
-				next = page_index;
-			next++;
-
-			if (pagezone != zone) {
-				if (zone)
-					spin_unlock_irq(&zone->lru_lock);
-				zone = pagezone;
-				spin_lock_irq(&zone->lru_lock);
-			}
+		if (page_evictable(page, NULL)) {
+			enum lru_list lru = page_lru_base_type(page);
 
-			if (PageLRU(page) && PageUnevictable(page))
-				check_move_unevictable_page(page, zone);
+			VM_BUG_ON(PageActive(page));
+			ClearPageUnevictable(page);
+			__dec_zone_state(zone, NR_UNEVICTABLE);
+			lruvec = mem_cgroup_lru_move_lists(zone, page,
+						LRU_UNEVICTABLE, lru);
+			list_move(&page->lru, &lruvec->lists[lru]);
+			__inc_zone_state(zone, NR_INACTIVE_ANON + lru);
+			pgrescued++;
 		}
-		if (zone)
-			spin_unlock_irq(&zone->lru_lock);
-		pagevec_release(&pvec);
+	}
 
-		count_vm_events(UNEVICTABLE_PGSCANNED, pg_scanned);
-		cond_resched();
+	if (zone) {
+		__count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued);
+		__count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned);
+		spin_unlock_irq(&zone->lru_lock);
 	}
 }
-#else
-void scan_mapping_unevictable_pages(struct address_space *mapping)
-{
-}
 #endif /* CONFIG_SHMEM */
 
 static void warn_scan_unevictable_pages(void)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] SHM_UNLOCK: fix long unpreemptible section
  2012-01-15  0:18 ` Hugh Dickins
@ 2012-01-18 22:37   ` Andrew Morton
  2012-01-18 23:26     ` Hugh Dickins
  0 siblings, 1 reply; 11+ messages in thread
From: Andrew Morton @ 2012-01-18 22:37 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: KOSAKI Motohiro, Minchan Kim, Rik van Riel, Shaohua Li,
	Eric Dumazet, Johannes Weiner, Michel Lespinasse, linux-kernel,
	linux-mm, stable

On Sat, 14 Jan 2012 16:18:43 -0800 (PST)
Hugh Dickins <hughd@google.com> wrote:

> scan_mapping_unevictable_pages() is used to make SysV SHM_LOCKed pages
> evictable again once the shared memory is unlocked.  It does this with
> pagevec_lookup()s across the whole object (which might occupy most of
> memory), and takes 300ms to unlock 7GB here.  A cond_resched() every
> PAGEVEC_SIZE pages would be good.
> 
> However, KOSAKI-san points out that this is called under shmem.c's
> info->lock, and it's also under shm.c's shm_lock(), both spinlocks.
> There is no strong reason for that: we need to take these pages off
> the unevictable list soonish, but those locks are not required for it.
> 
> So move the call to scan_mapping_unevictable_pages() from shmem.c's
> unlock handling up to shm.c's unlock handling.  Remove the recently
> added barrier, not needed now we have spin_unlock() before the scan.
> 
> Use get_file(), with subsequent fput(), to make sure we have a
> reference to mapping throughout scan_mapping_unevictable_pages():
> that's something that was previously guaranteed by the shm_lock().
> 
> Remove shmctl's lru_add_drain_all(): we don't fault in pages at
> SHM_LOCK time, and we lazily discover them to be Unevictable later,
> so it serves no purpose for SHM_LOCK; and serves no purpose for
> SHM_UNLOCK, since pages still on pagevec are not marked Unevictable.
> 
> The original code avoided redundant rescans by checking VM_LOCKED
> flag at its level: now avoid them by checking shp's SHM_LOCKED.
> 
> The original code called scan_mapping_unevictable_pages() on a
> locked area at shm_destroy() time: perhaps we once had accounting
> cross-checks which required that, but not now, so skip the overhead
> and just let inode eviction deal with them.
> 
> Put check_move_unevictable_page() and scan_mapping_unevictable_pages()
> under CONFIG_SHMEM (with stub for the TINY case when ramfs is used),
> more as comment than to save space; comment them used for SHM_UNLOCK.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: stable@vger.kernel.org [back to 2.6.32 but will need respins]

Is -stable backporting really warranted?  AFAICT the only thing we're
fixing here is a long latency glitch during a rare operation on large
machines.  Usually it will be on only one CPU, too.

"[PATCH 2/2] SHM_UNLOCK: fix Unevictable pages stranded after swap"
does loko like -stable material, so omitting 1/1 will probably screw
things up :(


> Resend in the hope that it can get into 3.3.

That we can do ;)

>
> ...
>
> --- mmotm.orig/mm/vmscan.c	2012-01-06 10:04:54.000000000 -0800
> +++ mmotm/mm/vmscan.c	2012-01-06 10:06:13.941943604 -0800
> @@ -3499,6 +3499,7 @@ int page_evictable(struct page *page, st
>  	return 1;
>  }
>  
> +#ifdef CONFIG_SHMEM
>  /**
>   * check_move_unevictable_page - check page for evictability and move to appropriate zone lru list
>   * @page: page to check evictability and move to appropriate lru list
> @@ -3509,6 +3510,8 @@ int page_evictable(struct page *page, st
>   *
>   * Restrictions: zone->lru_lock must be held, page must be on LRU and must
>   * have PageUnevictable set.
> + *
> + * This function is only used for SysV IPC SHM_UNLOCK.
>   */
>  static void check_move_unevictable_page(struct page *page, struct zone *zone)
>  {
> @@ -3545,6 +3548,8 @@ retry:
>   *
>   * Scan all pages in mapping.  Check unevictable pages for
>   * evictability and move them to the appropriate zone lru list.
> + *
> + * This function is only used for SysV IPC SHM_UNLOCK.
>   */
>  void scan_mapping_unevictable_pages(struct address_space *mapping)
>  {
> @@ -3590,9 +3595,14 @@ void scan_mapping_unevictable_pages(stru
>  		pagevec_release(&pvec);
>  
>  		count_vm_events(UNEVICTABLE_PGSCANNED, pg_scanned);
> +		cond_resched();
>  	}
> -
>  }
> +#else
> +void scan_mapping_unevictable_pages(struct address_space *mapping)
> +{
> +}
> +#endif /* CONFIG_SHMEM */

Inlining the CONFIG_SHMEM=n stub would have been mroe efficient.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] SHM_UNLOCK: fix long unpreemptible section
  2012-01-18 22:37   ` Andrew Morton
@ 2012-01-18 23:26     ` Hugh Dickins
  0 siblings, 0 replies; 11+ messages in thread
From: Hugh Dickins @ 2012-01-18 23:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, Minchan Kim, Rik van Riel, Shaohua Li,
	Eric Dumazet, Johannes Weiner, Michel Lespinasse, linux-kernel,
	linux-mm, stable

On Wed, 18 Jan 2012, Andrew Morton wrote:
> On Sat, 14 Jan 2012 16:18:43 -0800 (PST)
> Hugh Dickins <hughd@google.com> wrote:
> 
> > scan_mapping_unevictable_pages() is used to make SysV SHM_LOCKed pages
> > evictable again once the shared memory is unlocked.  It does this with
> > pagevec_lookup()s across the whole object (which might occupy most of
> > memory), and takes 300ms to unlock 7GB here.  A cond_resched() every
> > PAGEVEC_SIZE pages would be good.
>... 
> Is -stable backporting really warranted?  AFAICT the only thing we're
> fixing here is a long latency glitch during a rare operation on large
> machines.  Usually it will be on only one CPU, too.

True: I'm not sure if it amounts to -stable material or not.
I see you've taken out its Cc: stable line: that's fine by me, but...

> "[PATCH 2/2] SHM_UNLOCK: fix Unevictable pages stranded after swap"
> does loko like -stable material, so omitting 1/1 will probably screw
> things up :(

Sort of, but they both(?) needed respinning for -stable anyway.
Even against 3.2, there's some little change in vmscan.c that generates
a reject.  Greg has now closed down 3.1.N (which would have been tiresome
to port to, because it was still supporting a second caller of check_move),
and by your argument above it's not worth porting 1/2 back to 2.6.32.  So
I think 2/2 can just go into 3.2.N, dragging 1/2 along in its slipstream
(if you can have a slipstream in front of you).

I ordered them that way because 1/2 fixes an old, and 2/2 a recent, bug.

> > Resend in the hope that it can get into 3.3.
> 
> That we can do ;)

Thank you!

> > +#else
> > +void scan_mapping_unevictable_pages(struct address_space *mapping)
> > +{
> > +}
> > +#endif /* CONFIG_SHMEM */
> 
> Inlining the CONFIG_SHMEM=n stub would have been mroe efficient.

True, though in 2/2 it morphs into shmem_unlock_mapping() over
in shmem.c, and we seem to have the convention that TINY's !SHMEM
stubs live as non-inline functions there - probably no good reason
for that, just reflects their historical origins in tiny-shmem.c.
A grand saving to make some other time ;)

Hugh

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-01-18 23:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-06 21:10 [PATCH 1/2] SHM_UNLOCK: fix long unpreemptible section Hugh Dickins
2012-01-06 21:12 ` [PATCH 2/2] SHM_UNLOCK: fix Unevictable pages stranded after swap Hugh Dickins
2012-01-09 20:42   ` KOSAKI Motohiro
2012-01-09 22:25     ` Hugh Dickins
2012-01-09 23:09       ` KOSAKI Motohiro
2012-01-09 23:56         ` Hugh Dickins
2012-01-15  0:20   ` Hugh Dickins
2012-01-07  8:28 ` [PATCH 1/2] SHM_UNLOCK: fix long unpreemptible section KOSAKI Motohiro
2012-01-15  0:18 ` Hugh Dickins
2012-01-18 22:37   ` Andrew Morton
2012-01-18 23:26     ` Hugh Dickins

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).