Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Matthew Wilcox <willy@infradead.org>,
	linux-mm@kvack.org
Subject: [PATCH 08/10] mm/khugepaged: collapse_shmem() without freezing new_page
Date: Mon, 26 Nov 2018 15:29:17 -0800 (PST)
Message-ID: <alpine.LSU.2.11.1811261527570.2275@eggly.anvils> (raw)
In-Reply-To: <alpine.LSU.2.11.1811261444420.2275@eggly.anvils>

khugepaged's collapse_shmem() does almost all of its work, to assemble
the huge new_page from 512 scattered old pages, with the new_page's
refcount frozen to 0 (and refcounts of all old pages so far also frozen
to 0).  Including shmem_getpage() to read in any which were out on swap,
memory reclaim if necessary to allocate their intermediate pages, and
copying over all the data from old to new.

Imagine the frozen refcount as a spinlock held, but without any lock
debugging to highlight the abuse: it's not good, and under serious load
heads into lockups - speculative getters of the page are not expecting
to spin while khugepaged is rescheduled.

One can get a little further under load by hacking around elsewhere;
but fortunately, freezing the new_page turns out to have been entirely
unnecessary, with no hacks needed elsewhere.

The huge new_page lock is already held throughout, and guards all its
subpages as they are brought one by one into the page cache tree; and
anything reading the data in that page, without the lock, before it has
been marked PageUptodate, would already be in the wrong.  So simply
eliminate the freezing of the new_page.

Each of the old pages remains frozen with refcount 0 after it has been
replaced by a new_page subpage in the page cache tree, until they are all
unfrozen on success or failure: just as before.  They could be unfrozen
sooner, but cause no problem once no longer visible to find_get_entry(),
filemap_map_pages() and other speculative lookups.

Fixes: f3f0e1d2150b2 ("khugepaged: add support of collapse for tmpfs/shmem pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: stable@vger.kernel.org # 4.8+
---
 mm/khugepaged.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 9d4e9ff1af95..55930cbed3fd 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1287,7 +1287,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
  * collapse_shmem - collapse small tmpfs/shmem pages into huge one.
  *
  * Basic scheme is simple, details are more complex:
- *  - allocate and freeze a new huge page;
+ *  - allocate and lock a new huge page;
  *  - scan page cache replacing old pages with the new one
  *    + swap in pages if necessary;
  *    + fill in gaps;
@@ -1295,11 +1295,11 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
  *  - if replacing succeeds:
  *    + copy data over;
  *    + free old pages;
- *    + unfreeze huge page;
+ *    + unlock huge page;
  *  - if replacing failed;
  *    + put all pages back and unfreeze them;
  *    + restore gaps in the page cache;
- *    + free huge page;
+ *    + unlock and free huge page;
  */
 static void collapse_shmem(struct mm_struct *mm,
 		struct address_space *mapping, pgoff_t start,
@@ -1333,13 +1333,11 @@ static void collapse_shmem(struct mm_struct *mm,
 	__SetPageSwapBacked(new_page);
 	new_page->index = start;
 	new_page->mapping = mapping;
-	BUG_ON(!page_ref_freeze(new_page, 1));
 
 	/*
-	 * At this point the new_page is 'frozen' (page_count() is zero),
-	 * locked and not up-to-date. It's safe to insert it into the page
-	 * cache, because nobody would be able to map it or use it in other
-	 * way until we unfreeze it.
+	 * At this point the new_page is locked and not up-to-date.
+	 * It's safe to insert it into the page cache, because nobody would
+	 * be able to map it or use it in another way until we unlock it.
 	 */
 
 	/* This will be less messy when we use multi-index entries */
@@ -1491,9 +1489,8 @@ static void collapse_shmem(struct mm_struct *mm,
 			index++;
 		}
 
-		/* Everything is ready, let's unfreeze the new_page */
 		SetPageUptodate(new_page);
-		page_ref_unfreeze(new_page, HPAGE_PMD_NR);
+		page_ref_add(new_page, HPAGE_PMD_NR - 1);
 		set_page_dirty(new_page);
 		mem_cgroup_commit_charge(new_page, memcg, false, true);
 		lru_cache_add_anon(new_page);
@@ -1541,8 +1538,6 @@ static void collapse_shmem(struct mm_struct *mm,
 		VM_BUG_ON(nr_none);
 		xas_unlock_irq(&xas);
 
-		/* Unfreeze new_page, caller would take care about freeing it */
-		page_ref_unfreeze(new_page, 1);
 		mem_cgroup_cancel_charge(new_page, memcg, true);
 		new_page->mapping = NULL;
 	}
-- 
2.20.0.rc0.387.gc7a69e6b6c-goog

  parent reply index

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-26 23:13 [PATCH 00/10] huge_memory,khugepaged tmpfs split/collapse fixes Hugh Dickins
2018-11-26 23:19 ` [PATCH 02/10] mm/huge_memory: splitting set mapping+index before unfreeze Hugh Dickins
2018-11-27  7:26   ` Kirill A. Shutemov
2018-11-26 23:21 ` [PATCH 03/10] mm/huge_memory: fix lockdep complaint on 32-bit i_size_read() Hugh Dickins
2018-11-27  7:27   ` Kirill A. Shutemov
2018-11-26 23:23 ` [PATCH 04/10] mm/khugepaged: collapse_shmem() stop if punched or truncated Hugh Dickins
2018-11-27  7:31   ` Kirill A. Shutemov
2018-11-27 13:26   ` Matthew Wilcox
2018-11-26 23:25 ` [PATCH 05/10] mm/khugepaged: fix crashes due to misaccounted holes Hugh Dickins
2018-11-27  7:35   ` Kirill A. Shutemov
2018-11-26 23:26 ` [PATCH 06/10] mm/khugepaged: collapse_shmem() remember to clear holes Hugh Dickins
2018-11-27  7:38   ` Kirill A. Shutemov
2018-11-26 23:27 ` [PATCH 07/10] mm/khugepaged: minor reorderings in collapse_shmem() Hugh Dickins
2018-11-27  7:59   ` Kirill A. Shutemov
2018-11-27 20:23     ` Hugh Dickins
2018-11-28 10:59       ` Kirill A. Shutemov
2018-11-28 19:40         ` Hugh Dickins
2018-11-26 23:29 ` Hugh Dickins [this message]
2018-11-27  8:01   ` [PATCH 08/10] mm/khugepaged: collapse_shmem() without freezing new_page Kirill A. Shutemov
2018-11-26 23:31 ` [PATCH 09/10] mm/khugepaged: collapse_shmem() do not crash on Compound Hugh Dickins
2018-11-26 23:33 ` [PATCH 10/10] mm/khugepaged: fix the xas_create_range() error path Hugh Dickins
2018-11-27  8:02   ` Kirill A. Shutemov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LSU.2.11.1811261527570.2275@eggly.anvils \
    --to=hughd@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git