Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Gerald Schaefer <gerald.schaefer@de.ibm.com>
To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Hugh Dickins <hughd@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Martin Schwidefsky <schwidefsky@de.ibm.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>
Subject: [BUG/REGRESSION] THP: broken page count after commit aa88b68c
Date: Thu, 2 Jun 2016 17:21:41 +0200
Message-ID: <20160602172141.75c006a9@thinkpad> (raw)

Christian Borntraeger reported a kernel panic after corrupt page counts,
and it turned out to be a regression introduced with commit aa88b68c
"thp: keep huge zero page pinned until tlb flush", at least on s390.

put_huge_zero_page() was moved over from zap_huge_pmd() to release_pages(),
and it was replaced by tlb_remove_page(). However, release_pages() might
not always be triggered by (the arch-specific) tlb_remove_page().

On s390 we call free_page_and_swap_cache() from tlb_remove_page(), and not
tlb_flush_mmu() -> free_pages_and_swap_cache() like the generic version,
because we don't use the MMU-gather logic. Although both functions have very
similar names, they are doing very unsimilar things, in particular
free_page_xxx is just doing a put_page(), while free_pages_xxx calls
release_pages().

This of course results in very harmful put_page()s on the huge zero page,
on architectures where tlb_remove_page() is implemented in this way. It
seems to affect only s390 and sh, but sh doesn't have THP support, so
the problem (currently) probably only exists on s390.

The following quick hack fixed the issue:

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 0d457e7..c99463a 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -252,7 +252,10 @@ static inline void free_swap_cache(struct page *page)
 void free_page_and_swap_cache(struct page *page)
 {
 	free_swap_cache(page);
-	put_page(page);
+	if (is_huge_zero_page(page))
+		put_huge_zero_page();
+	else
+		put_page(page);
 }
 
 /*

But of course there might be a better solution, and there still are some
questions left:
- Why does free_page_xxx() behave so differently from free_pages_xxx()?
- Would it be OK to implement free_page_xxx() by calling free_pages_xxx()
  with nr = 1, similar to free_page() vs. free_pages()?
- Would it be OK to replace the put_page() in free_page_xxx() with a call
  to release_pages() with nr = 1?
- Would it be better to fix this in the arch-specific tlb_remove_page(),
  by calling free_pages_xxx() with nr = 1 instead of free_page_xxx()?

Regards,
Gerald

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply index

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-02 15:21 Gerald Schaefer [this message]
2016-06-02 15:51 ` Kirill A. Shutemov
2016-06-02 18:40   ` Andrew Morton
2016-06-02 18:56     ` Christian Borntraeger
2016-06-02 18:56     ` Christian Borntraeger
     [not found]     ` <201606021856.u52ImC6o037023@mx0a-001b2d01.pphosted.com>
2016-06-02 19:03       ` Andrew Morton
2016-06-02 19:10         ` Christian Borntraeger
2016-06-02 19:10         ` Christian Borntraeger
2016-06-02 19:47   ` Hugh Dickins
2016-06-03 10:36     ` Kirill A. Shutemov

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160602172141.75c006a9@thinkpad \
    --to=gerald.schaefer@de.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=dave.hansen@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=heiko.carstens@de.ibm.com \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=schwidefsky@de.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git