From: Vlastimil Babka <vbabka@suse.cz> To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Andrew Morton <akpm@linux-foundation.org>, Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com>, Hugh Dickins <hughd@google.com>, Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH, RFC 00/10] THP refcounting redesign Date: Tue, 10 Jun 2014 10:10:56 +0200 [thread overview] Message-ID: <5396BD90.4060104@suse.cz> (raw) In-Reply-To: <1402329861-7037-1-git-send-email-kirill.shutemov@linux.intel.com> On 06/09/2014 06:04 PM, Kirill A. Shutemov wrote: > Hello everybody, > > We've discussed few times that is would be nice to allow huge pages to be > mapped with 4k pages too. Here's my first attempt to actually implement > this. It's early prototype and not stabilized yet, but I want to share it > to discuss any potential show stoppers early. > > The main reason why we can't map THP with 4k is how refcounting on THP > designed. It built around two requirements: > > - split of huge page should never fail; > - we can't change interface of get_user_page(); > > To be able to split huge page at any point we have to track which tail > page was pinned. It leads to tricky and expensive get_page() on tail pages > and also occupy tail_page->_mapcount. > > Most split_huge_page*() users want PMD to be split into table of PTEs and > don't care whether compound page is going to be split or not. > > The plan is: > > - allow split_huge_page() to fail if the page is pinned. It's trivial to > split non-pinned page and it doesn't require tail page refcounting, so > tail_page->_mapcount is free to be reused. > > - introduce new routine -- split_huge_pmd() -- to split PMD into table of > PTEs. It splits only one PMD, not touching other PMDs the page is > mapped with or underlying compound page. Unlike new split_huge_page(), > split_huge_pmd() never fails. > > Fortunately, we have only few places where split_huge_page() is needed: > swap out, memory failure, migration, KSM. And all of them can handle > split_huge_page() fail. > > In new scheme we use tail_page->_mapcount is used to account how many time > the tail page is mapped. head_page->_mapcount is used for both PMD mapping > of whole huge page and PTE mapping of the firt 4k page of the compound > page. It seems work fine, except the fact that we don't have a cheap way > to check whether the page mapped with PMDs or not. > > Introducing split_huge_pmd() effectively allows THP to be mapped with 4k. > It can break some kernel expectations. I.e. VMA now can start and end in > middle of compound page. IIUC, it will break compactation and probably > something else (any hints?). I don't think compaction cares at all about VMA's. Unless the underlying page migration does. What will break is munlock due to VM_BUG_ON(PageTail(page)) in the PageTransHuge() check. > Also munmap() on part of huge page will not split and free unmapped part > immediately. We need to be careful here to keep memory footprint under > control. So who will take care of it, if it's not done immediately? > As side effect we don't need to mark PMD splitting since we have > split_huge_pmd(). get_page()/put_page() on tail of THP is cheaper (and > cleaner) now. But per patch 2, PageAnon() is more expensive. Also there are no side effects to this change? > I will continue with stabilizing this. The patchset also available on > git[1]. > > Any commemnt? > > [1] git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git thp/refcounting/v1 >
WARNING: multiple messages have this Message-ID (diff)
From: Vlastimil Babka <vbabka@suse.cz> To: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Andrew Morton <akpm@linux-foundation.org>, Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com>, Hugh Dickins <hughd@google.com>, Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH, RFC 00/10] THP refcounting redesign Date: Tue, 10 Jun 2014 10:10:56 +0200 [thread overview] Message-ID: <5396BD90.4060104@suse.cz> (raw) In-Reply-To: <1402329861-7037-1-git-send-email-kirill.shutemov@linux.intel.com> On 06/09/2014 06:04 PM, Kirill A. Shutemov wrote: > Hello everybody, > > We've discussed few times that is would be nice to allow huge pages to be > mapped with 4k pages too. Here's my first attempt to actually implement > this. It's early prototype and not stabilized yet, but I want to share it > to discuss any potential show stoppers early. > > The main reason why we can't map THP with 4k is how refcounting on THP > designed. It built around two requirements: > > - split of huge page should never fail; > - we can't change interface of get_user_page(); > > To be able to split huge page at any point we have to track which tail > page was pinned. It leads to tricky and expensive get_page() on tail pages > and also occupy tail_page->_mapcount. > > Most split_huge_page*() users want PMD to be split into table of PTEs and > don't care whether compound page is going to be split or not. > > The plan is: > > - allow split_huge_page() to fail if the page is pinned. It's trivial to > split non-pinned page and it doesn't require tail page refcounting, so > tail_page->_mapcount is free to be reused. > > - introduce new routine -- split_huge_pmd() -- to split PMD into table of > PTEs. It splits only one PMD, not touching other PMDs the page is > mapped with or underlying compound page. Unlike new split_huge_page(), > split_huge_pmd() never fails. > > Fortunately, we have only few places where split_huge_page() is needed: > swap out, memory failure, migration, KSM. And all of them can handle > split_huge_page() fail. > > In new scheme we use tail_page->_mapcount is used to account how many time > the tail page is mapped. head_page->_mapcount is used for both PMD mapping > of whole huge page and PTE mapping of the firt 4k page of the compound > page. It seems work fine, except the fact that we don't have a cheap way > to check whether the page mapped with PMDs or not. > > Introducing split_huge_pmd() effectively allows THP to be mapped with 4k. > It can break some kernel expectations. I.e. VMA now can start and end in > middle of compound page. IIUC, it will break compactation and probably > something else (any hints?). I don't think compaction cares at all about VMA's. Unless the underlying page migration does. What will break is munlock due to VM_BUG_ON(PageTail(page)) in the PageTransHuge() check. > Also munmap() on part of huge page will not split and free unmapped part > immediately. We need to be careful here to keep memory footprint under > control. So who will take care of it, if it's not done immediately? > As side effect we don't need to mark PMD splitting since we have > split_huge_pmd(). get_page()/put_page() on tail of THP is cheaper (and > cleaner) now. But per patch 2, PageAnon() is more expensive. Also there are no side effects to this change? > I will continue with stabilizing this. The patchset also available on > git[1]. > > Any commemnt? > > [1] git://git.kernel.org/pub/scm/linux/kernel/git/kas/linux.git thp/refcounting/v1 > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2014-06-10 8:11 UTC|newest] Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-06-09 16:04 [PATCH, RFC 00/10] THP refcounting redesign Kirill A. Shutemov 2014-06-09 16:04 ` Kirill A. Shutemov 2014-06-09 16:04 ` [PATCH, RFC 01/10] mm, thp: drop FOLL_SPLIT Kirill A. Shutemov 2014-06-09 16:04 ` Kirill A. Shutemov 2014-06-09 16:04 ` [PATCH, RFC 02/10] mm: change PageAnon() to work on tail pages Kirill A. Shutemov 2014-06-09 16:04 ` Kirill A. Shutemov 2014-06-09 16:04 ` [PATCH, RFC 03/10] thp: rename split_huge_page_pmd() to split_huge_pmd() Kirill A. Shutemov 2014-06-09 16:04 ` Kirill A. Shutemov 2014-06-09 16:04 ` [PATCH, RFC 04/10] thp: PMD splitting without splitting compound page Kirill A. Shutemov 2014-06-09 16:04 ` Kirill A. Shutemov 2014-06-09 16:04 ` [PATCH, RFC 05/10] mm, vmstats: new THP splitting event Kirill A. Shutemov 2014-06-09 16:04 ` Kirill A. Shutemov 2014-06-09 16:04 ` [PATCH, RFC 06/10] thp: implement new split_huge_page() Kirill A. Shutemov 2014-06-09 16:04 ` Kirill A. Shutemov 2014-06-09 16:04 ` [PATCH, RFC 07/10] mm, thp: remove infrastructure for handling splitting PMDs Kirill A. Shutemov 2014-06-09 16:04 ` Kirill A. Shutemov 2014-06-09 16:04 ` [PATCH, RFC 08/10] x86, thp: remove " Kirill A. Shutemov 2014-06-09 16:04 ` Kirill A. Shutemov 2014-06-09 16:04 ` [PATCH, RFC 09/10] futex, thp: remove special case for THP in get_futex_key Kirill A. Shutemov 2014-06-09 16:04 ` Kirill A. Shutemov 2014-06-09 16:04 ` [PATCH, RFC 10/10] thp: update documentation Kirill A. Shutemov 2014-06-09 16:04 ` Kirill A. Shutemov 2014-06-10 8:10 ` Vlastimil Babka [this message] 2014-06-10 8:10 ` [PATCH, RFC 00/10] THP refcounting redesign Vlastimil Babka 2014-06-10 13:52 ` Kirill A. Shutemov 2014-06-10 13:52 ` Kirill A. Shutemov 2014-06-10 14:29 ` Andrea Arcangeli 2014-06-10 14:29 ` Andrea Arcangeli 2014-06-10 15:24 ` Kirill A. Shutemov 2014-06-10 15:24 ` Kirill A. Shutemov 2014-06-10 20:25 ` Christoph Lameter 2014-06-10 20:25 ` Christoph Lameter 2014-06-10 20:46 ` Kirill A. Shutemov 2014-06-10 20:46 ` Kirill A. Shutemov 2014-06-10 21:21 ` Christoph Lameter 2014-06-10 21:21 ` Christoph Lameter 2014-06-10 22:04 ` Andrea Arcangeli 2014-06-10 22:04 ` Andrea Arcangeli 2014-06-10 22:14 ` Kirill A. Shutemov 2014-06-10 22:14 ` Kirill A. Shutemov 2014-06-10 22:37 ` Andrea Arcangeli 2014-06-10 22:37 ` Andrea Arcangeli 2014-06-10 21:58 ` Andrea Arcangeli 2014-06-10 21:58 ` Andrea Arcangeli
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=5396BD90.4060104@suse.cz \ --to=vbabka@suse.cz \ --cc=aarcange@redhat.com \ --cc=akpm@linux-foundation.org \ --cc=dave.hansen@intel.com \ --cc=hughd@google.com \ --cc=kirill.shutemov@linux.intel.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@suse.de \ --cc=riel@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.