From: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
David Rientjes <rientjes@google.com>,
Andi Kleen <andi@firstfloor.org>,
Wu Fengguang <fengguang.wu@intel.com>,
KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/6] thp: optimize away unnecessary page table locking
Date: Mon, 16 Jan 2012 12:19:19 -0500 [thread overview]
Message-ID: <4F145C17.1060009@ah.jp.nec.com> (raw)
On Sat, Jan 14, 2012 at 06:19:56PM +0100, Andrea Arcangeli wrote:
> On Thu, Jan 12, 2012 at 02:34:54PM -0500, Naoya Horiguchi wrote:
...
> > index 36b3d98..b7811df 100644
> > --- 3.2-rc5.orig/mm/huge_memory.c
> > +++ 3.2-rc5/mm/huge_memory.c
> > @@ -1001,29 +1001,21 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > {
> > int ret = 0;
> >
> > - spin_lock(&tlb->mm->page_table_lock);
> > - if (likely(pmd_trans_huge(*pmd))) {
> > - if (unlikely(pmd_trans_splitting(*pmd))) {
> > - spin_unlock(&tlb->mm->page_table_lock);
> > - wait_split_huge_page(vma->anon_vma,
> > - pmd);
> > - } else {
> > - struct page *page;
> > - pgtable_t pgtable;
> > - pgtable = get_pmd_huge_pte(tlb->mm);
> > - page = pmd_page(*pmd);
> > - pmd_clear(pmd);
> > - page_remove_rmap(page);
> > - VM_BUG_ON(page_mapcount(page) < 0);
> > - add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
> > - VM_BUG_ON(!PageHead(page));
> > - spin_unlock(&tlb->mm->page_table_lock);
> > - tlb_remove_page(tlb, page);
> > - pte_free(tlb->mm, pgtable);
> > - ret = 1;
> > - }
> > - } else
> > + if (likely(pmd_trans_huge_stable(pmd, vma))) {
> > + struct page *page;
> > + pgtable_t pgtable;
> > + pgtable = get_pmd_huge_pte(tlb->mm);
> > + page = pmd_page(*pmd);
> > + pmd_clear(pmd);
> > + page_remove_rmap(page);
> > + VM_BUG_ON(page_mapcount(page) < 0);
> > + add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
> > + VM_BUG_ON(!PageHead(page));
> > spin_unlock(&tlb->mm->page_table_lock);
> > + tlb_remove_page(tlb, page);
> > + pte_free(tlb->mm, pgtable);
> > + ret = 1;
> > + }
>
> This has been micro slowed down. I think you should use
> pmd_trans_huge_stable only in places where pmd_trans_huge cannot be
> set. I would back out the above as it's a micro-regression.
I guess this micro-regression happens because I failed to correctly replace
likely()/unlikey() applied to pmd_trans_huge() and pmd_trans_splitting().
I should have keep them in pmd_trans_huge_stable() instead of applying
likely() on pmd_trans_huge_stable().
> Maybe what you could do if you want to clean it up further is to make
> a static inline in huge_mm of pmd_trans_huge_stable that only checks
> pmd_trans_huge and then calls __pmd_trans_huge_stable, and use
> __pmd_trans_huge_stable above.
OK, I agree.
> > @@ -1034,21 +1026,14 @@ int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> > {
> > int ret = 0;
> >
> > - spin_lock(&vma->vm_mm->page_table_lock);
> > - if (likely(pmd_trans_huge(*pmd))) {
> > - ret = !pmd_trans_splitting(*pmd);
> > - spin_unlock(&vma->vm_mm->page_table_lock);
> > - if (unlikely(!ret))
> > - wait_split_huge_page(vma->anon_vma, pmd);
> > - else {
> > - /*
> > - * All logical pages in the range are present
> > - * if backed by a huge page.
> > - */
> > - memset(vec, 1, (end - addr) >> PAGE_SHIFT);
> > - }
> > - } else
> > + if (likely(pmd_trans_huge_stable(pmd, vma))) {
> > + /*
> > + * All logical pages in the range are present
> > + * if backed by a huge page.
> > + */
> > spin_unlock(&vma->vm_mm->page_table_lock);
> > + memset(vec, 1, (end - addr) >> PAGE_SHIFT);
> > + }
> >
> > return ret;
> > }
>
> same slowdown here. Here even __pmd_trans_huge_stable wouldn't be
> enough to optimize it as it'd still generate more .text with two
> spin_unlock (one in __pmd_trans_huge_stable and one retained above)
> instead of just 1 in the original version.
Yes, additional spin_unlock() raises the binary size of mincore_huge_pmd().
But replacing with __pmd_trans_huge_stable() which unifies duplicate codes
reduces the binary size too. I think the amount of size reduction is larger
than that of size growth of additional spin_unlock().
> I'd avoid the cleanup for
> the above ultra optimized version.
Anyway if you don't like this replacement, I'll leave it as it is.
> > @@ -1078,21 +1063,12 @@ int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
> > goto out;
> > }
> >
> > - spin_lock(&mm->page_table_lock);
> > - if (likely(pmd_trans_huge(*old_pmd))) {
> > - if (pmd_trans_splitting(*old_pmd)) {
> > - spin_unlock(&mm->page_table_lock);
> > - wait_split_huge_page(vma->anon_vma, old_pmd);
> > - ret = -1;
> > - } else {
> > - pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
> > - VM_BUG_ON(!pmd_none(*new_pmd));
> > - set_pmd_at(mm, new_addr, new_pmd, pmd);
> > - spin_unlock(&mm->page_table_lock);
> > - ret = 1;
> > - }
> > - } else {
> > + if (likely(pmd_trans_huge_stable(old_pmd, vma))) {
> > + pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
> > + VM_BUG_ON(!pmd_none(*new_pmd));
> > + set_pmd_at(mm, new_addr, new_pmd, pmd);
> > spin_unlock(&mm->page_table_lock);
> > + ret = 1;
> > }
>
> Same slowdown here, needs __pmd_trans_huge_stable as usual, but you
> are now forcing mremap to call split_huge_page even if it's not needed
> (i.e. after wait_split_huge_page).
I didn't think the behavior changes but this can be performance regression
of additional if-check. As you commented below, we had better go to change
the return value of wait_split_huge_page path in __pmd_trans_huge_stable().
> I'd like no-regression cleanups so
> I'd reverse the above and avoid changing already ultra-optimized code
> paths.
I agree.
> > out:
> > return ret;
> > @@ -1104,27 +1080,48 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> > struct mm_struct *mm = vma->vm_mm;
> > int ret = 0;
> >
> > - spin_lock(&mm->page_table_lock);
> > - if (likely(pmd_trans_huge(*pmd))) {
> > - if (unlikely(pmd_trans_splitting(*pmd))) {
> > - spin_unlock(&mm->page_table_lock);
> > - wait_split_huge_page(vma->anon_vma, pmd);
> > - } else {
> > - pmd_t entry;
> > + if (likely(pmd_trans_huge_stable(pmd, vma))) {
> > + pmd_t entry;
> >
> > - entry = pmdp_get_and_clear(mm, addr, pmd);
> > - entry = pmd_modify(entry, newprot);
> > - set_pmd_at(mm, addr, pmd, entry);
> > - spin_unlock(&vma->vm_mm->page_table_lock);
> > - flush_tlb_range(vma, addr, addr + HPAGE_PMD_SIZE);
> > - ret = 1;
> > - }
> > - } else
> > + entry = pmdp_get_and_clear(mm, addr, pmd);
> > + entry = pmd_modify(entry, newprot);
> > + set_pmd_at(mm, addr, pmd, entry);
> > spin_unlock(&vma->vm_mm->page_table_lock);
> > + flush_tlb_range(vma, addr, addr + HPAGE_PMD_SIZE);
> > + ret = 1;
> > + }
> >
> > return ret;
>
> Needs __pmd_trans_huge_stable. Ok to cleanup with that (no regression
> in this case with the __ version).
>
> > diff --git 3.2-rc5.orig/mm/mremap.c 3.2-rc5/mm/mremap.c
> > index d6959cb..d534668 100644
> > --- 3.2-rc5.orig/mm/mremap.c
> > +++ 3.2-rc5/mm/mremap.c
> > @@ -155,9 +155,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> > if (err > 0) {
> > need_flush = true;
> > continue;
> > - } else if (!err) {
> > - split_huge_page_pmd(vma->vm_mm, old_pmd);
> > }
> > + split_huge_page_pmd(vma->vm_mm, old_pmd);
> > VM_BUG_ON(pmd_trans_huge(*old_pmd));
> > }
> > if (pmd_none(*new_pmd) && __pte_alloc(new_vma->vm_mm, new_vma,
>
> regression. If you really want to optimize this and cleanup you could
> make __pmd_trans_huge_stable return -1 if wait_split_huge_page path
> was taken, then you just change the other checks to == 1 and behave
> the same if it's 0 or -1, except in move_huge_pmd where you'll return
> -1 if __pmd_trans_huge_stable returned -1 to retain the above
> optimizaton.
All right.
> Maybe it's not much of an optimization anyway because we trade one
> branch for another, and both should be in l1 cache (though the retval
> is even guaranteed in a register not only in l1 cache so it's even
> better to check that for a branch), but to me is more about keeping
> the code strict which kinds of self-documents it, because conceptually
> calling split_huge_page_pmd if wait_split_huge_page was called is
> superflous (even if at runtime it won't make any difference).
OK, I cancel this change.
> Thanks for cleaning up this, especially where pmd_trans_huge_stable is
> perfect fit, this is a nice cleanup.
Thank you for your valuable feedbacks!
Naoya
next reply other threads:[~2012-01-16 17:23 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-01-16 17:19 Naoya Horiguchi [this message]
-- strict thread matches above, loose matches on Subject: below --
2012-02-08 15:51 [PATCH 0/6 v5] pagemap handles transparent hugepage Naoya Horiguchi
2012-02-08 15:51 ` [PATCH 2/6] thp: optimize away unnecessary page table locking Naoya Horiguchi
2012-02-09 2:19 ` KAMEZAWA Hiroyuki
2012-02-19 21:21 ` Hugh Dickins
2012-02-20 7:28 ` Naoya Horiguchi
2012-02-20 11:38 ` Hugh Dickins
2012-02-20 11:54 ` Jiri Slaby
2012-01-27 23:02 [PATCH 0/6 v4] pagemap handles transparent hugepage Naoya Horiguchi
2012-01-27 23:02 ` [PATCH 2/6] thp: optimize away unnecessary page table locking Naoya Horiguchi
2012-01-28 11:23 ` Hillf Danton
2012-01-28 22:33 ` Naoya Horiguchi
2012-01-30 6:22 ` KAMEZAWA Hiroyuki
2012-02-02 5:27 ` Naoya Horiguchi
2012-02-02 8:32 ` KAMEZAWA Hiroyuki
2012-01-12 19:34 [PATCH 0/6 v3] pagemap handles transparent hugepage Naoya Horiguchi
2012-01-12 19:34 ` [PATCH 2/6] thp: optimize away unnecessary page table locking Naoya Horiguchi
2012-01-13 12:04 ` Hillf Danton
2012-01-13 15:14 ` Naoya Horiguchi
2012-01-14 3:24 ` Hillf Danton
2012-01-14 17:19 ` Andrea Arcangeli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4F145C17.1060009@ah.jp.nec.com \
--to=n-horiguchi@ah.jp.nec.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andi@firstfloor.org \
--cc=fengguang.wu@intel.com \
--cc=kamezawa.hiroyu@jp.fujitsu.com \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=rientjes@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).