linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6 v4] pagemap handles transparent hugepage
@ 2012-01-27 23:02 Naoya Horiguchi
  2012-01-27 23:02 ` [PATCH 1/6] pagemap: avoid splitting thp when reading /proc/pid/pagemap Naoya Horiguchi
                   ` (5 more replies)
  0 siblings, 6 replies; 30+ messages in thread
From: Naoya Horiguchi @ 2012-01-27 23:02 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Rientjes, Andi Kleen, Wu Fengguang,
	Andrea Arcangeli, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
	linux-kernel, Naoya Horiguchi

Hi,

I rebased the patchset onto 3.3-rc1, and made some fixes on thp
optimization patch based on the feedbacks from Andrea.

Naoya Horiguchi (6):
  pagemap: avoid splitting thp when reading
  thp: optimize away unnecessary page table locking
  pagemap: export KPF_THP
  pagemap: document KPF_THP and make page-types aware of
  introduce thp_ptep_get()
  pagemap: introduce data structure for pagemap entry

 Documentation/vm/page-types.c     |    2 +
 Documentation/vm/pagemap.txt      |    4 +
 arch/x86/include/asm/pgtable.h    |    5 ++
 fs/proc/page.c                    |    2 +
 fs/proc/task_mmu.c                |  135 +++++++++++++++++++++----------------
 include/asm-generic/pgtable.h     |    4 +
 include/linux/huge_mm.h           |   17 +++++
 include/linux/kernel-page-flags.h |    1 +
 mm/huge_memory.c                  |  120 +++++++++++++++-----------------
 9 files changed, 169 insertions(+), 121 deletions(-)

Thanks,
Naoya

^ permalink raw reply	[flat|nested] 30+ messages in thread
* [PATCH 0/6 v5] pagemap handles transparent hugepage
@ 2012-02-08 15:51 Naoya Horiguchi
  2012-02-08 15:51 ` [PATCH 2/6] thp: optimize away unnecessary page table locking Naoya Horiguchi
  0 siblings, 1 reply; 30+ messages in thread
From: Naoya Horiguchi @ 2012-02-08 15:51 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Rientjes, Andi Kleen, Wu Fengguang,
	Andrea Arcangeli, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
	linux-kernel, Naoya Horiguchi

Hi,

In this version, I applied the feedbacks about the return value of
__pmd_trans_huge_lock() and renaming newly added components.
I hope these patches go into mainline.

Naoya Horiguchi (6):
  pagemap: avoid splitting thp when reading /proc/pid/pagemap
  thp: optimize away unnecessary page table locking
  pagemap: export KPF_THP
  pagemap: document KPF_THP and make page-types aware of it
  introduce pmd_to_pte_t()
  pagemap: introduce data structure for pagemap entry

 Documentation/vm/page-types.c     |    2 +
 Documentation/vm/pagemap.txt      |    4 +
 arch/x86/include/asm/pgtable.h    |    5 ++
 fs/proc/page.c                    |    2 +
 fs/proc/task_mmu.c                |  138 ++++++++++++++++++++++---------------
 include/asm-generic/pgtable.h     |    4 +
 include/linux/huge_mm.h           |   17 +++++
 include/linux/kernel-page-flags.h |    1 +
 mm/huge_memory.c                  |  122 +++++++++++++++-----------------
 mm/mremap.c                       |    2 -
 10 files changed, 174 insertions(+), 123 deletions(-)

Thanks,
Naoya

^ permalink raw reply	[flat|nested] 30+ messages in thread
* Re: [PATCH 2/6] thp: optimize away unnecessary page table locking
@ 2012-01-16 17:19 Naoya Horiguchi
  0 siblings, 0 replies; 30+ messages in thread
From: Naoya Horiguchi @ 2012-01-16 17:19 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Naoya Horiguchi, linux-mm, Andrew Morton, David Rientjes,
	Andi Kleen, Wu Fengguang, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
	linux-kernel

On Sat, Jan 14, 2012 at 06:19:56PM +0100, Andrea Arcangeli wrote:
> On Thu, Jan 12, 2012 at 02:34:54PM -0500, Naoya Horiguchi wrote:
...
> > index 36b3d98..b7811df 100644
> > --- 3.2-rc5.orig/mm/huge_memory.c
> > +++ 3.2-rc5/mm/huge_memory.c
> > @@ -1001,29 +1001,21 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> >  {
> >  	int ret = 0;
> >  
> > -	spin_lock(&tlb->mm->page_table_lock);
> > -	if (likely(pmd_trans_huge(*pmd))) {
> > -		if (unlikely(pmd_trans_splitting(*pmd))) {
> > -			spin_unlock(&tlb->mm->page_table_lock);
> > -			wait_split_huge_page(vma->anon_vma,
> > -					     pmd);
> > -		} else {
> > -			struct page *page;
> > -			pgtable_t pgtable;
> > -			pgtable = get_pmd_huge_pte(tlb->mm);
> > -			page = pmd_page(*pmd);
> > -			pmd_clear(pmd);
> > -			page_remove_rmap(page);
> > -			VM_BUG_ON(page_mapcount(page) < 0);
> > -			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
> > -			VM_BUG_ON(!PageHead(page));
> > -			spin_unlock(&tlb->mm->page_table_lock);
> > -			tlb_remove_page(tlb, page);
> > -			pte_free(tlb->mm, pgtable);
> > -			ret = 1;
> > -		}
> > -	} else
> > +	if (likely(pmd_trans_huge_stable(pmd, vma))) {
> > +		struct page *page;
> > +		pgtable_t pgtable;
> > +		pgtable = get_pmd_huge_pte(tlb->mm);
> > +		page = pmd_page(*pmd);
> > +		pmd_clear(pmd);
> > +		page_remove_rmap(page);
> > +		VM_BUG_ON(page_mapcount(page) < 0);
> > +		add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
> > +		VM_BUG_ON(!PageHead(page));
> >  		spin_unlock(&tlb->mm->page_table_lock);
> > +		tlb_remove_page(tlb, page);
> > +		pte_free(tlb->mm, pgtable);
> > +		ret = 1;
> > +	}
> 
> This has been micro slowed down. I think you should use
> pmd_trans_huge_stable only in places where pmd_trans_huge cannot be
> set. I would back out the above as it's a micro-regression.

I guess this micro-regression happens because I failed to correctly replace
likely()/unlikey() applied to pmd_trans_huge() and pmd_trans_splitting().
I should have keep them in pmd_trans_huge_stable() instead of applying
likely() on pmd_trans_huge_stable().

> Maybe what you could do if you want to clean it up further is to make
> a static inline in huge_mm of pmd_trans_huge_stable that only checks
> pmd_trans_huge and then calls __pmd_trans_huge_stable, and use
> __pmd_trans_huge_stable above.

OK, I agree.

> > @@ -1034,21 +1026,14 @@ int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> >  {
> >  	int ret = 0;
> >  
> > -	spin_lock(&vma->vm_mm->page_table_lock);
> > -	if (likely(pmd_trans_huge(*pmd))) {
> > -		ret = !pmd_trans_splitting(*pmd);
> > -		spin_unlock(&vma->vm_mm->page_table_lock);
> > -		if (unlikely(!ret))
> > -			wait_split_huge_page(vma->anon_vma, pmd);
> > -		else {
> > -			/*
> > -			 * All logical pages in the range are present
> > -			 * if backed by a huge page.
> > -			 */
> > -			memset(vec, 1, (end - addr) >> PAGE_SHIFT);
> > -		}
> > -	} else
> > +	if (likely(pmd_trans_huge_stable(pmd, vma))) {
> > +		/*
> > +		 * All logical pages in the range are present
> > +		 * if backed by a huge page.
> > +		 */
> >  		spin_unlock(&vma->vm_mm->page_table_lock);
> > +		memset(vec, 1, (end - addr) >> PAGE_SHIFT);
> > +	}
> >  
> >  	return ret;
> >  }
> 
> same slowdown here. Here even __pmd_trans_huge_stable wouldn't be
> enough to optimize it as it'd still generate more .text with two
> spin_unlock (one in __pmd_trans_huge_stable and one retained above)
> instead of just 1 in the original version.

Yes, additional spin_unlock() raises the binary size of mincore_huge_pmd().
But replacing with __pmd_trans_huge_stable() which unifies duplicate codes
reduces the binary size too. I think the amount of size reduction is larger
than that of size growth of additional spin_unlock().

> I'd avoid the cleanup for
> the above ultra optimized version.

Anyway if you don't like this replacement, I'll leave it as it is.

> > @@ -1078,21 +1063,12 @@ int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
> >  		goto out;
> >  	}
> >  
> > -	spin_lock(&mm->page_table_lock);
> > -	if (likely(pmd_trans_huge(*old_pmd))) {
> > -		if (pmd_trans_splitting(*old_pmd)) {
> > -			spin_unlock(&mm->page_table_lock);
> > -			wait_split_huge_page(vma->anon_vma, old_pmd);
> > -			ret = -1;
> > -		} else {
> > -			pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
> > -			VM_BUG_ON(!pmd_none(*new_pmd));
> > -			set_pmd_at(mm, new_addr, new_pmd, pmd);
> > -			spin_unlock(&mm->page_table_lock);
> > -			ret = 1;
> > -		}
> > -	} else {
> > +	if (likely(pmd_trans_huge_stable(old_pmd, vma))) {
> > +		pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
> > +		VM_BUG_ON(!pmd_none(*new_pmd));
> > +		set_pmd_at(mm, new_addr, new_pmd, pmd);
> >  		spin_unlock(&mm->page_table_lock);
> > +		ret = 1;
> >  	}
> 
> Same slowdown here, needs __pmd_trans_huge_stable as usual, but you
> are now forcing mremap to call split_huge_page even if it's not needed
> (i.e. after wait_split_huge_page).

I didn't think the behavior changes but this can be performance regression
of additional if-check. As you commented below, we had better go to change
the return value of wait_split_huge_page path in __pmd_trans_huge_stable().

> I'd like no-regression cleanups so
> I'd reverse the above and avoid changing already ultra-optimized code
> paths.

I agree.

> >  out:
> >  	return ret;
> > @@ -1104,27 +1080,48 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
> >  	struct mm_struct *mm = vma->vm_mm;
> >  	int ret = 0;
> >  
> > -	spin_lock(&mm->page_table_lock);
> > -	if (likely(pmd_trans_huge(*pmd))) {
> > -		if (unlikely(pmd_trans_splitting(*pmd))) {
> > -			spin_unlock(&mm->page_table_lock);
> > -			wait_split_huge_page(vma->anon_vma, pmd);
> > -		} else {
> > -			pmd_t entry;
> > +	if (likely(pmd_trans_huge_stable(pmd, vma))) {
> > +		pmd_t entry;
> >  
> > -			entry = pmdp_get_and_clear(mm, addr, pmd);
> > -			entry = pmd_modify(entry, newprot);
> > -			set_pmd_at(mm, addr, pmd, entry);
> > -			spin_unlock(&vma->vm_mm->page_table_lock);
> > -			flush_tlb_range(vma, addr, addr + HPAGE_PMD_SIZE);
> > -			ret = 1;
> > -		}
> > -	} else
> > +		entry = pmdp_get_and_clear(mm, addr, pmd);
> > +		entry = pmd_modify(entry, newprot);
> > +		set_pmd_at(mm, addr, pmd, entry);
> >  		spin_unlock(&vma->vm_mm->page_table_lock);
> > +		flush_tlb_range(vma, addr, addr + HPAGE_PMD_SIZE);
> > +		ret = 1;
> > +	}
> >  
> >  	return ret;
> 
> Needs __pmd_trans_huge_stable. Ok to cleanup with that (no regression
> in this case with the __ version).
> 
> > diff --git 3.2-rc5.orig/mm/mremap.c 3.2-rc5/mm/mremap.c
> > index d6959cb..d534668 100644
> > --- 3.2-rc5.orig/mm/mremap.c
> > +++ 3.2-rc5/mm/mremap.c
> > @@ -155,9 +155,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >  			if (err > 0) {
> >  				need_flush = true;
> >  				continue;
> > -			} else if (!err) {
> > -				split_huge_page_pmd(vma->vm_mm, old_pmd);
> >  			}
> > +			split_huge_page_pmd(vma->vm_mm, old_pmd);
> >  			VM_BUG_ON(pmd_trans_huge(*old_pmd));
> >  		}
> >  		if (pmd_none(*new_pmd) && __pte_alloc(new_vma->vm_mm, new_vma,
> 
> regression. If you really want to optimize this and cleanup you could
> make __pmd_trans_huge_stable return -1 if wait_split_huge_page path
> was taken, then you just change the other checks to == 1 and behave
> the same if it's 0 or -1, except in move_huge_pmd where you'll return
> -1 if __pmd_trans_huge_stable returned -1 to retain the above
> optimizaton.

All right.

> Maybe it's not much of an optimization anyway because we trade one
> branch for another, and both should be in l1 cache (though the retval
> is even guaranteed in a register not only in l1 cache so it's even
> better to check that for a branch), but to me is more about keeping
> the code strict which kinds of self-documents it, because conceptually
> calling split_huge_page_pmd if wait_split_huge_page was called is
> superflous (even if at runtime it won't make any difference).

OK, I cancel this change.

> Thanks for cleaning up this, especially where pmd_trans_huge_stable is
> perfect fit, this is a nice cleanup.

Thank you for your valuable feedbacks!

Naoya

^ permalink raw reply	[flat|nested] 30+ messages in thread
* [PATCH 0/6 v3] pagemap handles transparent hugepage
@ 2012-01-12 19:34 Naoya Horiguchi
  2012-01-12 19:34 ` [PATCH 2/6] thp: optimize away unnecessary page table locking Naoya Horiguchi
  0 siblings, 1 reply; 30+ messages in thread
From: Naoya Horiguchi @ 2012-01-12 19:34 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, David Rientjes, Andi Kleen, Wu Fengguang,
	Andrea Arcangeli, KOSAKI Motohiro, KAMEZAWA Hiroyuki,
	linux-kernel, Naoya Horiguchi

Thank you for all your reviews and comments on the previous posts.

In this version, I newly added two patches. One is to separate arch
dependency commented by KOSAKI-san, and the other is to introduce
new type pme_t as commented by Andrew.
And I changed "export KPF_THP" patch to fix an unnoticed bug where
both of KPF_THP and with KPF_HUGE are set for hugetlbfs hugepage.

Thanks,
Naoya

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2012-02-20 11:54 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-27 23:02 [PATCH 0/6 v4] pagemap handles transparent hugepage Naoya Horiguchi
2012-01-27 23:02 ` [PATCH 1/6] pagemap: avoid splitting thp when reading /proc/pid/pagemap Naoya Horiguchi
2012-01-29 13:17   ` Hillf Danton
2012-01-30 19:23     ` Naoya Horiguchi
2012-01-27 23:02 ` [PATCH 2/6] thp: optimize away unnecessary page table locking Naoya Horiguchi
2012-01-28 11:23   ` Hillf Danton
2012-01-28 22:33     ` Naoya Horiguchi
2012-01-30  6:22   ` KAMEZAWA Hiroyuki
2012-02-02  5:27     ` Naoya Horiguchi
2012-02-02  8:32       ` KAMEZAWA Hiroyuki
2012-01-27 23:02 ` [PATCH 3/6] pagemap: export KPF_THP Naoya Horiguchi
2012-01-27 23:02 ` [PATCH 4/6] pagemap: document KPF_THP and make page-types aware of it Naoya Horiguchi
2012-01-27 23:02 ` [PATCH 5/6] introduce thp_ptep_get() Naoya Horiguchi
2012-01-30  6:26   ` KAMEZAWA Hiroyuki
2012-01-30 19:24     ` Naoya Horiguchi
2012-01-27 23:02 ` [PATCH 6/6] pagemap: introduce data structure for pagemap entry Naoya Horiguchi
2012-01-30  6:31   ` KAMEZAWA Hiroyuki
2012-01-30 19:27     ` Naoya Horiguchi
  -- strict thread matches above, loose matches on Subject: below --
2012-02-08 15:51 [PATCH 0/6 v5] pagemap handles transparent hugepage Naoya Horiguchi
2012-02-08 15:51 ` [PATCH 2/6] thp: optimize away unnecessary page table locking Naoya Horiguchi
2012-02-09  2:19   ` KAMEZAWA Hiroyuki
2012-02-19 21:21   ` Hugh Dickins
2012-02-20  7:28     ` Naoya Horiguchi
2012-02-20 11:38       ` Hugh Dickins
2012-02-20 11:54         ` Jiri Slaby
2012-01-16 17:19 Naoya Horiguchi
2012-01-12 19:34 [PATCH 0/6 v3] pagemap handles transparent hugepage Naoya Horiguchi
2012-01-12 19:34 ` [PATCH 2/6] thp: optimize away unnecessary page table locking Naoya Horiguchi
2012-01-13 12:04   ` Hillf Danton
2012-01-13 15:14     ` Naoya Horiguchi
2012-01-14  3:24       ` Hillf Danton
2012-01-14 17:19   ` Andrea Arcangeli

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).