linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Arcangeli <aarcange@redhat.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
	David Rientjes <rientjes@google.com>,
	Andi Kleen <andi@firstfloor.org>,
	Wu Fengguang <fengguang.wu@intel.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/6] thp: optimize away unnecessary page table locking
Date: Sat, 14 Jan 2012 18:19:56 +0100	[thread overview]
Message-ID: <20120114171955.GG3236@redhat.com> (raw)
In-Reply-To: <1326396898-5579-3-git-send-email-n-horiguchi@ah.jp.nec.com>

On Thu, Jan 12, 2012 at 02:34:54PM -0500, Naoya Horiguchi wrote:
> @@ -694,26 +686,18 @@ static int pagemap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
>  	/* find the first VMA at or above 'addr' */
>  	vma = find_vma(walk->mm, addr);
>  
> -	spin_lock(&walk->mm->page_table_lock);
> -	if (pmd_trans_huge(*pmd)) {
> -		if (pmd_trans_splitting(*pmd)) {
> -			spin_unlock(&walk->mm->page_table_lock);
> -			wait_split_huge_page(vma->anon_vma, pmd);
> -		} else {
> -			for (; addr != end; addr += PAGE_SIZE) {
> -				unsigned long offset = (addr & ~PAGEMAP_WALK_MASK)
> -					>> PAGE_SHIFT;
> -				pfn = thp_pte_to_pagemap_entry(*(pte_t *)pmd,
> -							       offset);
> -				err = add_to_pagemap(addr, pfn, pm);
> -				if (err)
> -					break;
> -			}
> -			spin_unlock(&walk->mm->page_table_lock);
> -			return err;
> +	if (pmd_trans_huge_stable(pmd, vma)) {
> +		for (; addr != end; addr += PAGE_SIZE) {
> +			unsigned long offset = (addr & ~PAGEMAP_WALK_MASK)
> +				>> PAGE_SHIFT;
> +			pfn = thp_pte_to_pagemap_entry(*(pte_t *)pmd,
> +						       offset);
> +			err = add_to_pagemap(addr, pfn, pm);
> +			if (err)
> +				break;
>  		}
> -	} else {
>  		spin_unlock(&walk->mm->page_table_lock);

This was already pointed out by Hillf, I didn't see a new submit but I
guess it's in the works, thanks.

> index 36b3d98..b7811df 100644
> --- 3.2-rc5.orig/mm/huge_memory.c
> +++ 3.2-rc5/mm/huge_memory.c
> @@ -1001,29 +1001,21 @@ int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>  {
>  	int ret = 0;
>  
> -	spin_lock(&tlb->mm->page_table_lock);
> -	if (likely(pmd_trans_huge(*pmd))) {
> -		if (unlikely(pmd_trans_splitting(*pmd))) {
> -			spin_unlock(&tlb->mm->page_table_lock);
> -			wait_split_huge_page(vma->anon_vma,
> -					     pmd);
> -		} else {
> -			struct page *page;
> -			pgtable_t pgtable;
> -			pgtable = get_pmd_huge_pte(tlb->mm);
> -			page = pmd_page(*pmd);
> -			pmd_clear(pmd);
> -			page_remove_rmap(page);
> -			VM_BUG_ON(page_mapcount(page) < 0);
> -			add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
> -			VM_BUG_ON(!PageHead(page));
> -			spin_unlock(&tlb->mm->page_table_lock);
> -			tlb_remove_page(tlb, page);
> -			pte_free(tlb->mm, pgtable);
> -			ret = 1;
> -		}
> -	} else
> +	if (likely(pmd_trans_huge_stable(pmd, vma))) {
> +		struct page *page;
> +		pgtable_t pgtable;
> +		pgtable = get_pmd_huge_pte(tlb->mm);
> +		page = pmd_page(*pmd);
> +		pmd_clear(pmd);
> +		page_remove_rmap(page);
> +		VM_BUG_ON(page_mapcount(page) < 0);
> +		add_mm_counter(tlb->mm, MM_ANONPAGES, -HPAGE_PMD_NR);
> +		VM_BUG_ON(!PageHead(page));
>  		spin_unlock(&tlb->mm->page_table_lock);
> +		tlb_remove_page(tlb, page);
> +		pte_free(tlb->mm, pgtable);
> +		ret = 1;
> +	}

This has been micro slowed down. I think you should use
pmd_trans_huge_stable only in places where pmd_trans_huge cannot be
set. I would back out the above as it's a micro-regression.

Maybe what you could do if you want to clean it up further is to make
a static inline in huge_mm of pmd_trans_huge_stable that only checks
pmd_trans_huge and then calls __pmd_trans_huge_stable, and use
__pmd_trans_huge_stable above.

> @@ -1034,21 +1026,14 @@ int mincore_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>  {
>  	int ret = 0;
>  
> -	spin_lock(&vma->vm_mm->page_table_lock);
> -	if (likely(pmd_trans_huge(*pmd))) {
> -		ret = !pmd_trans_splitting(*pmd);
> -		spin_unlock(&vma->vm_mm->page_table_lock);
> -		if (unlikely(!ret))
> -			wait_split_huge_page(vma->anon_vma, pmd);
> -		else {
> -			/*
> -			 * All logical pages in the range are present
> -			 * if backed by a huge page.
> -			 */
> -			memset(vec, 1, (end - addr) >> PAGE_SHIFT);
> -		}
> -	} else
> +	if (likely(pmd_trans_huge_stable(pmd, vma))) {
> +		/*
> +		 * All logical pages in the range are present
> +		 * if backed by a huge page.
> +		 */
>  		spin_unlock(&vma->vm_mm->page_table_lock);
> +		memset(vec, 1, (end - addr) >> PAGE_SHIFT);
> +	}
>  
>  	return ret;
>  }

same slowdown here. Here even __pmd_trans_huge_stable wouldn't be
enough to optimize it as it'd still generate more .text with two
spin_unlock (one in __pmd_trans_huge_stable and one retained above)
instead of just 1 in the original version. I'd avoid the cleanup for
the above ultra optimized version.

> @@ -1078,21 +1063,12 @@ int move_huge_pmd(struct vm_area_struct *vma, struct vm_area_struct *new_vma,
>  		goto out;
>  	}
>  
> -	spin_lock(&mm->page_table_lock);
> -	if (likely(pmd_trans_huge(*old_pmd))) {
> -		if (pmd_trans_splitting(*old_pmd)) {
> -			spin_unlock(&mm->page_table_lock);
> -			wait_split_huge_page(vma->anon_vma, old_pmd);
> -			ret = -1;
> -		} else {
> -			pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
> -			VM_BUG_ON(!pmd_none(*new_pmd));
> -			set_pmd_at(mm, new_addr, new_pmd, pmd);
> -			spin_unlock(&mm->page_table_lock);
> -			ret = 1;
> -		}
> -	} else {
> +	if (likely(pmd_trans_huge_stable(old_pmd, vma))) {
> +		pmd = pmdp_get_and_clear(mm, old_addr, old_pmd);
> +		VM_BUG_ON(!pmd_none(*new_pmd));
> +		set_pmd_at(mm, new_addr, new_pmd, pmd);
>  		spin_unlock(&mm->page_table_lock);
> +		ret = 1;
>  	}

Same slowdown here, needs __pmd_trans_huge_stable as usual, but you
are now forcing mremap to call split_huge_page even if it's not needed
(i.e. after wait_split_huge_page). I'd like no-regression cleanups so
I'd reverse the above and avoid changing already ultra-optimized code
paths.

>  out:
>  	return ret;
> @@ -1104,27 +1080,48 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
>  	struct mm_struct *mm = vma->vm_mm;
>  	int ret = 0;
>  
> -	spin_lock(&mm->page_table_lock);
> -	if (likely(pmd_trans_huge(*pmd))) {
> -		if (unlikely(pmd_trans_splitting(*pmd))) {
> -			spin_unlock(&mm->page_table_lock);
> -			wait_split_huge_page(vma->anon_vma, pmd);
> -		} else {
> -			pmd_t entry;
> +	if (likely(pmd_trans_huge_stable(pmd, vma))) {
> +		pmd_t entry;
>  
> -			entry = pmdp_get_and_clear(mm, addr, pmd);
> -			entry = pmd_modify(entry, newprot);
> -			set_pmd_at(mm, addr, pmd, entry);
> -			spin_unlock(&vma->vm_mm->page_table_lock);
> -			flush_tlb_range(vma, addr, addr + HPAGE_PMD_SIZE);
> -			ret = 1;
> -		}
> -	} else
> +		entry = pmdp_get_and_clear(mm, addr, pmd);
> +		entry = pmd_modify(entry, newprot);
> +		set_pmd_at(mm, addr, pmd, entry);
>  		spin_unlock(&vma->vm_mm->page_table_lock);
> +		flush_tlb_range(vma, addr, addr + HPAGE_PMD_SIZE);
> +		ret = 1;
> +	}
>  
>  	return ret;

Needs __pmd_trans_huge_stable. Ok to cleanup with that (no regression
in this case with the __ version).

> diff --git 3.2-rc5.orig/mm/mremap.c 3.2-rc5/mm/mremap.c
> index d6959cb..d534668 100644
> --- 3.2-rc5.orig/mm/mremap.c
> +++ 3.2-rc5/mm/mremap.c
> @@ -155,9 +155,8 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
>  			if (err > 0) {
>  				need_flush = true;
>  				continue;
> -			} else if (!err) {
> -				split_huge_page_pmd(vma->vm_mm, old_pmd);
>  			}
> +			split_huge_page_pmd(vma->vm_mm, old_pmd);
>  			VM_BUG_ON(pmd_trans_huge(*old_pmd));
>  		}
>  		if (pmd_none(*new_pmd) && __pte_alloc(new_vma->vm_mm, new_vma,

regression. If you really want to optimize this and cleanup you could
make __pmd_trans_huge_stable return -1 if wait_split_huge_page path
was taken, then you just change the other checks to == 1 and behave
the same if it's 0 or -1, except in move_huge_pmd where you'll return
-1 if __pmd_trans_huge_stable returned -1 to retain the above
optimizaton.

Maybe it's not much of an optimization anyway because we trade one
branch for another, and both should be in l1 cache (though the retval
is even guaranteed in a register not only in l1 cache so it's even
better to check that for a branch), but to me is more about keeping
the code strict which kinds of self-documents it, because conceptually
calling split_huge_page_pmd if wait_split_huge_page was called is
superflous (even if at runtime it won't make any difference).

Thanks for cleaning up this, especially where pmd_trans_huge_stable is
perfect fit, this is a nice cleanup.

  parent reply	other threads:[~2012-01-14 17:20 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-12 19:34 [PATCH 0/6 v3] pagemap handles transparent hugepage Naoya Horiguchi
2012-01-12 19:34 ` [PATCH 1/6] pagemap: avoid splitting thp when reading /proc/pid/pagemap Naoya Horiguchi
2012-01-14 17:00   ` Andrea Arcangeli
2012-01-15 12:06     ` Andrea Arcangeli
2012-01-16 17:18       ` Naoya Horiguchi
2012-01-12 19:34 ` [PATCH 2/6] thp: optimize away unnecessary page table locking Naoya Horiguchi
2012-01-13 12:04   ` Hillf Danton
2012-01-13 15:14     ` Naoya Horiguchi
2012-01-14  3:24       ` Hillf Danton
2012-01-14 17:19   ` Andrea Arcangeli [this message]
2012-01-12 19:34 ` [PATCH 3/6] pagemap: export KPF_THP Naoya Horiguchi
2012-01-12 19:34 ` [PATCH 4/6] pagemap: document KPF_THP and make page-types aware of it Naoya Horiguchi
2012-01-12 19:34 ` [PATCH 5/6] introduce thp_ptep_get() Naoya Horiguchi
2012-01-12 19:34 ` [PATCH 6/6] pagemap: introduce data structure for pagemap entry Naoya Horiguchi
2012-01-13 21:54 ` [PATCH 0/6 v3] pagemap handles transparent hugepage Andrew Morton
2012-01-16 17:19 [PATCH 2/6] thp: optimize away unnecessary page table locking Naoya Horiguchi
2012-01-27 23:02 [PATCH 0/6 v4] pagemap handles transparent hugepage Naoya Horiguchi
2012-01-27 23:02 ` [PATCH 2/6] thp: optimize away unnecessary page table locking Naoya Horiguchi
2012-01-28 11:23   ` Hillf Danton
2012-01-28 22:33     ` Naoya Horiguchi
2012-01-30  6:22   ` KAMEZAWA Hiroyuki
2012-02-02  5:27     ` Naoya Horiguchi
2012-02-02  8:32       ` KAMEZAWA Hiroyuki
2012-02-08 15:51 [PATCH 0/6 v5] pagemap handles transparent hugepage Naoya Horiguchi
2012-02-08 15:51 ` [PATCH 2/6] thp: optimize away unnecessary page table locking Naoya Horiguchi
2012-02-09  2:19   ` KAMEZAWA Hiroyuki
2012-02-19 21:21   ` Hugh Dickins
2012-02-20  7:28     ` Naoya Horiguchi
2012-02-20 11:38       ` Hugh Dickins
2012-02-20 11:54         ` Jiri Slaby

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120114171955.GG3236@redhat.com \
    --to=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=fengguang.wu@intel.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).