All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerome Marchand <jmarchan@redhat.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: linux-mm@kvack.org, Dave Hansen <dave.hansen@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH -mm v2 04/11] pagewalk: move pmd_trans_huge_lock() from callbacks to common code
Date: Wed, 18 Jun 2014 17:13:53 +0200	[thread overview]
Message-ID: <53A1ACB1.3050102@redhat.com> (raw)
In-Reply-To: <20140617150159.GA8524@nhori.redhat.com>

On 06/17/2014 05:01 PM, Naoya Horiguchi wrote:
> On Tue, Jun 17, 2014 at 04:27:56PM +0200, Jerome Marchand wrote:
>> On 06/12/2014 11:48 PM, Naoya Horiguchi wrote:
>>> Now all of current users of page table walker are canonicalized, i.e.
>>> pmd_entry() handles only trans_pmd entry, and pte_entry() handles pte entry.
>>> So we can factorize common code more.
>>> This patch moves pmd_trans_huge_lock() in each pmd_entry() to pagewalk core.
>>>
>>> ChangeLog v2:
>>> - add null check walk->vma in walk_pmd_range()
>>
>> An older version of this patch already made it to linux-next (commit
>> b0e08c5) and I've actually hit the NULL pointer dereference.
>>
>> Moreover, that patch (or maybe another recent pagewalk patch) breaks
>> /proc/<pid>/smaps. All fields that should have been filled by
>> smaps_pte() are almost always zero (and when it isn't, it's always a
>> multiple of 2MB). It seems to me that the page walk never goes below
>> pmd level.
> 
> Agreed, I'm now thinking that forcing pte_entry() for every user is not
> good idea, so I'll return to the start point and just will do only the
> necessary changes (i.e. only iron out the vma handling problem for hugepage.)
> 
> Thanks,
> Naoya Horiguchi
> 
>> Jerome
>>
>>> - move comment update into a separate patch
>>>
>>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>> ---


>>> diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
>>> index 24311d6f5c20..f1a3417d0b51 100644
>>> --- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
>>> +++ mmotm-2014-05-21-16-57/mm/pagewalk.c
>>> @@ -73,8 +73,22 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
>>>  			continue;
>>>  		}
>>>  
>>> -		if (walk->pmd_entry) {
>>> -			err = walk->pmd_entry(pmd, addr, next, walk);
>>> +		/*
>>> +		 * We don't take compound_lock() here but no race with splitting
>>> +		 * thp happens because:
>>> +		 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is
>>> +		 *    not under splitting, which means there's no concurrent
>>> +		 *    thp split,
>>> +		 *  - if another thread runs into split_huge_page() just after
>>> +		 *    we entered this if-block, the thread must wait for page
>>> +		 *    table lock to be unlocked in __split_huge_page_splitting(),
>>> +		 *    where the main part of thp split is not executed yet.
>>> +		 */
>>> +		if (walk->pmd_entry && walk->vma) {
>>> +			if (pmd_trans_huge_lock(pmd, walk->vma, &walk->ptl) == 1) {
>>> +				err = walk->pmd_entry(pmd, addr, next, walk);
>>> +				spin_unlock(walk->ptl);
>>> +			}
>>>  			if (skip_lower_level_walking(walk))
>>>  				continue;
>>>  			if (err)

This is the cause of the smaps trouble. This code modifies walk->control
when pmd_entry() is present, even when it is not called. All the control
code should depend on pmd_trans_huge_lock() == 1 too.

Jerome

WARNING: multiple messages have this Message-ID (diff)
From: Jerome Marchand <jmarchan@redhat.com>
To: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: linux-mm@kvack.org, Dave Hansen <dave.hansen@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH -mm v2 04/11] pagewalk: move pmd_trans_huge_lock() from callbacks to common code
Date: Wed, 18 Jun 2014 17:13:53 +0200	[thread overview]
Message-ID: <53A1ACB1.3050102@redhat.com> (raw)
In-Reply-To: <20140617150159.GA8524@nhori.redhat.com>

On 06/17/2014 05:01 PM, Naoya Horiguchi wrote:
> On Tue, Jun 17, 2014 at 04:27:56PM +0200, Jerome Marchand wrote:
>> On 06/12/2014 11:48 PM, Naoya Horiguchi wrote:
>>> Now all of current users of page table walker are canonicalized, i.e.
>>> pmd_entry() handles only trans_pmd entry, and pte_entry() handles pte entry.
>>> So we can factorize common code more.
>>> This patch moves pmd_trans_huge_lock() in each pmd_entry() to pagewalk core.
>>>
>>> ChangeLog v2:
>>> - add null check walk->vma in walk_pmd_range()
>>
>> An older version of this patch already made it to linux-next (commit
>> b0e08c5) and I've actually hit the NULL pointer dereference.
>>
>> Moreover, that patch (or maybe another recent pagewalk patch) breaks
>> /proc/<pid>/smaps. All fields that should have been filled by
>> smaps_pte() are almost always zero (and when it isn't, it's always a
>> multiple of 2MB). It seems to me that the page walk never goes below
>> pmd level.
> 
> Agreed, I'm now thinking that forcing pte_entry() for every user is not
> good idea, so I'll return to the start point and just will do only the
> necessary changes (i.e. only iron out the vma handling problem for hugepage.)
> 
> Thanks,
> Naoya Horiguchi
> 
>> Jerome
>>
>>> - move comment update into a separate patch
>>>
>>> Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
>>> ---


>>> diff --git mmotm-2014-05-21-16-57.orig/mm/pagewalk.c mmotm-2014-05-21-16-57/mm/pagewalk.c
>>> index 24311d6f5c20..f1a3417d0b51 100644
>>> --- mmotm-2014-05-21-16-57.orig/mm/pagewalk.c
>>> +++ mmotm-2014-05-21-16-57/mm/pagewalk.c
>>> @@ -73,8 +73,22 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr,
>>>  			continue;
>>>  		}
>>>  
>>> -		if (walk->pmd_entry) {
>>> -			err = walk->pmd_entry(pmd, addr, next, walk);
>>> +		/*
>>> +		 * We don't take compound_lock() here but no race with splitting
>>> +		 * thp happens because:
>>> +		 *  - if pmd_trans_huge_lock() returns 1, the relevant thp is
>>> +		 *    not under splitting, which means there's no concurrent
>>> +		 *    thp split,
>>> +		 *  - if another thread runs into split_huge_page() just after
>>> +		 *    we entered this if-block, the thread must wait for page
>>> +		 *    table lock to be unlocked in __split_huge_page_splitting(),
>>> +		 *    where the main part of thp split is not executed yet.
>>> +		 */
>>> +		if (walk->pmd_entry && walk->vma) {
>>> +			if (pmd_trans_huge_lock(pmd, walk->vma, &walk->ptl) == 1) {
>>> +				err = walk->pmd_entry(pmd, addr, next, walk);
>>> +				spin_unlock(walk->ptl);
>>> +			}
>>>  			if (skip_lower_level_walking(walk))
>>>  				continue;
>>>  			if (err)

This is the cause of the smaps trouble. This code modifies walk->control
when pmd_entry() is present, even when it is not called. All the control
code should depend on pmd_trans_huge_lock() == 1 too.

Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2014-06-18 15:14 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-12 21:48 [PATCH -mm v2 00/11] pagewalk: standardize current users, move pmd locking, apply to mincore Naoya Horiguchi
2014-06-12 21:48 ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 01/11] pagewalk: remove pgd_entry() and pud_entry() Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 02/11] madvise: cleanup swapin_walk_pmd_entry() Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-15 20:24   ` Hugh Dickins
2014-06-15 20:24     ` Hugh Dickins
2014-06-16 15:59     ` Naoya Horiguchi
2014-06-16 15:59       ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 03/11] memcg: separate mem_cgroup_move_charge_pte_range() Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 04/11] pagewalk: move pmd_trans_huge_lock() from callbacks to common code Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-17 14:27   ` Jerome Marchand
2014-06-17 14:27     ` Jerome Marchand
2014-06-17 15:01     ` Naoya Horiguchi
2014-06-17 15:01       ` Naoya Horiguchi
2014-06-18 15:13       ` Jerome Marchand [this message]
2014-06-18 15:13         ` Jerome Marchand
2014-06-18 15:31         ` Naoya Horiguchi
2014-06-18 15:31           ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 05/11] pagewalk: remove mm_walk->skip Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 06/11] pagewalk: add size to struct mm_walk Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 22:07   ` Dave Hansen
2014-06-12 22:07     ` Dave Hansen
2014-06-12 22:36     ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 07/11] pagewalk: change type of arg of callbacks Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 08/11] pagewalk: update comment on walk_page_range() Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 09/11] fs/proc/task_mmu.c: refactor smaps Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 10/11] fs/proc/task_mmu.c: clean up gather_*_stats() Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:48 ` [PATCH -mm v2 11/11] mincore: apply page table walker on do_mincore() Naoya Horiguchi
2014-06-12 21:48   ` Naoya Horiguchi
2014-06-12 21:56 ` [PATCH -mm v2 00/11] pagewalk: standardize current users, move pmd locking, apply to mincore Andrew Morton
2014-06-12 21:56   ` Andrew Morton
2014-06-12 22:21   ` Naoya Horiguchi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53A1ACB1.3050102@redhat.com \
    --to=jmarchan@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=hughd@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.