All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michael Ellerman <mpe@ellerman.id.au>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	benh@kernel.crashing.org, paulus@samba.org,
	akpm@linux-foundation.org,
	Mel Gorman <mgorman@techsingularity.net>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org,
	"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Subject: Re: [V3] powerpc/mm: Fix Multi hit ERAT cause by recent THP update
Date: Tue,  9 Feb 2016 23:16:06 +1100 (AEDT)	[thread overview]
Message-ID: <20160209121606.EA46C140B97@ozlabs.org> (raw)
In-Reply-To: <1454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Tue, 2016-09-02 at 01:20:31 UTC, "Aneesh Kumar K.V" wrote:
> With ppc64 we use the deposited pgtable_t to store the hash pte slot
> information. We should not withdraw the deposited pgtable_t without
> marking the pmd none. This ensure that low level hash fault handling
> will skip this huge pte and we will handle them at upper levels.
> 
> Recent change to pmd splitting changed the above in order to handle the
> race between pmd split and exit_mmap. The race is explained below.
> 
> Consider following race:
> 
> 		CPU0				CPU1
> shrink_page_list()
>   add_to_swap()
>     split_huge_page_to_list()
>       __split_huge_pmd_locked()
>         pmdp_huge_clear_flush_notify()
> 	// pmd_none() == true
> 					exit_mmap()
> 					  unmap_vmas()
> 					    zap_pmd_range()
> 					      // no action on pmd since pmd_none() == true
> 	pmd_populate()
> 
> As result the THP will not be freed. The leak is detected by check_mm():
> 
> 	BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
> 
> The above required us to not mark pmd none during a pmd split.
> 
> The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
> level fault handling code skip this pte. At higher level we do take ptl
> lock. That should serialze us against the pmd split. Once the lock is
> acquired we do check the pmd again using pmd_same. That should always
> return false for us and hence we should retry the access. We do the
> pmd_same check in all case after taking plt with
> THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
> huge_pmd_set_accessed)
> 
> Also make sure we wait for irq disable section in other cpus to finish
> before flipping a huge pte entry with a regular pmd entry. Code paths
> like find_linux_pte_or_hugepte depend on irq disable to get
> a stable pte_t pointer. A parallel thp split need to make sure we
> don't convert a pmd pte to a regular pmd entry without waiting for the
> irq disable section to finish.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/9db4cd6c21535a4846b38808f3

cheers

WARNING: multiple messages have this Message-ID (diff)
From: Michael Ellerman <mpe@ellerman.id.au>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>,
	benh@kernel.crashing.org, paulus@samba.org,
	akpm@linux-foundation.org,
	Mel Gorman <mgorman@techsingularity.net>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org,
	linux-kernel@vger.kernel.org
Subject: Re: [V3] powerpc/mm: Fix Multi hit ERAT cause by recent THP update
Date: Tue,  9 Feb 2016 23:16:06 +1100 (AEDT)	[thread overview]
Message-ID: <20160209121606.EA46C140B97@ozlabs.org> (raw)
In-Reply-To: <1454980831-16631-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>

On Tue, 2016-09-02 at 01:20:31 UTC, "Aneesh Kumar K.V" wrote:
> With ppc64 we use the deposited pgtable_t to store the hash pte slot
> information. We should not withdraw the deposited pgtable_t without
> marking the pmd none. This ensure that low level hash fault handling
> will skip this huge pte and we will handle them at upper levels.
> 
> Recent change to pmd splitting changed the above in order to handle the
> race between pmd split and exit_mmap. The race is explained below.
> 
> Consider following race:
> 
> 		CPU0				CPU1
> shrink_page_list()
>   add_to_swap()
>     split_huge_page_to_list()
>       __split_huge_pmd_locked()
>         pmdp_huge_clear_flush_notify()
> 	// pmd_none() == true
> 					exit_mmap()
> 					  unmap_vmas()
> 					    zap_pmd_range()
> 					      // no action on pmd since pmd_none() == true
> 	pmd_populate()
> 
> As result the THP will not be freed. The leak is detected by check_mm():
> 
> 	BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512
> 
> The above required us to not mark pmd none during a pmd split.
> 
> The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
> level fault handling code skip this pte. At higher level we do take ptl
> lock. That should serialze us against the pmd split. Once the lock is
> acquired we do check the pmd again using pmd_same. That should always
> return false for us and hence we should retry the access. We do the
> pmd_same check in all case after taking plt with
> THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
> huge_pmd_set_accessed)
> 
> Also make sure we wait for irq disable section in other cpus to finish
> before flipping a huge pte entry with a regular pmd entry. Code paths
> like find_linux_pte_or_hugepte depend on irq disable to get
> a stable pte_t pointer. A parallel thp split need to make sure we
> don't convert a pmd pte to a regular pmd entry without waiting for the
> irq disable section to finish.
> 
> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/9db4cd6c21535a4846b38808f3

cheers

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-02-09 12:16 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-09  1:20 [PATCH V3] powerpc/mm: Fix Multi hit ERAT cause by recent THP update Aneesh Kumar K.V
2016-02-09  1:20 ` Aneesh Kumar K.V
2016-02-09 12:16 ` Michael Ellerman [this message]
2016-02-09 12:16   ` [V3] " Michael Ellerman
2016-02-14  5:32   ` Aneesh Kumar K.V
2016-02-14  5:32     ` Aneesh Kumar K.V
2016-02-15  2:44 ` [PATCH V3] " Balbir Singh
2016-02-15  2:44   ` Balbir Singh
2016-02-15  4:37   ` Aneesh Kumar K.V
2016-02-15  4:37     ` Aneesh Kumar K.V
2016-02-15  4:37     ` Aneesh Kumar K.V
2016-02-15  5:09     ` Balbir Singh
2016-02-15  5:09       ` Balbir Singh
2016-02-15 11:01       ` Aneesh Kumar K.V
2016-02-15 11:01         ` Aneesh Kumar K.V
2016-02-15 11:01         ` Aneesh Kumar K.V
2016-02-16  5:20         ` Balbir Singh
2016-02-16  5:20           ` Balbir Singh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160209121606.EA46C140B97@ozlabs.org \
    --to=mpe@ellerman.id.au \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@techsingularity.net \
    --cc=paulus@samba.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.