All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
  2014-05-14  9:29 [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() Anthony Iliopoulos
@ 2014-05-13 22:44 ` Dave Hansen
  2014-05-15 17:00     ` Anthony Iliopoulos
  2014-05-13 23:36 ` [tip:x86/urgent] x86, mm, hugetlb: Add " tip-bot for Anthony Iliopoulos
  1 sibling, 1 reply; 9+ messages in thread
From: Dave Hansen @ 2014-05-13 22:44 UTC (permalink / raw)
  To: Anthony Iliopoulos, Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: x86, linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt,
	Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion,
	Gerald Schaefer, David Gibson, linux-arch

On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote:
> The invalidation is required in order to maintain proper semantics
> under CoW conditions. In scenarios where a process clones several
> threads, a thread operating on a core whose DTLB entry for a
> particular hugepage has not been invalidated, will be reading from
> the hugepage that belongs to the forked child process, even after
> hugetlb_cow().
> 
> The thread will not see the updated page as long as the stale DTLB
> entry remains cached, the thread attempts to write into the page,
> the child process exits, or the thread gets migrated to a different
> processor.

No to be too nitpicky, but this applies to ITLB too, right?

I believe this bug came all the way back from:

> commit 1e8f889b10d8d2223105719e36ce45688fedbd59
> Author: David Gibson <david@gibson.dropbear.id.au>
> Date:   Fri Jan 6 00:10:44 2006 -0800
> 
>     [PATCH] Hugetlb: Copy on Write support

It was probably the first time that we ever changed an _existing_
hugetlbfs pte, and that patch probably just missed the TLB flush because
none of the other pte-setting hugetlb.c code needed TLB flushes.

The bogus x86 version of huge_ptep_clear_flush() came from the s390
guys, so double-shame on IBM! :P

> commit 8fe627ec5b7c47b1654dff50536d9709863295a3
> Author: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> Date:   Mon Apr 28 02:13:28 2008 -0700
> 
>     hugetlbfs: add missing TLB flush to hugetlb_cow()

This is probably an opportunity for all the other architecture
maintainers to make sure that they have proper copies of
huge_ptep_clear_flush().

I went through the hugetlb code on x86 and couldn't find another TLB
flush that fixes this issue, and I believe this is correct, so:

Acked-by: Dave Hansen <dave.hansen@intel.com>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip:x86/urgent] x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow()
  2014-05-14  9:29 [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() Anthony Iliopoulos
  2014-05-13 22:44 ` Dave Hansen
@ 2014-05-13 23:36 ` tip-bot for Anthony Iliopoulos
  1 sibling, 0 replies; 9+ messages in thread
From: tip-bot for Anthony Iliopoulos @ 2014-05-13 23:36 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, dave.hansen, shay.goikhman, tglx,
	anthony.iliopoulos, hpa

Commit-ID:  9844f5462392b53824e8b86726e7c33b5ecbb676
Gitweb:     http://git.kernel.org/tip/9844f5462392b53824e8b86726e7c33b5ecbb676
Author:     Anthony Iliopoulos <anthony.iliopoulos@huawei.com>
AuthorDate: Wed, 14 May 2014 11:29:48 +0200
Committer:  H. Peter Anvin <hpa@linux.intel.com>
CommitDate: Tue, 13 May 2014 16:34:09 -0700

x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow()

The invalidation is required in order to maintain proper semantics
under CoW conditions. In scenarios where a process clones several
threads, a thread operating on a core whose DTLB entry for a
particular hugepage has not been invalidated, will be reading from
the hugepage that belongs to the forked child process, even after
hugetlb_cow().

The thread will not see the updated page as long as the stale DTLB
entry remains cached, the thread attempts to write into the page,
the child process exits, or the thread gets migrated to a different
processor.

Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com>
Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp
Suggested-by: Shay Goikhman <shay.goikhman@huawei.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v2.6.16+ (!)
---
 arch/x86/include/asm/hugetlb.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
index a809121..68c0539 100644
--- a/arch/x86/include/asm/hugetlb.h
+++ b/arch/x86/include/asm/hugetlb.h
@@ -52,6 +52,7 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
 					 unsigned long addr, pte_t *ptep)
 {
+	ptep_clear_flush(vma, addr, ptep);
 }
 
 static inline int huge_pte_none(pte_t pte)

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
@ 2014-05-14  9:29 Anthony Iliopoulos
  2014-05-13 22:44 ` Dave Hansen
  2014-05-13 23:36 ` [tip:x86/urgent] x86, mm, hugetlb: Add " tip-bot for Anthony Iliopoulos
  0 siblings, 2 replies; 9+ messages in thread
From: Anthony Iliopoulos @ 2014-05-14  9:29 UTC (permalink / raw)
  To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin
  Cc: x86, linux-kernel, Kirill A. Shutemov, Dave Hansen,
	Shay Goikhman, Paul Mundt, Carlos Villavieja, Nacho Navarro,
	Avi Mendelson, Yoav Etsion

The invalidation is required in order to maintain proper semantics
under CoW conditions. In scenarios where a process clones several
threads, a thread operating on a core whose DTLB entry for a
particular hugepage has not been invalidated, will be reading from
the hugepage that belongs to the forked child process, even after
hugetlb_cow().

The thread will not see the updated page as long as the stale DTLB
entry remains cached, the thread attempts to write into the page,
the child process exits, or the thread gets migrated to a different
processor.

Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com>
Suggested-by: Shay Goikhman <shay.goikhman@huawei.com>
---
 arch/x86/include/asm/hugetlb.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
index a809121..68c0539 100644
--- a/arch/x86/include/asm/hugetlb.h
+++ b/arch/x86/include/asm/hugetlb.h
@@ -52,6 +52,7 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 static inline void huge_ptep_clear_flush(struct vm_area_struct *vma,
 					 unsigned long addr, pte_t *ptep)
 {
+	ptep_clear_flush(vma, addr, ptep);
 }

 static inline int huge_pte_none(pte_t pte)
--
1.8.1.2

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
  2014-05-15 17:00     ` Anthony Iliopoulos
  (?)
@ 2014-05-14 17:24     ` Dave Hansen
  -1 siblings, 0 replies; 9+ messages in thread
From: Dave Hansen @ 2014-05-14 17:24 UTC (permalink / raw)
  To: Anthony Iliopoulos
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel,
	Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja,
	Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer,
	David Gibson, linux-arch

On 05/15/2014 10:00 AM, Anthony Iliopoulos wrote:
> I have actually also wondered about another related thing:
> even when we actually do invalidate the page, there may still be a
> race between a thread on a core that reads the page in some tight
> loop (e.g. on a spinlock), and the page fault handler running on
> a different core, at the point where the pte is set. Since we
> invalidate the page via the TLB shootdowns *before* we update
> the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page()
> as well as hugetlb_cow()), there may be some tiny window where the
> thread might re-read the page before the pte is set.

Don't forget about the "clear" part.  ptep_clear_flush() does:

        pte = ptep_get_and_clear(mm, address, ptep);
        if (pte_accessible(mm, pte))
                flush_tlb_page(vma, address);

so it makes the pte !present and guarantees that any other CPUs looking
at it after the flush but before the set_pte() will also end up in the
page fault handler, and they'll wait until the first fault has finished
with the page tables.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
  2014-05-15 17:00     ` Anthony Iliopoulos
  (?)
  (?)
@ 2014-05-15  7:05     ` Oren Twaig
  2014-05-17  9:24         ` Anthony Iliopoulos
  -1 siblings, 1 reply; 9+ messages in thread
From: Oren Twaig @ 2014-05-15  7:05 UTC (permalink / raw)
  To: Anthony Iliopoulos, Dave Hansen
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel,
	Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja,
	Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer,
	David Gibson, linux-arch


On 05/15/2014 08:00 PM, Anthony Iliopoulos wrote:
> I have dismissed this case, since I assume that there are many more
> cycles spent in servicing the TLB invalidation IPI, walking the pgtable
> plus other related overhead (e.g. sched) than in updating the pte/pmd
> so I am not sure how possible it would be to hit this condition.

Hi Anthony,

I have a question about the above statement. What will happen with multi
cpu VMs ? won't the race described above can happen ? I.e one virtual CPU
can will visit the host and the other will continue to encounter your race ?

Thanks,
Oren.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
  2014-05-13 22:44 ` Dave Hansen
@ 2014-05-15 17:00     ` Anthony Iliopoulos
  0 siblings, 0 replies; 9+ messages in thread
From: Anthony Iliopoulos @ 2014-05-15 17:00 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel,
	Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja,
	Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer,
	David Gibson, linux-arch

On Tue, May 13, 2014 at 03:44:55PM -0700, Dave Hansen wrote:
> On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote:
> > The invalidation is required in order to maintain proper semantics
> > under CoW conditions. In scenarios where a process clones several
> > threads, a thread operating on a core whose DTLB entry for a
> > particular hugepage has not been invalidated, will be reading from
> > the hugepage that belongs to the forked child process, even after
> > hugetlb_cow().
> > 
> > The thread will not see the updated page as long as the stale DTLB
> > entry remains cached, the thread attempts to write into the page,
> > the child process exits, or the thread gets migrated to a different
> > processor.
> 
> No to be too nitpicky, but this applies to ITLB too, right?

Quite true, this does apply to ITLB too. I suppose there might be cases
like self-modifying code that touches the private text pages, or some
JIT compiler writing on mmap-ed executable regions that would observe
the same behavior under similar conditions.

> I believe this bug came all the way back from:
> 
> > commit 1e8f889b10d8d2223105719e36ce45688fedbd59
> > Author: David Gibson <david@gibson.dropbear.id.au>
> > Date:   Fri Jan 6 00:10:44 2006 -0800
> > 
> >     [PATCH] Hugetlb: Copy on Write support
> 
> It was probably the first time that we ever changed an _existing_
> hugetlbfs pte, and that patch probably just missed the TLB flush because
> none of the other pte-setting hugetlb.c code needed TLB flushes.

This seems to be the case, I assume that our internal use case was
also probably the first that needed to rely on proper semantics under
a multithreaded CoW scenario with hugetlb pages, so this came into
the surface.

I have actually also wondered about another related thing:
even when we actually do invalidate the page, there may still be a
race between a thread on a core that reads the page in some tight
loop (e.g. on a spinlock), and the page fault handler running on
a different core, at the point where the pte is set. Since we
invalidate the page via the TLB shootdowns *before* we update
the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page()
as well as hugetlb_cow()), there may be some tiny window where the
thread might re-read the page before the pte is set.

I have dismissed this case, since I assume that there are many more
cycles spent in servicing the TLB invalidation IPI, walking the pgtable
plus other related overhead (e.g. sched) than in updating the pte/pmd
so I am not sure how possible it would be to hit this condition.

I am also hesitant to simply submit a patch that reverses the order
of flushing and setting the pte, due to the following commit:

    commit 4ce072f1faf29d24df4600f53db8cdd62d400a8f
    Author: Siddha, Suresh B <suresh.b.siddha@intel.com>
    Date:   Fri Sep 29 01:58:42 2006 -0700

    [PATCH] mm: fix a race condition under SMC + COW

> The bogus x86 version of huge_ptep_clear_flush() came from the s390
> guys, so double-shame on IBM! :P
> 
> > commit 8fe627ec5b7c47b1654dff50536d9709863295a3
> > Author: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> > Date:   Mon Apr 28 02:13:28 2008 -0700
> > 
> >     hugetlbfs: add missing TLB flush to hugetlb_cow()
> 
> This is probably an opportunity for all the other architecture
> maintainers to make sure that they have proper copies of
> huge_ptep_clear_flush().
> 
> I went through the hugetlb code on x86 and couldn't find another TLB
> flush that fixes this issue, and I believe this is correct, so:
> 
> Acked-by: Dave Hansen <dave.hansen@intel.com>

Many thanks for confirming, and for your prompt response.

Regards,
Anthony

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
@ 2014-05-15 17:00     ` Anthony Iliopoulos
  0 siblings, 0 replies; 9+ messages in thread
From: Anthony Iliopoulos @ 2014-05-15 17:00 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel,
	Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja,
	Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer,
	David Gibson, linux-arch

On Tue, May 13, 2014 at 03:44:55PM -0700, Dave Hansen wrote:
> On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote:
> > The invalidation is required in order to maintain proper semantics
> > under CoW conditions. In scenarios where a process clones several
> > threads, a thread operating on a core whose DTLB entry for a
> > particular hugepage has not been invalidated, will be reading from
> > the hugepage that belongs to the forked child process, even after
> > hugetlb_cow().
> > 
> > The thread will not see the updated page as long as the stale DTLB
> > entry remains cached, the thread attempts to write into the page,
> > the child process exits, or the thread gets migrated to a different
> > processor.
> 
> No to be too nitpicky, but this applies to ITLB too, right?

Quite true, this does apply to ITLB too. I suppose there might be cases
like self-modifying code that touches the private text pages, or some
JIT compiler writing on mmap-ed executable regions that would observe
the same behavior under similar conditions.

> I believe this bug came all the way back from:
> 
> > commit 1e8f889b10d8d2223105719e36ce45688fedbd59
> > Author: David Gibson <david@gibson.dropbear.id.au>
> > Date:   Fri Jan 6 00:10:44 2006 -0800
> > 
> >     [PATCH] Hugetlb: Copy on Write support
> 
> It was probably the first time that we ever changed an _existing_
> hugetlbfs pte, and that patch probably just missed the TLB flush because
> none of the other pte-setting hugetlb.c code needed TLB flushes.

This seems to be the case, I assume that our internal use case was
also probably the first that needed to rely on proper semantics under
a multithreaded CoW scenario with hugetlb pages, so this came into
the surface.

I have actually also wondered about another related thing:
even when we actually do invalidate the page, there may still be a
race between a thread on a core that reads the page in some tight
loop (e.g. on a spinlock), and the page fault handler running on
a different core, at the point where the pte is set. Since we
invalidate the page via the TLB shootdowns *before* we update
the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page()
as well as hugetlb_cow()), there may be some tiny window where the
thread might re-read the page before the pte is set.

I have dismissed this case, since I assume that there are many more
cycles spent in servicing the TLB invalidation IPI, walking the pgtable
plus other related overhead (e.g. sched) than in updating the pte/pmd
so I am not sure how possible it would be to hit this condition.

I am also hesitant to simply submit a patch that reverses the order
of flushing and setting the pte, due to the following commit:

    commit 4ce072f1faf29d24df4600f53db8cdd62d400a8f
    Author: Siddha, Suresh B <suresh.b.siddha@intel.com>
    Date:   Fri Sep 29 01:58:42 2006 -0700

    [PATCH] mm: fix a race condition under SMC + COW

> The bogus x86 version of huge_ptep_clear_flush() came from the s390
> guys, so double-shame on IBM! :P
> 
> > commit 8fe627ec5b7c47b1654dff50536d9709863295a3
> > Author: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> > Date:   Mon Apr 28 02:13:28 2008 -0700
> > 
> >     hugetlbfs: add missing TLB flush to hugetlb_cow()
> 
> This is probably an opportunity for all the other architecture
> maintainers to make sure that they have proper copies of
> huge_ptep_clear_flush().
> 
> I went through the hugetlb code on x86 and couldn't find another TLB
> flush that fixes this issue, and I believe this is correct, so:
> 
> Acked-by: Dave Hansen <dave.hansen@intel.com>

Many thanks for confirming, and for your prompt response.

Regards,
Anthony

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
  2014-05-15  7:05     ` Oren Twaig
@ 2014-05-17  9:24         ` Anthony Iliopoulos
  0 siblings, 0 replies; 9+ messages in thread
From: Anthony Iliopoulos @ 2014-05-17  9:24 UTC (permalink / raw)
  To: Oren Twaig
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt,
	Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion,
	Gerald Schaefer, David Gibson, linux-arch

Hi Oren,

On Thu, May 15, 2014 at 10:05:05AM +0300, Oren Twaig wrote:
> 
> On 05/15/2014 08:00 PM, Anthony Iliopoulos wrote:
> > I have dismissed this case, since I assume that there are many more
> > cycles spent in servicing the TLB invalidation IPI, walking the pgtable
> > plus other related overhead (e.g. sched) than in updating the pte/pmd
> > so I am not sure how possible it would be to hit this condition.
> 
> Hi Anthony,
> 
> I have a question about the above statement. What will happen with multi
> cpu VMs ? won't the race described above can happen ? I.e one virtual CPU
> can will visit the host and the other will continue to encounter your race ?

I don't think there will be any race for the vcpu cases, since the sptes
will be cleared via the mmu_notifier, so this should take care of it
in the same manner as ptep_get_and_clear() does, as described by Dave
earlier in this thread.

Regards,
Anthony

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
@ 2014-05-17  9:24         ` Anthony Iliopoulos
  0 siblings, 0 replies; 9+ messages in thread
From: Anthony Iliopoulos @ 2014-05-17  9:24 UTC (permalink / raw)
  To: Oren Twaig
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86,
	linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt,
	Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion,
	Gerald Schaefer, David Gibson, linux-arch

Hi Oren,

On Thu, May 15, 2014 at 10:05:05AM +0300, Oren Twaig wrote:
> 
> On 05/15/2014 08:00 PM, Anthony Iliopoulos wrote:
> > I have dismissed this case, since I assume that there are many more
> > cycles spent in servicing the TLB invalidation IPI, walking the pgtable
> > plus other related overhead (e.g. sched) than in updating the pte/pmd
> > so I am not sure how possible it would be to hit this condition.
> 
> Hi Anthony,
> 
> I have a question about the above statement. What will happen with multi
> cpu VMs ? won't the race described above can happen ? I.e one virtual CPU
> can will visit the host and the other will continue to encounter your race ?

I don't think there will be any race for the vcpu cases, since the sptes
will be cleared via the mmu_notifier, so this should take care of it
in the same manner as ptep_get_and_clear() does, as described by Dave
earlier in this thread.

Regards,
Anthony

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-05-17  9:25 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-14  9:29 [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() Anthony Iliopoulos
2014-05-13 22:44 ` Dave Hansen
2014-05-15 17:00   ` Anthony Iliopoulos
2014-05-15 17:00     ` Anthony Iliopoulos
2014-05-14 17:24     ` Dave Hansen
2014-05-15  7:05     ` Oren Twaig
2014-05-17  9:24       ` Anthony Iliopoulos
2014-05-17  9:24         ` Anthony Iliopoulos
2014-05-13 23:36 ` [tip:x86/urgent] x86, mm, hugetlb: Add " tip-bot for Anthony Iliopoulos

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.