All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anthony Iliopoulos <anthony.iliopoulos@huawei.com>
To: Dave Hansen <dave@sr71.net>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	<x86@kernel.org>, <linux-kernel@vger.kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Shay Goikhman <shay.goikhman@huawei.com>,
	Paul Mundt <paul.mundt@huawei.com>,
	Carlos Villavieja <villavieja@hps.utexas.edu>,
	Nacho Navarro <nacho.navarro@bsc.es>,
	"Avi Mendelson" <avi.mendelson@tce.technion.ac.il>,
	Yoav Etsion <yetsion@tce.technion.ac.il>,
	Gerald Schaefer <gerald.schaefer@de.ibm.com>,
	David Gibson <david@gibson.dropbear.id.au>,
	linux-arch <linux-arch@vger.kernel.org>
Subject: Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
Date: Thu, 15 May 2014 19:00:35 +0200	[thread overview]
Message-ID: <20140515170035.GA15779@server-36.huawei.corp> (raw)
In-Reply-To: <5372A067.9010808@sr71.net>

On Tue, May 13, 2014 at 03:44:55PM -0700, Dave Hansen wrote:
> On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote:
> > The invalidation is required in order to maintain proper semantics
> > under CoW conditions. In scenarios where a process clones several
> > threads, a thread operating on a core whose DTLB entry for a
> > particular hugepage has not been invalidated, will be reading from
> > the hugepage that belongs to the forked child process, even after
> > hugetlb_cow().
> > 
> > The thread will not see the updated page as long as the stale DTLB
> > entry remains cached, the thread attempts to write into the page,
> > the child process exits, or the thread gets migrated to a different
> > processor.
> 
> No to be too nitpicky, but this applies to ITLB too, right?

Quite true, this does apply to ITLB too. I suppose there might be cases
like self-modifying code that touches the private text pages, or some
JIT compiler writing on mmap-ed executable regions that would observe
the same behavior under similar conditions.

> I believe this bug came all the way back from:
> 
> > commit 1e8f889b10d8d2223105719e36ce45688fedbd59
> > Author: David Gibson <david@gibson.dropbear.id.au>
> > Date:   Fri Jan 6 00:10:44 2006 -0800
> > 
> >     [PATCH] Hugetlb: Copy on Write support
> 
> It was probably the first time that we ever changed an _existing_
> hugetlbfs pte, and that patch probably just missed the TLB flush because
> none of the other pte-setting hugetlb.c code needed TLB flushes.

This seems to be the case, I assume that our internal use case was
also probably the first that needed to rely on proper semantics under
a multithreaded CoW scenario with hugetlb pages, so this came into
the surface.

I have actually also wondered about another related thing:
even when we actually do invalidate the page, there may still be a
race between a thread on a core that reads the page in some tight
loop (e.g. on a spinlock), and the page fault handler running on
a different core, at the point where the pte is set. Since we
invalidate the page via the TLB shootdowns *before* we update
the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page()
as well as hugetlb_cow()), there may be some tiny window where the
thread might re-read the page before the pte is set.

I have dismissed this case, since I assume that there are many more
cycles spent in servicing the TLB invalidation IPI, walking the pgtable
plus other related overhead (e.g. sched) than in updating the pte/pmd
so I am not sure how possible it would be to hit this condition.

I am also hesitant to simply submit a patch that reverses the order
of flushing and setting the pte, due to the following commit:

    commit 4ce072f1faf29d24df4600f53db8cdd62d400a8f
    Author: Siddha, Suresh B <suresh.b.siddha@intel.com>
    Date:   Fri Sep 29 01:58:42 2006 -0700

    [PATCH] mm: fix a race condition under SMC + COW

> The bogus x86 version of huge_ptep_clear_flush() came from the s390
> guys, so double-shame on IBM! :P
> 
> > commit 8fe627ec5b7c47b1654dff50536d9709863295a3
> > Author: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> > Date:   Mon Apr 28 02:13:28 2008 -0700
> > 
> >     hugetlbfs: add missing TLB flush to hugetlb_cow()
> 
> This is probably an opportunity for all the other architecture
> maintainers to make sure that they have proper copies of
> huge_ptep_clear_flush().
> 
> I went through the hugetlb code on x86 and couldn't find another TLB
> flush that fixes this issue, and I believe this is correct, so:
> 
> Acked-by: Dave Hansen <dave.hansen@intel.com>

Many thanks for confirming, and for your prompt response.

Regards,
Anthony

WARNING: multiple messages have this Message-ID (diff)
From: Anthony Iliopoulos <anthony.iliopoulos@huawei.com>
To: Dave Hansen <dave@sr71.net>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Shay Goikhman <shay.goikhman@huawei.com>,
	Paul Mundt <paul.mundt@huawei.com>,
	Carlos Villavieja <villavieja@hps.utexas.edu>,
	Nacho Navarro <nacho.navarro@bsc.es>,
	Avi Mendelson <avi.mendelson@tce.technion.ac.il>,
	Yoav Etsion <yetsion@tce.technion.ac.il>,
	Gerald Schaefer <gerald.schaefer@de.ibm.com>,
	David Gibson <david@gibson.dropbear.id.au>,
	linux-arch <linux-arch@vger.kernel.org>
Subject: Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow()
Date: Thu, 15 May 2014 19:00:35 +0200	[thread overview]
Message-ID: <20140515170035.GA15779@server-36.huawei.corp> (raw)
In-Reply-To: <5372A067.9010808@sr71.net>

On Tue, May 13, 2014 at 03:44:55PM -0700, Dave Hansen wrote:
> On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote:
> > The invalidation is required in order to maintain proper semantics
> > under CoW conditions. In scenarios where a process clones several
> > threads, a thread operating on a core whose DTLB entry for a
> > particular hugepage has not been invalidated, will be reading from
> > the hugepage that belongs to the forked child process, even after
> > hugetlb_cow().
> > 
> > The thread will not see the updated page as long as the stale DTLB
> > entry remains cached, the thread attempts to write into the page,
> > the child process exits, or the thread gets migrated to a different
> > processor.
> 
> No to be too nitpicky, but this applies to ITLB too, right?

Quite true, this does apply to ITLB too. I suppose there might be cases
like self-modifying code that touches the private text pages, or some
JIT compiler writing on mmap-ed executable regions that would observe
the same behavior under similar conditions.

> I believe this bug came all the way back from:
> 
> > commit 1e8f889b10d8d2223105719e36ce45688fedbd59
> > Author: David Gibson <david@gibson.dropbear.id.au>
> > Date:   Fri Jan 6 00:10:44 2006 -0800
> > 
> >     [PATCH] Hugetlb: Copy on Write support
> 
> It was probably the first time that we ever changed an _existing_
> hugetlbfs pte, and that patch probably just missed the TLB flush because
> none of the other pte-setting hugetlb.c code needed TLB flushes.

This seems to be the case, I assume that our internal use case was
also probably the first that needed to rely on proper semantics under
a multithreaded CoW scenario with hugetlb pages, so this came into
the surface.

I have actually also wondered about another related thing:
even when we actually do invalidate the page, there may still be a
race between a thread on a core that reads the page in some tight
loop (e.g. on a spinlock), and the page fault handler running on
a different core, at the point where the pte is set. Since we
invalidate the page via the TLB shootdowns *before* we update
the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page()
as well as hugetlb_cow()), there may be some tiny window where the
thread might re-read the page before the pte is set.

I have dismissed this case, since I assume that there are many more
cycles spent in servicing the TLB invalidation IPI, walking the pgtable
plus other related overhead (e.g. sched) than in updating the pte/pmd
so I am not sure how possible it would be to hit this condition.

I am also hesitant to simply submit a patch that reverses the order
of flushing and setting the pte, due to the following commit:

    commit 4ce072f1faf29d24df4600f53db8cdd62d400a8f
    Author: Siddha, Suresh B <suresh.b.siddha@intel.com>
    Date:   Fri Sep 29 01:58:42 2006 -0700

    [PATCH] mm: fix a race condition under SMC + COW

> The bogus x86 version of huge_ptep_clear_flush() came from the s390
> guys, so double-shame on IBM! :P
> 
> > commit 8fe627ec5b7c47b1654dff50536d9709863295a3
> > Author: Gerald Schaefer <gerald.schaefer@de.ibm.com>
> > Date:   Mon Apr 28 02:13:28 2008 -0700
> > 
> >     hugetlbfs: add missing TLB flush to hugetlb_cow()
> 
> This is probably an opportunity for all the other architecture
> maintainers to make sure that they have proper copies of
> huge_ptep_clear_flush().
> 
> I went through the hugetlb code on x86 and couldn't find another TLB
> flush that fixes this issue, and I believe this is correct, so:
> 
> Acked-by: Dave Hansen <dave.hansen@intel.com>

Many thanks for confirming, and for your prompt response.

Regards,
Anthony

  reply	other threads:[~2014-05-14 17:09 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-14  9:29 [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() Anthony Iliopoulos
2014-05-13 22:44 ` Dave Hansen
2014-05-15 17:00   ` Anthony Iliopoulos [this message]
2014-05-15 17:00     ` Anthony Iliopoulos
2014-05-14 17:24     ` Dave Hansen
2014-05-15  7:05     ` Oren Twaig
2014-05-17  9:24       ` Anthony Iliopoulos
2014-05-17  9:24         ` Anthony Iliopoulos
2014-05-13 23:36 ` [tip:x86/urgent] x86, mm, hugetlb: Add " tip-bot for Anthony Iliopoulos

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140515170035.GA15779@server-36.huawei.corp \
    --to=anthony.iliopoulos@huawei.com \
    --cc=avi.mendelson@tce.technion.ac.il \
    --cc=dave@sr71.net \
    --cc=david@gibson.dropbear.id.au \
    --cc=gerald.schaefer@de.ibm.com \
    --cc=hpa@zytor.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=nacho.navarro@bsc.es \
    --cc=paul.mundt@huawei.com \
    --cc=shay.goikhman@huawei.com \
    --cc=tglx@linutronix.de \
    --cc=villavieja@hps.utexas.edu \
    --cc=x86@kernel.org \
    --cc=yetsion@tce.technion.ac.il \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.