From: Anthony Iliopoulos <anthony.iliopoulos@huawei.com> To: Dave Hansen <dave@sr71.net> Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>, <x86@kernel.org>, <linux-kernel@vger.kernel.org>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Shay Goikhman <shay.goikhman@huawei.com>, Paul Mundt <paul.mundt@huawei.com>, Carlos Villavieja <villavieja@hps.utexas.edu>, Nacho Navarro <nacho.navarro@bsc.es>, "Avi Mendelson" <avi.mendelson@tce.technion.ac.il>, Yoav Etsion <yetsion@tce.technion.ac.il>, Gerald Schaefer <gerald.schaefer@de.ibm.com>, David Gibson <david@gibson.dropbear.id.au>, linux-arch <linux-arch@vger.kernel.org> Subject: Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() Date: Thu, 15 May 2014 19:00:35 +0200 [thread overview] Message-ID: <20140515170035.GA15779@server-36.huawei.corp> (raw) In-Reply-To: <5372A067.9010808@sr71.net> On Tue, May 13, 2014 at 03:44:55PM -0700, Dave Hansen wrote: > On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote: > > The invalidation is required in order to maintain proper semantics > > under CoW conditions. In scenarios where a process clones several > > threads, a thread operating on a core whose DTLB entry for a > > particular hugepage has not been invalidated, will be reading from > > the hugepage that belongs to the forked child process, even after > > hugetlb_cow(). > > > > The thread will not see the updated page as long as the stale DTLB > > entry remains cached, the thread attempts to write into the page, > > the child process exits, or the thread gets migrated to a different > > processor. > > No to be too nitpicky, but this applies to ITLB too, right? Quite true, this does apply to ITLB too. I suppose there might be cases like self-modifying code that touches the private text pages, or some JIT compiler writing on mmap-ed executable regions that would observe the same behavior under similar conditions. > I believe this bug came all the way back from: > > > commit 1e8f889b10d8d2223105719e36ce45688fedbd59 > > Author: David Gibson <david@gibson.dropbear.id.au> > > Date: Fri Jan 6 00:10:44 2006 -0800 > > > > [PATCH] Hugetlb: Copy on Write support > > It was probably the first time that we ever changed an _existing_ > hugetlbfs pte, and that patch probably just missed the TLB flush because > none of the other pte-setting hugetlb.c code needed TLB flushes. This seems to be the case, I assume that our internal use case was also probably the first that needed to rely on proper semantics under a multithreaded CoW scenario with hugetlb pages, so this came into the surface. I have actually also wondered about another related thing: even when we actually do invalidate the page, there may still be a race between a thread on a core that reads the page in some tight loop (e.g. on a spinlock), and the page fault handler running on a different core, at the point where the pte is set. Since we invalidate the page via the TLB shootdowns *before* we update the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page() as well as hugetlb_cow()), there may be some tiny window where the thread might re-read the page before the pte is set. I have dismissed this case, since I assume that there are many more cycles spent in servicing the TLB invalidation IPI, walking the pgtable plus other related overhead (e.g. sched) than in updating the pte/pmd so I am not sure how possible it would be to hit this condition. I am also hesitant to simply submit a patch that reverses the order of flushing and setting the pte, due to the following commit: commit 4ce072f1faf29d24df4600f53db8cdd62d400a8f Author: Siddha, Suresh B <suresh.b.siddha@intel.com> Date: Fri Sep 29 01:58:42 2006 -0700 [PATCH] mm: fix a race condition under SMC + COW > The bogus x86 version of huge_ptep_clear_flush() came from the s390 > guys, so double-shame on IBM! :P > > > commit 8fe627ec5b7c47b1654dff50536d9709863295a3 > > Author: Gerald Schaefer <gerald.schaefer@de.ibm.com> > > Date: Mon Apr 28 02:13:28 2008 -0700 > > > > hugetlbfs: add missing TLB flush to hugetlb_cow() > > This is probably an opportunity for all the other architecture > maintainers to make sure that they have proper copies of > huge_ptep_clear_flush(). > > I went through the hugetlb code on x86 and couldn't find another TLB > flush that fixes this issue, and I believe this is correct, so: > > Acked-by: Dave Hansen <dave.hansen@intel.com> Many thanks for confirming, and for your prompt response. Regards, Anthony
WARNING: multiple messages have this Message-ID (diff)
From: Anthony Iliopoulos <anthony.iliopoulos@huawei.com> To: Dave Hansen <dave@sr71.net> Cc: Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, "H. Peter Anvin" <hpa@zytor.com>, x86@kernel.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Shay Goikhman <shay.goikhman@huawei.com>, Paul Mundt <paul.mundt@huawei.com>, Carlos Villavieja <villavieja@hps.utexas.edu>, Nacho Navarro <nacho.navarro@bsc.es>, Avi Mendelson <avi.mendelson@tce.technion.ac.il>, Yoav Etsion <yetsion@tce.technion.ac.il>, Gerald Schaefer <gerald.schaefer@de.ibm.com>, David Gibson <david@gibson.dropbear.id.au>, linux-arch <linux-arch@vger.kernel.org> Subject: Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() Date: Thu, 15 May 2014 19:00:35 +0200 [thread overview] Message-ID: <20140515170035.GA15779@server-36.huawei.corp> (raw) In-Reply-To: <5372A067.9010808@sr71.net> On Tue, May 13, 2014 at 03:44:55PM -0700, Dave Hansen wrote: > On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote: > > The invalidation is required in order to maintain proper semantics > > under CoW conditions. In scenarios where a process clones several > > threads, a thread operating on a core whose DTLB entry for a > > particular hugepage has not been invalidated, will be reading from > > the hugepage that belongs to the forked child process, even after > > hugetlb_cow(). > > > > The thread will not see the updated page as long as the stale DTLB > > entry remains cached, the thread attempts to write into the page, > > the child process exits, or the thread gets migrated to a different > > processor. > > No to be too nitpicky, but this applies to ITLB too, right? Quite true, this does apply to ITLB too. I suppose there might be cases like self-modifying code that touches the private text pages, or some JIT compiler writing on mmap-ed executable regions that would observe the same behavior under similar conditions. > I believe this bug came all the way back from: > > > commit 1e8f889b10d8d2223105719e36ce45688fedbd59 > > Author: David Gibson <david@gibson.dropbear.id.au> > > Date: Fri Jan 6 00:10:44 2006 -0800 > > > > [PATCH] Hugetlb: Copy on Write support > > It was probably the first time that we ever changed an _existing_ > hugetlbfs pte, and that patch probably just missed the TLB flush because > none of the other pte-setting hugetlb.c code needed TLB flushes. This seems to be the case, I assume that our internal use case was also probably the first that needed to rely on proper semantics under a multithreaded CoW scenario with hugetlb pages, so this came into the surface. I have actually also wondered about another related thing: even when we actually do invalidate the page, there may still be a race between a thread on a core that reads the page in some tight loop (e.g. on a spinlock), and the page fault handler running on a different core, at the point where the pte is set. Since we invalidate the page via the TLB shootdowns *before* we update the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page() as well as hugetlb_cow()), there may be some tiny window where the thread might re-read the page before the pte is set. I have dismissed this case, since I assume that there are many more cycles spent in servicing the TLB invalidation IPI, walking the pgtable plus other related overhead (e.g. sched) than in updating the pte/pmd so I am not sure how possible it would be to hit this condition. I am also hesitant to simply submit a patch that reverses the order of flushing and setting the pte, due to the following commit: commit 4ce072f1faf29d24df4600f53db8cdd62d400a8f Author: Siddha, Suresh B <suresh.b.siddha@intel.com> Date: Fri Sep 29 01:58:42 2006 -0700 [PATCH] mm: fix a race condition under SMC + COW > The bogus x86 version of huge_ptep_clear_flush() came from the s390 > guys, so double-shame on IBM! :P > > > commit 8fe627ec5b7c47b1654dff50536d9709863295a3 > > Author: Gerald Schaefer <gerald.schaefer@de.ibm.com> > > Date: Mon Apr 28 02:13:28 2008 -0700 > > > > hugetlbfs: add missing TLB flush to hugetlb_cow() > > This is probably an opportunity for all the other architecture > maintainers to make sure that they have proper copies of > huge_ptep_clear_flush(). > > I went through the hugetlb code on x86 and couldn't find another TLB > flush that fixes this issue, and I believe this is correct, so: > > Acked-by: Dave Hansen <dave.hansen@intel.com> Many thanks for confirming, and for your prompt response. Regards, Anthony
next prev parent reply other threads:[~2014-05-14 17:09 UTC|newest] Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top 2014-05-14 9:29 [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() Anthony Iliopoulos 2014-05-13 22:44 ` Dave Hansen 2014-05-15 17:00 ` Anthony Iliopoulos [this message] 2014-05-15 17:00 ` Anthony Iliopoulos 2014-05-14 17:24 ` Dave Hansen 2014-05-15 7:05 ` Oren Twaig 2014-05-17 9:24 ` Anthony Iliopoulos 2014-05-17 9:24 ` Anthony Iliopoulos 2014-05-13 23:36 ` [tip:x86/urgent] x86, mm, hugetlb: Add " tip-bot for Anthony Iliopoulos
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20140515170035.GA15779@server-36.huawei.corp \ --to=anthony.iliopoulos@huawei.com \ --cc=avi.mendelson@tce.technion.ac.il \ --cc=dave@sr71.net \ --cc=david@gibson.dropbear.id.au \ --cc=gerald.schaefer@de.ibm.com \ --cc=hpa@zytor.com \ --cc=kirill.shutemov@linux.intel.com \ --cc=linux-arch@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=mingo@redhat.com \ --cc=nacho.navarro@bsc.es \ --cc=paul.mundt@huawei.com \ --cc=shay.goikhman@huawei.com \ --cc=tglx@linutronix.de \ --cc=villavieja@hps.utexas.edu \ --cc=x86@kernel.org \ --cc=yetsion@tce.technion.ac.il \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.