* [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() @ 2014-05-14 9:29 Anthony Iliopoulos 2014-05-13 22:44 ` Dave Hansen 2014-05-13 23:36 ` [tip:x86/urgent] x86, mm, hugetlb: Add " tip-bot for Anthony Iliopoulos 0 siblings, 2 replies; 9+ messages in thread From: Anthony Iliopoulos @ 2014-05-14 9:29 UTC (permalink / raw) To: Thomas Gleixner, Ingo Molnar, H. Peter Anvin Cc: x86, linux-kernel, Kirill A. Shutemov, Dave Hansen, Shay Goikhman, Paul Mundt, Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion The invalidation is required in order to maintain proper semantics under CoW conditions. In scenarios where a process clones several threads, a thread operating on a core whose DTLB entry for a particular hugepage has not been invalidated, will be reading from the hugepage that belongs to the forked child process, even after hugetlb_cow(). The thread will not see the updated page as long as the stale DTLB entry remains cached, the thread attempts to write into the page, the child process exits, or the thread gets migrated to a different processor. Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com> Suggested-by: Shay Goikhman <shay.goikhman@huawei.com> --- arch/x86/include/asm/hugetlb.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h index a809121..68c0539 100644 --- a/arch/x86/include/asm/hugetlb.h +++ b/arch/x86/include/asm/hugetlb.h @@ -52,6 +52,7 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { + ptep_clear_flush(vma, addr, ptep); } static inline int huge_pte_none(pte_t pte) -- 1.8.1.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() 2014-05-14 9:29 [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() Anthony Iliopoulos @ 2014-05-13 22:44 ` Dave Hansen 2014-05-15 17:00 ` Anthony Iliopoulos 2014-05-13 23:36 ` [tip:x86/urgent] x86, mm, hugetlb: Add " tip-bot for Anthony Iliopoulos 1 sibling, 1 reply; 9+ messages in thread From: Dave Hansen @ 2014-05-13 22:44 UTC (permalink / raw) To: Anthony Iliopoulos, Thomas Gleixner, Ingo Molnar, H. Peter Anvin Cc: x86, linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer, David Gibson, linux-arch On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote: > The invalidation is required in order to maintain proper semantics > under CoW conditions. In scenarios where a process clones several > threads, a thread operating on a core whose DTLB entry for a > particular hugepage has not been invalidated, will be reading from > the hugepage that belongs to the forked child process, even after > hugetlb_cow(). > > The thread will not see the updated page as long as the stale DTLB > entry remains cached, the thread attempts to write into the page, > the child process exits, or the thread gets migrated to a different > processor. No to be too nitpicky, but this applies to ITLB too, right? I believe this bug came all the way back from: > commit 1e8f889b10d8d2223105719e36ce45688fedbd59 > Author: David Gibson <david@gibson.dropbear.id.au> > Date: Fri Jan 6 00:10:44 2006 -0800 > > [PATCH] Hugetlb: Copy on Write support It was probably the first time that we ever changed an _existing_ hugetlbfs pte, and that patch probably just missed the TLB flush because none of the other pte-setting hugetlb.c code needed TLB flushes. The bogus x86 version of huge_ptep_clear_flush() came from the s390 guys, so double-shame on IBM! :P > commit 8fe627ec5b7c47b1654dff50536d9709863295a3 > Author: Gerald Schaefer <gerald.schaefer@de.ibm.com> > Date: Mon Apr 28 02:13:28 2008 -0700 > > hugetlbfs: add missing TLB flush to hugetlb_cow() This is probably an opportunity for all the other architecture maintainers to make sure that they have proper copies of huge_ptep_clear_flush(). I went through the hugetlb code on x86 and couldn't find another TLB flush that fixes this issue, and I believe this is correct, so: Acked-by: Dave Hansen <dave.hansen@intel.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() 2014-05-13 22:44 ` Dave Hansen @ 2014-05-15 17:00 ` Anthony Iliopoulos 0 siblings, 0 replies; 9+ messages in thread From: Anthony Iliopoulos @ 2014-05-15 17:00 UTC (permalink / raw) To: Dave Hansen Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer, David Gibson, linux-arch On Tue, May 13, 2014 at 03:44:55PM -0700, Dave Hansen wrote: > On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote: > > The invalidation is required in order to maintain proper semantics > > under CoW conditions. In scenarios where a process clones several > > threads, a thread operating on a core whose DTLB entry for a > > particular hugepage has not been invalidated, will be reading from > > the hugepage that belongs to the forked child process, even after > > hugetlb_cow(). > > > > The thread will not see the updated page as long as the stale DTLB > > entry remains cached, the thread attempts to write into the page, > > the child process exits, or the thread gets migrated to a different > > processor. > > No to be too nitpicky, but this applies to ITLB too, right? Quite true, this does apply to ITLB too. I suppose there might be cases like self-modifying code that touches the private text pages, or some JIT compiler writing on mmap-ed executable regions that would observe the same behavior under similar conditions. > I believe this bug came all the way back from: > > > commit 1e8f889b10d8d2223105719e36ce45688fedbd59 > > Author: David Gibson <david@gibson.dropbear.id.au> > > Date: Fri Jan 6 00:10:44 2006 -0800 > > > > [PATCH] Hugetlb: Copy on Write support > > It was probably the first time that we ever changed an _existing_ > hugetlbfs pte, and that patch probably just missed the TLB flush because > none of the other pte-setting hugetlb.c code needed TLB flushes. This seems to be the case, I assume that our internal use case was also probably the first that needed to rely on proper semantics under a multithreaded CoW scenario with hugetlb pages, so this came into the surface. I have actually also wondered about another related thing: even when we actually do invalidate the page, there may still be a race between a thread on a core that reads the page in some tight loop (e.g. on a spinlock), and the page fault handler running on a different core, at the point where the pte is set. Since we invalidate the page via the TLB shootdowns *before* we update the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page() as well as hugetlb_cow()), there may be some tiny window where the thread might re-read the page before the pte is set. I have dismissed this case, since I assume that there are many more cycles spent in servicing the TLB invalidation IPI, walking the pgtable plus other related overhead (e.g. sched) than in updating the pte/pmd so I am not sure how possible it would be to hit this condition. I am also hesitant to simply submit a patch that reverses the order of flushing and setting the pte, due to the following commit: commit 4ce072f1faf29d24df4600f53db8cdd62d400a8f Author: Siddha, Suresh B <suresh.b.siddha@intel.com> Date: Fri Sep 29 01:58:42 2006 -0700 [PATCH] mm: fix a race condition under SMC + COW > The bogus x86 version of huge_ptep_clear_flush() came from the s390 > guys, so double-shame on IBM! :P > > > commit 8fe627ec5b7c47b1654dff50536d9709863295a3 > > Author: Gerald Schaefer <gerald.schaefer@de.ibm.com> > > Date: Mon Apr 28 02:13:28 2008 -0700 > > > > hugetlbfs: add missing TLB flush to hugetlb_cow() > > This is probably an opportunity for all the other architecture > maintainers to make sure that they have proper copies of > huge_ptep_clear_flush(). > > I went through the hugetlb code on x86 and couldn't find another TLB > flush that fixes this issue, and I believe this is correct, so: > > Acked-by: Dave Hansen <dave.hansen@intel.com> Many thanks for confirming, and for your prompt response. Regards, Anthony ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() @ 2014-05-15 17:00 ` Anthony Iliopoulos 0 siblings, 0 replies; 9+ messages in thread From: Anthony Iliopoulos @ 2014-05-15 17:00 UTC (permalink / raw) To: Dave Hansen Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer, David Gibson, linux-arch On Tue, May 13, 2014 at 03:44:55PM -0700, Dave Hansen wrote: > On 05/14/2014 02:29 AM, Anthony Iliopoulos wrote: > > The invalidation is required in order to maintain proper semantics > > under CoW conditions. In scenarios where a process clones several > > threads, a thread operating on a core whose DTLB entry for a > > particular hugepage has not been invalidated, will be reading from > > the hugepage that belongs to the forked child process, even after > > hugetlb_cow(). > > > > The thread will not see the updated page as long as the stale DTLB > > entry remains cached, the thread attempts to write into the page, > > the child process exits, or the thread gets migrated to a different > > processor. > > No to be too nitpicky, but this applies to ITLB too, right? Quite true, this does apply to ITLB too. I suppose there might be cases like self-modifying code that touches the private text pages, or some JIT compiler writing on mmap-ed executable regions that would observe the same behavior under similar conditions. > I believe this bug came all the way back from: > > > commit 1e8f889b10d8d2223105719e36ce45688fedbd59 > > Author: David Gibson <david@gibson.dropbear.id.au> > > Date: Fri Jan 6 00:10:44 2006 -0800 > > > > [PATCH] Hugetlb: Copy on Write support > > It was probably the first time that we ever changed an _existing_ > hugetlbfs pte, and that patch probably just missed the TLB flush because > none of the other pte-setting hugetlb.c code needed TLB flushes. This seems to be the case, I assume that our internal use case was also probably the first that needed to rely on proper semantics under a multithreaded CoW scenario with hugetlb pages, so this came into the surface. I have actually also wondered about another related thing: even when we actually do invalidate the page, there may still be a race between a thread on a core that reads the page in some tight loop (e.g. on a spinlock), and the page fault handler running on a different core, at the point where the pte is set. Since we invalidate the page via the TLB shootdowns *before* we update the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page() as well as hugetlb_cow()), there may be some tiny window where the thread might re-read the page before the pte is set. I have dismissed this case, since I assume that there are many more cycles spent in servicing the TLB invalidation IPI, walking the pgtable plus other related overhead (e.g. sched) than in updating the pte/pmd so I am not sure how possible it would be to hit this condition. I am also hesitant to simply submit a patch that reverses the order of flushing and setting the pte, due to the following commit: commit 4ce072f1faf29d24df4600f53db8cdd62d400a8f Author: Siddha, Suresh B <suresh.b.siddha@intel.com> Date: Fri Sep 29 01:58:42 2006 -0700 [PATCH] mm: fix a race condition under SMC + COW > The bogus x86 version of huge_ptep_clear_flush() came from the s390 > guys, so double-shame on IBM! :P > > > commit 8fe627ec5b7c47b1654dff50536d9709863295a3 > > Author: Gerald Schaefer <gerald.schaefer@de.ibm.com> > > Date: Mon Apr 28 02:13:28 2008 -0700 > > > > hugetlbfs: add missing TLB flush to hugetlb_cow() > > This is probably an opportunity for all the other architecture > maintainers to make sure that they have proper copies of > huge_ptep_clear_flush(). > > I went through the hugetlb code on x86 and couldn't find another TLB > flush that fixes this issue, and I believe this is correct, so: > > Acked-by: Dave Hansen <dave.hansen@intel.com> Many thanks for confirming, and for your prompt response. Regards, Anthony ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() 2014-05-15 17:00 ` Anthony Iliopoulos (?) @ 2014-05-14 17:24 ` Dave Hansen -1 siblings, 0 replies; 9+ messages in thread From: Dave Hansen @ 2014-05-14 17:24 UTC (permalink / raw) To: Anthony Iliopoulos Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer, David Gibson, linux-arch On 05/15/2014 10:00 AM, Anthony Iliopoulos wrote: > I have actually also wondered about another related thing: > even when we actually do invalidate the page, there may still be a > race between a thread on a core that reads the page in some tight > loop (e.g. on a spinlock), and the page fault handler running on > a different core, at the point where the pte is set. Since we > invalidate the page via the TLB shootdowns *before* we update > the pte (this is true for all do_wp_page(), do_huge_pmd_wp_page() > as well as hugetlb_cow()), there may be some tiny window where the > thread might re-read the page before the pte is set. Don't forget about the "clear" part. ptep_clear_flush() does: pte = ptep_get_and_clear(mm, address, ptep); if (pte_accessible(mm, pte)) flush_tlb_page(vma, address); so it makes the pte !present and guarantees that any other CPUs looking at it after the flush but before the set_pte() will also end up in the page fault handler, and they'll wait until the first fault has finished with the page tables. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() 2014-05-15 17:00 ` Anthony Iliopoulos (?) (?) @ 2014-05-15 7:05 ` Oren Twaig 2014-05-17 9:24 ` Anthony Iliopoulos -1 siblings, 1 reply; 9+ messages in thread From: Oren Twaig @ 2014-05-15 7:05 UTC (permalink / raw) To: Anthony Iliopoulos, Dave Hansen Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer, David Gibson, linux-arch On 05/15/2014 08:00 PM, Anthony Iliopoulos wrote: > I have dismissed this case, since I assume that there are many more > cycles spent in servicing the TLB invalidation IPI, walking the pgtable > plus other related overhead (e.g. sched) than in updating the pte/pmd > so I am not sure how possible it would be to hit this condition. Hi Anthony, I have a question about the above statement. What will happen with multi cpu VMs ? won't the race described above can happen ? I.e one virtual CPU can will visit the host and the other will continue to encounter your race ? Thanks, Oren. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() 2014-05-15 7:05 ` Oren Twaig @ 2014-05-17 9:24 ` Anthony Iliopoulos 0 siblings, 0 replies; 9+ messages in thread From: Anthony Iliopoulos @ 2014-05-17 9:24 UTC (permalink / raw) To: Oren Twaig Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer, David Gibson, linux-arch Hi Oren, On Thu, May 15, 2014 at 10:05:05AM +0300, Oren Twaig wrote: > > On 05/15/2014 08:00 PM, Anthony Iliopoulos wrote: > > I have dismissed this case, since I assume that there are many more > > cycles spent in servicing the TLB invalidation IPI, walking the pgtable > > plus other related overhead (e.g. sched) than in updating the pte/pmd > > so I am not sure how possible it would be to hit this condition. > > Hi Anthony, > > I have a question about the above statement. What will happen with multi > cpu VMs ? won't the race described above can happen ? I.e one virtual CPU > can will visit the host and the other will continue to encounter your race ? I don't think there will be any race for the vcpu cases, since the sptes will be cleared via the mmu_notifier, so this should take care of it in the same manner as ptep_get_and_clear() does, as described by Dave earlier in this thread. Regards, Anthony ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() @ 2014-05-17 9:24 ` Anthony Iliopoulos 0 siblings, 0 replies; 9+ messages in thread From: Anthony Iliopoulos @ 2014-05-17 9:24 UTC (permalink / raw) To: Oren Twaig Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, x86, linux-kernel, Kirill A. Shutemov, Shay Goikhman, Paul Mundt, Carlos Villavieja, Nacho Navarro, Avi Mendelson, Yoav Etsion, Gerald Schaefer, David Gibson, linux-arch Hi Oren, On Thu, May 15, 2014 at 10:05:05AM +0300, Oren Twaig wrote: > > On 05/15/2014 08:00 PM, Anthony Iliopoulos wrote: > > I have dismissed this case, since I assume that there are many more > > cycles spent in servicing the TLB invalidation IPI, walking the pgtable > > plus other related overhead (e.g. sched) than in updating the pte/pmd > > so I am not sure how possible it would be to hit this condition. > > Hi Anthony, > > I have a question about the above statement. What will happen with multi > cpu VMs ? won't the race described above can happen ? I.e one virtual CPU > can will visit the host and the other will continue to encounter your race ? I don't think there will be any race for the vcpu cases, since the sptes will be cleared via the mmu_notifier, so this should take care of it in the same manner as ptep_get_and_clear() does, as described by Dave earlier in this thread. Regards, Anthony ^ permalink raw reply [flat|nested] 9+ messages in thread
* [tip:x86/urgent] x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow() 2014-05-14 9:29 [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() Anthony Iliopoulos 2014-05-13 22:44 ` Dave Hansen @ 2014-05-13 23:36 ` tip-bot for Anthony Iliopoulos 1 sibling, 0 replies; 9+ messages in thread From: tip-bot for Anthony Iliopoulos @ 2014-05-13 23:36 UTC (permalink / raw) To: linux-tip-commits Cc: linux-kernel, hpa, mingo, dave.hansen, shay.goikhman, tglx, anthony.iliopoulos, hpa Commit-ID: 9844f5462392b53824e8b86726e7c33b5ecbb676 Gitweb: http://git.kernel.org/tip/9844f5462392b53824e8b86726e7c33b5ecbb676 Author: Anthony Iliopoulos <anthony.iliopoulos@huawei.com> AuthorDate: Wed, 14 May 2014 11:29:48 +0200 Committer: H. Peter Anvin <hpa@linux.intel.com> CommitDate: Tue, 13 May 2014 16:34:09 -0700 x86, mm, hugetlb: Add missing TLB page invalidation for hugetlb_cow() The invalidation is required in order to maintain proper semantics under CoW conditions. In scenarios where a process clones several threads, a thread operating on a core whose DTLB entry for a particular hugepage has not been invalidated, will be reading from the hugepage that belongs to the forked child process, even after hugetlb_cow(). The thread will not see the updated page as long as the stale DTLB entry remains cached, the thread attempts to write into the page, the child process exits, or the thread gets migrated to a different processor. Signed-off-by: Anthony Iliopoulos <anthony.iliopoulos@huawei.com> Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp Suggested-by: Shay Goikhman <shay.goikhman@huawei.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v2.6.16+ (!) --- arch/x86/include/asm/hugetlb.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h index a809121..68c0539 100644 --- a/arch/x86/include/asm/hugetlb.h +++ b/arch/x86/include/asm/hugetlb.h @@ -52,6 +52,7 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { + ptep_clear_flush(vma, addr, ptep); } static inline int huge_pte_none(pte_t pte) ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-05-17 9:25 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-05-14 9:29 [PATCH] x86, hugetlb: add missing TLB page invalidation for hugetlb_cow() Anthony Iliopoulos 2014-05-13 22:44 ` Dave Hansen 2014-05-15 17:00 ` Anthony Iliopoulos 2014-05-15 17:00 ` Anthony Iliopoulos 2014-05-14 17:24 ` Dave Hansen 2014-05-15 7:05 ` Oren Twaig 2014-05-17 9:24 ` Anthony Iliopoulos 2014-05-17 9:24 ` Anthony Iliopoulos 2014-05-13 23:36 ` [tip:x86/urgent] x86, mm, hugetlb: Add " tip-bot for Anthony Iliopoulos
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.