All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Filippov <jcmvbkbc@gmail.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: linux-arch@vger.kernel.org, linux-mm@kvack.org,
	linux-xtensa@linux-xtensa.org, Chris Zankel <chris@zankel.net>,
	Marc Gauthier <Marc.Gauthier@tensilica.com>
Subject: Re: TLB and PTE coherency during munmap
Date: Wed, 29 May 2013 07:23:52 +0400	[thread overview]
Message-ID: <CAMo8BfLNt07PM87eV-xT+VnLVvmxrryWw4QBX6G4p-gy1Wb70w@mail.gmail.com> (raw)
In-Reply-To: <20130528143459.GN724@phenom.dumpdata.com>

On Tue, May 28, 2013 at 6:34 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Sun, May 26, 2013 at 06:50:46AM +0400, Max Filippov wrote:
>> Hello arch and mm people.
>>
>> Is it intentional that threads of a process that invoked munmap syscall
>> can see TLB entries pointing to already freed pages, or it is a bug?
>>
>> I'm talking about zap_pmd_range and zap_pte_range:
>>
>>       zap_pmd_range
>>         zap_pte_range
>>           arch_enter_lazy_mmu_mode
>>             ptep_get_and_clear_full
>>             tlb_remove_tlb_entry
>>             __tlb_remove_page
>>           arch_leave_lazy_mmu_mode
>>         cond_resched
>>
>> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>> PTEs and frees corresponding pages, but doesn't flush TLB, and
>> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>> of the same process gets scheduled then it is able to see TLB entries
>> pointing to already freed physical pages.
>
> The idea behind the lazy MMU subsystem is that it does not need to flush
> the TLB all the time and allow one to do PTE manipulations in a "batch mode".
> Meaning there are stray entries - and one has to be diligient about not using them.

Yes, I got it, IOW TLB entries must either be flushed before userspace can
see them, or the underlying pages must not be freed.

> Here is the relvant comment from the Linux header:
>
> /*
>  * A facility to provide lazy MMU batching.  This allows PTE updates and
>  * page invalidations to be delayed until a call to leave lazy MMU mode
>  * is issued.  Some architectures may benefit from doing this, and it is
>  * beneficial for both shadow and direct mode hypervisors, which may batch
>  * the PTE updates which happen during this window.  Note that using this
>  * interface requires that read hazards be removed from the code.  A read
>  * hazard could result in the direct mode hypervisor case, since the actual
>  * write to the page tables may not yet have taken place, so reads though
>  * a raw PTE pointer after it has been modified are not guaranteed to be
>  * up to date.  This mode can only be entered and left under the protection of
>  * the page table locks for all page tables which may be modified.  In the UP
>  * case, this is required so that preemption is disabled, and in the SMP case,
>  * it must synchronize the delayed page table writes properly on other CPUs.
>  */
>
> This means that eventually when arch_leave_lazy_mmu_mode or
> arch_flush_lazy_mmu_mode is called, the PTE updates _should_ be flushed
> (aka, TLB flush if needed on the altered PTE entries).

Should (: But I only see powerpc, sparc and x86 defining
__HAVE_ARCH_ENTER_LAZY_MMU_MODE, so this does not apply to all
remaining arches.

-- 
Thanks.
-- Max

WARNING: multiple messages have this Message-ID (diff)
From: Max Filippov <jcmvbkbc@gmail.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: linux-arch@vger.kernel.org, linux-mm@kvack.org,
	linux-xtensa@linux-xtensa.org, Chris Zankel <chris@zankel.net>,
	Marc Gauthier <Marc.Gauthier@tensilica.com>
Subject: Re: TLB and PTE coherency during munmap
Date: Wed, 29 May 2013 07:23:52 +0400	[thread overview]
Message-ID: <CAMo8BfLNt07PM87eV-xT+VnLVvmxrryWw4QBX6G4p-gy1Wb70w@mail.gmail.com> (raw)
In-Reply-To: <20130528143459.GN724@phenom.dumpdata.com>

On Tue, May 28, 2013 at 6:34 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Sun, May 26, 2013 at 06:50:46AM +0400, Max Filippov wrote:
>> Hello arch and mm people.
>>
>> Is it intentional that threads of a process that invoked munmap syscall
>> can see TLB entries pointing to already freed pages, or it is a bug?
>>
>> I'm talking about zap_pmd_range and zap_pte_range:
>>
>>       zap_pmd_range
>>         zap_pte_range
>>           arch_enter_lazy_mmu_mode
>>             ptep_get_and_clear_full
>>             tlb_remove_tlb_entry
>>             __tlb_remove_page
>>           arch_leave_lazy_mmu_mode
>>         cond_resched
>>
>> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>> PTEs and frees corresponding pages, but doesn't flush TLB, and
>> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>> of the same process gets scheduled then it is able to see TLB entries
>> pointing to already freed physical pages.
>
> The idea behind the lazy MMU subsystem is that it does not need to flush
> the TLB all the time and allow one to do PTE manipulations in a "batch mode".
> Meaning there are stray entries - and one has to be diligient about not using them.

Yes, I got it, IOW TLB entries must either be flushed before userspace can
see them, or the underlying pages must not be freed.

> Here is the relvant comment from the Linux header:
>
> /*
>  * A facility to provide lazy MMU batching.  This allows PTE updates and
>  * page invalidations to be delayed until a call to leave lazy MMU mode
>  * is issued.  Some architectures may benefit from doing this, and it is
>  * beneficial for both shadow and direct mode hypervisors, which may batch
>  * the PTE updates which happen during this window.  Note that using this
>  * interface requires that read hazards be removed from the code.  A read
>  * hazard could result in the direct mode hypervisor case, since the actual
>  * write to the page tables may not yet have taken place, so reads though
>  * a raw PTE pointer after it has been modified are not guaranteed to be
>  * up to date.  This mode can only be entered and left under the protection of
>  * the page table locks for all page tables which may be modified.  In the UP
>  * case, this is required so that preemption is disabled, and in the SMP case,
>  * it must synchronize the delayed page table writes properly on other CPUs.
>  */
>
> This means that eventually when arch_leave_lazy_mmu_mode or
> arch_flush_lazy_mmu_mode is called, the PTE updates _should_ be flushed
> (aka, TLB flush if needed on the altered PTE entries).

Should (: But I only see powerpc, sparc and x86 defining
__HAVE_ARCH_ENTER_LAZY_MMU_MODE, so this does not apply to all
remaining arches.

-- 
Thanks.
-- Max

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2013-05-29  3:23 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-26  2:42 TLB and PTE coherency during munmap Max Filippov
2013-05-26  2:50 ` Max Filippov
2013-05-26  2:50   ` Max Filippov
2013-05-28  7:10   ` Max Filippov
2013-05-28  7:10     ` Max Filippov
2013-05-29 12:27     ` Peter Zijlstra
2013-05-29 12:27       ` Peter Zijlstra
2013-05-29 12:42       ` Vineet Gupta
2013-05-29 12:42         ` Vineet Gupta
2013-05-29 12:47         ` Peter Zijlstra
2013-05-29 12:47           ` Peter Zijlstra
2013-05-29 17:51         ` Peter Zijlstra
2013-05-29 17:51           ` Peter Zijlstra
2013-05-29 22:04           ` Catalin Marinas
2013-05-29 22:04             ` Catalin Marinas
2013-05-30  6:48             ` Peter Zijlstra
2013-05-30  6:48               ` Peter Zijlstra
2013-05-30  5:04           ` Vineet Gupta
2013-05-30  5:04             ` Vineet Gupta
2013-05-30  6:56             ` Peter Zijlstra
2013-05-30  6:56               ` Peter Zijlstra
2013-05-30  7:00               ` Vineet Gupta
2013-05-30  7:00                 ` Vineet Gupta
2013-05-30 11:03                 ` Peter Zijlstra
2013-05-30 11:03                   ` Peter Zijlstra
2013-05-31  4:09           ` Max Filippov
2013-05-31  4:09             ` Max Filippov
2013-05-31  7:55             ` Peter Zijlstra
2013-05-31  7:55               ` Peter Zijlstra
2013-06-03  9:05             ` Peter Zijlstra
2013-06-03  9:05               ` Peter Zijlstra
2013-06-03  9:16               ` Ingo Molnar
2013-06-03  9:16                 ` Ingo Molnar
2013-06-03 10:01                 ` Catalin Marinas
2013-06-03 10:01                   ` Catalin Marinas
2013-06-03 10:04                   ` Peter Zijlstra
2013-06-03 10:04                     ` Peter Zijlstra
2013-06-03 10:09                     ` Catalin Marinas
2013-06-03 10:09                       ` Catalin Marinas
2013-06-04  9:52               ` Peter Zijlstra
2013-06-04  9:52                 ` Peter Zijlstra
2013-06-05  0:05                 ` Linus Torvalds
2013-06-05  0:05                   ` Linus Torvalds
2013-06-05 10:26                   ` [PATCH] arch, mm: Remove tlb_fast_mode() Peter Zijlstra
2013-06-05 10:26                     ` Peter Zijlstra
2013-05-31  1:40       ` TLB and PTE coherency during munmap Max Filippov
2013-05-31  1:40         ` Max Filippov
2013-05-28 14:34   ` Konrad Rzeszutek Wilk
2013-05-28 14:34     ` Konrad Rzeszutek Wilk
2013-05-29  3:23     ` Max Filippov [this message]
2013-05-29  3:23       ` Max Filippov
2013-05-28 15:16   ` Michal Hocko
2013-05-28 15:16     ` Michal Hocko
2013-05-28 15:23     ` Catalin Marinas
2013-05-28 15:23       ` Catalin Marinas
2013-05-28 14:35 ` Catalin Marinas
2013-05-29  4:15   ` Max Filippov
2013-05-29  4:15     ` Max Filippov
2013-05-29 10:15     ` Catalin Marinas
2013-05-29 10:15       ` Catalin Marinas
2013-05-31  1:26       ` Max Filippov
2013-05-31  1:26         ` Max Filippov
2013-05-31  9:06         ` Catalin Marinas
2013-05-31  9:06           ` Catalin Marinas
2013-06-03  9:16         ` Max Filippov
2013-06-03  9:16           ` Max Filippov
2013-05-29 11:53   ` Vineet Gupta
2013-05-29 12:00   ` Vineet Gupta
2013-05-29 12:00     ` Vineet Gupta
2013-06-07  2:21 George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMo8BfLNt07PM87eV-xT+VnLVvmxrryWw4QBX6G4p-gy1Wb70w@mail.gmail.com \
    --to=jcmvbkbc@gmail.com \
    --cc=Marc.Gauthier@tensilica.com \
    --cc=chris@zankel.net \
    --cc=konrad.wilk@oracle.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xtensa@linux-xtensa.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.