From mboxrd@z Thu Jan 1 00:00:00 1970 From: Max Filippov Subject: Re: TLB and PTE coherency during munmap Date: Wed, 29 May 2013 08:15:28 +0400 Message-ID: <51A580E0.10300@gmail.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Return-path: Received: from mail-lb0-f169.google.com ([209.85.217.169]:39550 "EHLO mail-lb0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751979Ab3E2EPh (ORCPT ); Wed, 29 May 2013 00:15:37 -0400 Received: by mail-lb0-f169.google.com with SMTP id 10so8583689lbf.0 for ; Tue, 28 May 2013 21:15:35 -0700 (PDT) In-Reply-To: Sender: linux-arch-owner@vger.kernel.org List-ID: To: Catalin Marinas Cc: "linux-arch@vger.kernel.org" , linux-mm@kvack.org, linux-xtensa@linux-xtensa.org, Chris Zankel , Marc Gauthier Hi Catalin, On Tue, May 28, 2013 at 6:35 PM, Catalin Marinas wrote: > Max, > > On 26 May 2013 03:42, Max Filippov wrote: >> Hello arch and mm people. >> >> Is it intentional that threads of a process that invoked munmap syscall >> can see TLB entries pointing to already freed pages, or it is a bug? > > If it happens, this would be a bug. It means that a process can access > a physical page that has been allocated to something else, possibly > kernel data. > >> I'm talking about zap_pmd_range and zap_pte_range: >> >> zap_pmd_range >> zap_pte_range >> arch_enter_lazy_mmu_mode >> ptep_get_and_clear_full >> tlb_remove_tlb_entry >> __tlb_remove_page >> arch_leave_lazy_mmu_mode >> cond_resched >> >> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry >> and __tlb_remove_page there is a loop in the zap_pte_range that clears >> PTEs and frees corresponding pages, but doesn't flush TLB, and >> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread >> of the same process gets scheduled then it is able to see TLB entries >> pointing to already freed physical pages. > > It looks to me like cond_resched() here introduces a possible bug but > it depends on the actual arch code, especially the > __tlb_remove_tlb_entry() function. On ARM we record the range in > tlb_remove_tlb_entry() and queue the pages to be removed in > __tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0 > even for the UP case (which is also needed for hardware speculative > TLB loads). The tlb_finish_mmu() takes care of whatever pages are left > to be freed. > > With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1, > cond_resched() in zap_pmd_range() would cause problems. So, looks like most architectures in the UP configuration should have this issue (unless they flush TLB in the switch_mm, even when switching to the same mm): tlb_remove_tlb_entry __tlb_remove_tlb_entry __tlb_remove_page __HAVE_ARCH_ENTER_LAZY_MMU_MODE non-default non-trivial non-default defined alpha arc arm yes yes arm64 yes yes avr32 blackfin c6x cris frv h8300 hexagon ia64 yes yes yes Kconfig m32r m68k metag microblaze mips mn10300 openrisc parisc powerpc yes yes s390 yes yes (a) score sh yes yes (a) sparc yes tile um yes yes yes unicore32 x86 yes xtensa (a) __tlb_remove_page == free_page_and_swap_cache > I think possible workarounds: > > 1. tlb_fast_mode() always returning 0. > 2. add a tlb_flush_mmu(tlb) before cond_resched() in zap_pmd_range(). > 3. implement __tlb_remove_tlb_entry() on xtensa to always flush the > tlb (which is probably costly). > 4. drop the cond_resched() (not sure about preemptible kernels though). > > I would vote for 1 but let's see what the mm people say. -- Thanks. -- Max From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx118.postini.com [74.125.245.118]) by kanga.kvack.org (Postfix) with SMTP id AA0456B0099 for ; Wed, 29 May 2013 00:15:37 -0400 (EDT) Received: by mail-la0-f48.google.com with SMTP id fs12so8045667lab.7 for ; Tue, 28 May 2013 21:15:35 -0700 (PDT) Message-ID: <51A580E0.10300@gmail.com> Date: Wed, 29 May 2013 08:15:28 +0400 From: Max Filippov MIME-Version: 1.0 Subject: Re: TLB and PTE coherency during munmap References: In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Catalin Marinas Cc: "linux-arch@vger.kernel.org" , linux-mm@kvack.org, linux-xtensa@linux-xtensa.org, Chris Zankel , Marc Gauthier Hi Catalin, On Tue, May 28, 2013 at 6:35 PM, Catalin Marinas wrote: > Max, > > On 26 May 2013 03:42, Max Filippov wrote: >> Hello arch and mm people. >> >> Is it intentional that threads of a process that invoked munmap syscall >> can see TLB entries pointing to already freed pages, or it is a bug? > > If it happens, this would be a bug. It means that a process can access > a physical page that has been allocated to something else, possibly > kernel data. > >> I'm talking about zap_pmd_range and zap_pte_range: >> >> zap_pmd_range >> zap_pte_range >> arch_enter_lazy_mmu_mode >> ptep_get_and_clear_full >> tlb_remove_tlb_entry >> __tlb_remove_page >> arch_leave_lazy_mmu_mode >> cond_resched >> >> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry >> and __tlb_remove_page there is a loop in the zap_pte_range that clears >> PTEs and frees corresponding pages, but doesn't flush TLB, and >> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread >> of the same process gets scheduled then it is able to see TLB entries >> pointing to already freed physical pages. > > It looks to me like cond_resched() here introduces a possible bug but > it depends on the actual arch code, especially the > __tlb_remove_tlb_entry() function. On ARM we record the range in > tlb_remove_tlb_entry() and queue the pages to be removed in > __tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0 > even for the UP case (which is also needed for hardware speculative > TLB loads). The tlb_finish_mmu() takes care of whatever pages are left > to be freed. > > With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1, > cond_resched() in zap_pmd_range() would cause problems. So, looks like most architectures in the UP configuration should have this issue (unless they flush TLB in the switch_mm, even when switching to the same mm): tlb_remove_tlb_entry __tlb_remove_tlb_entry __tlb_remove_page __HAVE_ARCH_ENTER_LAZY_MMU_MODE non-default non-trivial non-default defined alpha arc arm yes yes arm64 yes yes avr32 blackfin c6x cris frv h8300 hexagon ia64 yes yes yes Kconfig m32r m68k metag microblaze mips mn10300 openrisc parisc powerpc yes yes s390 yes yes (a) score sh yes yes (a) sparc yes tile um yes yes yes unicore32 x86 yes xtensa (a) __tlb_remove_page == free_page_and_swap_cache > I think possible workarounds: > > 1. tlb_fast_mode() always returning 0. > 2. add a tlb_flush_mmu(tlb) before cond_resched() in zap_pmd_range(). > 3. implement __tlb_remove_tlb_entry() on xtensa to always flush the > tlb (which is probably costly). > 4. drop the cond_resched() (not sure about preemptible kernels though). > > I would vote for 1 but let's see what the mm people say. -- Thanks. -- Max -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org