All of lore.kernel.org
 help / color / mirror / Atom feed
* TLB and PTE coherency during munmap
@ 2013-05-26  2:42 Max Filippov
  2013-05-26  2:50   ` Max Filippov
  2013-05-28 14:35 ` Catalin Marinas
  0 siblings, 2 replies; 70+ messages in thread
From: Max Filippov @ 2013-05-26  2:42 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-xtensa; +Cc: Chris Zankel, Marc Gauthier

Hello arch and mm people.

Is it intentional that threads of a process that invoked munmap syscall
can see TLB entries pointing to already freed pages, or it is a bug?

I'm talking about zap_pmd_range and zap_pte_range:

      zap_pmd_range
        zap_pte_range
          arch_enter_lazy_mmu_mode
            ptep_get_and_clear_full
            tlb_remove_tlb_entry
            __tlb_remove_page
          arch_leave_lazy_mmu_mode
        cond_resched

With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
and __tlb_remove_page there is a loop in the zap_pte_range that clears
PTEs and frees corresponding pages, but doesn't flush TLB, and
surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
of the same process gets scheduled then it is able to see TLB entries
pointing to already freed physical pages.

I've noticed that with xtensa arch when I added a test before returning to
userspace checking that TLB contents agrees with page tables of the
current mm. This check reliably fires with the LTP test mtest05 that
maps, unmaps and accesses memory from multiple threads.

Is there anything wrong in my description, maybe something specific to
my arch, or this issue really exists?

I've also noticed that there are a lot of arches with default implementations
of the involved functions, does that mean that any/all of them have this
issue too?

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 70+ messages in thread

* TLB and PTE coherency during munmap
  2013-05-26  2:42 TLB and PTE coherency during munmap Max Filippov
@ 2013-05-26  2:50   ` Max Filippov
  2013-05-28 14:35 ` Catalin Marinas
  1 sibling, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-05-26  2:50 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-xtensa; +Cc: Chris Zankel, Marc Gauthier

Hello arch and mm people.

Is it intentional that threads of a process that invoked munmap syscall
can see TLB entries pointing to already freed pages, or it is a bug?

I'm talking about zap_pmd_range and zap_pte_range:

      zap_pmd_range
        zap_pte_range
          arch_enter_lazy_mmu_mode
            ptep_get_and_clear_full
            tlb_remove_tlb_entry
            __tlb_remove_page
          arch_leave_lazy_mmu_mode
        cond_resched

With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
and __tlb_remove_page there is a loop in the zap_pte_range that clears
PTEs and frees corresponding pages, but doesn't flush TLB, and
surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
of the same process gets scheduled then it is able to see TLB entries
pointing to already freed physical pages.

I've noticed that with xtensa arch when I added a test before returning to
userspace checking that TLB contents agrees with page tables of the
current mm. This check reliably fires with the LTP test mtest05 that
maps, unmaps and accesses memory from multiple threads.

Is there anything wrong in my description, maybe something specific to
my arch, or this issue really exists?

I've also noticed that there are a lot of arches with default implementations
of the involved functions, does that mean that any/all of them have this
issue too?

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 70+ messages in thread

* TLB and PTE coherency during munmap
@ 2013-05-26  2:50   ` Max Filippov
  0 siblings, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-05-26  2:50 UTC (permalink / raw)
  To: linux-arch, linux-mm, linux-xtensa; +Cc: Chris Zankel, Marc Gauthier

Hello arch and mm people.

Is it intentional that threads of a process that invoked munmap syscall
can see TLB entries pointing to already freed pages, or it is a bug?

I'm talking about zap_pmd_range and zap_pte_range:

      zap_pmd_range
        zap_pte_range
          arch_enter_lazy_mmu_mode
            ptep_get_and_clear_full
            tlb_remove_tlb_entry
            __tlb_remove_page
          arch_leave_lazy_mmu_mode
        cond_resched

With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
and __tlb_remove_page there is a loop in the zap_pte_range that clears
PTEs and frees corresponding pages, but doesn't flush TLB, and
surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
of the same process gets scheduled then it is able to see TLB entries
pointing to already freed physical pages.

I've noticed that with xtensa arch when I added a test before returning to
userspace checking that TLB contents agrees with page tables of the
current mm. This check reliably fires with the LTP test mtest05 that
maps, unmaps and accesses memory from multiple threads.

Is there anything wrong in my description, maybe something specific to
my arch, or this issue really exists?

I've also noticed that there are a lot of arches with default implementations
of the involved functions, does that mean that any/all of them have this
issue too?

-- 
Thanks.
-- Max

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
  2013-05-26  2:50   ` Max Filippov
@ 2013-05-28  7:10     ` Max Filippov
  -1 siblings, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-05-28  7:10 UTC (permalink / raw)
  To: Peter Zijlstra, KAMEZAWA Hiroyuki, linux-arch, linux-mm
  Cc: Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa, Hugh Dickins

On Sun, May 26, 2013 at 6:50 AM, Max Filippov <jcmvbkbc@gmail.com> wrote:
> Hello arch and mm people.
>
> Is it intentional that threads of a process that invoked munmap syscall
> can see TLB entries pointing to already freed pages, or it is a bug?
>
> I'm talking about zap_pmd_range and zap_pte_range:
>
>       zap_pmd_range
>         zap_pte_range
>           arch_enter_lazy_mmu_mode
>             ptep_get_and_clear_full
>             tlb_remove_tlb_entry
>             __tlb_remove_page
>           arch_leave_lazy_mmu_mode
>         cond_resched
>
> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
> and __tlb_remove_page there is a loop in the zap_pte_range that clears
> PTEs and frees corresponding pages, but doesn't flush TLB, and
> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
> of the same process gets scheduled then it is able to see TLB entries
> pointing to already freed physical pages.
>
> I've noticed that with xtensa arch when I added a test before returning to
> userspace checking that TLB contents agrees with page tables of the
> current mm. This check reliably fires with the LTP test mtest05 that
> maps, unmaps and accesses memory from multiple threads.
>
> Is there anything wrong in my description, maybe something specific to
> my arch, or this issue really exists?

Hi,

I've made similar checking function for MIPS (because qemu is my only choice
and it simulates MIPS TLB) and ran my tests on mips-malta machine in qemu.
With MIPS I can also see this issue. I hope I did it right, the patch at the
bottom is for the reference. The test I run and the diagnostic output are as
follows:

# ./runltp -p -q -T 100 -s mtest05
...
mmstress    0  TINFO  :  test2: Test case tests the race condition between simultaneous write faults in the same address space.
[  439.010000] 14: 70d68000: 03178000/00000000
mmstress    2  TPASS  :  TEST 2 Passed
...
mmstress    0  TINFO  :  test2: Test case tests the race condition between simultaneous write faults in the same address space.
[  947.390000] 10: 6f9d2000: 03639000/00000000
[  947.390000] 10: 6f9d3000: 03638000/00000000
mmstress    2  TPASS  :  TEST 2 Passed
...
mmstress    0  TINFO  :  test1: Test case tests the race condition between simultaneous read faults in the same address space.
[ 1922.680000] 10: 68e12000: 03b59000/00000000
[ 1922.680000] 10: 68e13000: 03b58000/00000000
mmstress    1  TPASS  :  TEST 1 Passed
...

To me it looks like the cond_resched in the zap_pmd_range is the root cause
of this issue (let alone SMP case for now). It was introduced in the commit

commit 97a894136f29802da19a15541de3c019e1ca147e
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date:   Tue May 24 17:12:04 2011 -0700

    mm: Remove i_mmap_lock lockbreak

Peter, Kamezawa, other reviewers of that commit, could you please comment?


------8<------
From 29324d01c0c95b11fbfe509d9b0eae42cab6d380 Mon Sep 17 00:00:00 2001
From: Max Filippov <jcmvbkbc@gmail.com>
Date: Tue, 28 May 2013 08:06:03 +0400
Subject: [PATCH] mips: check TLB sanity on return to userspace

Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
---
 arch/mips/kernel/entry.S |    1 +
 arch/mips/lib/dump_tlb.c |   81 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 82 insertions(+), 0 deletions(-)

diff --git a/arch/mips/kernel/entry.S b/arch/mips/kernel/entry.S
index e578685..82881ac 100644
--- a/arch/mips/kernel/entry.S
+++ b/arch/mips/kernel/entry.S
@@ -49,6 +49,7 @@ resume_userspace:
 	local_irq_disable		# make sure we dont miss an
 					# interrupt setting need_resched
 					# between sampling and return
+	jal	check_tlb_all
 	LONG_L	a2, TI_FLAGS($28)	# current->work
 	andi	t0, a2, _TIF_WORK_MASK	# (ignoring syscall_trace)
 	bnez	t0, work_pending
diff --git a/arch/mips/lib/dump_tlb.c b/arch/mips/lib/dump_tlb.c
index 32b9f21..629e38b 100644
--- a/arch/mips/lib/dump_tlb.c
+++ b/arch/mips/lib/dump_tlb.c
@@ -6,6 +6,8 @@
  */
 #include <linux/kernel.h>
 #include <linux/mm.h>
+#include <linux/sched.h>
+#include <linux/mm_types.h>

 #include <asm/mipsregs.h>
 #include <asm/page.h>
@@ -111,3 +113,82 @@ void dump_tlb_all(void)
 {
 	dump_tlb(0, current_cpu_data.tlbsize - 1);
 }
+
+static inline pte_t *get_pte_for_vaddr(unsigned vaddr)
+{
+	struct task_struct *task = get_current();
+	struct mm_struct *mm = task->mm;
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+
+	if (!mm)
+		mm = task->active_mm;
+	pgd = pgd_offset(mm, vaddr);
+	if (pgd_none_or_clear_bad(pgd))
+		return NULL;
+	pud = pud_offset(pgd, vaddr);
+	if (pud_none_or_clear_bad(pud))
+		return NULL;
+	pmd = pmd_offset(pud, vaddr);
+	if (pmd_none_or_clear_bad(pmd))
+		return NULL;
+	pte = pte_offset_map(pmd, vaddr);
+	return pte;
+}
+
+static inline void check_tlb_entry(unsigned i, unsigned long va,
+		unsigned long long entrylo)
+{
+	if (entrylo & 2) {
+		pte_t *pte = get_pte_for_vaddr(va);
+		unsigned long pte_pa = pte ? (pte_pfn(*pte) << PAGE_SHIFT) : 0;
+		unsigned long tlb_pa = (entrylo << 6) & PAGE_MASK;
+		if (pte_pa != tlb_pa)
+			pr_err("%d: %08lx: %08lx/%08lx\n",
+					i, va, tlb_pa, pte_pa);
+	}
+}
+static void check_tlb(unsigned last)
+{
+	unsigned long s_entryhi, asid;
+	unsigned int s_index, s_pagemask, i;
+	struct {
+		unsigned long entryhi;
+		unsigned long entrylo0, entrylo1;
+	} c[64];
+
+	BUG_ON(last > ARRAY_SIZE(c));
+	s_pagemask = read_c0_pagemask();
+	s_entryhi = read_c0_entryhi();
+	s_index = read_c0_index();
+	asid = s_entryhi & 0xff;
+
+	for (i = 0; i < last; ++i) {
+		write_c0_index(i);
+		BARRIER();
+		tlb_read();
+		BARRIER();
+		c[i].entryhi = read_c0_entryhi();
+		c[i].entrylo0 = read_c0_entrylo0();
+		c[i].entrylo1 = read_c0_entrylo1();
+	}
+	for (i = 0; i < last; ++i) {
+		/* Unused entries have a virtual address of CKSEG0.  */
+		if ((c[i].entryhi & ~0x1ffffUL) != CKSEG0
+		    && (c[i].entryhi & 0xff) == asid) {
+			unsigned long va = c[i].entryhi & ~0x1fffUL;
+			check_tlb_entry(i, va, c[i].entrylo0);
+			check_tlb_entry(i, va + PAGE_SIZE, c[i].entrylo1);
+		}
+	}
+	write_c0_entryhi(s_entryhi);
+	write_c0_index(s_index);
+	write_c0_pagemask(s_pagemask);
+}
+
+void check_tlb_all(void)
+{
+	check_tlb(current_cpu_data.tlbsize);
+}
-- 
1.7.7.6


-- 
Thanks.
-- Max

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-28  7:10     ` Max Filippov
  0 siblings, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-05-28  7:10 UTC (permalink / raw)
  To: Peter Zijlstra, KAMEZAWA Hiroyuki, linux-arch, linux-mm
  Cc: Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa, Hugh Dickins

On Sun, May 26, 2013 at 6:50 AM, Max Filippov <jcmvbkbc@gmail.com> wrote:
> Hello arch and mm people.
>
> Is it intentional that threads of a process that invoked munmap syscall
> can see TLB entries pointing to already freed pages, or it is a bug?
>
> I'm talking about zap_pmd_range and zap_pte_range:
>
>       zap_pmd_range
>         zap_pte_range
>           arch_enter_lazy_mmu_mode
>             ptep_get_and_clear_full
>             tlb_remove_tlb_entry
>             __tlb_remove_page
>           arch_leave_lazy_mmu_mode
>         cond_resched
>
> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
> and __tlb_remove_page there is a loop in the zap_pte_range that clears
> PTEs and frees corresponding pages, but doesn't flush TLB, and
> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
> of the same process gets scheduled then it is able to see TLB entries
> pointing to already freed physical pages.
>
> I've noticed that with xtensa arch when I added a test before returning to
> userspace checking that TLB contents agrees with page tables of the
> current mm. This check reliably fires with the LTP test mtest05 that
> maps, unmaps and accesses memory from multiple threads.
>
> Is there anything wrong in my description, maybe something specific to
> my arch, or this issue really exists?

Hi,

I've made similar checking function for MIPS (because qemu is my only choice
and it simulates MIPS TLB) and ran my tests on mips-malta machine in qemu.
With MIPS I can also see this issue. I hope I did it right, the patch at the
bottom is for the reference. The test I run and the diagnostic output are as
follows:

# ./runltp -p -q -T 100 -s mtest05
...
mmstress    0  TINFO  :  test2: Test case tests the race condition between simultaneous write faults in the same address space.
[  439.010000] 14: 70d68000: 03178000/00000000
mmstress    2  TPASS  :  TEST 2 Passed
...
mmstress    0  TINFO  :  test2: Test case tests the race condition between simultaneous write faults in the same address space.
[  947.390000] 10: 6f9d2000: 03639000/00000000
[  947.390000] 10: 6f9d3000: 03638000/00000000
mmstress    2  TPASS  :  TEST 2 Passed
...
mmstress    0  TINFO  :  test1: Test case tests the race condition between simultaneous read faults in the same address space.
[ 1922.680000] 10: 68e12000: 03b59000/00000000
[ 1922.680000] 10: 68e13000: 03b58000/00000000
mmstress    1  TPASS  :  TEST 1 Passed
...

To me it looks like the cond_resched in the zap_pmd_range is the root cause
of this issue (let alone SMP case for now). It was introduced in the commit

commit 97a894136f29802da19a15541de3c019e1ca147e
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date:   Tue May 24 17:12:04 2011 -0700

    mm: Remove i_mmap_lock lockbreak

Peter, Kamezawa, other reviewers of that commit, could you please comment?


------8<------

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-28 14:34     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 70+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-05-28 14:34 UTC (permalink / raw)
  To: Max Filippov
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

On Sun, May 26, 2013 at 06:50:46AM +0400, Max Filippov wrote:
> Hello arch and mm people.
> 
> Is it intentional that threads of a process that invoked munmap syscall
> can see TLB entries pointing to already freed pages, or it is a bug?
> 
> I'm talking about zap_pmd_range and zap_pte_range:
> 
>       zap_pmd_range
>         zap_pte_range
>           arch_enter_lazy_mmu_mode
>             ptep_get_and_clear_full
>             tlb_remove_tlb_entry
>             __tlb_remove_page
>           arch_leave_lazy_mmu_mode
>         cond_resched
> 
> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
> and __tlb_remove_page there is a loop in the zap_pte_range that clears
> PTEs and frees corresponding pages, but doesn't flush TLB, and
> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
> of the same process gets scheduled then it is able to see TLB entries
> pointing to already freed physical pages.

The idea behind the lazy MMU subsystem is that it does not need to flush
the TLB all the time and allow one to do PTE manipulations in a "batch mode".
Meaning there are stray entries - and one has to be diligient about not using them.

Here is the relvant comment from the Linux header:
                                                                            
/*                                                                              
 * A facility to provide lazy MMU batching.  This allows PTE updates and        
 * page invalidations to be delayed until a call to leave lazy MMU mode         
 * is issued.  Some architectures may benefit from doing this, and it is        
 * beneficial for both shadow and direct mode hypervisors, which may batch      
 * the PTE updates which happen during this window.  Note that using this       
 * interface requires that read hazards be removed from the code.  A read       
 * hazard could result in the direct mode hypervisor case, since the actual     
 * write to the page tables may not yet have taken place, so reads though       
 * a raw PTE pointer after it has been modified are not guaranteed to be        
 * up to date.  This mode can only be entered and left under the protection of  
 * the page table locks for all page tables which may be modified.  In the UP   
 * case, this is required so that preemption is disabled, and in the SMP case,  
 * it must synchronize the delayed page table writes properly on other CPUs.    
 */                                      

This means that eventually when arch_leave_lazy_mmu_mode or
arch_flush_lazy_mmu_mode is called, the PTE updates _should_ be flushed
(aka, TLB flush if needed on the altered PTE entries).

> 
> I've noticed that with xtensa arch when I added a test before returning to
> userspace checking that TLB contents agrees with page tables of the
> current mm. This check reliably fires with the LTP test mtest05 that
> maps, unmaps and accesses memory from multiple threads.
> 
> Is there anything wrong in my description, maybe something specific to
> my arch, or this issue really exists?
> 
> I've also noticed that there are a lot of arches with default implementations
> of the involved functions, does that mean that any/all of them have this
> issue too?
> 
> -- 
> Thanks.
> -- Max
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-28 14:34     ` Konrad Rzeszutek Wilk
  0 siblings, 0 replies; 70+ messages in thread
From: Konrad Rzeszutek Wilk @ 2013-05-28 14:34 UTC (permalink / raw)
  To: Max Filippov
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

On Sun, May 26, 2013 at 06:50:46AM +0400, Max Filippov wrote:
> Hello arch and mm people.
> 
> Is it intentional that threads of a process that invoked munmap syscall
> can see TLB entries pointing to already freed pages, or it is a bug?
> 
> I'm talking about zap_pmd_range and zap_pte_range:
> 
>       zap_pmd_range
>         zap_pte_range
>           arch_enter_lazy_mmu_mode
>             ptep_get_and_clear_full
>             tlb_remove_tlb_entry
>             __tlb_remove_page
>           arch_leave_lazy_mmu_mode
>         cond_resched
> 
> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
> and __tlb_remove_page there is a loop in the zap_pte_range that clears
> PTEs and frees corresponding pages, but doesn't flush TLB, and
> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
> of the same process gets scheduled then it is able to see TLB entries
> pointing to already freed physical pages.

The idea behind the lazy MMU subsystem is that it does not need to flush
the TLB all the time and allow one to do PTE manipulations in a "batch mode".
Meaning there are stray entries - and one has to be diligient about not using them.

Here is the relvant comment from the Linux header:
                                                                            
/*                                                                              
 * A facility to provide lazy MMU batching.  This allows PTE updates and        
 * page invalidations to be delayed until a call to leave lazy MMU mode         
 * is issued.  Some architectures may benefit from doing this, and it is        
 * beneficial for both shadow and direct mode hypervisors, which may batch      
 * the PTE updates which happen during this window.  Note that using this       
 * interface requires that read hazards be removed from the code.  A read       
 * hazard could result in the direct mode hypervisor case, since the actual     
 * write to the page tables may not yet have taken place, so reads though       
 * a raw PTE pointer after it has been modified are not guaranteed to be        
 * up to date.  This mode can only be entered and left under the protection of  
 * the page table locks for all page tables which may be modified.  In the UP   
 * case, this is required so that preemption is disabled, and in the SMP case,  
 * it must synchronize the delayed page table writes properly on other CPUs.    
 */                                      

This means that eventually when arch_leave_lazy_mmu_mode or
arch_flush_lazy_mmu_mode is called, the PTE updates _should_ be flushed
(aka, TLB flush if needed on the altered PTE entries).

> 
> I've noticed that with xtensa arch when I added a test before returning to
> userspace checking that TLB contents agrees with page tables of the
> current mm. This check reliably fires with the LTP test mtest05 that
> maps, unmaps and accesses memory from multiple threads.
> 
> Is there anything wrong in my description, maybe something specific to
> my arch, or this issue really exists?
> 
> I've also noticed that there are a lot of arches with default implementations
> of the involved functions, does that mean that any/all of them have this
> issue too?
> 
> -- 
> Thanks.
> -- Max
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
  2013-05-26  2:42 TLB and PTE coherency during munmap Max Filippov
  2013-05-26  2:50   ` Max Filippov
@ 2013-05-28 14:35 ` Catalin Marinas
  2013-05-29  4:15     ` Max Filippov
                     ` (2 more replies)
  1 sibling, 3 replies; 70+ messages in thread
From: Catalin Marinas @ 2013-05-28 14:35 UTC (permalink / raw)
  To: Max Filippov
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

Max,

On 26 May 2013 03:42, Max Filippov <jcmvbkbc@gmail.com> wrote:
> Hello arch and mm people.
>
> Is it intentional that threads of a process that invoked munmap syscall
> can see TLB entries pointing to already freed pages, or it is a bug?

If it happens, this would be a bug. It means that a process can access
a physical page that has been allocated to something else, possibly
kernel data.

> I'm talking about zap_pmd_range and zap_pte_range:
>
>       zap_pmd_range
>         zap_pte_range
>           arch_enter_lazy_mmu_mode
>             ptep_get_and_clear_full
>             tlb_remove_tlb_entry
>             __tlb_remove_page
>           arch_leave_lazy_mmu_mode
>         cond_resched
>
> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
> and __tlb_remove_page there is a loop in the zap_pte_range that clears
> PTEs and frees corresponding pages, but doesn't flush TLB, and
> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
> of the same process gets scheduled then it is able to see TLB entries
> pointing to already freed physical pages.

It looks to me like cond_resched() here introduces a possible bug but
it depends on the actual arch code, especially the
__tlb_remove_tlb_entry() function. On ARM we record the range in
tlb_remove_tlb_entry() and queue the pages to be removed in
__tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0
even for the UP case (which is also needed for hardware speculative
TLB loads). The tlb_finish_mmu() takes care of whatever pages are left
to be freed.

With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1,
cond_resched() in zap_pmd_range() would cause problems.

I think possible workarounds:

1. tlb_fast_mode() always returning 0.
2. add a tlb_flush_mmu(tlb) before cond_resched() in zap_pmd_range().
3. implement __tlb_remove_tlb_entry() on xtensa to always flush the
tlb (which is probably costly).
4. drop the cond_resched() (not sure about preemptible kernels though).

I would vote for 1 but let's see what the mm people say.

--
Catalin

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-28 15:16     ` Michal Hocko
  0 siblings, 0 replies; 70+ messages in thread
From: Michal Hocko @ 2013-05-28 15:16 UTC (permalink / raw)
  To: Max Filippov
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

On Sun 26-05-13 06:50:46, Max Filippov wrote:
> Hello arch and mm people.
> 
> Is it intentional that threads of a process that invoked munmap syscall
> can see TLB entries pointing to already freed pages, or it is a bug?
> 
> I'm talking about zap_pmd_range and zap_pte_range:
> 
>       zap_pmd_range
>         zap_pte_range
>           arch_enter_lazy_mmu_mode
>             ptep_get_and_clear_full
>             tlb_remove_tlb_entry
>             __tlb_remove_page
>           arch_leave_lazy_mmu_mode
>         cond_resched
> 
> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
> and __tlb_remove_page there is a loop in the zap_pte_range that clears
> PTEs and frees corresponding pages,

The page is not freed at that time (at least not for the generic
mmu_gather implementation). It is stored into mmu_gather and then freed
along with the tlb flush in tlb_flush_mmu.
[...]
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-28 15:16     ` Michal Hocko
  0 siblings, 0 replies; 70+ messages in thread
From: Michal Hocko @ 2013-05-28 15:16 UTC (permalink / raw)
  To: Max Filippov
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

On Sun 26-05-13 06:50:46, Max Filippov wrote:
> Hello arch and mm people.
> 
> Is it intentional that threads of a process that invoked munmap syscall
> can see TLB entries pointing to already freed pages, or it is a bug?
> 
> I'm talking about zap_pmd_range and zap_pte_range:
> 
>       zap_pmd_range
>         zap_pte_range
>           arch_enter_lazy_mmu_mode
>             ptep_get_and_clear_full
>             tlb_remove_tlb_entry
>             __tlb_remove_page
>           arch_leave_lazy_mmu_mode
>         cond_resched
> 
> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
> and __tlb_remove_page there is a loop in the zap_pte_range that clears
> PTEs and frees corresponding pages,

The page is not freed at that time (at least not for the generic
mmu_gather implementation). It is stored into mmu_gather and then freed
along with the tlb flush in tlb_flush_mmu.
[...]
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
  2013-05-28 15:16     ` Michal Hocko
@ 2013-05-28 15:23       ` Catalin Marinas
  -1 siblings, 0 replies; 70+ messages in thread
From: Catalin Marinas @ 2013-05-28 15:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Max Filippov, linux-arch, linux-mm, linux-xtensa, Chris Zankel,
	Marc Gauthier

On 28 May 2013 16:16, Michal Hocko <mhocko@suse.cz> wrote:
> On Sun 26-05-13 06:50:46, Max Filippov wrote:
>> Is it intentional that threads of a process that invoked munmap syscall
>> can see TLB entries pointing to already freed pages, or it is a bug?
>>
>> I'm talking about zap_pmd_range and zap_pte_range:
>>
>>       zap_pmd_range
>>         zap_pte_range
>>           arch_enter_lazy_mmu_mode
>>             ptep_get_and_clear_full
>>             tlb_remove_tlb_entry
>>             __tlb_remove_page
>>           arch_leave_lazy_mmu_mode
>>         cond_resched
>>
>> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>> PTEs and frees corresponding pages,
>
> The page is not freed at that time (at least not for the generic
> mmu_gather implementation). It is stored into mmu_gather and then freed
> along with the tlb flush in tlb_flush_mmu.

Actually for the UP case, the page gets freed in __tlb_remove_page()
since tlb_fast_mode() is 1.

--
Catalin

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-28 15:23       ` Catalin Marinas
  0 siblings, 0 replies; 70+ messages in thread
From: Catalin Marinas @ 2013-05-28 15:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Max Filippov, linux-arch, linux-mm, linux-xtensa, Chris Zankel,
	Marc Gauthier

On 28 May 2013 16:16, Michal Hocko <mhocko@suse.cz> wrote:
> On Sun 26-05-13 06:50:46, Max Filippov wrote:
>> Is it intentional that threads of a process that invoked munmap syscall
>> can see TLB entries pointing to already freed pages, or it is a bug?
>>
>> I'm talking about zap_pmd_range and zap_pte_range:
>>
>>       zap_pmd_range
>>         zap_pte_range
>>           arch_enter_lazy_mmu_mode
>>             ptep_get_and_clear_full
>>             tlb_remove_tlb_entry
>>             __tlb_remove_page
>>           arch_leave_lazy_mmu_mode
>>         cond_resched
>>
>> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>> PTEs and frees corresponding pages,
>
> The page is not freed at that time (at least not for the generic
> mmu_gather implementation). It is stored into mmu_gather and then freed
> along with the tlb flush in tlb_flush_mmu.

Actually for the UP case, the page gets freed in __tlb_remove_page()
since tlb_fast_mode() is 1.

--
Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
  2013-05-28 14:34     ` Konrad Rzeszutek Wilk
@ 2013-05-29  3:23       ` Max Filippov
  -1 siblings, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-05-29  3:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

On Tue, May 28, 2013 at 6:34 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Sun, May 26, 2013 at 06:50:46AM +0400, Max Filippov wrote:
>> Hello arch and mm people.
>>
>> Is it intentional that threads of a process that invoked munmap syscall
>> can see TLB entries pointing to already freed pages, or it is a bug?
>>
>> I'm talking about zap_pmd_range and zap_pte_range:
>>
>>       zap_pmd_range
>>         zap_pte_range
>>           arch_enter_lazy_mmu_mode
>>             ptep_get_and_clear_full
>>             tlb_remove_tlb_entry
>>             __tlb_remove_page
>>           arch_leave_lazy_mmu_mode
>>         cond_resched
>>
>> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>> PTEs and frees corresponding pages, but doesn't flush TLB, and
>> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>> of the same process gets scheduled then it is able to see TLB entries
>> pointing to already freed physical pages.
>
> The idea behind the lazy MMU subsystem is that it does not need to flush
> the TLB all the time and allow one to do PTE manipulations in a "batch mode".
> Meaning there are stray entries - and one has to be diligient about not using them.

Yes, I got it, IOW TLB entries must either be flushed before userspace can
see them, or the underlying pages must not be freed.

> Here is the relvant comment from the Linux header:
>
> /*
>  * A facility to provide lazy MMU batching.  This allows PTE updates and
>  * page invalidations to be delayed until a call to leave lazy MMU mode
>  * is issued.  Some architectures may benefit from doing this, and it is
>  * beneficial for both shadow and direct mode hypervisors, which may batch
>  * the PTE updates which happen during this window.  Note that using this
>  * interface requires that read hazards be removed from the code.  A read
>  * hazard could result in the direct mode hypervisor case, since the actual
>  * write to the page tables may not yet have taken place, so reads though
>  * a raw PTE pointer after it has been modified are not guaranteed to be
>  * up to date.  This mode can only be entered and left under the protection of
>  * the page table locks for all page tables which may be modified.  In the UP
>  * case, this is required so that preemption is disabled, and in the SMP case,
>  * it must synchronize the delayed page table writes properly on other CPUs.
>  */
>
> This means that eventually when arch_leave_lazy_mmu_mode or
> arch_flush_lazy_mmu_mode is called, the PTE updates _should_ be flushed
> (aka, TLB flush if needed on the altered PTE entries).

Should (: But I only see powerpc, sparc and x86 defining
__HAVE_ARCH_ENTER_LAZY_MMU_MODE, so this does not apply to all
remaining arches.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-29  3:23       ` Max Filippov
  0 siblings, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-05-29  3:23 UTC (permalink / raw)
  To: Konrad Rzeszutek Wilk
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

On Tue, May 28, 2013 at 6:34 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
> On Sun, May 26, 2013 at 06:50:46AM +0400, Max Filippov wrote:
>> Hello arch and mm people.
>>
>> Is it intentional that threads of a process that invoked munmap syscall
>> can see TLB entries pointing to already freed pages, or it is a bug?
>>
>> I'm talking about zap_pmd_range and zap_pte_range:
>>
>>       zap_pmd_range
>>         zap_pte_range
>>           arch_enter_lazy_mmu_mode
>>             ptep_get_and_clear_full
>>             tlb_remove_tlb_entry
>>             __tlb_remove_page
>>           arch_leave_lazy_mmu_mode
>>         cond_resched
>>
>> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>> PTEs and frees corresponding pages, but doesn't flush TLB, and
>> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>> of the same process gets scheduled then it is able to see TLB entries
>> pointing to already freed physical pages.
>
> The idea behind the lazy MMU subsystem is that it does not need to flush
> the TLB all the time and allow one to do PTE manipulations in a "batch mode".
> Meaning there are stray entries - and one has to be diligient about not using them.

Yes, I got it, IOW TLB entries must either be flushed before userspace can
see them, or the underlying pages must not be freed.

> Here is the relvant comment from the Linux header:
>
> /*
>  * A facility to provide lazy MMU batching.  This allows PTE updates and
>  * page invalidations to be delayed until a call to leave lazy MMU mode
>  * is issued.  Some architectures may benefit from doing this, and it is
>  * beneficial for both shadow and direct mode hypervisors, which may batch
>  * the PTE updates which happen during this window.  Note that using this
>  * interface requires that read hazards be removed from the code.  A read
>  * hazard could result in the direct mode hypervisor case, since the actual
>  * write to the page tables may not yet have taken place, so reads though
>  * a raw PTE pointer after it has been modified are not guaranteed to be
>  * up to date.  This mode can only be entered and left under the protection of
>  * the page table locks for all page tables which may be modified.  In the UP
>  * case, this is required so that preemption is disabled, and in the SMP case,
>  * it must synchronize the delayed page table writes properly on other CPUs.
>  */
>
> This means that eventually when arch_leave_lazy_mmu_mode or
> arch_flush_lazy_mmu_mode is called, the PTE updates _should_ be flushed
> (aka, TLB flush if needed on the altered PTE entries).

Should (: But I only see powerpc, sparc and x86 defining
__HAVE_ARCH_ENTER_LAZY_MMU_MODE, so this does not apply to all
remaining arches.

-- 
Thanks.
-- Max

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
  2013-05-28 14:35 ` Catalin Marinas
@ 2013-05-29  4:15     ` Max Filippov
  2013-05-29 11:53   ` Vineet Gupta
  2013-05-29 12:00     ` Vineet Gupta
  2 siblings, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-05-29  4:15 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

Hi Catalin,

On Tue, May 28, 2013 at 6:35 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
> Max,
>
> On 26 May 2013 03:42, Max Filippov <jcmvbkbc@gmail.com> wrote:
>> Hello arch and mm people.
>>
>> Is it intentional that threads of a process that invoked munmap syscall
>> can see TLB entries pointing to already freed pages, or it is a bug?
>
> If it happens, this would be a bug. It means that a process can access
> a physical page that has been allocated to something else, possibly
> kernel data.
>
>> I'm talking about zap_pmd_range and zap_pte_range:
>>
>>       zap_pmd_range
>>         zap_pte_range
>>           arch_enter_lazy_mmu_mode
>>             ptep_get_and_clear_full
>>             tlb_remove_tlb_entry
>>             __tlb_remove_page
>>           arch_leave_lazy_mmu_mode
>>         cond_resched
>>
>> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>> PTEs and frees corresponding pages, but doesn't flush TLB, and
>> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>> of the same process gets scheduled then it is able to see TLB entries
>> pointing to already freed physical pages.
>
> It looks to me like cond_resched() here introduces a possible bug but
> it depends on the actual arch code, especially the
> __tlb_remove_tlb_entry() function. On ARM we record the range in
> tlb_remove_tlb_entry() and queue the pages to be removed in
> __tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0
> even for the UP case (which is also needed for hardware speculative
> TLB loads). The tlb_finish_mmu() takes care of whatever pages are left
> to be freed.
>
> With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1,
> cond_resched() in zap_pmd_range() would cause problems.

So, looks like most architectures in the UP configuration should have
this issue (unless they flush TLB in the switch_mm, even when switching
to the same mm):

		tlb_remove_tlb_entry	__tlb_remove_tlb_entry	__tlb_remove_page	__HAVE_ARCH_ENTER_LAZY_MMU_MODE		
		non-default		non-trivial		non-default		defined				
alpha
arc
arm		yes						yes
arm64		yes						yes
avr32
blackfin
c6x
cris
frv
h8300
hexagon
ia64		yes			yes			yes
Kconfig
m32r
m68k
metag
microblaze
mips
mn10300
openrisc
parisc
powerpc					yes						yes
s390		yes						yes (a)
score
sh		yes						yes (a)
sparc											yes
tile
um		yes			yes			yes
unicore32
x86											yes
xtensa

(a) __tlb_remove_page == free_page_and_swap_cache

> I think possible workarounds:
>
> 1. tlb_fast_mode() always returning 0.
> 2. add a tlb_flush_mmu(tlb) before cond_resched() in zap_pmd_range().
> 3. implement __tlb_remove_tlb_entry() on xtensa to always flush the
> tlb (which is probably costly).
> 4. drop the cond_resched() (not sure about preemptible kernels though).
>
> I would vote for 1 but let's see what the mm people say.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-29  4:15     ` Max Filippov
  0 siblings, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-05-29  4:15 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

Hi Catalin,

On Tue, May 28, 2013 at 6:35 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
> Max,
>
> On 26 May 2013 03:42, Max Filippov <jcmvbkbc@gmail.com> wrote:
>> Hello arch and mm people.
>>
>> Is it intentional that threads of a process that invoked munmap syscall
>> can see TLB entries pointing to already freed pages, or it is a bug?
>
> If it happens, this would be a bug. It means that a process can access
> a physical page that has been allocated to something else, possibly
> kernel data.
>
>> I'm talking about zap_pmd_range and zap_pte_range:
>>
>>       zap_pmd_range
>>         zap_pte_range
>>           arch_enter_lazy_mmu_mode
>>             ptep_get_and_clear_full
>>             tlb_remove_tlb_entry
>>             __tlb_remove_page
>>           arch_leave_lazy_mmu_mode
>>         cond_resched
>>
>> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>> PTEs and frees corresponding pages, but doesn't flush TLB, and
>> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>> of the same process gets scheduled then it is able to see TLB entries
>> pointing to already freed physical pages.
>
> It looks to me like cond_resched() here introduces a possible bug but
> it depends on the actual arch code, especially the
> __tlb_remove_tlb_entry() function. On ARM we record the range in
> tlb_remove_tlb_entry() and queue the pages to be removed in
> __tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0
> even for the UP case (which is also needed for hardware speculative
> TLB loads). The tlb_finish_mmu() takes care of whatever pages are left
> to be freed.
>
> With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1,
> cond_resched() in zap_pmd_range() would cause problems.

So, looks like most architectures in the UP configuration should have
this issue (unless they flush TLB in the switch_mm, even when switching
to the same mm):

		tlb_remove_tlb_entry	__tlb_remove_tlb_entry	__tlb_remove_page	__HAVE_ARCH_ENTER_LAZY_MMU_MODE		
		non-default		non-trivial		non-default		defined				
alpha
arc
arm		yes						yes
arm64		yes						yes
avr32
blackfin
c6x
cris
frv
h8300
hexagon
ia64		yes			yes			yes
Kconfig
m32r
m68k
metag
microblaze
mips
mn10300
openrisc
parisc
powerpc					yes						yes
s390		yes						yes (a)
score
sh		yes						yes (a)
sparc											yes
tile
um		yes			yes			yes
unicore32
x86											yes
xtensa

(a) __tlb_remove_page == free_page_and_swap_cache

> I think possible workarounds:
>
> 1. tlb_fast_mode() always returning 0.
> 2. add a tlb_flush_mmu(tlb) before cond_resched() in zap_pmd_range().
> 3. implement __tlb_remove_tlb_entry() on xtensa to always flush the
> tlb (which is probably costly).
> 4. drop the cond_resched() (not sure about preemptible kernels though).
>
> I would vote for 1 but let's see what the mm people say.

-- 
Thanks.
-- Max

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-29 10:15       ` Catalin Marinas
  0 siblings, 0 replies; 70+ messages in thread
From: Catalin Marinas @ 2013-05-29 10:15 UTC (permalink / raw)
  To: Max Filippov
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

On Wed, May 29, 2013 at 05:15:28AM +0100, Max Filippov wrote:
> On Tue, May 28, 2013 at 6:35 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On 26 May 2013 03:42, Max Filippov <jcmvbkbc@gmail.com> wrote:
> >> Is it intentional that threads of a process that invoked munmap syscall
> >> can see TLB entries pointing to already freed pages, or it is a bug?
> >
> > If it happens, this would be a bug. It means that a process can access
> > a physical page that has been allocated to something else, possibly
> > kernel data.
> >
> >> I'm talking about zap_pmd_range and zap_pte_range:
> >>
> >>       zap_pmd_range
> >>         zap_pte_range
> >>           arch_enter_lazy_mmu_mode
> >>             ptep_get_and_clear_full
> >>             tlb_remove_tlb_entry
> >>             __tlb_remove_page
> >>           arch_leave_lazy_mmu_mode
> >>         cond_resched
> >>
> >> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
> >> and __tlb_remove_page there is a loop in the zap_pte_range that clears
> >> PTEs and frees corresponding pages, but doesn't flush TLB, and
> >> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
> >> of the same process gets scheduled then it is able to see TLB entries
> >> pointing to already freed physical pages.
> >
> > It looks to me like cond_resched() here introduces a possible bug but
> > it depends on the actual arch code, especially the
> > __tlb_remove_tlb_entry() function. On ARM we record the range in
> > tlb_remove_tlb_entry() and queue the pages to be removed in
> > __tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0
> > even for the UP case (which is also needed for hardware speculative
> > TLB loads). The tlb_finish_mmu() takes care of whatever pages are left
> > to be freed.
> >
> > With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1,
> > cond_resched() in zap_pmd_range() would cause problems.
> 
> So, looks like most architectures in the UP configuration should have
> this issue (unless they flush TLB in the switch_mm, even when switching
> to the same mm):

switch_mm() wouldn't be called if switching to the same mm. You could do
it in switch_to() but it's not efficient (or before returning to user
space on the same processor).

Do you happen to have a user-space test for this? Something like one
thread does an mmap(), writes some poison value, munmap(). The other
thread keeps checking the poison value while trapping and ignoring any
SIGSEGV. If it's working correctly, the second thread should either get
a SIGSEGV or read the poison value.

-- 
Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-29 10:15       ` Catalin Marinas
  0 siblings, 0 replies; 70+ messages in thread
From: Catalin Marinas @ 2013-05-29 10:15 UTC (permalink / raw)
  To: Max Filippov
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

On Wed, May 29, 2013 at 05:15:28AM +0100, Max Filippov wrote:
> On Tue, May 28, 2013 at 6:35 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
> > On 26 May 2013 03:42, Max Filippov <jcmvbkbc@gmail.com> wrote:
> >> Is it intentional that threads of a process that invoked munmap syscall
> >> can see TLB entries pointing to already freed pages, or it is a bug?
> >
> > If it happens, this would be a bug. It means that a process can access
> > a physical page that has been allocated to something else, possibly
> > kernel data.
> >
> >> I'm talking about zap_pmd_range and zap_pte_range:
> >>
> >>       zap_pmd_range
> >>         zap_pte_range
> >>           arch_enter_lazy_mmu_mode
> >>             ptep_get_and_clear_full
> >>             tlb_remove_tlb_entry
> >>             __tlb_remove_page
> >>           arch_leave_lazy_mmu_mode
> >>         cond_resched
> >>
> >> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
> >> and __tlb_remove_page there is a loop in the zap_pte_range that clears
> >> PTEs and frees corresponding pages, but doesn't flush TLB, and
> >> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
> >> of the same process gets scheduled then it is able to see TLB entries
> >> pointing to already freed physical pages.
> >
> > It looks to me like cond_resched() here introduces a possible bug but
> > it depends on the actual arch code, especially the
> > __tlb_remove_tlb_entry() function. On ARM we record the range in
> > tlb_remove_tlb_entry() and queue the pages to be removed in
> > __tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0
> > even for the UP case (which is also needed for hardware speculative
> > TLB loads). The tlb_finish_mmu() takes care of whatever pages are left
> > to be freed.
> >
> > With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1,
> > cond_resched() in zap_pmd_range() would cause problems.
> 
> So, looks like most architectures in the UP configuration should have
> this issue (unless they flush TLB in the switch_mm, even when switching
> to the same mm):

switch_mm() wouldn't be called if switching to the same mm. You could do
it in switch_to() but it's not efficient (or before returning to user
space on the same processor).

Do you happen to have a user-space test for this? Something like one
thread does an mmap(), writes some poison value, munmap(). The other
thread keeps checking the poison value while trapping and ignoring any
SIGSEGV. If it's working correctly, the second thread should either get
a SIGSEGV or read the poison value.

-- 
Catalin

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
  2013-05-28 14:35 ` Catalin Marinas
  2013-05-29  4:15     ` Max Filippov
@ 2013-05-29 11:53   ` Vineet Gupta
  2013-05-29 12:00     ` Vineet Gupta
  2 siblings, 0 replies; 70+ messages in thread
From: Vineet Gupta @ 2013-05-29 11:53 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Max Filippov, linux-arch, linux-mm, linux-xtensa, Chris Zankel,
	Marc Gauthier

On 05/28/2013 08:05 PM, Catalin Marinas wrote:
> Max,
> 
> On 26 May 2013 03:42, Max Filippov <jcmvbkbc@gmail.com> wrote:
>> Hello arch and mm people.
>>
>> Is it intentional that threads of a process that invoked munmap syscall
>> can see TLB entries pointing to already freed pages, or it is a bug?
> 
> If it happens, this would be a bug. It means that a process can access
> a physical page that has been allocated to something else, possibly
> kernel data.
> 
>> I'm talking about zap_pmd_range and zap_pte_range:
>>
>>       zap_pmd_range
>>         zap_pte_range
>>           arch_enter_lazy_mmu_mode
>>             ptep_get_and_clear_full
>>             tlb_remove_tlb_entry
>>             __tlb_remove_page
>>           arch_leave_lazy_mmu_mode
>>         cond_resched
>>
>> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>> PTEs and frees corresponding pages, but doesn't flush TLB, and
>> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>> of the same process gets scheduled then it is able to see TLB entries
>> pointing to already freed physical pages.
> 
> It looks to me like cond_resched() here introduces a possible bug but
> it depends on the actual arch code, especially the
> __tlb_remove_tlb_entry() function. On ARM we record the range in
> tlb_remove_tlb_entry() and queue the pages to be removed in
> __tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0
> even for the UP case (which is also needed for hardware speculative
> TLB loads). The tlb_finish_mmu() takes care of whatever pages are left
> to be freed.
> 
> With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1,
> cond_resched() in zap_pmd_range() would cause problems.
> 
> I think possible workarounds:
> 
> 1. tlb_fast_mode() always returning 0.

This might add needless page free batching logic so not very lucrative.

> 2. add a tlb_flush_mmu(tlb) before cond_resched() in zap_pmd_range().

For !fullmm flushes it might be no-op on some arches (atleast on ARC) as we use
tlb_end_vma() to do TLB range flush. And flushing the entire TLB would be
excessive though.

Actually zap_pte_range() already has logic to flush the TLB range (if batching
runs out of space). Can we re-use that infrastructure to make sure zap_pte_range()
does it's share of TLB flushing before returning and going into cond_resched().

However with that, we need to prevent tlb_end_vma()/tlb_finish_mmu() from
duplicating the range flush - which can be done by clearing tlb->need_flush.

Now simplistically this will cause even the fullmm flushes (simple ASID increment
on ARC/ARM..) to become TLB walks to flush the individual entries so we can do
this for only for !fullmm, assuming that cond_resched() can potentially cause an
exit'ing task's thread to be scheduled in and reuse the entries.

Let me go off cook a patch to see if this might work.

-Vineet

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-29 12:00     ` Vineet Gupta
  0 siblings, 0 replies; 70+ messages in thread
From: Vineet Gupta @ 2013-05-29 12:00 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Max Filippov, linux-arch, linux-mm, linux-xtensa, Chris Zankel,
	Marc Gauthier

[Resending with fixed linux-mm address]

On 05/28/2013 08:05 PM, Catalin Marinas wrote:
> Max,
> 
> On 26 May 2013 03:42, Max Filippov <jcmvbkbc@gmail.com> wrote:
>> Hello arch and mm people.
>>
>> Is it intentional that threads of a process that invoked munmap syscall
>> can see TLB entries pointing to already freed pages, or it is a bug?
> 
> If it happens, this would be a bug. It means that a process can access
> a physical page that has been allocated to something else, possibly
> kernel data.
> 
>> I'm talking about zap_pmd_range and zap_pte_range:
>>
>>       zap_pmd_range
>>         zap_pte_range
>>           arch_enter_lazy_mmu_mode
>>             ptep_get_and_clear_full
>>             tlb_remove_tlb_entry
>>             __tlb_remove_page
>>           arch_leave_lazy_mmu_mode
>>         cond_resched
>>
>> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>> PTEs and frees corresponding pages, but doesn't flush TLB, and
>> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>> of the same process gets scheduled then it is able to see TLB entries
>> pointing to already freed physical pages.
> 
> It looks to me like cond_resched() here introduces a possible bug but
> it depends on the actual arch code, especially the
> __tlb_remove_tlb_entry() function. On ARM we record the range in
> tlb_remove_tlb_entry() and queue the pages to be removed in
> __tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0
> even for the UP case (which is also needed for hardware speculative
> TLB loads). The tlb_finish_mmu() takes care of whatever pages are left
> to be freed.
> 
> With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1,
> cond_resched() in zap_pmd_range() would cause problems.
> 
> I think possible workarounds:
> 
> 1. tlb_fast_mode() always returning 0.

This might add needless page free batching logic so not very lucrative.

> 2. add a tlb_flush_mmu(tlb) before cond_resched() in zap_pmd_range().

For !fullmm flushes it might be no-op on some arches (atleast on ARC) as we use
tlb_end_vma() to do TLB range flush. And flushing the entire TLB would be
excessive though.

Actually zap_pte_range() already has logic to flush the TLB range (if batching
runs out of space). Can we re-use that infrastructure to make sure zap_pte_range()
does it's share of TLB flushing before returning and going into cond_resched().

However with that, we need to prevent tlb_end_vma()/tlb_finish_mmu() from
duplicating the range flush - which can be done by clearing tlb->need_flush.

Now simplistically this will cause even the fullmm flushes (simple ASID increment
on ARC/ARM..) to become TLB walks to flush the individual entries so we can do
this for only for !fullmm, assuming that cond_resched() can potentially cause an
exit'ing task's thread to be scheduled in and reuse the entries.

Let me go off cook a patch to see if this might work.

-Vineet





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-29 12:00     ` Vineet Gupta
  0 siblings, 0 replies; 70+ messages in thread
From: Vineet Gupta @ 2013-05-29 12:00 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Max Filippov, linux-arch, linux-mm, linux-xtensa, Chris Zankel,
	Marc Gauthier

[Resending with fixed linux-mm address]

On 05/28/2013 08:05 PM, Catalin Marinas wrote:
> Max,
> 
> On 26 May 2013 03:42, Max Filippov <jcmvbkbc@gmail.com> wrote:
>> Hello arch and mm people.
>>
>> Is it intentional that threads of a process that invoked munmap syscall
>> can see TLB entries pointing to already freed pages, or it is a bug?
> 
> If it happens, this would be a bug. It means that a process can access
> a physical page that has been allocated to something else, possibly
> kernel data.
> 
>> I'm talking about zap_pmd_range and zap_pte_range:
>>
>>       zap_pmd_range
>>         zap_pte_range
>>           arch_enter_lazy_mmu_mode
>>             ptep_get_and_clear_full
>>             tlb_remove_tlb_entry
>>             __tlb_remove_page
>>           arch_leave_lazy_mmu_mode
>>         cond_resched
>>
>> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>> PTEs and frees corresponding pages, but doesn't flush TLB, and
>> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>> of the same process gets scheduled then it is able to see TLB entries
>> pointing to already freed physical pages.
> 
> It looks to me like cond_resched() here introduces a possible bug but
> it depends on the actual arch code, especially the
> __tlb_remove_tlb_entry() function. On ARM we record the range in
> tlb_remove_tlb_entry() and queue the pages to be removed in
> __tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0
> even for the UP case (which is also needed for hardware speculative
> TLB loads). The tlb_finish_mmu() takes care of whatever pages are left
> to be freed.
> 
> With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1,
> cond_resched() in zap_pmd_range() would cause problems.
> 
> I think possible workarounds:
> 
> 1. tlb_fast_mode() always returning 0.

This might add needless page free batching logic so not very lucrative.

> 2. add a tlb_flush_mmu(tlb) before cond_resched() in zap_pmd_range().

For !fullmm flushes it might be no-op on some arches (atleast on ARC) as we use
tlb_end_vma() to do TLB range flush. And flushing the entire TLB would be
excessive though.

Actually zap_pte_range() already has logic to flush the TLB range (if batching
runs out of space). Can we re-use that infrastructure to make sure zap_pte_range()
does it's share of TLB flushing before returning and going into cond_resched().

However with that, we need to prevent tlb_end_vma()/tlb_finish_mmu() from
duplicating the range flush - which can be done by clearing tlb->need_flush.

Now simplistically this will cause even the fullmm flushes (simple ASID increment
on ARC/ARM..) to become TLB walks to flush the individual entries so we can do
this for only for !fullmm, assuming that cond_resched() can potentially cause an
exit'ing task's thread to be scheduled in and reuse the entries.

Let me go off cook a patch to see if this might work.

-Vineet






^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
  2013-05-28  7:10     ` Max Filippov
@ 2013-05-29 12:27       ` Peter Zijlstra
  -1 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-05-29 12:27 UTC (permalink / raw)
  To: Max Filippov
  Cc: KAMEZAWA Hiroyuki, linux-arch, linux-mm, Ralf Baechle,
	Chris Zankel, Marc Gauthier, linux-xtensa, Hugh Dickins

On Tue, May 28, 2013 at 11:10:25AM +0400, Max Filippov wrote:
> On Sun, May 26, 2013 at 6:50 AM, Max Filippov <jcmvbkbc@gmail.com> wrote:
> > Hello arch and mm people.
> >
> > Is it intentional that threads of a process that invoked munmap syscall
> > can see TLB entries pointing to already freed pages, or it is a bug?
> >
> > I'm talking about zap_pmd_range and zap_pte_range:
> >
> >       zap_pmd_range
> >         zap_pte_range
> >           arch_enter_lazy_mmu_mode
> >             ptep_get_and_clear_full
> >             tlb_remove_tlb_entry
> >             __tlb_remove_page
> >           arch_leave_lazy_mmu_mode
> >         cond_resched
> >
> > With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
> > and __tlb_remove_page there is a loop in the zap_pte_range that clears
> > PTEs and frees corresponding pages, but doesn't flush TLB, and
> > surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
> > of the same process gets scheduled then it is able to see TLB entries
> > pointing to already freed physical pages.
> >
> > I've noticed that with xtensa arch when I added a test before returning to
> > userspace checking that TLB contents agrees with page tables of the
> > current mm. This check reliably fires with the LTP test mtest05 that
> > maps, unmaps and accesses memory from multiple threads.
> >
> > Is there anything wrong in my description, maybe something specific to
> > my arch, or this issue really exists?
> 
> Hi,
> 
> I've made similar checking function for MIPS (because qemu is my only choice
> and it simulates MIPS TLB) and ran my tests on mips-malta machine in qemu.
> With MIPS I can also see this issue. I hope I did it right, the patch at the
> bottom is for the reference. The test I run and the diagnostic output are as
> follows:
> 
> To me it looks like the cond_resched in the zap_pmd_range is the root cause
> of this issue (let alone SMP case for now). It was introduced in the commit
> 
> commit 97a894136f29802da19a15541de3c019e1ca147e
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date:   Tue May 24 17:12:04 2011 -0700
> 
>     mm: Remove i_mmap_lock lockbreak
> 
> Peter, Kamezawa, other reviewers of that commit, could you please comment?

Are you all running UP systems? I suppose the preemptible muck
invalidated the assumption that UP systems are 'easy'.

If you make tlb_fast_mode() return an unconditional false, does it all
work again?

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-29 12:27       ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-05-29 12:27 UTC (permalink / raw)
  To: Max Filippov
  Cc: KAMEZAWA Hiroyuki, linux-arch, linux-mm, Ralf Baechle,
	Chris Zankel, Marc Gauthier, linux-xtensa, Hugh Dickins

On Tue, May 28, 2013 at 11:10:25AM +0400, Max Filippov wrote:
> On Sun, May 26, 2013 at 6:50 AM, Max Filippov <jcmvbkbc@gmail.com> wrote:
> > Hello arch and mm people.
> >
> > Is it intentional that threads of a process that invoked munmap syscall
> > can see TLB entries pointing to already freed pages, or it is a bug?
> >
> > I'm talking about zap_pmd_range and zap_pte_range:
> >
> >       zap_pmd_range
> >         zap_pte_range
> >           arch_enter_lazy_mmu_mode
> >             ptep_get_and_clear_full
> >             tlb_remove_tlb_entry
> >             __tlb_remove_page
> >           arch_leave_lazy_mmu_mode
> >         cond_resched
> >
> > With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
> > and __tlb_remove_page there is a loop in the zap_pte_range that clears
> > PTEs and frees corresponding pages, but doesn't flush TLB, and
> > surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
> > of the same process gets scheduled then it is able to see TLB entries
> > pointing to already freed physical pages.
> >
> > I've noticed that with xtensa arch when I added a test before returning to
> > userspace checking that TLB contents agrees with page tables of the
> > current mm. This check reliably fires with the LTP test mtest05 that
> > maps, unmaps and accesses memory from multiple threads.
> >
> > Is there anything wrong in my description, maybe something specific to
> > my arch, or this issue really exists?
> 
> Hi,
> 
> I've made similar checking function for MIPS (because qemu is my only choice
> and it simulates MIPS TLB) and ran my tests on mips-malta machine in qemu.
> With MIPS I can also see this issue. I hope I did it right, the patch at the
> bottom is for the reference. The test I run and the diagnostic output are as
> follows:
> 
> To me it looks like the cond_resched in the zap_pmd_range is the root cause
> of this issue (let alone SMP case for now). It was introduced in the commit
> 
> commit 97a894136f29802da19a15541de3c019e1ca147e
> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date:   Tue May 24 17:12:04 2011 -0700
> 
>     mm: Remove i_mmap_lock lockbreak
> 
> Peter, Kamezawa, other reviewers of that commit, could you please comment?

Are you all running UP systems? I suppose the preemptible muck
invalidated the assumption that UP systems are 'easy'.

If you make tlb_fast_mode() return an unconditional false, does it all
work again?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-29 12:42         ` Vineet Gupta
  0 siblings, 0 replies; 70+ messages in thread
From: Vineet Gupta @ 2013-05-29 12:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Max Filippov, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

On 05/29/2013 05:57 PM, Peter Zijlstra wrote:
> On Tue, May 28, 2013 at 11:10:25AM +0400, Max Filippov wrote:
>> On Sun, May 26, 2013 at 6:50 AM, Max Filippov <jcmvbkbc@gmail.com> wrote:
>>> Hello arch and mm people.
>>>
>>> Is it intentional that threads of a process that invoked munmap syscall
>>> can see TLB entries pointing to already freed pages, or it is a bug?
>>>
>>> I'm talking about zap_pmd_range and zap_pte_range:
>>>
>>>       zap_pmd_range
>>>         zap_pte_range
>>>           arch_enter_lazy_mmu_mode
>>>             ptep_get_and_clear_full
>>>             tlb_remove_tlb_entry
>>>             __tlb_remove_page
>>>           arch_leave_lazy_mmu_mode
>>>         cond_resched
>>>
>>> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>>> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>>> PTEs and frees corresponding pages, but doesn't flush TLB, and
>>> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>>> of the same process gets scheduled then it is able to see TLB entries
>>> pointing to already freed physical pages.
>>>
>>> I've noticed that with xtensa arch when I added a test before returning to
>>> userspace checking that TLB contents agrees with page tables of the
>>> current mm. This check reliably fires with the LTP test mtest05 that
>>> maps, unmaps and accesses memory from multiple threads.
>>>
>>> Is there anything wrong in my description, maybe something specific to
>>> my arch, or this issue really exists?
>>
>> Hi,
>>
>> I've made similar checking function for MIPS (because qemu is my only choice
>> and it simulates MIPS TLB) and ran my tests on mips-malta machine in qemu.
>> With MIPS I can also see this issue. I hope I did it right, the patch at the
>> bottom is for the reference. The test I run and the diagnostic output are as
>> follows:
>>
>> To me it looks like the cond_resched in the zap_pmd_range is the root cause
>> of this issue (let alone SMP case for now). It was introduced in the commit
>>
>> commit 97a894136f29802da19a15541de3c019e1ca147e
>> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> Date:   Tue May 24 17:12:04 2011 -0700
>>
>>     mm: Remove i_mmap_lock lockbreak
>>
>> Peter, Kamezawa, other reviewers of that commit, could you please comment?
> 
> Are you all running UP systems? I suppose the preemptible muck
> invalidated the assumption that UP systems are 'easy'.
> 
> If you make tlb_fast_mode() return an unconditional false, does it all
> work again?
> 

It seems tlb_fast_mode() only affects the page free batching and won't affect the
TLB flush themselves unless ofcourse the batching runs out of space.

FWIW, prior to your commit d16dfc550f5326 "mm: mmu_gather rework"
tlb_finish_mmu() right before the need_resced() which would have handled the
current situation. My proposal - please see my earlier email in thread is to reuse
the force_flush logic in zap_pte_range() to do this.

-Vineet

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-29 12:42         ` Vineet Gupta
  0 siblings, 0 replies; 70+ messages in thread
From: Vineet Gupta @ 2013-05-29 12:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Max Filippov, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

On 05/29/2013 05:57 PM, Peter Zijlstra wrote:
> On Tue, May 28, 2013 at 11:10:25AM +0400, Max Filippov wrote:
>> On Sun, May 26, 2013 at 6:50 AM, Max Filippov <jcmvbkbc@gmail.com> wrote:
>>> Hello arch and mm people.
>>>
>>> Is it intentional that threads of a process that invoked munmap syscall
>>> can see TLB entries pointing to already freed pages, or it is a bug?
>>>
>>> I'm talking about zap_pmd_range and zap_pte_range:
>>>
>>>       zap_pmd_range
>>>         zap_pte_range
>>>           arch_enter_lazy_mmu_mode
>>>             ptep_get_and_clear_full
>>>             tlb_remove_tlb_entry
>>>             __tlb_remove_page
>>>           arch_leave_lazy_mmu_mode
>>>         cond_resched
>>>
>>> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>>> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>>> PTEs and frees corresponding pages, but doesn't flush TLB, and
>>> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>>> of the same process gets scheduled then it is able to see TLB entries
>>> pointing to already freed physical pages.
>>>
>>> I've noticed that with xtensa arch when I added a test before returning to
>>> userspace checking that TLB contents agrees with page tables of the
>>> current mm. This check reliably fires with the LTP test mtest05 that
>>> maps, unmaps and accesses memory from multiple threads.
>>>
>>> Is there anything wrong in my description, maybe something specific to
>>> my arch, or this issue really exists?
>>
>> Hi,
>>
>> I've made similar checking function for MIPS (because qemu is my only choice
>> and it simulates MIPS TLB) and ran my tests on mips-malta machine in qemu.
>> With MIPS I can also see this issue. I hope I did it right, the patch at the
>> bottom is for the reference. The test I run and the diagnostic output are as
>> follows:
>>
>> To me it looks like the cond_resched in the zap_pmd_range is the root cause
>> of this issue (let alone SMP case for now). It was introduced in the commit
>>
>> commit 97a894136f29802da19a15541de3c019e1ca147e
>> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> Date:   Tue May 24 17:12:04 2011 -0700
>>
>>     mm: Remove i_mmap_lock lockbreak
>>
>> Peter, Kamezawa, other reviewers of that commit, could you please comment?
> 
> Are you all running UP systems? I suppose the preemptible muck
> invalidated the assumption that UP systems are 'easy'.
> 
> If you make tlb_fast_mode() return an unconditional false, does it all
> work again?
> 

It seems tlb_fast_mode() only affects the page free batching and won't affect the
TLB flush themselves unless ofcourse the batching runs out of space.

FWIW, prior to your commit d16dfc550f5326 "mm: mmu_gather rework"
tlb_finish_mmu() right before the need_resced() which would have handled the
current situation. My proposal - please see my earlier email in thread is to reuse
the force_flush logic in zap_pte_range() to do this.

-Vineet

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-29 12:47           ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-05-29 12:47 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Max Filippov, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

On Wed, May 29, 2013 at 06:12:15PM +0530, Vineet Gupta wrote:
> It seems tlb_fast_mode() only affects the page free batching and won't affect the
> TLB flush themselves unless ofcourse the batching runs out of space.

It does, it will keep the pages around until after a flush. So you'll no
longer free pages before having done a TLB flush.

> FWIW, prior to your commit d16dfc550f5326 "mm: mmu_gather rework"
> tlb_finish_mmu() right before the need_resced() which would have handled the
> current situation. My proposal - please see my earlier email in thread is to reuse
> the force_flush logic in zap_pte_range() to do this.

I don't have earlier emails as my lkml folder is somewhat broken atm :/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-29 12:47           ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-05-29 12:47 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Max Filippov, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

On Wed, May 29, 2013 at 06:12:15PM +0530, Vineet Gupta wrote:
> It seems tlb_fast_mode() only affects the page free batching and won't affect the
> TLB flush themselves unless ofcourse the batching runs out of space.

It does, it will keep the pages around until after a flush. So you'll no
longer free pages before having done a TLB flush.

> FWIW, prior to your commit d16dfc550f5326 "mm: mmu_gather rework"
> tlb_finish_mmu() right before the need_resced() which would have handled the
> current situation. My proposal - please see my earlier email in thread is to reuse
> the force_flush logic in zap_pte_range() to do this.

I don't have earlier emails as my lkml folder is somewhat broken atm :/

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-29 17:51           ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-05-29 17:51 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Max Filippov, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

What about something like this?

---
 include/asm-generic/tlb.h | 11 ++++++++++-
 mm/memory.c               | 17 ++++++++++++++++-
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b1b1fa6..651b1cf 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -116,6 +116,7 @@ struct mmu_gather {
 
 static inline int tlb_fast_mode(struct mmu_gather *tlb)
 {
+#ifndef CONFIG_PREEMPT
 #ifdef CONFIG_SMP
 	return tlb->fast_mode;
 #else
@@ -124,7 +125,15 @@ static inline int tlb_fast_mode(struct mmu_gather *tlb)
 	 * and page free order so much..
 	 */
 	return 1;
-#endif
+#endif /* CONFIG_SMP */
+#else  /* CONFIG_PREEMPT */
+	/*
+	 * Since mmu_gather is preemptible, preemptible kernels are like SMP
+	 * kernels, we must batch to make sure we invalidate TLBs before we
+	 * free the pages.
+	 */
+	return 0;
+#endif /* CONFIG_PREEMPT */
 }
 
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
diff --git a/mm/memory.c b/mm/memory.c
index 6dc1882..e915af2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -384,6 +384,21 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table)
 
 #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
 
+static inline void cond_resched_tlb(struct mmu_gather *tlb)
+{
+#ifndef CONFIG_PREEMPT
+	/*
+	 * For full preempt kernels we must do regular batching like
+	 * SMP, see tlb_fast_mode(). For !PREEMPT we can 'cheat' and
+	 * do a flush before our voluntary 'yield'.
+	 */
+	if (need_resched()) {
+		tlb_flush_mmu(tlb);
+		cond_resched();
+	}
+#endif
+}
+
 /*
  * If a p?d_bad entry is found while walking page tables, report
  * the error, before resetting entry to p?d_none.  Usually (but
@@ -1264,7 +1279,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
 			goto next;
 		next = zap_pte_range(tlb, vma, pmd, addr, next, details);
 next:
-		cond_resched();
+		cond_resched_tlb(tlb);
 	} while (pmd++, addr = next, addr != end);
 
 	return addr;


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-29 17:51           ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-05-29 17:51 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Max Filippov, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

What about something like this?

---
 include/asm-generic/tlb.h | 11 ++++++++++-
 mm/memory.c               | 17 ++++++++++++++++-
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b1b1fa6..651b1cf 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -116,6 +116,7 @@ struct mmu_gather {
 
 static inline int tlb_fast_mode(struct mmu_gather *tlb)
 {
+#ifndef CONFIG_PREEMPT
 #ifdef CONFIG_SMP
 	return tlb->fast_mode;
 #else
@@ -124,7 +125,15 @@ static inline int tlb_fast_mode(struct mmu_gather *tlb)
 	 * and page free order so much..
 	 */
 	return 1;
-#endif
+#endif /* CONFIG_SMP */
+#else  /* CONFIG_PREEMPT */
+	/*
+	 * Since mmu_gather is preemptible, preemptible kernels are like SMP
+	 * kernels, we must batch to make sure we invalidate TLBs before we
+	 * free the pages.
+	 */
+	return 0;
+#endif /* CONFIG_PREEMPT */
 }
 
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
diff --git a/mm/memory.c b/mm/memory.c
index 6dc1882..e915af2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -384,6 +384,21 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table)
 
 #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
 
+static inline void cond_resched_tlb(struct mmu_gather *tlb)
+{
+#ifndef CONFIG_PREEMPT
+	/*
+	 * For full preempt kernels we must do regular batching like
+	 * SMP, see tlb_fast_mode(). For !PREEMPT we can 'cheat' and
+	 * do a flush before our voluntary 'yield'.
+	 */
+	if (need_resched()) {
+		tlb_flush_mmu(tlb);
+		cond_resched();
+	}
+#endif
+}
+
 /*
  * If a p?d_bad entry is found while walking page tables, report
  * the error, before resetting entry to p?d_none.  Usually (but
@@ -1264,7 +1279,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
 			goto next;
 		next = zap_pte_range(tlb, vma, pmd, addr, next, details);
 next:
-		cond_resched();
+		cond_resched_tlb(tlb);
 	} while (pmd++, addr = next, addr != end);
 
 	return addr;



^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-29 22:04             ` Catalin Marinas
  0 siblings, 0 replies; 70+ messages in thread
From: Catalin Marinas @ 2013-05-29 22:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vineet Gupta, Max Filippov, KAMEZAWA Hiroyuki, linux-arch,
	linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins

On 29 May 2013 18:51, Peter Zijlstra <peterz@infradead.org> wrote:
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -384,6 +384,21 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table)
>
>  #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
>
> +static inline void cond_resched_tlb(struct mmu_gather *tlb)
> +{
> +#ifndef CONFIG_PREEMPT
> +       /*
> +        * For full preempt kernels we must do regular batching like
> +        * SMP, see tlb_fast_mode(). For !PREEMPT we can 'cheat' and
> +        * do a flush before our voluntary 'yield'.
> +        */
> +       if (need_resched()) {
> +               tlb_flush_mmu(tlb);
> +               cond_resched();
> +       }
> +#endif
> +}

Does it matter that in the CONFIG_PREEMPT case, you no longer call
cond_resched()? I guess we can just rely on the kernel full preemption
to reschedule as needed.

--
Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-29 22:04             ` Catalin Marinas
  0 siblings, 0 replies; 70+ messages in thread
From: Catalin Marinas @ 2013-05-29 22:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vineet Gupta, Max Filippov, KAMEZAWA Hiroyuki, linux-arch,
	linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins

On 29 May 2013 18:51, Peter Zijlstra <peterz@infradead.org> wrote:
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -384,6 +384,21 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table)
>
>  #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
>
> +static inline void cond_resched_tlb(struct mmu_gather *tlb)
> +{
> +#ifndef CONFIG_PREEMPT
> +       /*
> +        * For full preempt kernels we must do regular batching like
> +        * SMP, see tlb_fast_mode(). For !PREEMPT we can 'cheat' and
> +        * do a flush before our voluntary 'yield'.
> +        */
> +       if (need_resched()) {
> +               tlb_flush_mmu(tlb);
> +               cond_resched();
> +       }
> +#endif
> +}

Does it matter that in the CONFIG_PREEMPT case, you no longer call
cond_resched()? I guess we can just rely on the kernel full preemption
to reschedule as needed.

--
Catalin

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
  2013-05-29 17:51           ` Peter Zijlstra
@ 2013-05-30  5:04             ` Vineet Gupta
  -1 siblings, 0 replies; 70+ messages in thread
From: Vineet Gupta @ 2013-05-30  5:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Max Filippov, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

On 05/29/2013 11:21 PM, Peter Zijlstra wrote:
> What about something like this?
>
> ---
>  include/asm-generic/tlb.h | 11 ++++++++++-
>  mm/memory.c               | 17 ++++++++++++++++-
>  2 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index b1b1fa6..651b1cf 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -116,6 +116,7 @@ struct mmu_gather {
>  
>  static inline int tlb_fast_mode(struct mmu_gather *tlb)
>  {
> +#ifndef CONFIG_PREEMPT
>  #ifdef CONFIG_SMP
>  	return tlb->fast_mode;
>  #else
> @@ -124,7 +125,15 @@ static inline int tlb_fast_mode(struct mmu_gather *tlb)
>  	 * and page free order so much..
>  	 */
>  	return 1;
> -#endif
> +#endif /* CONFIG_SMP */
> +#else  /* CONFIG_PREEMPT */
> +	/*
> +	 * Since mmu_gather is preemptible, preemptible kernels are like SMP
> +	 * kernels, we must batch to make sure we invalidate TLBs before we
> +	 * free the pages.
> +	 */
> +	return 0;
> +#endif /* CONFIG_PREEMPT */
>  }

So this adds the page batching logic to small/simpler UP systems - but it's
necessary evil :-(

>  void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
> diff --git a/mm/memory.c b/mm/memory.c
> index 6dc1882..e915af2 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -384,6 +384,21 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table)
>  
>  #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
>  
> +static inline void cond_resched_tlb(struct mmu_gather *tlb)
> +{
> +#ifndef CONFIG_PREEMPT
> +	/*
> +	 * For full preempt kernels we must do regular batching like
> +	 * SMP, see tlb_fast_mode(). For !PREEMPT we can 'cheat' and
> +	 * do a flush before our voluntary 'yield'.
> +	 */
> +	if (need_resched()) {

This is really neat: w/o this check, a @fullmm flush (exit/execve) would have
suffered multiple full TLB flushes in the loop, now you do that only if a
scheduling was needed - meaning only in the case when we have the potential race
condition which Max was seeing. Cool !

> +		tlb_flush_mmu(tlb);
> +		cond_resched();
> +	}
> +#endif
> +}
> +
>  /*
>   * If a p?d_bad entry is found while walking page tables, report
>   * the error, before resetting entry to p?d_none.  Usually (but
> @@ -1264,7 +1279,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
>  			goto next;
>  		next = zap_pte_range(tlb, vma, pmd, addr, next, details);
>  next:
> -		cond_resched();
> +		cond_resched_tlb(tlb);
>  	} while (pmd++, addr = next, addr != end);
>  
>  	return addr;

BTW, since we are on the topic, it seems that we are missing tlb_fast_mode() in
one spot - unless it is tied to rcu table free stuff.

-------------->
From: Vineet Gupta <vgupta@synopsys.com>
Date: Thu, 30 May 2013 10:25:30 +0530
Subject: [PATCH] mm: tlb_fast_mode check missing in tlb_finish_mmu()

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 mm/memory.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index d9d5fd9..569ffe1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -269,6 +269,9 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long
start, unsigned long e
     /* keep the page table cache within bounds */
     check_pgt_cache();
 
+    if (tlb_fast_mode(tlb))
+        return;
+
     for (batch = tlb->local.next; batch; batch = next) {
         next = batch->next;
         free_pages((unsigned long)batch, 0);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-30  5:04             ` Vineet Gupta
  0 siblings, 0 replies; 70+ messages in thread
From: Vineet Gupta @ 2013-05-30  5:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Max Filippov, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

On 05/29/2013 11:21 PM, Peter Zijlstra wrote:
> What about something like this?
>
> ---
>  include/asm-generic/tlb.h | 11 ++++++++++-
>  mm/memory.c               | 17 ++++++++++++++++-
>  2 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index b1b1fa6..651b1cf 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -116,6 +116,7 @@ struct mmu_gather {
>  
>  static inline int tlb_fast_mode(struct mmu_gather *tlb)
>  {
> +#ifndef CONFIG_PREEMPT
>  #ifdef CONFIG_SMP
>  	return tlb->fast_mode;
>  #else
> @@ -124,7 +125,15 @@ static inline int tlb_fast_mode(struct mmu_gather *tlb)
>  	 * and page free order so much..
>  	 */
>  	return 1;
> -#endif
> +#endif /* CONFIG_SMP */
> +#else  /* CONFIG_PREEMPT */
> +	/*
> +	 * Since mmu_gather is preemptible, preemptible kernels are like SMP
> +	 * kernels, we must batch to make sure we invalidate TLBs before we
> +	 * free the pages.
> +	 */
> +	return 0;
> +#endif /* CONFIG_PREEMPT */
>  }

So this adds the page batching logic to small/simpler UP systems - but it's
necessary evil :-(

>  void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
> diff --git a/mm/memory.c b/mm/memory.c
> index 6dc1882..e915af2 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -384,6 +384,21 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table)
>  
>  #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
>  
> +static inline void cond_resched_tlb(struct mmu_gather *tlb)
> +{
> +#ifndef CONFIG_PREEMPT
> +	/*
> +	 * For full preempt kernels we must do regular batching like
> +	 * SMP, see tlb_fast_mode(). For !PREEMPT we can 'cheat' and
> +	 * do a flush before our voluntary 'yield'.
> +	 */
> +	if (need_resched()) {

This is really neat: w/o this check, a @fullmm flush (exit/execve) would have
suffered multiple full TLB flushes in the loop, now you do that only if a
scheduling was needed - meaning only in the case when we have the potential race
condition which Max was seeing. Cool !

> +		tlb_flush_mmu(tlb);
> +		cond_resched();
> +	}
> +#endif
> +}
> +
>  /*
>   * If a p?d_bad entry is found while walking page tables, report
>   * the error, before resetting entry to p?d_none.  Usually (but
> @@ -1264,7 +1279,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
>  			goto next;
>  		next = zap_pte_range(tlb, vma, pmd, addr, next, details);
>  next:
> -		cond_resched();
> +		cond_resched_tlb(tlb);
>  	} while (pmd++, addr = next, addr != end);
>  
>  	return addr;

BTW, since we are on the topic, it seems that we are missing tlb_fast_mode() in
one spot - unless it is tied to rcu table free stuff.

-------------->
From: Vineet Gupta <vgupta@synopsys.com>
Date: Thu, 30 May 2013 10:25:30 +0530
Subject: [PATCH] mm: tlb_fast_mode check missing in tlb_finish_mmu()

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
---
 mm/memory.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index d9d5fd9..569ffe1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -269,6 +269,9 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long
start, unsigned long e
     /* keep the page table cache within bounds */
     check_pgt_cache();
 
+    if (tlb_fast_mode(tlb))
+        return;
+
     for (batch = tlb->local.next; batch; batch = next) {
         next = batch->next;
         free_pages((unsigned long)batch, 0);
-- 
1.7.10.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-30  6:48               ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-05-30  6:48 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Vineet Gupta, Max Filippov, KAMEZAWA Hiroyuki, linux-arch,
	linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins

On Wed, May 29, 2013 at 11:04:35PM +0100, Catalin Marinas wrote:
> On 29 May 2013 18:51, Peter Zijlstra <peterz@infradead.org> wrote:
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -384,6 +384,21 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table)
> >
> >  #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> >
> > +static inline void cond_resched_tlb(struct mmu_gather *tlb)
> > +{
> > +#ifndef CONFIG_PREEMPT
> > +       /*
> > +        * For full preempt kernels we must do regular batching like
> > +        * SMP, see tlb_fast_mode(). For !PREEMPT we can 'cheat' and
> > +        * do a flush before our voluntary 'yield'.
> > +        */
> > +       if (need_resched()) {
> > +               tlb_flush_mmu(tlb);
> > +               cond_resched();
> > +       }
> > +#endif
> > +}
> 
> Does it matter that in the CONFIG_PREEMPT case, you no longer call
> cond_resched()? I guess we can just rely on the kernel full preemption
> to reschedule as needed.

Exactly, the preempt_enable from the spin_unlock in pte_unmap_unlock()
will most likely immediately trigger a preemption.

And since we do full batching for PREEMPT doing extra flushes would be
detrimental for performance -- however unlikely it is we'll still see
the need_resched() there.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-30  6:48               ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-05-30  6:48 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Vineet Gupta, Max Filippov, KAMEZAWA Hiroyuki, linux-arch,
	linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins

On Wed, May 29, 2013 at 11:04:35PM +0100, Catalin Marinas wrote:
> On 29 May 2013 18:51, Peter Zijlstra <peterz@infradead.org> wrote:
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -384,6 +384,21 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table)
> >
> >  #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> >
> > +static inline void cond_resched_tlb(struct mmu_gather *tlb)
> > +{
> > +#ifndef CONFIG_PREEMPT
> > +       /*
> > +        * For full preempt kernels we must do regular batching like
> > +        * SMP, see tlb_fast_mode(). For !PREEMPT we can 'cheat' and
> > +        * do a flush before our voluntary 'yield'.
> > +        */
> > +       if (need_resched()) {
> > +               tlb_flush_mmu(tlb);
> > +               cond_resched();
> > +       }
> > +#endif
> > +}
> 
> Does it matter that in the CONFIG_PREEMPT case, you no longer call
> cond_resched()? I guess we can just rely on the kernel full preemption
> to reschedule as needed.

Exactly, the preempt_enable from the spin_unlock in pte_unmap_unlock()
will most likely immediately trigger a preemption.

And since we do full batching for PREEMPT doing extra flushes would be
detrimental for performance -- however unlikely it is we'll still see
the need_resched() there.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-30  6:56               ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-05-30  6:56 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Max Filippov, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

On Thu, May 30, 2013 at 10:34:53AM +0530, Vineet Gupta wrote:
> On 05/29/2013 11:21 PM, Peter Zijlstra wrote:
> > What about something like this?
> >
> > ---
> >  include/asm-generic/tlb.h | 11 ++++++++++-
> >  mm/memory.c               | 17 ++++++++++++++++-
> >  2 files changed, 26 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> > index b1b1fa6..651b1cf 100644
> > --- a/include/asm-generic/tlb.h
> > +++ b/include/asm-generic/tlb.h
> > @@ -116,6 +116,7 @@ struct mmu_gather {
> >  
> >  static inline int tlb_fast_mode(struct mmu_gather *tlb)
> >  {
> > +#ifndef CONFIG_PREEMPT
> >  #ifdef CONFIG_SMP
> >  	return tlb->fast_mode;
> >  #else
> > @@ -124,7 +125,15 @@ static inline int tlb_fast_mode(struct mmu_gather *tlb)
> >  	 * and page free order so much..
> >  	 */
> >  	return 1;
> > -#endif
> > +#endif /* CONFIG_SMP */
> > +#else  /* CONFIG_PREEMPT */
> > +	/*
> > +	 * Since mmu_gather is preemptible, preemptible kernels are like SMP
> > +	 * kernels, we must batch to make sure we invalidate TLBs before we
> > +	 * free the pages.
> > +	 */
> > +	return 0;
> > +#endif /* CONFIG_PREEMPT */
> >  }
> 
> So this adds the page batching logic to small/simpler UP systems - but it's
> necessary evil :-(

Well, only if you have CONFIG_PREEMPT=y, at which point UP isn't really
_that_ simply anymore really.

> >  void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 6dc1882..e915af2 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -384,6 +384,21 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table)
> >  
> >  #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> >  
> > +static inline void cond_resched_tlb(struct mmu_gather *tlb)
> > +{
> > +#ifndef CONFIG_PREEMPT
> > +	/*
> > +	 * For full preempt kernels we must do regular batching like
> > +	 * SMP, see tlb_fast_mode(). For !PREEMPT we can 'cheat' and
> > +	 * do a flush before our voluntary 'yield'.
> > +	 */
> > +	if (need_resched()) {
> 
> This is really neat: w/o this check, a @fullmm flush (exit/execve) would have
> suffered multiple full TLB flushes in the loop, now you do that only if a
> scheduling was needed - meaning only in the case when we have the potential race
> condition which Max was seeing. Cool !

Glad you like it ;-)

> > +		tlb_flush_mmu(tlb);
> > +		cond_resched();
> > +	}
> > +#endif
> > +}
> > +
> >  /*
> >   * If a p?d_bad entry is found while walking page tables, report
> >   * the error, before resetting entry to p?d_none.  Usually (but
> > @@ -1264,7 +1279,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
> >  			goto next;
> >  		next = zap_pte_range(tlb, vma, pmd, addr, next, details);
> >  next:
> > -		cond_resched();
> > +		cond_resched_tlb(tlb);
> >  	} while (pmd++, addr = next, addr != end);
> >  
> >  	return addr;
> 
> BTW, since we are on the topic, it seems that we are missing tlb_fast_mode() in
> one spot - unless it is tied to rcu table free stuff.
> 
> -------------->
> From: Vineet Gupta <vgupta@synopsys.com>
> Date: Thu, 30 May 2013 10:25:30 +0530
> Subject: [PATCH] mm: tlb_fast_mode check missing in tlb_finish_mmu()
> 
> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> ---
>  mm/memory.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index d9d5fd9..569ffe1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -269,6 +269,9 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long
> start, unsigned long e
>      /* keep the page table cache within bounds */
>      check_pgt_cache();
>  
> +    if (tlb_fast_mode(tlb))
> +        return;
> +
>      for (batch = tlb->local.next; batch; batch = next) {
>          next = batch->next;
>          free_pages((unsigned long)batch, 0);

Yes I think that is possible. It would shrink the code a little when
fast_mode was unconditionally 1 -- ie. simple UP ;-).



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-30  6:56               ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-05-30  6:56 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Max Filippov, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

On Thu, May 30, 2013 at 10:34:53AM +0530, Vineet Gupta wrote:
> On 05/29/2013 11:21 PM, Peter Zijlstra wrote:
> > What about something like this?
> >
> > ---
> >  include/asm-generic/tlb.h | 11 ++++++++++-
> >  mm/memory.c               | 17 ++++++++++++++++-
> >  2 files changed, 26 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> > index b1b1fa6..651b1cf 100644
> > --- a/include/asm-generic/tlb.h
> > +++ b/include/asm-generic/tlb.h
> > @@ -116,6 +116,7 @@ struct mmu_gather {
> >  
> >  static inline int tlb_fast_mode(struct mmu_gather *tlb)
> >  {
> > +#ifndef CONFIG_PREEMPT
> >  #ifdef CONFIG_SMP
> >  	return tlb->fast_mode;
> >  #else
> > @@ -124,7 +125,15 @@ static inline int tlb_fast_mode(struct mmu_gather *tlb)
> >  	 * and page free order so much..
> >  	 */
> >  	return 1;
> > -#endif
> > +#endif /* CONFIG_SMP */
> > +#else  /* CONFIG_PREEMPT */
> > +	/*
> > +	 * Since mmu_gather is preemptible, preemptible kernels are like SMP
> > +	 * kernels, we must batch to make sure we invalidate TLBs before we
> > +	 * free the pages.
> > +	 */
> > +	return 0;
> > +#endif /* CONFIG_PREEMPT */
> >  }
> 
> So this adds the page batching logic to small/simpler UP systems - but it's
> necessary evil :-(

Well, only if you have CONFIG_PREEMPT=y, at which point UP isn't really
_that_ simply anymore really.

> >  void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 6dc1882..e915af2 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -384,6 +384,21 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table)
> >  
> >  #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
> >  
> > +static inline void cond_resched_tlb(struct mmu_gather *tlb)
> > +{
> > +#ifndef CONFIG_PREEMPT
> > +	/*
> > +	 * For full preempt kernels we must do regular batching like
> > +	 * SMP, see tlb_fast_mode(). For !PREEMPT we can 'cheat' and
> > +	 * do a flush before our voluntary 'yield'.
> > +	 */
> > +	if (need_resched()) {
> 
> This is really neat: w/o this check, a @fullmm flush (exit/execve) would have
> suffered multiple full TLB flushes in the loop, now you do that only if a
> scheduling was needed - meaning only in the case when we have the potential race
> condition which Max was seeing. Cool !

Glad you like it ;-)

> > +		tlb_flush_mmu(tlb);
> > +		cond_resched();
> > +	}
> > +#endif
> > +}
> > +
> >  /*
> >   * If a p?d_bad entry is found while walking page tables, report
> >   * the error, before resetting entry to p?d_none.  Usually (but
> > @@ -1264,7 +1279,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
> >  			goto next;
> >  		next = zap_pte_range(tlb, vma, pmd, addr, next, details);
> >  next:
> > -		cond_resched();
> > +		cond_resched_tlb(tlb);
> >  	} while (pmd++, addr = next, addr != end);
> >  
> >  	return addr;
> 
> BTW, since we are on the topic, it seems that we are missing tlb_fast_mode() in
> one spot - unless it is tied to rcu table free stuff.
> 
> -------------->
> From: Vineet Gupta <vgupta@synopsys.com>
> Date: Thu, 30 May 2013 10:25:30 +0530
> Subject: [PATCH] mm: tlb_fast_mode check missing in tlb_finish_mmu()
> 
> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> ---
>  mm/memory.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index d9d5fd9..569ffe1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -269,6 +269,9 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long
> start, unsigned long e
>      /* keep the page table cache within bounds */
>      check_pgt_cache();
>  
> +    if (tlb_fast_mode(tlb))
> +        return;
> +
>      for (batch = tlb->local.next; batch; batch = next) {
>          next = batch->next;
>          free_pages((unsigned long)batch, 0);

Yes I think that is possible. It would shrink the code a little when
fast_mode was unconditionally 1 -- ie. simple UP ;-).




^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-30  7:00                 ` Vineet Gupta
  0 siblings, 0 replies; 70+ messages in thread
From: Vineet Gupta @ 2013-05-30  7:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Max Filippov, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

On 05/30/2013 12:26 PM, Peter Zijlstra wrote:
> On Thu, May 30, 2013 at 10:34:53AM +0530, Vineet Gupta wrote:
>> On 05/29/2013 11:21 PM, Peter Zijlstra wrote:
>>
>> BTW, since we are on the topic, it seems that we are missing tlb_fast_mode() in
>> one spot - unless it is tied to rcu table free stuff.
>>
>> -------------->
>> From: Vineet Gupta <vgupta@synopsys.com>
>> Date: Thu, 30 May 2013 10:25:30 +0530
>> Subject: [PATCH] mm: tlb_fast_mode check missing in tlb_finish_mmu()
>>
>> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
>> ---
>>  mm/memory.c |    3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index d9d5fd9..569ffe1 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -269,6 +269,9 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long
>> start, unsigned long e
>>      /* keep the page table cache within bounds */
>>      check_pgt_cache();
>>  
>> +    if (tlb_fast_mode(tlb))
>> +        return;
>> +
>>      for (batch = tlb->local.next; batch; batch = next) {
>>          next = batch->next;
>>          free_pages((unsigned long)batch, 0);
> Yes I think that is possible. It would shrink the code a little when
> fast_mode was unconditionally 1 -- ie. simple UP ;-).

Exactly ! Can you please revert with a Reviewed-by/Acked-by so I can formally send
it over to linux-mm list.

Thx,
-Vineet

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-30  7:00                 ` Vineet Gupta
  0 siblings, 0 replies; 70+ messages in thread
From: Vineet Gupta @ 2013-05-30  7:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Max Filippov, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

On 05/30/2013 12:26 PM, Peter Zijlstra wrote:
> On Thu, May 30, 2013 at 10:34:53AM +0530, Vineet Gupta wrote:
>> On 05/29/2013 11:21 PM, Peter Zijlstra wrote:
>>
>> BTW, since we are on the topic, it seems that we are missing tlb_fast_mode() in
>> one spot - unless it is tied to rcu table free stuff.
>>
>> -------------->
>> From: Vineet Gupta <vgupta@synopsys.com>
>> Date: Thu, 30 May 2013 10:25:30 +0530
>> Subject: [PATCH] mm: tlb_fast_mode check missing in tlb_finish_mmu()
>>
>> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
>> ---
>>  mm/memory.c |    3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index d9d5fd9..569ffe1 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -269,6 +269,9 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long
>> start, unsigned long e
>>      /* keep the page table cache within bounds */
>>      check_pgt_cache();
>>  
>> +    if (tlb_fast_mode(tlb))
>> +        return;
>> +
>>      for (batch = tlb->local.next; batch; batch = next) {
>>          next = batch->next;
>>          free_pages((unsigned long)batch, 0);
> Yes I think that is possible. It would shrink the code a little when
> fast_mode was unconditionally 1 -- ie. simple UP ;-).

Exactly ! Can you please revert with a Reviewed-by/Acked-by so I can formally send
it over to linux-mm list.

Thx,
-Vineet

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-30 11:03                   ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-05-30 11:03 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Max Filippov, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

On Thu, May 30, 2013 at 12:30:51PM +0530, Vineet Gupta wrote:
> On 05/30/2013 12:26 PM, Peter Zijlstra wrote:
> > On Thu, May 30, 2013 at 10:34:53AM +0530, Vineet Gupta wrote:
> >> On 05/29/2013 11:21 PM, Peter Zijlstra wrote:
> >>
> >> BTW, since we are on the topic, it seems that we are missing tlb_fast_mode() in
> >> one spot - unless it is tied to rcu table free stuff.
> >>
> >> -------------->
> >> From: Vineet Gupta <vgupta@synopsys.com>
> >> Date: Thu, 30 May 2013 10:25:30 +0530
> >> Subject: [PATCH] mm: tlb_fast_mode check missing in tlb_finish_mmu()
> >>
> >> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> >> ---
> >>  mm/memory.c |    3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index d9d5fd9..569ffe1 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -269,6 +269,9 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long
> >> start, unsigned long e
> >>      /* keep the page table cache within bounds */
> >>      check_pgt_cache();
> >>  
> >> +    if (tlb_fast_mode(tlb))
> >> +        return;
> >> +
> >>      for (batch = tlb->local.next; batch; batch = next) {
> >>          next = batch->next;
> >>          free_pages((unsigned long)batch, 0);
> > Yes I think that is possible. It would shrink the code a little when
> > fast_mode was unconditionally 1 -- ie. simple UP ;-).
> 
> Exactly ! Can you please revert with a Reviewed-by/Acked-by so I can formally send
> it over to linux-mm list.

s/revert/reply/?

Acked-by: Peter Zijlstra <peterz@infradead.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-30 11:03                   ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-05-30 11:03 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Max Filippov, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

On Thu, May 30, 2013 at 12:30:51PM +0530, Vineet Gupta wrote:
> On 05/30/2013 12:26 PM, Peter Zijlstra wrote:
> > On Thu, May 30, 2013 at 10:34:53AM +0530, Vineet Gupta wrote:
> >> On 05/29/2013 11:21 PM, Peter Zijlstra wrote:
> >>
> >> BTW, since we are on the topic, it seems that we are missing tlb_fast_mode() in
> >> one spot - unless it is tied to rcu table free stuff.
> >>
> >> -------------->
> >> From: Vineet Gupta <vgupta@synopsys.com>
> >> Date: Thu, 30 May 2013 10:25:30 +0530
> >> Subject: [PATCH] mm: tlb_fast_mode check missing in tlb_finish_mmu()
> >>
> >> Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
> >> ---
> >>  mm/memory.c |    3 +++
> >>  1 file changed, 3 insertions(+)
> >>
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index d9d5fd9..569ffe1 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -269,6 +269,9 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long
> >> start, unsigned long e
> >>      /* keep the page table cache within bounds */
> >>      check_pgt_cache();
> >>  
> >> +    if (tlb_fast_mode(tlb))
> >> +        return;
> >> +
> >>      for (batch = tlb->local.next; batch; batch = next) {
> >>          next = batch->next;
> >>          free_pages((unsigned long)batch, 0);
> > Yes I think that is possible. It would shrink the code a little when
> > fast_mode was unconditionally 1 -- ie. simple UP ;-).
> 
> Exactly ! Can you please revert with a Reviewed-by/Acked-by so I can formally send
> it over to linux-mm list.

s/revert/reply/?

Acked-by: Peter Zijlstra <peterz@infradead.org>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-31  1:26         ` Max Filippov
  0 siblings, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-05-31  1:26 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

On Wed, May 29, 2013 at 2:15 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Wed, May 29, 2013 at 05:15:28AM +0100, Max Filippov wrote:
>> On Tue, May 28, 2013 at 6:35 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> > On 26 May 2013 03:42, Max Filippov <jcmvbkbc@gmail.com> wrote:
>> >> Is it intentional that threads of a process that invoked munmap syscall
>> >> can see TLB entries pointing to already freed pages, or it is a bug?
>> >
>> > If it happens, this would be a bug. It means that a process can access
>> > a physical page that has been allocated to something else, possibly
>> > kernel data.
>> >
>> >> I'm talking about zap_pmd_range and zap_pte_range:
>> >>
>> >>       zap_pmd_range
>> >>         zap_pte_range
>> >>           arch_enter_lazy_mmu_mode
>> >>             ptep_get_and_clear_full
>> >>             tlb_remove_tlb_entry
>> >>             __tlb_remove_page
>> >>           arch_leave_lazy_mmu_mode
>> >>         cond_resched
>> >>
>> >> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>> >> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>> >> PTEs and frees corresponding pages, but doesn't flush TLB, and
>> >> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>> >> of the same process gets scheduled then it is able to see TLB entries
>> >> pointing to already freed physical pages.
>> >
>> > It looks to me like cond_resched() here introduces a possible bug but
>> > it depends on the actual arch code, especially the
>> > __tlb_remove_tlb_entry() function. On ARM we record the range in
>> > tlb_remove_tlb_entry() and queue the pages to be removed in
>> > __tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0
>> > even for the UP case (which is also needed for hardware speculative
>> > TLB loads). The tlb_finish_mmu() takes care of whatever pages are left
>> > to be freed.
>> >
>> > With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1,
>> > cond_resched() in zap_pmd_range() would cause problems.
>>
>> So, looks like most architectures in the UP configuration should have
>> this issue (unless they flush TLB in the switch_mm, even when switching
>> to the same mm):
>
> switch_mm() wouldn't be called if switching to the same mm. You could do

Hmm... Strange, but as far as I can tell from the context_switch it would.

> it in switch_to() but it's not efficient (or before returning to user
> space on the same processor).
>
> Do you happen to have a user-space test for this? Something like one

I only had mtest05 from LTP that triggered TLB/PTE inconsistency, but
not anything that would really try to peek at the freed page. I can make
such test though.

> thread does an mmap(), writes some poison value, munmap(). The other
> thread keeps checking the poison value while trapping and ignoring any
> SIGSEGV. If it's working correctly, the second thread should either get
> a SIGSEGV or read the poison value.

-- 
Thanks.
-- Max

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-31  1:26         ` Max Filippov
  0 siblings, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-05-31  1:26 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

On Wed, May 29, 2013 at 2:15 PM, Catalin Marinas
<catalin.marinas@arm.com> wrote:
> On Wed, May 29, 2013 at 05:15:28AM +0100, Max Filippov wrote:
>> On Tue, May 28, 2013 at 6:35 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
>> > On 26 May 2013 03:42, Max Filippov <jcmvbkbc@gmail.com> wrote:
>> >> Is it intentional that threads of a process that invoked munmap syscall
>> >> can see TLB entries pointing to already freed pages, or it is a bug?
>> >
>> > If it happens, this would be a bug. It means that a process can access
>> > a physical page that has been allocated to something else, possibly
>> > kernel data.
>> >
>> >> I'm talking about zap_pmd_range and zap_pte_range:
>> >>
>> >>       zap_pmd_range
>> >>         zap_pte_range
>> >>           arch_enter_lazy_mmu_mode
>> >>             ptep_get_and_clear_full
>> >>             tlb_remove_tlb_entry
>> >>             __tlb_remove_page
>> >>           arch_leave_lazy_mmu_mode
>> >>         cond_resched
>> >>
>> >> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>> >> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>> >> PTEs and frees corresponding pages, but doesn't flush TLB, and
>> >> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>> >> of the same process gets scheduled then it is able to see TLB entries
>> >> pointing to already freed physical pages.
>> >
>> > It looks to me like cond_resched() here introduces a possible bug but
>> > it depends on the actual arch code, especially the
>> > __tlb_remove_tlb_entry() function. On ARM we record the range in
>> > tlb_remove_tlb_entry() and queue the pages to be removed in
>> > __tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0
>> > even for the UP case (which is also needed for hardware speculative
>> > TLB loads). The tlb_finish_mmu() takes care of whatever pages are left
>> > to be freed.
>> >
>> > With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1,
>> > cond_resched() in zap_pmd_range() would cause problems.
>>
>> So, looks like most architectures in the UP configuration should have
>> this issue (unless they flush TLB in the switch_mm, even when switching
>> to the same mm):
>
> switch_mm() wouldn't be called if switching to the same mm. You could do

Hmm... Strange, but as far as I can tell from the context_switch it would.

> it in switch_to() but it's not efficient (or before returning to user
> space on the same processor).
>
> Do you happen to have a user-space test for this? Something like one

I only had mtest05 from LTP that triggered TLB/PTE inconsistency, but
not anything that would really try to peek at the freed page. I can make
such test though.

> thread does an mmap(), writes some poison value, munmap(). The other
> thread keeps checking the poison value while trapping and ignoring any
> SIGSEGV. If it's working correctly, the second thread should either get
> a SIGSEGV or read the poison value.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-31  1:40         ` Max Filippov
  0 siblings, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-05-31  1:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KAMEZAWA Hiroyuki, linux-arch, linux-mm, Ralf Baechle,
	Chris Zankel, Marc Gauthier, linux-xtensa, Hugh Dickins

On Wed, May 29, 2013 at 4:27 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, May 28, 2013 at 11:10:25AM +0400, Max Filippov wrote:
>> On Sun, May 26, 2013 at 6:50 AM, Max Filippov <jcmvbkbc@gmail.com> wrote:
>> > Hello arch and mm people.
>> >
>> > Is it intentional that threads of a process that invoked munmap syscall
>> > can see TLB entries pointing to already freed pages, or it is a bug?
>> >
>> > I'm talking about zap_pmd_range and zap_pte_range:
>> >
>> >       zap_pmd_range
>> >         zap_pte_range
>> >           arch_enter_lazy_mmu_mode
>> >             ptep_get_and_clear_full
>> >             tlb_remove_tlb_entry
>> >             __tlb_remove_page
>> >           arch_leave_lazy_mmu_mode
>> >         cond_resched
>> >
>> > With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>> > and __tlb_remove_page there is a loop in the zap_pte_range that clears
>> > PTEs and frees corresponding pages, but doesn't flush TLB, and
>> > surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>> > of the same process gets scheduled then it is able to see TLB entries
>> > pointing to already freed physical pages.
>> >
>> > I've noticed that with xtensa arch when I added a test before returning to
>> > userspace checking that TLB contents agrees with page tables of the
>> > current mm. This check reliably fires with the LTP test mtest05 that
>> > maps, unmaps and accesses memory from multiple threads.
>> >
>> > Is there anything wrong in my description, maybe something specific to
>> > my arch, or this issue really exists?
>>
>> Hi,
>>
>> I've made similar checking function for MIPS (because qemu is my only choice
>> and it simulates MIPS TLB) and ran my tests on mips-malta machine in qemu.
>> With MIPS I can also see this issue. I hope I did it right, the patch at the
>> bottom is for the reference. The test I run and the diagnostic output are as
>> follows:
>>
>> To me it looks like the cond_resched in the zap_pmd_range is the root cause
>> of this issue (let alone SMP case for now). It was introduced in the commit
>>
>> commit 97a894136f29802da19a15541de3c019e1ca147e
>> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> Date:   Tue May 24 17:12:04 2011 -0700
>>
>>     mm: Remove i_mmap_lock lockbreak
>>
>> Peter, Kamezawa, other reviewers of that commit, could you please comment?
>
> Are you all running UP systems? I suppose the preemptible muck

Yes, xtensa SMP support is in the middle of forward porting/cleanup process.

> invalidated the assumption that UP systems are 'easy'.
>
> If you make tlb_fast_mode() return an unconditional false, does it all
> work again?

Actually the only thing that was visibly broken is the test for
TLB/PTE coherency,
but that incoherency appears to be intentional (and the test is too
simple for that).
Unconditionally returning false from tlb_fast_mode() doesn't fix that test, but
AFAICS it fixes underlying page freeing.

-- 
Thanks.
-- Max

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-31  1:40         ` Max Filippov
  0 siblings, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-05-31  1:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KAMEZAWA Hiroyuki, linux-arch, linux-mm, Ralf Baechle,
	Chris Zankel, Marc Gauthier, linux-xtensa, Hugh Dickins

On Wed, May 29, 2013 at 4:27 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, May 28, 2013 at 11:10:25AM +0400, Max Filippov wrote:
>> On Sun, May 26, 2013 at 6:50 AM, Max Filippov <jcmvbkbc@gmail.com> wrote:
>> > Hello arch and mm people.
>> >
>> > Is it intentional that threads of a process that invoked munmap syscall
>> > can see TLB entries pointing to already freed pages, or it is a bug?
>> >
>> > I'm talking about zap_pmd_range and zap_pte_range:
>> >
>> >       zap_pmd_range
>> >         zap_pte_range
>> >           arch_enter_lazy_mmu_mode
>> >             ptep_get_and_clear_full
>> >             tlb_remove_tlb_entry
>> >             __tlb_remove_page
>> >           arch_leave_lazy_mmu_mode
>> >         cond_resched
>> >
>> > With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>> > and __tlb_remove_page there is a loop in the zap_pte_range that clears
>> > PTEs and frees corresponding pages, but doesn't flush TLB, and
>> > surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>> > of the same process gets scheduled then it is able to see TLB entries
>> > pointing to already freed physical pages.
>> >
>> > I've noticed that with xtensa arch when I added a test before returning to
>> > userspace checking that TLB contents agrees with page tables of the
>> > current mm. This check reliably fires with the LTP test mtest05 that
>> > maps, unmaps and accesses memory from multiple threads.
>> >
>> > Is there anything wrong in my description, maybe something specific to
>> > my arch, or this issue really exists?
>>
>> Hi,
>>
>> I've made similar checking function for MIPS (because qemu is my only choice
>> and it simulates MIPS TLB) and ran my tests on mips-malta machine in qemu.
>> With MIPS I can also see this issue. I hope I did it right, the patch at the
>> bottom is for the reference. The test I run and the diagnostic output are as
>> follows:
>>
>> To me it looks like the cond_resched in the zap_pmd_range is the root cause
>> of this issue (let alone SMP case for now). It was introduced in the commit
>>
>> commit 97a894136f29802da19a15541de3c019e1ca147e
>> Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> Date:   Tue May 24 17:12:04 2011 -0700
>>
>>     mm: Remove i_mmap_lock lockbreak
>>
>> Peter, Kamezawa, other reviewers of that commit, could you please comment?
>
> Are you all running UP systems? I suppose the preemptible muck

Yes, xtensa SMP support is in the middle of forward porting/cleanup process.

> invalidated the assumption that UP systems are 'easy'.
>
> If you make tlb_fast_mode() return an unconditional false, does it all
> work again?

Actually the only thing that was visibly broken is the test for
TLB/PTE coherency,
but that incoherency appears to be intentional (and the test is too
simple for that).
Unconditionally returning false from tlb_fast_mode() doesn't fix that test, but
AFAICS it fixes underlying page freeing.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
  2013-05-29 17:51           ` Peter Zijlstra
@ 2013-05-31  4:09             ` Max Filippov
  -1 siblings, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-05-31  4:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vineet Gupta, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

Hi Peter,

On Wed, May 29, 2013 at 9:51 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> What about something like this?

With that patch I still get mtest05 firing my TLB/PTE incoherency check
in the UP PREEMPT_VOLUNTARY configuration. This happens after
zap_pte_range completion in the end of unmap_region because of
rescheduling called in the following call chain:

unmap_region
  free_pgtables
    unlink_anon_vmas
      lock_anon_vma_root
        down_write
          might_sleep
            might_resched
              _cond_resched


>
> ---
>  include/asm-generic/tlb.h | 11 ++++++++++-
>  mm/memory.c               | 17 ++++++++++++++++-
>  2 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index b1b1fa6..651b1cf 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -116,6 +116,7 @@ struct mmu_gather {
>
>  static inline int tlb_fast_mode(struct mmu_gather *tlb)
>  {
> +#ifndef CONFIG_PREEMPT
>  #ifdef CONFIG_SMP
>         return tlb->fast_mode;
>  #else
> @@ -124,7 +125,15 @@ static inline int tlb_fast_mode(struct mmu_gather *tlb)
>          * and page free order so much..
>          */
>         return 1;
> -#endif
> +#endif /* CONFIG_SMP */
> +#else  /* CONFIG_PREEMPT */
> +       /*
> +        * Since mmu_gather is preemptible, preemptible kernels are like SMP
> +        * kernels, we must batch to make sure we invalidate TLBs before we
> +        * free the pages.
> +        */
> +       return 0;
> +#endif /* CONFIG_PREEMPT */
>  }
>
>  void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
> diff --git a/mm/memory.c b/mm/memory.c
> index 6dc1882..e915af2 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -384,6 +384,21 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table)
>
>  #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
>
> +static inline void cond_resched_tlb(struct mmu_gather *tlb)
> +{
> +#ifndef CONFIG_PREEMPT
> +       /*
> +        * For full preempt kernels we must do regular batching like
> +        * SMP, see tlb_fast_mode(). For !PREEMPT we can 'cheat' and
> +        * do a flush before our voluntary 'yield'.
> +        */
> +       if (need_resched()) {
> +               tlb_flush_mmu(tlb);
> +               cond_resched();
> +       }
> +#endif
> +}
> +
>  /*
>   * If a p?d_bad entry is found while walking page tables, report
>   * the error, before resetting entry to p?d_none.  Usually (but
> @@ -1264,7 +1279,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
>                         goto next;
>                 next = zap_pte_range(tlb, vma, pmd, addr, next, details);
>  next:
> -               cond_resched();
> +               cond_resched_tlb(tlb);
>         } while (pmd++, addr = next, addr != end);
>
>         return addr;
>
>



-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-31  4:09             ` Max Filippov
  0 siblings, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-05-31  4:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vineet Gupta, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

Hi Peter,

On Wed, May 29, 2013 at 9:51 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> What about something like this?

With that patch I still get mtest05 firing my TLB/PTE incoherency check
in the UP PREEMPT_VOLUNTARY configuration. This happens after
zap_pte_range completion in the end of unmap_region because of
rescheduling called in the following call chain:

unmap_region
  free_pgtables
    unlink_anon_vmas
      lock_anon_vma_root
        down_write
          might_sleep
            might_resched
              _cond_resched


>
> ---
>  include/asm-generic/tlb.h | 11 ++++++++++-
>  mm/memory.c               | 17 ++++++++++++++++-
>  2 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
> index b1b1fa6..651b1cf 100644
> --- a/include/asm-generic/tlb.h
> +++ b/include/asm-generic/tlb.h
> @@ -116,6 +116,7 @@ struct mmu_gather {
>
>  static inline int tlb_fast_mode(struct mmu_gather *tlb)
>  {
> +#ifndef CONFIG_PREEMPT
>  #ifdef CONFIG_SMP
>         return tlb->fast_mode;
>  #else
> @@ -124,7 +125,15 @@ static inline int tlb_fast_mode(struct mmu_gather *tlb)
>          * and page free order so much..
>          */
>         return 1;
> -#endif
> +#endif /* CONFIG_SMP */
> +#else  /* CONFIG_PREEMPT */
> +       /*
> +        * Since mmu_gather is preemptible, preemptible kernels are like SMP
> +        * kernels, we must batch to make sure we invalidate TLBs before we
> +        * free the pages.
> +        */
> +       return 0;
> +#endif /* CONFIG_PREEMPT */
>  }
>
>  void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
> diff --git a/mm/memory.c b/mm/memory.c
> index 6dc1882..e915af2 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -384,6 +384,21 @@ void tlb_remove_table(struct mmu_gather *tlb, void *table)
>
>  #endif /* CONFIG_HAVE_RCU_TABLE_FREE */
>
> +static inline void cond_resched_tlb(struct mmu_gather *tlb)
> +{
> +#ifndef CONFIG_PREEMPT
> +       /*
> +        * For full preempt kernels we must do regular batching like
> +        * SMP, see tlb_fast_mode(). For !PREEMPT we can 'cheat' and
> +        * do a flush before our voluntary 'yield'.
> +        */
> +       if (need_resched()) {
> +               tlb_flush_mmu(tlb);
> +               cond_resched();
> +       }
> +#endif
> +}
> +
>  /*
>   * If a p?d_bad entry is found while walking page tables, report
>   * the error, before resetting entry to p?d_none.  Usually (but
> @@ -1264,7 +1279,7 @@ static inline unsigned long zap_pmd_range(struct mmu_gather *tlb,
>                         goto next;
>                 next = zap_pte_range(tlb, vma, pmd, addr, next, details);
>  next:
> -               cond_resched();
> +               cond_resched_tlb(tlb);
>         } while (pmd++, addr = next, addr != end);
>
>         return addr;
>
>



-- 
Thanks.
-- Max

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
  2013-05-31  4:09             ` Max Filippov
@ 2013-05-31  7:55               ` Peter Zijlstra
  -1 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-05-31  7:55 UTC (permalink / raw)
  To: Max Filippov
  Cc: Vineet Gupta, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

On Fri, May 31, 2013 at 08:09:17AM +0400, Max Filippov wrote:
> Hi Peter,
> 
> On Wed, May 29, 2013 at 9:51 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > What about something like this?
> 
> With that patch I still get mtest05 firing my TLB/PTE incoherency check
> in the UP PREEMPT_VOLUNTARY configuration. This happens after
> zap_pte_range completion in the end of unmap_region because of
> rescheduling called in the following call chain:
> 
> unmap_region
>   free_pgtables
>     unlink_anon_vmas
>       lock_anon_vma_root
>         down_write
>           might_sleep
>             might_resched
>               _cond_resched
> 

Hurm, yeah. Catching all regular blocking primitives and making it
maintainable is going to be a problem :/

I suppose the easiest thing is simply killing fast_mode for now/

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-31  7:55               ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-05-31  7:55 UTC (permalink / raw)
  To: Max Filippov
  Cc: Vineet Gupta, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins

On Fri, May 31, 2013 at 08:09:17AM +0400, Max Filippov wrote:
> Hi Peter,
> 
> On Wed, May 29, 2013 at 9:51 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > What about something like this?
> 
> With that patch I still get mtest05 firing my TLB/PTE incoherency check
> in the UP PREEMPT_VOLUNTARY configuration. This happens after
> zap_pte_range completion in the end of unmap_region because of
> rescheduling called in the following call chain:
> 
> unmap_region
>   free_pgtables
>     unlink_anon_vmas
>       lock_anon_vma_root
>         down_write
>           might_sleep
>             might_resched
>               _cond_resched
> 

Hurm, yeah. Catching all regular blocking primitives and making it
maintainable is going to be a problem :/

I suppose the easiest thing is simply killing fast_mode for now/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-31  9:06           ` Catalin Marinas
  0 siblings, 0 replies; 70+ messages in thread
From: Catalin Marinas @ 2013-05-31  9:06 UTC (permalink / raw)
  To: Max Filippov
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

On Fri, May 31, 2013 at 02:26:27AM +0100, Max Filippov wrote:
> On Wed, May 29, 2013 at 2:15 PM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > On Wed, May 29, 2013 at 05:15:28AM +0100, Max Filippov wrote:
> >> On Tue, May 28, 2013 at 6:35 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> > On 26 May 2013 03:42, Max Filippov <jcmvbkbc@gmail.com> wrote:
> >> >> Is it intentional that threads of a process that invoked munmap syscall
> >> >> can see TLB entries pointing to already freed pages, or it is a bug?
> >> >
> >> > If it happens, this would be a bug. It means that a process can access
> >> > a physical page that has been allocated to something else, possibly
> >> > kernel data.
> >> >
> >> >> I'm talking about zap_pmd_range and zap_pte_range:
> >> >>
> >> >>       zap_pmd_range
> >> >>         zap_pte_range
> >> >>           arch_enter_lazy_mmu_mode
> >> >>             ptep_get_and_clear_full
> >> >>             tlb_remove_tlb_entry
> >> >>             __tlb_remove_page
> >> >>           arch_leave_lazy_mmu_mode
> >> >>         cond_resched
> >> >>
> >> >> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
> >> >> and __tlb_remove_page there is a loop in the zap_pte_range that clears
> >> >> PTEs and frees corresponding pages, but doesn't flush TLB, and
> >> >> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
> >> >> of the same process gets scheduled then it is able to see TLB entries
> >> >> pointing to already freed physical pages.
> >> >
> >> > It looks to me like cond_resched() here introduces a possible bug but
> >> > it depends on the actual arch code, especially the
> >> > __tlb_remove_tlb_entry() function. On ARM we record the range in
> >> > tlb_remove_tlb_entry() and queue the pages to be removed in
> >> > __tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0
> >> > even for the UP case (which is also needed for hardware speculative
> >> > TLB loads). The tlb_finish_mmu() takes care of whatever pages are left
> >> > to be freed.
> >> >
> >> > With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1,
> >> > cond_resched() in zap_pmd_range() would cause problems.
> >>
> >> So, looks like most architectures in the UP configuration should have
> >> this issue (unless they flush TLB in the switch_mm, even when switching
> >> to the same mm):
> >
> > switch_mm() wouldn't be called if switching to the same mm. You could do
> 
> Hmm... Strange, but as far as I can tell from the context_switch it would.

You are right. On ARM we have a prev != next check but that's in the
switch_mm() implementation.

-- 
Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-05-31  9:06           ` Catalin Marinas
  0 siblings, 0 replies; 70+ messages in thread
From: Catalin Marinas @ 2013-05-31  9:06 UTC (permalink / raw)
  To: Max Filippov
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

On Fri, May 31, 2013 at 02:26:27AM +0100, Max Filippov wrote:
> On Wed, May 29, 2013 at 2:15 PM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
> > On Wed, May 29, 2013 at 05:15:28AM +0100, Max Filippov wrote:
> >> On Tue, May 28, 2013 at 6:35 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
> >> > On 26 May 2013 03:42, Max Filippov <jcmvbkbc@gmail.com> wrote:
> >> >> Is it intentional that threads of a process that invoked munmap syscall
> >> >> can see TLB entries pointing to already freed pages, or it is a bug?
> >> >
> >> > If it happens, this would be a bug. It means that a process can access
> >> > a physical page that has been allocated to something else, possibly
> >> > kernel data.
> >> >
> >> >> I'm talking about zap_pmd_range and zap_pte_range:
> >> >>
> >> >>       zap_pmd_range
> >> >>         zap_pte_range
> >> >>           arch_enter_lazy_mmu_mode
> >> >>             ptep_get_and_clear_full
> >> >>             tlb_remove_tlb_entry
> >> >>             __tlb_remove_page
> >> >>           arch_leave_lazy_mmu_mode
> >> >>         cond_resched
> >> >>
> >> >> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
> >> >> and __tlb_remove_page there is a loop in the zap_pte_range that clears
> >> >> PTEs and frees corresponding pages, but doesn't flush TLB, and
> >> >> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
> >> >> of the same process gets scheduled then it is able to see TLB entries
> >> >> pointing to already freed physical pages.
> >> >
> >> > It looks to me like cond_resched() here introduces a possible bug but
> >> > it depends on the actual arch code, especially the
> >> > __tlb_remove_tlb_entry() function. On ARM we record the range in
> >> > tlb_remove_tlb_entry() and queue the pages to be removed in
> >> > __tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0
> >> > even for the UP case (which is also needed for hardware speculative
> >> > TLB loads). The tlb_finish_mmu() takes care of whatever pages are left
> >> > to be freed.
> >> >
> >> > With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1,
> >> > cond_resched() in zap_pmd_range() would cause problems.
> >>
> >> So, looks like most architectures in the UP configuration should have
> >> this issue (unless they flush TLB in the switch_mm, even when switching
> >> to the same mm):
> >
> > switch_mm() wouldn't be called if switching to the same mm. You could do
> 
> Hmm... Strange, but as far as I can tell from the context_switch it would.

You are right. On ARM we have a prev != next check but that's in the
switch_mm() implementation.

-- 
Catalin

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-06-03  9:05               ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-06-03  9:05 UTC (permalink / raw)
  To: Max Filippov
  Cc: Vineet Gupta, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins, Thomas Gleixner, Linus Torvalds, Andrew Morton,
	Ingo Molnar

On Fri, May 31, 2013 at 08:09:17AM +0400, Max Filippov wrote:
> Hi Peter,
> 
> On Wed, May 29, 2013 at 9:51 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > What about something like this?
> 
> With that patch I still get mtest05 firing my TLB/PTE incoherency check
> in the UP PREEMPT_VOLUNTARY configuration. This happens after
> zap_pte_range completion in the end of unmap_region because of
> rescheduling called in the following call chain:

OK, so there two options; completely kill off fast-mode or something like the
below where we add magic to the scheduler :/

I'm aware people might object to something like the below -- but since its a
possibility I thought we ought to at least mention it.

For those new to the thread; the problem is that since the introduction of
preemptible mmu_gather the traditional UP fast-mode is broken. Fast-mode is
where we free the pages first and flush TLBs later. This is not a problem if
there's no concurrency, but obviously if you can preempt there now is.

I think I prefer completely killing off fast-mode esp. since UP seems to go the
way of the Dodo and it does away with an exception in the mmu_gather code.

Anyway; opinions? Linus, Thomas, Ingo?

---
nclude/asm-generic/tlb.h | 19 +++++++++++++++----
 include/linux/sched.h     |  1 +
 kernel/sched/core.c       |  9 +++++++++
 mm/memory.c               |  3 +++
 4 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b1b1fa6..8d84154 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -116,15 +116,26 @@ struct mmu_gather {
 
 static inline int tlb_fast_mode(struct mmu_gather *tlb)
 {
+#ifdef CONFIG_PREEMPT
+	/*
+	 * We don't want to add to the schedule fast path for preemptible
+	 * kernels; disable fast mode unconditionally.
+	 */
+	return 0;
+#endif
+
 #ifdef CONFIG_SMP
+	/*
+	 * We can only use fast mode if there's a single CPU online;
+	 * otherwise SMP might trip over stale TLB entries.
+	 */
 	return tlb->fast_mode;
-#else
+#endif
+
 	/*
-	 * For UP we don't need to worry about TLB flush
-	 * and page free order so much..
+	 * Non-preempt UP can do fast mode unconditionally.
 	 */
 	return 1;
-#endif
 }
 
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 178a8d9..3dc6930 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1416,6 +1416,7 @@ struct task_struct {
 	unsigned int	sequential_io;
 	unsigned int	sequential_io_avg;
 #endif
+	struct mmu_gather *tlb;
 };
 
 /* Future-safe accessor for struct task_struct's cpus_allowed. */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 36f85be..6829b78 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2374,6 +2374,15 @@ static void __sched __schedule(void)
 	struct rq *rq;
 	int cpu;
 
+#ifndef CONFIG_PREEMPT
+	/*
+	 * We always force batched mmu_gather for preemptible kernels in order
+	 * to minimize scheduling delays. See tlb_fast_mode().
+	 */
+	if (current->tlb && tlb_fast_mode(current->tlb))
+		tlb_flush_mmu(current->tlb);
+#endif
+
 need_resched:
 	preempt_disable();
 	cpu = smp_processor_id();
diff --git a/mm/memory.c b/mm/memory.c
index d7d54a1..8925578 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -230,6 +230,7 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm)
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb->batch = NULL;
 #endif
+	current->tlb = tlb;
 }
 
 void tlb_flush_mmu(struct mmu_gather *tlb)
@@ -274,6 +275,8 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long e
 		free_pages((unsigned long)batch, 0);
 	}
 	tlb->local.next = NULL;
+
+	current->tlb = NULL;
 }
 
 /* __tlb_remove_page

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-06-03  9:05               ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-06-03  9:05 UTC (permalink / raw)
  To: Max Filippov
  Cc: Vineet Gupta, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins, Thomas Gleixner, Linus Torvalds, Andrew Morton,
	Ingo Molnar

On Fri, May 31, 2013 at 08:09:17AM +0400, Max Filippov wrote:
> Hi Peter,
> 
> On Wed, May 29, 2013 at 9:51 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > What about something like this?
> 
> With that patch I still get mtest05 firing my TLB/PTE incoherency check
> in the UP PREEMPT_VOLUNTARY configuration. This happens after
> zap_pte_range completion in the end of unmap_region because of
> rescheduling called in the following call chain:

OK, so there two options; completely kill off fast-mode or something like the
below where we add magic to the scheduler :/

I'm aware people might object to something like the below -- but since its a
possibility I thought we ought to at least mention it.

For those new to the thread; the problem is that since the introduction of
preemptible mmu_gather the traditional UP fast-mode is broken. Fast-mode is
where we free the pages first and flush TLBs later. This is not a problem if
there's no concurrency, but obviously if you can preempt there now is.

I think I prefer completely killing off fast-mode esp. since UP seems to go the
way of the Dodo and it does away with an exception in the mmu_gather code.

Anyway; opinions? Linus, Thomas, Ingo?

---
nclude/asm-generic/tlb.h | 19 +++++++++++++++----
 include/linux/sched.h     |  1 +
 kernel/sched/core.c       |  9 +++++++++
 mm/memory.c               |  3 +++
 4 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b1b1fa6..8d84154 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -116,15 +116,26 @@ struct mmu_gather {
 
 static inline int tlb_fast_mode(struct mmu_gather *tlb)
 {
+#ifdef CONFIG_PREEMPT
+	/*
+	 * We don't want to add to the schedule fast path for preemptible
+	 * kernels; disable fast mode unconditionally.
+	 */
+	return 0;
+#endif
+
 #ifdef CONFIG_SMP
+	/*
+	 * We can only use fast mode if there's a single CPU online;
+	 * otherwise SMP might trip over stale TLB entries.
+	 */
 	return tlb->fast_mode;
-#else
+#endif
+
 	/*
-	 * For UP we don't need to worry about TLB flush
-	 * and page free order so much..
+	 * Non-preempt UP can do fast mode unconditionally.
 	 */
 	return 1;
-#endif
 }
 
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 178a8d9..3dc6930 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1416,6 +1416,7 @@ struct task_struct {
 	unsigned int	sequential_io;
 	unsigned int	sequential_io_avg;
 #endif
+	struct mmu_gather *tlb;
 };
 
 /* Future-safe accessor for struct task_struct's cpus_allowed. */
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 36f85be..6829b78 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2374,6 +2374,15 @@ static void __sched __schedule(void)
 	struct rq *rq;
 	int cpu;
 
+#ifndef CONFIG_PREEMPT
+	/*
+	 * We always force batched mmu_gather for preemptible kernels in order
+	 * to minimize scheduling delays. See tlb_fast_mode().
+	 */
+	if (current->tlb && tlb_fast_mode(current->tlb))
+		tlb_flush_mmu(current->tlb);
+#endif
+
 need_resched:
 	preempt_disable();
 	cpu = smp_processor_id();
diff --git a/mm/memory.c b/mm/memory.c
index d7d54a1..8925578 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -230,6 +230,7 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm)
 #ifdef CONFIG_HAVE_RCU_TABLE_FREE
 	tlb->batch = NULL;
 #endif
+	current->tlb = tlb;
 }
 
 void tlb_flush_mmu(struct mmu_gather *tlb)
@@ -274,6 +275,8 @@ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long e
 		free_pages((unsigned long)batch, 0);
 	}
 	tlb->local.next = NULL;
+
+	current->tlb = NULL;
 }
 
 /* __tlb_remove_page


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
  2013-06-03  9:05               ` Peter Zijlstra
@ 2013-06-03  9:16                 ` Ingo Molnar
  -1 siblings, 0 replies; 70+ messages in thread
From: Ingo Molnar @ 2013-06-03  9:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Max Filippov, Vineet Gupta, KAMEZAWA Hiroyuki, linux-arch,
	linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins, Thomas Gleixner, Linus Torvalds,
	Andrew Morton


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, May 31, 2013 at 08:09:17AM +0400, Max Filippov wrote:
> > Hi Peter,
> > 
> > On Wed, May 29, 2013 at 9:51 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > What about something like this?
> > 
> > With that patch I still get mtest05 firing my TLB/PTE incoherency 
> > check in the UP PREEMPT_VOLUNTARY configuration. This happens after 
> > zap_pte_range completion in the end of unmap_region because of 
> > rescheduling called in the following call chain:
> 
> OK, so there two options; completely kill off fast-mode or something 
> like the below where we add magic to the scheduler :/
> 
> I'm aware people might object to something like the below -- but since 
> its a possibility I thought we ought to at least mention it.
> 
> For those new to the thread; the problem is that since the introduction 
> of preemptible mmu_gather the traditional UP fast-mode is broken. 
> Fast-mode is where we free the pages first and flush TLBs later. This is 
> not a problem if there's no concurrency, but obviously if you can 
> preempt there now is.
> 
> I think I prefer completely killing off fast-mode esp. since UP seems to 
> go the way of the Dodo and it does away with an exception in the 
> mmu_gather code.
> 
> Anyway; opinions? Linus, Thomas, Ingo?

Since UP kernels have not been packaged up by major distros for years, and 
since the live-patching of SMP kernels (the SMP alternative-instructions 
patching machinery) does away with a big chunk of the SMP cost, I guess UP 
kernels are slowly becoming like TINY_RCU: interesting but not really a 
primary design goal?

( Another reason for reducing SMP vs. UP complexity in this area would be
  the fact that we had a few bad regressions lately - the TLB code is not
  getting simpler, and bugs are getting discovered and fixed slower. )

At least that's the x86 perspective. ARM might still see it differently?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-06-03  9:16                 ` Ingo Molnar
  0 siblings, 0 replies; 70+ messages in thread
From: Ingo Molnar @ 2013-06-03  9:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Max Filippov, Vineet Gupta, KAMEZAWA Hiroyuki, linux-arch,
	linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins, Thomas Gleixner, Linus Torvalds,
	Andrew Morton


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Fri, May 31, 2013 at 08:09:17AM +0400, Max Filippov wrote:
> > Hi Peter,
> > 
> > On Wed, May 29, 2013 at 9:51 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > What about something like this?
> > 
> > With that patch I still get mtest05 firing my TLB/PTE incoherency 
> > check in the UP PREEMPT_VOLUNTARY configuration. This happens after 
> > zap_pte_range completion in the end of unmap_region because of 
> > rescheduling called in the following call chain:
> 
> OK, so there two options; completely kill off fast-mode or something 
> like the below where we add magic to the scheduler :/
> 
> I'm aware people might object to something like the below -- but since 
> its a possibility I thought we ought to at least mention it.
> 
> For those new to the thread; the problem is that since the introduction 
> of preemptible mmu_gather the traditional UP fast-mode is broken. 
> Fast-mode is where we free the pages first and flush TLBs later. This is 
> not a problem if there's no concurrency, but obviously if you can 
> preempt there now is.
> 
> I think I prefer completely killing off fast-mode esp. since UP seems to 
> go the way of the Dodo and it does away with an exception in the 
> mmu_gather code.
> 
> Anyway; opinions? Linus, Thomas, Ingo?

Since UP kernels have not been packaged up by major distros for years, and 
since the live-patching of SMP kernels (the SMP alternative-instructions 
patching machinery) does away with a big chunk of the SMP cost, I guess UP 
kernels are slowly becoming like TINY_RCU: interesting but not really a 
primary design goal?

( Another reason for reducing SMP vs. UP complexity in this area would be
  the fact that we had a few bad regressions lately - the TLB code is not
  getting simpler, and bugs are getting discovered and fixed slower. )

At least that's the x86 perspective. ARM might still see it differently?

Thanks,

	Ingo

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-06-03  9:16           ` Max Filippov
  0 siblings, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-06-03  9:16 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

On Fri, May 31, 2013 at 5:26 AM, Max Filippov <jcmvbkbc@gmail.com> wrote:
> On Wed, May 29, 2013 at 2:15 PM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
>> On Wed, May 29, 2013 at 05:15:28AM +0100, Max Filippov wrote:
>>> On Tue, May 28, 2013 at 6:35 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>> > On 26 May 2013 03:42, Max Filippov <jcmvbkbc@gmail.com> wrote:
>>> >> Is it intentional that threads of a process that invoked munmap syscall
>>> >> can see TLB entries pointing to already freed pages, or it is a bug?
>>> >
>>> > If it happens, this would be a bug. It means that a process can access
>>> > a physical page that has been allocated to something else, possibly
>>> > kernel data.
>>> >
>>> >> I'm talking about zap_pmd_range and zap_pte_range:
>>> >>
>>> >>       zap_pmd_range
>>> >>         zap_pte_range
>>> >>           arch_enter_lazy_mmu_mode
>>> >>             ptep_get_and_clear_full
>>> >>             tlb_remove_tlb_entry
>>> >>             __tlb_remove_page
>>> >>           arch_leave_lazy_mmu_mode
>>> >>         cond_resched
>>> >>
>>> >> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>>> >> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>>> >> PTEs and frees corresponding pages, but doesn't flush TLB, and
>>> >> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>>> >> of the same process gets scheduled then it is able to see TLB entries
>>> >> pointing to already freed physical pages.
>>> >
>>> > It looks to me like cond_resched() here introduces a possible bug but
>>> > it depends on the actual arch code, especially the
>>> > __tlb_remove_tlb_entry() function. On ARM we record the range in
>>> > tlb_remove_tlb_entry() and queue the pages to be removed in
>>> > __tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0
>>> > even for the UP case (which is also needed for hardware speculative
>>> > TLB loads). The tlb_finish_mmu() takes care of whatever pages are left
>>> > to be freed.
>>> >
>>> > With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1,
>>> > cond_resched() in zap_pmd_range() would cause problems.
>>>
>>> So, looks like most architectures in the UP configuration should have
>>> this issue (unless they flush TLB in the switch_mm, even when switching
>>> to the same mm):
>>
>> switch_mm() wouldn't be called if switching to the same mm. You could do
>
> Hmm... Strange, but as far as I can tell from the context_switch it would.
>
>> it in switch_to() but it's not efficient (or before returning to user
>> space on the same processor).
>>
>> Do you happen to have a user-space test for this? Something like one
>
> I only had mtest05 from LTP that triggered TLB/PTE inconsistency, but
> not anything that would really try to peek at the freed page. I can make
> such test though.
>
>> thread does an mmap(), writes some poison value, munmap(). The other
>> thread keeps checking the poison value while trapping and ignoring any
>> SIGSEGV. If it's working correctly, the second thread should either get
>> a SIGSEGV or read the poison value.

I've made a number of such tests and had them running for a couple of
days. Checking thread never read anything other than poison value so far.

-- 
Thanks.
-- Max

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-06-03  9:16           ` Max Filippov
  0 siblings, 0 replies; 70+ messages in thread
From: Max Filippov @ 2013-06-03  9:16 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: linux-arch, linux-mm, linux-xtensa, Chris Zankel, Marc Gauthier

On Fri, May 31, 2013 at 5:26 AM, Max Filippov <jcmvbkbc@gmail.com> wrote:
> On Wed, May 29, 2013 at 2:15 PM, Catalin Marinas
> <catalin.marinas@arm.com> wrote:
>> On Wed, May 29, 2013 at 05:15:28AM +0100, Max Filippov wrote:
>>> On Tue, May 28, 2013 at 6:35 PM, Catalin Marinas <catalin.marinas@arm.com> wrote:
>>> > On 26 May 2013 03:42, Max Filippov <jcmvbkbc@gmail.com> wrote:
>>> >> Is it intentional that threads of a process that invoked munmap syscall
>>> >> can see TLB entries pointing to already freed pages, or it is a bug?
>>> >
>>> > If it happens, this would be a bug. It means that a process can access
>>> > a physical page that has been allocated to something else, possibly
>>> > kernel data.
>>> >
>>> >> I'm talking about zap_pmd_range and zap_pte_range:
>>> >>
>>> >>       zap_pmd_range
>>> >>         zap_pte_range
>>> >>           arch_enter_lazy_mmu_mode
>>> >>             ptep_get_and_clear_full
>>> >>             tlb_remove_tlb_entry
>>> >>             __tlb_remove_page
>>> >>           arch_leave_lazy_mmu_mode
>>> >>         cond_resched
>>> >>
>>> >> With the default arch_{enter,leave}_lazy_mmu_mode, tlb_remove_tlb_entry
>>> >> and __tlb_remove_page there is a loop in the zap_pte_range that clears
>>> >> PTEs and frees corresponding pages, but doesn't flush TLB, and
>>> >> surrounding loop in the zap_pmd_range that calls cond_resched. If a thread
>>> >> of the same process gets scheduled then it is able to see TLB entries
>>> >> pointing to already freed physical pages.
>>> >
>>> > It looks to me like cond_resched() here introduces a possible bug but
>>> > it depends on the actual arch code, especially the
>>> > __tlb_remove_tlb_entry() function. On ARM we record the range in
>>> > tlb_remove_tlb_entry() and queue the pages to be removed in
>>> > __tlb_remove_page(). It pretty much acts like tlb_fast_mode() == 0
>>> > even for the UP case (which is also needed for hardware speculative
>>> > TLB loads). The tlb_finish_mmu() takes care of whatever pages are left
>>> > to be freed.
>>> >
>>> > With a dummy __tlb_remove_tlb_entry() and tlb_fast_mode() == 1,
>>> > cond_resched() in zap_pmd_range() would cause problems.
>>>
>>> So, looks like most architectures in the UP configuration should have
>>> this issue (unless they flush TLB in the switch_mm, even when switching
>>> to the same mm):
>>
>> switch_mm() wouldn't be called if switching to the same mm. You could do
>
> Hmm... Strange, but as far as I can tell from the context_switch it would.
>
>> it in switch_to() but it's not efficient (or before returning to user
>> space on the same processor).
>>
>> Do you happen to have a user-space test for this? Something like one
>
> I only had mtest05 from LTP that triggered TLB/PTE inconsistency, but
> not anything that would really try to peek at the freed page. I can make
> such test though.
>
>> thread does an mmap(), writes some poison value, munmap(). The other
>> thread keeps checking the poison value while trapping and ignoring any
>> SIGSEGV. If it's working correctly, the second thread should either get
>> a SIGSEGV or read the poison value.

I've made a number of such tests and had them running for a couple of
days. Checking thread never read anything other than poison value so far.

-- 
Thanks.
-- Max

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
  2013-06-03  9:16                 ` Ingo Molnar
@ 2013-06-03 10:01                   ` Catalin Marinas
  -1 siblings, 0 replies; 70+ messages in thread
From: Catalin Marinas @ 2013-06-03 10:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Max Filippov, Vineet Gupta, KAMEZAWA Hiroyuki,
	linux-arch, linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins, Thomas Gleixner, Linus Torvalds,
	Andrew Morton

On 3 June 2013 10:16, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Fri, May 31, 2013 at 08:09:17AM +0400, Max Filippov wrote:
>> > Hi Peter,
>> >
>> > On Wed, May 29, 2013 at 9:51 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > > What about something like this?
>> >
>> > With that patch I still get mtest05 firing my TLB/PTE incoherency
>> > check in the UP PREEMPT_VOLUNTARY configuration. This happens after
>> > zap_pte_range completion in the end of unmap_region because of
>> > rescheduling called in the following call chain:
>>
>> OK, so there two options; completely kill off fast-mode or something
>> like the below where we add magic to the scheduler :/
>>
>> I'm aware people might object to something like the below -- but since
>> its a possibility I thought we ought to at least mention it.
>>
>> For those new to the thread; the problem is that since the introduction
>> of preemptible mmu_gather the traditional UP fast-mode is broken.
>> Fast-mode is where we free the pages first and flush TLBs later. This is
>> not a problem if there's no concurrency, but obviously if you can
>> preempt there now is.
>>
>> I think I prefer completely killing off fast-mode esp. since UP seems to
>> go the way of the Dodo and it does away with an exception in the
>> mmu_gather code.
>>
>> Anyway; opinions? Linus, Thomas, Ingo?
>
> Since UP kernels have not been packaged up by major distros for years, and
> since the live-patching of SMP kernels (the SMP alternative-instructions
> patching machinery) does away with a big chunk of the SMP cost, I guess UP
> kernels are slowly becoming like TINY_RCU: interesting but not really a
> primary design goal?
>
> ( Another reason for reducing SMP vs. UP complexity in this area would be
>   the fact that we had a few bad regressions lately - the TLB code is not
>   getting simpler, and bugs are getting discovered and fixed slower. )
>
> At least that's the x86 perspective. ARM might still see it differently?

On ARM there is a lot of ongoing work on single zImage for multiple
SoCs and this implies SMP kernels. There is an SMP_ON_UP feature which
does run-time code patching to optimise the UP case in a few places.

Regarding tlb_fast_mode(), the ARM-specific implementation is always 0
on ARMv7 even if UP because of speculative TLB loads (the MMU could
pretty much act as a separate processor).

Catalin

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-06-03 10:01                   ` Catalin Marinas
  0 siblings, 0 replies; 70+ messages in thread
From: Catalin Marinas @ 2013-06-03 10:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Max Filippov, Vineet Gupta, KAMEZAWA Hiroyuki,
	linux-arch, linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins, Thomas Gleixner, Linus Torvalds,
	Andrew Morton

On 3 June 2013 10:16, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Fri, May 31, 2013 at 08:09:17AM +0400, Max Filippov wrote:
>> > Hi Peter,
>> >
>> > On Wed, May 29, 2013 at 9:51 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > > What about something like this?
>> >
>> > With that patch I still get mtest05 firing my TLB/PTE incoherency
>> > check in the UP PREEMPT_VOLUNTARY configuration. This happens after
>> > zap_pte_range completion in the end of unmap_region because of
>> > rescheduling called in the following call chain:
>>
>> OK, so there two options; completely kill off fast-mode or something
>> like the below where we add magic to the scheduler :/
>>
>> I'm aware people might object to something like the below -- but since
>> its a possibility I thought we ought to at least mention it.
>>
>> For those new to the thread; the problem is that since the introduction
>> of preemptible mmu_gather the traditional UP fast-mode is broken.
>> Fast-mode is where we free the pages first and flush TLBs later. This is
>> not a problem if there's no concurrency, but obviously if you can
>> preempt there now is.
>>
>> I think I prefer completely killing off fast-mode esp. since UP seems to
>> go the way of the Dodo and it does away with an exception in the
>> mmu_gather code.
>>
>> Anyway; opinions? Linus, Thomas, Ingo?
>
> Since UP kernels have not been packaged up by major distros for years, and
> since the live-patching of SMP kernels (the SMP alternative-instructions
> patching machinery) does away with a big chunk of the SMP cost, I guess UP
> kernels are slowly becoming like TINY_RCU: interesting but not really a
> primary design goal?
>
> ( Another reason for reducing SMP vs. UP complexity in this area would be
>   the fact that we had a few bad regressions lately - the TLB code is not
>   getting simpler, and bugs are getting discovered and fixed slower. )
>
> At least that's the x86 perspective. ARM might still see it differently?

On ARM there is a lot of ongoing work on single zImage for multiple
SoCs and this implies SMP kernels. There is an SMP_ON_UP feature which
does run-time code patching to optimise the UP case in a few places.

Regarding tlb_fast_mode(), the ARM-specific implementation is always 0
on ARMv7 even if UP because of speculative TLB loads (the MMU could
pretty much act as a separate processor).

Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-06-03 10:04                     ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-06-03 10:04 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Ingo Molnar, Max Filippov, Vineet Gupta, KAMEZAWA Hiroyuki,
	linux-arch, linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins, Thomas Gleixner, Linus Torvalds,
	Andrew Morton

On Mon, Jun 03, 2013 at 11:01:39AM +0100, Catalin Marinas wrote:
> On ARM there is a lot of ongoing work on single zImage for multiple
> SoCs and this implies SMP kernels. There is an SMP_ON_UP feature which
> does run-time code patching to optimise the UP case in a few places.
> 
> Regarding tlb_fast_mode(), the ARM-specific implementation is always 0
> on ARMv7 even if UP because of speculative TLB loads (the MMU could
> pretty much act as a separate processor).

Oh right.. I should really refresh the mmu_gather unification patches so
all archs are using the generic code.

/me ups on todo list.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-06-03 10:04                     ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-06-03 10:04 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Ingo Molnar, Max Filippov, Vineet Gupta, KAMEZAWA Hiroyuki,
	linux-arch, linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins, Thomas Gleixner, Linus Torvalds,
	Andrew Morton

On Mon, Jun 03, 2013 at 11:01:39AM +0100, Catalin Marinas wrote:
> On ARM there is a lot of ongoing work on single zImage for multiple
> SoCs and this implies SMP kernels. There is an SMP_ON_UP feature which
> does run-time code patching to optimise the UP case in a few places.
> 
> Regarding tlb_fast_mode(), the ARM-specific implementation is always 0
> on ARMv7 even if UP because of speculative TLB loads (the MMU could
> pretty much act as a separate processor).

Oh right.. I should really refresh the mmu_gather unification patches so
all archs are using the generic code.

/me ups on todo list.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
  2013-06-03 10:04                     ` Peter Zijlstra
@ 2013-06-03 10:09                       ` Catalin Marinas
  -1 siblings, 0 replies; 70+ messages in thread
From: Catalin Marinas @ 2013-06-03 10:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Max Filippov, Vineet Gupta, KAMEZAWA Hiroyuki,
	linux-arch, linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins, Thomas Gleixner, Linus Torvalds,
	Andrew Morton

On Mon, Jun 03, 2013 at 11:04:44AM +0100, Peter Zijlstra wrote:
> On Mon, Jun 03, 2013 at 11:01:39AM +0100, Catalin Marinas wrote:
> > On ARM there is a lot of ongoing work on single zImage for multiple
> > SoCs and this implies SMP kernels. There is an SMP_ON_UP feature which
> > does run-time code patching to optimise the UP case in a few places.
> > 
> > Regarding tlb_fast_mode(), the ARM-specific implementation is always 0
> > on ARMv7 even if UP because of speculative TLB loads (the MMU could
> > pretty much act as a separate processor).
> 
> Oh right.. I should really refresh the mmu_gather unification patches so
> all archs are using the generic code.

That would be great! I'll help with testing/review (or if it needs
anything else) from an arm/arm64 perspective.

Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-06-03 10:09                       ` Catalin Marinas
  0 siblings, 0 replies; 70+ messages in thread
From: Catalin Marinas @ 2013-06-03 10:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Max Filippov, Vineet Gupta, KAMEZAWA Hiroyuki,
	linux-arch, linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins, Thomas Gleixner, Linus Torvalds,
	Andrew Morton

On Mon, Jun 03, 2013 at 11:04:44AM +0100, Peter Zijlstra wrote:
> On Mon, Jun 03, 2013 at 11:01:39AM +0100, Catalin Marinas wrote:
> > On ARM there is a lot of ongoing work on single zImage for multiple
> > SoCs and this implies SMP kernels. There is an SMP_ON_UP feature which
> > does run-time code patching to optimise the UP case in a few places.
> > 
> > Regarding tlb_fast_mode(), the ARM-specific implementation is always 0
> > on ARMv7 even if UP because of speculative TLB loads (the MMU could
> > pretty much act as a separate processor).
> 
> Oh right.. I should really refresh the mmu_gather unification patches so
> all archs are using the generic code.

That would be great! I'll help with testing/review (or if it needs
anything else) from an arm/arm64 perspective.

Thanks.

-- 
Catalin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-06-04  9:52                 ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-06-04  9:52 UTC (permalink / raw)
  To: Max Filippov
  Cc: Vineet Gupta, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins, Thomas Gleixner, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Tony Luck

On Mon, Jun 03, 2013 at 11:05:01AM +0200, Peter Zijlstra wrote:
> On Fri, May 31, 2013 at 08:09:17AM +0400, Max Filippov wrote:
> > Hi Peter,
> > 
> > On Wed, May 29, 2013 at 9:51 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > What about something like this?
> > 
> > With that patch I still get mtest05 firing my TLB/PTE incoherency check
> > in the UP PREEMPT_VOLUNTARY configuration. This happens after
> > zap_pte_range completion in the end of unmap_region because of
> > rescheduling called in the following call chain:
> 
> OK, so there two options; completely kill off fast-mode or something like the
> below where we add magic to the scheduler :/
> 
> I'm aware people might object to something like the below -- but since its a
> possibility I thought we ought to at least mention it.
> 
> For those new to the thread; the problem is that since the introduction of
> preemptible mmu_gather the traditional UP fast-mode is broken. Fast-mode is
> where we free the pages first and flush TLBs later. This is not a problem if
> there's no concurrency, but obviously if you can preempt there now is.
> 
> I think I prefer completely killing off fast-mode esp. since UP seems to go the
> way of the Dodo and it does away with an exception in the mmu_gather code.
> 
> Anyway; opinions? Linus, Thomas, Ingo?

And here's the patch that makes fast mode go *poof*..

---
 arch/arm/include/asm/tlb.h  | 27 ++++-----------------------
 arch/ia64/include/asm/tlb.h | 41 ++++++++---------------------------------
 include/asm-generic/tlb.h   | 17 +----------------
 mm/memory.c                 |  9 ---------
 4 files changed, 13 insertions(+), 81 deletions(-)

diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index 99a1951..bdf2b84 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -33,18 +33,6 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 
-/*
- * We need to delay page freeing for SMP as other CPUs can access pages
- * which have been removed but not yet had their TLB entries invalidated.
- * Also, as ARMv7 speculative prefetch can drag new entries into the TLB,
- * we need to apply this same delaying tactic to ensure correct operation.
- */
-#if defined(CONFIG_SMP) || defined(CONFIG_CPU_32v7)
-#define tlb_fast_mode(tlb)	0
-#else
-#define tlb_fast_mode(tlb)	1
-#endif
-
 #define MMU_GATHER_BUNDLE	8
 
 /*
@@ -112,12 +100,10 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb)
 static inline void tlb_flush_mmu(struct mmu_gather *tlb)
 {
 	tlb_flush(tlb);
-	if (!tlb_fast_mode(tlb)) {
-		free_pages_and_swap_cache(tlb->pages, tlb->nr);
-		tlb->nr = 0;
-		if (tlb->pages == tlb->local)
-			__tlb_alloc_page(tlb);
-	}
+	free_pages_and_swap_cache(tlb->pages, tlb->nr);
+	tlb->nr = 0;
+	if (tlb->pages == tlb->local)
+		__tlb_alloc_page(tlb);
 }
 
 static inline void
@@ -178,11 +164,6 @@ tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
 
 static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
 {
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu */
-	}
-
 	tlb->pages[tlb->nr++] = page;
 	VM_BUG_ON(tlb->nr > tlb->max);
 	return tlb->max - tlb->nr;
diff --git a/arch/ia64/include/asm/tlb.h b/arch/ia64/include/asm/tlb.h
index c3ffe3e..ef3a9de 100644
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -46,12 +46,6 @@
 #include <asm/tlbflush.h>
 #include <asm/machvec.h>
 
-#ifdef CONFIG_SMP
-# define tlb_fast_mode(tlb)	((tlb)->nr == ~0U)
-#else
-# define tlb_fast_mode(tlb)	(1)
-#endif
-
 /*
  * If we can't allocate a page to make a big batch of page pointers
  * to work on, then just handle a few from the on-stack structure.
@@ -60,7 +54,7 @@
 
 struct mmu_gather {
 	struct mm_struct	*mm;
-	unsigned int		nr;		/* == ~0U => fast mode */
+	unsigned int		nr;
 	unsigned int		max;
 	unsigned char		fullmm;		/* non-zero means full mm flush */
 	unsigned char		need_flush;	/* really unmapped some PTEs? */
@@ -103,6 +97,7 @@ extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
 static inline void
 ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
 {
+	unsigned long i;
 	unsigned int nr;
 
 	if (!tlb->need_flush)
@@ -141,13 +136,11 @@ ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long e
 
 	/* lastly, release the freed pages */
 	nr = tlb->nr;
-	if (!tlb_fast_mode(tlb)) {
-		unsigned long i;
-		tlb->nr = 0;
-		tlb->start_addr = ~0UL;
-		for (i = 0; i < nr; ++i)
-			free_page_and_swap_cache(tlb->pages[i]);
-	}
+
+	tlb->nr = 0;
+	tlb->start_addr = ~0UL;
+	for (i = 0; i < nr; ++i)
+		free_page_and_swap_cache(tlb->pages[i]);
 }
 
 static inline void __tlb_alloc_page(struct mmu_gather *tlb)
@@ -167,20 +160,7 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_m
 	tlb->mm = mm;
 	tlb->max = ARRAY_SIZE(tlb->local);
 	tlb->pages = tlb->local;
-	/*
-	 * Use fast mode if only 1 CPU is online.
-	 *
-	 * It would be tempting to turn on fast-mode for full_mm_flush as well.  But this
-	 * doesn't work because of speculative accesses and software prefetching: the page
-	 * table of "mm" may (and usually is) the currently active page table and even
-	 * though the kernel won't do any user-space accesses during the TLB shoot down, a
-	 * compiler might use speculation or lfetch.fault on what happens to be a valid
-	 * user-space address.  This in turn could trigger a TLB miss fault (or a VHPT
-	 * walk) and re-insert a TLB entry we just removed.  Slow mode avoids such
-	 * problems.  (We could make fast-mode work by switching the current task to a
-	 * different "mm" during the shootdown.) --davidm 08/02/2002
-	 */
-	tlb->nr = (num_online_cpus() == 1) ? ~0U : 0;
+	tlb->nr = 0;
 	tlb->fullmm = full_mm_flush;
 	tlb->start_addr = ~0UL;
 }
@@ -214,11 +194,6 @@ static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
 {
 	tlb->need_flush = 1;
 
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu */
-	}
-
 	if (!tlb->nr && tlb->pages == tlb->local)
 		__tlb_alloc_page(tlb);
 
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b1b1fa6..13821c3 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -97,11 +97,9 @@ struct mmu_gather {
 	unsigned long		start;
 	unsigned long		end;
 	unsigned int		need_flush : 1,	/* Did free PTEs */
-				fast_mode  : 1; /* No batching   */
-
 	/* we are in the middle of an operation to clear
 	 * a full mm and can make some optimizations */
-	unsigned int		fullmm : 1,
+				fullmm : 1,
 	/* we have performed an operation which
 	 * requires a complete flush of the tlb */
 				need_flush_all : 1;
@@ -114,19 +112,6 @@ struct mmu_gather {
 
 #define HAVE_GENERIC_MMU_GATHER
 
-static inline int tlb_fast_mode(struct mmu_gather *tlb)
-{
-#ifdef CONFIG_SMP
-	return tlb->fast_mode;
-#else
-	/*
-	 * For UP we don't need to worry about TLB flush
-	 * and page free order so much..
-	 */
-	return 1;
-#endif
-}
-
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
 void tlb_flush_mmu(struct mmu_gather *tlb);
 void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start,
diff --git a/mm/memory.c b/mm/memory.c
index d7d54a1..95d0cce 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -220,7 +220,6 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm)
 	tlb->start	= -1UL;
 	tlb->end	= 0;
 	tlb->need_flush = 0;
-	tlb->fast_mode  = (num_possible_cpus() == 1);
 	tlb->local.next = NULL;
 	tlb->local.nr   = 0;
 	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
@@ -244,9 +243,6 @@ void tlb_flush_mmu(struct mmu_gather *tlb)
 	tlb_table_flush(tlb);
 #endif
 
-	if (tlb_fast_mode(tlb))
-		return;
-
 	for (batch = &tlb->local; batch; batch = batch->next) {
 		free_pages_and_swap_cache(batch->pages, batch->nr);
 		batch->nr = 0;
@@ -288,11 +284,6 @@ int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
 
 	VM_BUG_ON(!tlb->need_flush);
 
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu() */
-	}
-
 	batch = tlb->active;
 	batch->pages[batch->nr++] = page;
 	if (batch->nr == batch->max) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-06-04  9:52                 ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-06-04  9:52 UTC (permalink / raw)
  To: Max Filippov
  Cc: Vineet Gupta, KAMEZAWA Hiroyuki, linux-arch, linux-mm,
	Ralf Baechle, Chris Zankel, Marc Gauthier, linux-xtensa,
	Hugh Dickins, Thomas Gleixner, Linus Torvalds, Andrew Morton,
	Ingo Molnar, Tony Luck

On Mon, Jun 03, 2013 at 11:05:01AM +0200, Peter Zijlstra wrote:
> On Fri, May 31, 2013 at 08:09:17AM +0400, Max Filippov wrote:
> > Hi Peter,
> > 
> > On Wed, May 29, 2013 at 9:51 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > What about something like this?
> > 
> > With that patch I still get mtest05 firing my TLB/PTE incoherency check
> > in the UP PREEMPT_VOLUNTARY configuration. This happens after
> > zap_pte_range completion in the end of unmap_region because of
> > rescheduling called in the following call chain:
> 
> OK, so there two options; completely kill off fast-mode or something like the
> below where we add magic to the scheduler :/
> 
> I'm aware people might object to something like the below -- but since its a
> possibility I thought we ought to at least mention it.
> 
> For those new to the thread; the problem is that since the introduction of
> preemptible mmu_gather the traditional UP fast-mode is broken. Fast-mode is
> where we free the pages first and flush TLBs later. This is not a problem if
> there's no concurrency, but obviously if you can preempt there now is.
> 
> I think I prefer completely killing off fast-mode esp. since UP seems to go the
> way of the Dodo and it does away with an exception in the mmu_gather code.
> 
> Anyway; opinions? Linus, Thomas, Ingo?

And here's the patch that makes fast mode go *poof*..

---
 arch/arm/include/asm/tlb.h  | 27 ++++-----------------------
 arch/ia64/include/asm/tlb.h | 41 ++++++++---------------------------------
 include/asm-generic/tlb.h   | 17 +----------------
 mm/memory.c                 |  9 ---------
 4 files changed, 13 insertions(+), 81 deletions(-)

diff --git a/arch/arm/include/asm/tlb.h b/arch/arm/include/asm/tlb.h
index 99a1951..bdf2b84 100644
--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -33,18 +33,6 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 
-/*
- * We need to delay page freeing for SMP as other CPUs can access pages
- * which have been removed but not yet had their TLB entries invalidated.
- * Also, as ARMv7 speculative prefetch can drag new entries into the TLB,
- * we need to apply this same delaying tactic to ensure correct operation.
- */
-#if defined(CONFIG_SMP) || defined(CONFIG_CPU_32v7)
-#define tlb_fast_mode(tlb)	0
-#else
-#define tlb_fast_mode(tlb)	1
-#endif
-
 #define MMU_GATHER_BUNDLE	8
 
 /*
@@ -112,12 +100,10 @@ static inline void __tlb_alloc_page(struct mmu_gather *tlb)
 static inline void tlb_flush_mmu(struct mmu_gather *tlb)
 {
 	tlb_flush(tlb);
-	if (!tlb_fast_mode(tlb)) {
-		free_pages_and_swap_cache(tlb->pages, tlb->nr);
-		tlb->nr = 0;
-		if (tlb->pages == tlb->local)
-			__tlb_alloc_page(tlb);
-	}
+	free_pages_and_swap_cache(tlb->pages, tlb->nr);
+	tlb->nr = 0;
+	if (tlb->pages == tlb->local)
+		__tlb_alloc_page(tlb);
 }
 
 static inline void
@@ -178,11 +164,6 @@ tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma)
 
 static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
 {
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu */
-	}
-
 	tlb->pages[tlb->nr++] = page;
 	VM_BUG_ON(tlb->nr > tlb->max);
 	return tlb->max - tlb->nr;
diff --git a/arch/ia64/include/asm/tlb.h b/arch/ia64/include/asm/tlb.h
index c3ffe3e..ef3a9de 100644
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -46,12 +46,6 @@
 #include <asm/tlbflush.h>
 #include <asm/machvec.h>
 
-#ifdef CONFIG_SMP
-# define tlb_fast_mode(tlb)	((tlb)->nr == ~0U)
-#else
-# define tlb_fast_mode(tlb)	(1)
-#endif
-
 /*
  * If we can't allocate a page to make a big batch of page pointers
  * to work on, then just handle a few from the on-stack structure.
@@ -60,7 +54,7 @@
 
 struct mmu_gather {
 	struct mm_struct	*mm;
-	unsigned int		nr;		/* == ~0U => fast mode */
+	unsigned int		nr;
 	unsigned int		max;
 	unsigned char		fullmm;		/* non-zero means full mm flush */
 	unsigned char		need_flush;	/* really unmapped some PTEs? */
@@ -103,6 +97,7 @@ extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
 static inline void
 ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
 {
+	unsigned long i;
 	unsigned int nr;
 
 	if (!tlb->need_flush)
@@ -141,13 +136,11 @@ ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long e
 
 	/* lastly, release the freed pages */
 	nr = tlb->nr;
-	if (!tlb_fast_mode(tlb)) {
-		unsigned long i;
-		tlb->nr = 0;
-		tlb->start_addr = ~0UL;
-		for (i = 0; i < nr; ++i)
-			free_page_and_swap_cache(tlb->pages[i]);
-	}
+
+	tlb->nr = 0;
+	tlb->start_addr = ~0UL;
+	for (i = 0; i < nr; ++i)
+		free_page_and_swap_cache(tlb->pages[i]);
 }
 
 static inline void __tlb_alloc_page(struct mmu_gather *tlb)
@@ -167,20 +160,7 @@ tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned int full_m
 	tlb->mm = mm;
 	tlb->max = ARRAY_SIZE(tlb->local);
 	tlb->pages = tlb->local;
-	/*
-	 * Use fast mode if only 1 CPU is online.
-	 *
-	 * It would be tempting to turn on fast-mode for full_mm_flush as well.  But this
-	 * doesn't work because of speculative accesses and software prefetching: the page
-	 * table of "mm" may (and usually is) the currently active page table and even
-	 * though the kernel won't do any user-space accesses during the TLB shoot down, a
-	 * compiler might use speculation or lfetch.fault on what happens to be a valid
-	 * user-space address.  This in turn could trigger a TLB miss fault (or a VHPT
-	 * walk) and re-insert a TLB entry we just removed.  Slow mode avoids such
-	 * problems.  (We could make fast-mode work by switching the current task to a
-	 * different "mm" during the shootdown.) --davidm 08/02/2002
-	 */
-	tlb->nr = (num_online_cpus() == 1) ? ~0U : 0;
+	tlb->nr = 0;
 	tlb->fullmm = full_mm_flush;
 	tlb->start_addr = ~0UL;
 }
@@ -214,11 +194,6 @@ static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
 {
 	tlb->need_flush = 1;
 
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu */
-	}
-
 	if (!tlb->nr && tlb->pages == tlb->local)
 		__tlb_alloc_page(tlb);
 
diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h
index b1b1fa6..13821c3 100644
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -97,11 +97,9 @@ struct mmu_gather {
 	unsigned long		start;
 	unsigned long		end;
 	unsigned int		need_flush : 1,	/* Did free PTEs */
-				fast_mode  : 1; /* No batching   */
-
 	/* we are in the middle of an operation to clear
 	 * a full mm and can make some optimizations */
-	unsigned int		fullmm : 1,
+				fullmm : 1,
 	/* we have performed an operation which
 	 * requires a complete flush of the tlb */
 				need_flush_all : 1;
@@ -114,19 +112,6 @@ struct mmu_gather {
 
 #define HAVE_GENERIC_MMU_GATHER
 
-static inline int tlb_fast_mode(struct mmu_gather *tlb)
-{
-#ifdef CONFIG_SMP
-	return tlb->fast_mode;
-#else
-	/*
-	 * For UP we don't need to worry about TLB flush
-	 * and page free order so much..
-	 */
-	return 1;
-#endif
-}
-
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
 void tlb_flush_mmu(struct mmu_gather *tlb);
 void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start,
diff --git a/mm/memory.c b/mm/memory.c
index d7d54a1..95d0cce 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -220,7 +220,6 @@ void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm)
 	tlb->start	= -1UL;
 	tlb->end	= 0;
 	tlb->need_flush = 0;
-	tlb->fast_mode  = (num_possible_cpus() == 1);
 	tlb->local.next = NULL;
 	tlb->local.nr   = 0;
 	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
@@ -244,9 +243,6 @@ void tlb_flush_mmu(struct mmu_gather *tlb)
 	tlb_table_flush(tlb);
 #endif
 
-	if (tlb_fast_mode(tlb))
-		return;
-
 	for (batch = &tlb->local; batch; batch = batch->next) {
 		free_pages_and_swap_cache(batch->pages, batch->nr);
 		batch->nr = 0;
@@ -288,11 +284,6 @@ int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
 
 	VM_BUG_ON(!tlb->need_flush);
 
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu() */
-	}
-
 	batch = tlb->active;
 	batch->pages[batch->nr++] = page;
 	if (batch->nr == batch->max) {


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-06-05  0:05                   ` Linus Torvalds
  0 siblings, 0 replies; 70+ messages in thread
From: Linus Torvalds @ 2013-06-05  0:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Max Filippov, Vineet Gupta, KAMEZAWA Hiroyuki, linux-arch,
	linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins, Thomas Gleixner, Andrew Morton,
	Ingo Molnar, Tony Luck



On Tue, 4 Jun 2013, Peter Zijlstra wrote:
> 
> And here's the patch that makes fast mode go *poof*..

Let's just do this. Mind sending a patch with proper changelog and 
sign-off?

             Linus

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-06-05  0:05                   ` Linus Torvalds
  0 siblings, 0 replies; 70+ messages in thread
From: Linus Torvalds @ 2013-06-05  0:05 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Max Filippov, Vineet Gupta, KAMEZAWA Hiroyuki, linux-arch,
	linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins, Thomas Gleixner, Andrew Morton,
	Ingo Molnar, Tony Luck



On Tue, 4 Jun 2013, Peter Zijlstra wrote:
> 
> And here's the patch that makes fast mode go *poof*..

Let's just do this. Mind sending a patch with proper changelog and 
sign-off?

             Linus

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH] arch, mm: Remove tlb_fast_mode()
@ 2013-06-05 10:26                     ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-06-05 10:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Max Filippov, Vineet Gupta, KAMEZAWA Hiroyuki, linux-arch,
	linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins, Thomas Gleixner, Andrew Morton,
	Ingo Molnar, Tony Luck

On Wed, Jun 05, 2013 at 09:05:03AM +0900, Linus Torvalds wrote:
> 
> 
> On Tue, 4 Jun 2013, Peter Zijlstra wrote:
> > 
> > And here's the patch that makes fast mode go *poof*..
> 
> Let's just do this. Mind sending a patch with proper changelog and 
> sign-off?

Sure, as an extra I even compile tested it! ;-) (including arm and
ia64).

---
Subject: arch, mm: Remove tlb_fast_mode()
From: Peter Zijlstra <peterz@infradead.org>

Since the introduction of preemptible mmu_gather TLB fast mode has been
broken. TLB fast mode relies on there being absolutely no concurrency;
it frees pages first and invalidates TLBs later.

However now we can get concurrency and stuff goes *bang*.

This patch removes all tlb_fast_mode() code; it was found the better
option vs trying to patch the hole by entangling tlb invalidation with
the scheduler.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/arm/include/asm/tlb.h  |   27 ++++-----------------------
 arch/ia64/include/asm/tlb.h |   41 ++++++++---------------------------------
 include/asm-generic/tlb.h   |   17 +----------------
 mm/memory.c                 |    9 ---------
 4 files changed, 13 insertions(+), 81 deletions(-)

--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -33,18 +33,6 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 
-/*
- * We need to delay page freeing for SMP as other CPUs can access pages
- * which have been removed but not yet had their TLB entries invalidated.
- * Also, as ARMv7 speculative prefetch can drag new entries into the TLB,
- * we need to apply this same delaying tactic to ensure correct operation.
- */
-#if defined(CONFIG_SMP) || defined(CONFIG_CPU_32v7)
-#define tlb_fast_mode(tlb)	0
-#else
-#define tlb_fast_mode(tlb)	1
-#endif
-
 #define MMU_GATHER_BUNDLE	8
 
 /*
@@ -112,12 +100,10 @@ static inline void __tlb_alloc_page(stru
 static inline void tlb_flush_mmu(struct mmu_gather *tlb)
 {
 	tlb_flush(tlb);
-	if (!tlb_fast_mode(tlb)) {
-		free_pages_and_swap_cache(tlb->pages, tlb->nr);
-		tlb->nr = 0;
-		if (tlb->pages == tlb->local)
-			__tlb_alloc_page(tlb);
-	}
+	free_pages_and_swap_cache(tlb->pages, tlb->nr);
+	tlb->nr = 0;
+	if (tlb->pages == tlb->local)
+		__tlb_alloc_page(tlb);
 }
 
 static inline void
@@ -178,11 +164,6 @@ tlb_end_vma(struct mmu_gather *tlb, stru
 
 static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
 {
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu */
-	}
-
 	tlb->pages[tlb->nr++] = page;
 	VM_BUG_ON(tlb->nr > tlb->max);
 	return tlb->max - tlb->nr;
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -46,12 +46,6 @@
 #include <asm/tlbflush.h>
 #include <asm/machvec.h>
 
-#ifdef CONFIG_SMP
-# define tlb_fast_mode(tlb)	((tlb)->nr == ~0U)
-#else
-# define tlb_fast_mode(tlb)	(1)
-#endif
-
 /*
  * If we can't allocate a page to make a big batch of page pointers
  * to work on, then just handle a few from the on-stack structure.
@@ -60,7 +54,7 @@
 
 struct mmu_gather {
 	struct mm_struct	*mm;
-	unsigned int		nr;		/* == ~0U => fast mode */
+	unsigned int		nr;
 	unsigned int		max;
 	unsigned char		fullmm;		/* non-zero means full mm flush */
 	unsigned char		need_flush;	/* really unmapped some PTEs? */
@@ -103,6 +97,7 @@ extern struct ia64_tr_entry *ia64_idtrs[
 static inline void
 ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
 {
+	unsigned long i;
 	unsigned int nr;
 
 	if (!tlb->need_flush)
@@ -141,13 +136,11 @@ ia64_tlb_flush_mmu (struct mmu_gather *t
 
 	/* lastly, release the freed pages */
 	nr = tlb->nr;
-	if (!tlb_fast_mode(tlb)) {
-		unsigned long i;
-		tlb->nr = 0;
-		tlb->start_addr = ~0UL;
-		for (i = 0; i < nr; ++i)
-			free_page_and_swap_cache(tlb->pages[i]);
-	}
+
+	tlb->nr = 0;
+	tlb->start_addr = ~0UL;
+	for (i = 0; i < nr; ++i)
+		free_page_and_swap_cache(tlb->pages[i]);
 }
 
 static inline void __tlb_alloc_page(struct mmu_gather *tlb)
@@ -167,20 +160,7 @@ tlb_gather_mmu(struct mmu_gather *tlb, s
 	tlb->mm = mm;
 	tlb->max = ARRAY_SIZE(tlb->local);
 	tlb->pages = tlb->local;
-	/*
-	 * Use fast mode if only 1 CPU is online.
-	 *
-	 * It would be tempting to turn on fast-mode for full_mm_flush as well.  But this
-	 * doesn't work because of speculative accesses and software prefetching: the page
-	 * table of "mm" may (and usually is) the currently active page table and even
-	 * though the kernel won't do any user-space accesses during the TLB shoot down, a
-	 * compiler might use speculation or lfetch.fault on what happens to be a valid
-	 * user-space address.  This in turn could trigger a TLB miss fault (or a VHPT
-	 * walk) and re-insert a TLB entry we just removed.  Slow mode avoids such
-	 * problems.  (We could make fast-mode work by switching the current task to a
-	 * different "mm" during the shootdown.) --davidm 08/02/2002
-	 */
-	tlb->nr = (num_online_cpus() == 1) ? ~0U : 0;
+	tlb->nr = 0;
 	tlb->fullmm = full_mm_flush;
 	tlb->start_addr = ~0UL;
 }
@@ -214,11 +194,6 @@ static inline int __tlb_remove_page(stru
 {
 	tlb->need_flush = 1;
 
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu */
-	}
-
 	if (!tlb->nr && tlb->pages == tlb->local)
 		__tlb_alloc_page(tlb);
 
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -97,11 +97,9 @@ struct mmu_gather {
 	unsigned long		start;
 	unsigned long		end;
 	unsigned int		need_flush : 1,	/* Did free PTEs */
-				fast_mode  : 1; /* No batching   */
-
 	/* we are in the middle of an operation to clear
 	 * a full mm and can make some optimizations */
-	unsigned int		fullmm : 1,
+				fullmm : 1,
 	/* we have performed an operation which
 	 * requires a complete flush of the tlb */
 				need_flush_all : 1;
@@ -114,19 +112,6 @@ struct mmu_gather {
 
 #define HAVE_GENERIC_MMU_GATHER
 
-static inline int tlb_fast_mode(struct mmu_gather *tlb)
-{
-#ifdef CONFIG_SMP
-	return tlb->fast_mode;
-#else
-	/*
-	 * For UP we don't need to worry about TLB flush
-	 * and page free order so much..
-	 */
-	return 1;
-#endif
-}
-
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
 void tlb_flush_mmu(struct mmu_gather *tlb);
 void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start,
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -220,7 +220,6 @@ void tlb_gather_mmu(struct mmu_gather *t
 	tlb->start	= -1UL;
 	tlb->end	= 0;
 	tlb->need_flush = 0;
-	tlb->fast_mode  = (num_possible_cpus() == 1);
 	tlb->local.next = NULL;
 	tlb->local.nr   = 0;
 	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
@@ -244,9 +243,6 @@ void tlb_flush_mmu(struct mmu_gather *tl
 	tlb_table_flush(tlb);
 #endif
 
-	if (tlb_fast_mode(tlb))
-		return;
-
 	for (batch = &tlb->local; batch; batch = batch->next) {
 		free_pages_and_swap_cache(batch->pages, batch->nr);
 		batch->nr = 0;
@@ -288,11 +284,6 @@ int __tlb_remove_page(struct mmu_gather
 
 	VM_BUG_ON(!tlb->need_flush);
 
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu() */
-	}
-
 	batch = tlb->active;
 	batch->pages[batch->nr++] = page;
 	if (batch->nr == batch->max) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH] arch, mm: Remove tlb_fast_mode()
@ 2013-06-05 10:26                     ` Peter Zijlstra
  0 siblings, 0 replies; 70+ messages in thread
From: Peter Zijlstra @ 2013-06-05 10:26 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Max Filippov, Vineet Gupta, KAMEZAWA Hiroyuki, linux-arch,
	linux-mm, Ralf Baechle, Chris Zankel, Marc Gauthier,
	linux-xtensa, Hugh Dickins, Thomas Gleixner, Andrew Morton,
	Ingo Molnar, Tony Luck

On Wed, Jun 05, 2013 at 09:05:03AM +0900, Linus Torvalds wrote:
> 
> 
> On Tue, 4 Jun 2013, Peter Zijlstra wrote:
> > 
> > And here's the patch that makes fast mode go *poof*..
> 
> Let's just do this. Mind sending a patch with proper changelog and 
> sign-off?

Sure, as an extra I even compile tested it! ;-) (including arm and
ia64).

---
Subject: arch, mm: Remove tlb_fast_mode()
From: Peter Zijlstra <peterz@infradead.org>

Since the introduction of preemptible mmu_gather TLB fast mode has been
broken. TLB fast mode relies on there being absolutely no concurrency;
it frees pages first and invalidates TLBs later.

However now we can get concurrency and stuff goes *bang*.

This patch removes all tlb_fast_mode() code; it was found the better
option vs trying to patch the hole by entangling tlb invalidation with
the scheduler.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 arch/arm/include/asm/tlb.h  |   27 ++++-----------------------
 arch/ia64/include/asm/tlb.h |   41 ++++++++---------------------------------
 include/asm-generic/tlb.h   |   17 +----------------
 mm/memory.c                 |    9 ---------
 4 files changed, 13 insertions(+), 81 deletions(-)

--- a/arch/arm/include/asm/tlb.h
+++ b/arch/arm/include/asm/tlb.h
@@ -33,18 +33,6 @@
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 
-/*
- * We need to delay page freeing for SMP as other CPUs can access pages
- * which have been removed but not yet had their TLB entries invalidated.
- * Also, as ARMv7 speculative prefetch can drag new entries into the TLB,
- * we need to apply this same delaying tactic to ensure correct operation.
- */
-#if defined(CONFIG_SMP) || defined(CONFIG_CPU_32v7)
-#define tlb_fast_mode(tlb)	0
-#else
-#define tlb_fast_mode(tlb)	1
-#endif
-
 #define MMU_GATHER_BUNDLE	8
 
 /*
@@ -112,12 +100,10 @@ static inline void __tlb_alloc_page(stru
 static inline void tlb_flush_mmu(struct mmu_gather *tlb)
 {
 	tlb_flush(tlb);
-	if (!tlb_fast_mode(tlb)) {
-		free_pages_and_swap_cache(tlb->pages, tlb->nr);
-		tlb->nr = 0;
-		if (tlb->pages == tlb->local)
-			__tlb_alloc_page(tlb);
-	}
+	free_pages_and_swap_cache(tlb->pages, tlb->nr);
+	tlb->nr = 0;
+	if (tlb->pages == tlb->local)
+		__tlb_alloc_page(tlb);
 }
 
 static inline void
@@ -178,11 +164,6 @@ tlb_end_vma(struct mmu_gather *tlb, stru
 
 static inline int __tlb_remove_page(struct mmu_gather *tlb, struct page *page)
 {
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu */
-	}
-
 	tlb->pages[tlb->nr++] = page;
 	VM_BUG_ON(tlb->nr > tlb->max);
 	return tlb->max - tlb->nr;
--- a/arch/ia64/include/asm/tlb.h
+++ b/arch/ia64/include/asm/tlb.h
@@ -46,12 +46,6 @@
 #include <asm/tlbflush.h>
 #include <asm/machvec.h>
 
-#ifdef CONFIG_SMP
-# define tlb_fast_mode(tlb)	((tlb)->nr == ~0U)
-#else
-# define tlb_fast_mode(tlb)	(1)
-#endif
-
 /*
  * If we can't allocate a page to make a big batch of page pointers
  * to work on, then just handle a few from the on-stack structure.
@@ -60,7 +54,7 @@
 
 struct mmu_gather {
 	struct mm_struct	*mm;
-	unsigned int		nr;		/* == ~0U => fast mode */
+	unsigned int		nr;
 	unsigned int		max;
 	unsigned char		fullmm;		/* non-zero means full mm flush */
 	unsigned char		need_flush;	/* really unmapped some PTEs? */
@@ -103,6 +97,7 @@ extern struct ia64_tr_entry *ia64_idtrs[
 static inline void
 ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
 {
+	unsigned long i;
 	unsigned int nr;
 
 	if (!tlb->need_flush)
@@ -141,13 +136,11 @@ ia64_tlb_flush_mmu (struct mmu_gather *t
 
 	/* lastly, release the freed pages */
 	nr = tlb->nr;
-	if (!tlb_fast_mode(tlb)) {
-		unsigned long i;
-		tlb->nr = 0;
-		tlb->start_addr = ~0UL;
-		for (i = 0; i < nr; ++i)
-			free_page_and_swap_cache(tlb->pages[i]);
-	}
+
+	tlb->nr = 0;
+	tlb->start_addr = ~0UL;
+	for (i = 0; i < nr; ++i)
+		free_page_and_swap_cache(tlb->pages[i]);
 }
 
 static inline void __tlb_alloc_page(struct mmu_gather *tlb)
@@ -167,20 +160,7 @@ tlb_gather_mmu(struct mmu_gather *tlb, s
 	tlb->mm = mm;
 	tlb->max = ARRAY_SIZE(tlb->local);
 	tlb->pages = tlb->local;
-	/*
-	 * Use fast mode if only 1 CPU is online.
-	 *
-	 * It would be tempting to turn on fast-mode for full_mm_flush as well.  But this
-	 * doesn't work because of speculative accesses and software prefetching: the page
-	 * table of "mm" may (and usually is) the currently active page table and even
-	 * though the kernel won't do any user-space accesses during the TLB shoot down, a
-	 * compiler might use speculation or lfetch.fault on what happens to be a valid
-	 * user-space address.  This in turn could trigger a TLB miss fault (or a VHPT
-	 * walk) and re-insert a TLB entry we just removed.  Slow mode avoids such
-	 * problems.  (We could make fast-mode work by switching the current task to a
-	 * different "mm" during the shootdown.) --davidm 08/02/2002
-	 */
-	tlb->nr = (num_online_cpus() == 1) ? ~0U : 0;
+	tlb->nr = 0;
 	tlb->fullmm = full_mm_flush;
 	tlb->start_addr = ~0UL;
 }
@@ -214,11 +194,6 @@ static inline int __tlb_remove_page(stru
 {
 	tlb->need_flush = 1;
 
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu */
-	}
-
 	if (!tlb->nr && tlb->pages == tlb->local)
 		__tlb_alloc_page(tlb);
 
--- a/include/asm-generic/tlb.h
+++ b/include/asm-generic/tlb.h
@@ -97,11 +97,9 @@ struct mmu_gather {
 	unsigned long		start;
 	unsigned long		end;
 	unsigned int		need_flush : 1,	/* Did free PTEs */
-				fast_mode  : 1; /* No batching   */
-
 	/* we are in the middle of an operation to clear
 	 * a full mm and can make some optimizations */
-	unsigned int		fullmm : 1,
+				fullmm : 1,
 	/* we have performed an operation which
 	 * requires a complete flush of the tlb */
 				need_flush_all : 1;
@@ -114,19 +112,6 @@ struct mmu_gather {
 
 #define HAVE_GENERIC_MMU_GATHER
 
-static inline int tlb_fast_mode(struct mmu_gather *tlb)
-{
-#ifdef CONFIG_SMP
-	return tlb->fast_mode;
-#else
-	/*
-	 * For UP we don't need to worry about TLB flush
-	 * and page free order so much..
-	 */
-	return 1;
-#endif
-}
-
 void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, bool fullmm);
 void tlb_flush_mmu(struct mmu_gather *tlb);
 void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start,
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -220,7 +220,6 @@ void tlb_gather_mmu(struct mmu_gather *t
 	tlb->start	= -1UL;
 	tlb->end	= 0;
 	tlb->need_flush = 0;
-	tlb->fast_mode  = (num_possible_cpus() == 1);
 	tlb->local.next = NULL;
 	tlb->local.nr   = 0;
 	tlb->local.max  = ARRAY_SIZE(tlb->__pages);
@@ -244,9 +243,6 @@ void tlb_flush_mmu(struct mmu_gather *tl
 	tlb_table_flush(tlb);
 #endif
 
-	if (tlb_fast_mode(tlb))
-		return;
-
 	for (batch = &tlb->local; batch; batch = batch->next) {
 		free_pages_and_swap_cache(batch->pages, batch->nr);
 		batch->nr = 0;
@@ -288,11 +284,6 @@ int __tlb_remove_page(struct mmu_gather
 
 	VM_BUG_ON(!tlb->need_flush);
 
-	if (tlb_fast_mode(tlb)) {
-		free_page_and_swap_cache(page);
-		return 1; /* avoid calling tlb_flush_mmu() */
-	}
-
 	batch = tlb->active;
 	batch->pages[batch->nr++] = page;
 	if (batch->nr == batch->max) {


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: TLB and PTE coherency during munmap
@ 2013-06-07  2:21 George Spelvin
  0 siblings, 0 replies; 70+ messages in thread
From: George Spelvin @ 2013-06-07  2:21 UTC (permalink / raw)
  To: peterz; +Cc: linux, linux-kernel

It's a bug in the original code, but as long as you're cleaning it up:

@@ -103,6 +97,7 @@ extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
 static inline void
 ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
 {
+	unsigned long i;
 	unsigned int nr;
 
 	if (!tlb->need_flush)
@@ -141,13 +136,11 @@ ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long e
 
 	/* lastly, release the freed pages */
 	nr = tlb->nr;
-	if (!tlb_fast_mode(tlb)) {
-		unsigned long i;
-		tlb->nr = 0;
-		tlb->start_addr = ~0UL;
-		for (i = 0; i < nr; ++i)
-			free_page_and_swap_cache(tlb->pages[i]);
-	}
+
+	tlb->nr = 0;
+	tlb->start_addr = ~0UL;
+	for (i = 0; i < nr; ++i)
+		free_page_and_swap_cache(tlb->pages[i]);
 }


How about making that first hunk:

@@ -103,6 +97,6 @@ extern struct ia64_tr_entry *ia64_idtrs[NR_CPUS];
 static inline void
 ia64_tlb_flush_mmu (struct mmu_gather *tlb, unsigned long start, unsigned long end)
 {
-	unsigned int nr;
+	unsigned int nr, i;
 
 	if (!tlb->need_flush)

Because i does not need to be 64 bits; tlb->nr is only 32.

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2013-06-07  2:28 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-26  2:42 TLB and PTE coherency during munmap Max Filippov
2013-05-26  2:50 ` Max Filippov
2013-05-26  2:50   ` Max Filippov
2013-05-28  7:10   ` Max Filippov
2013-05-28  7:10     ` Max Filippov
2013-05-29 12:27     ` Peter Zijlstra
2013-05-29 12:27       ` Peter Zijlstra
2013-05-29 12:42       ` Vineet Gupta
2013-05-29 12:42         ` Vineet Gupta
2013-05-29 12:47         ` Peter Zijlstra
2013-05-29 12:47           ` Peter Zijlstra
2013-05-29 17:51         ` Peter Zijlstra
2013-05-29 17:51           ` Peter Zijlstra
2013-05-29 22:04           ` Catalin Marinas
2013-05-29 22:04             ` Catalin Marinas
2013-05-30  6:48             ` Peter Zijlstra
2013-05-30  6:48               ` Peter Zijlstra
2013-05-30  5:04           ` Vineet Gupta
2013-05-30  5:04             ` Vineet Gupta
2013-05-30  6:56             ` Peter Zijlstra
2013-05-30  6:56               ` Peter Zijlstra
2013-05-30  7:00               ` Vineet Gupta
2013-05-30  7:00                 ` Vineet Gupta
2013-05-30 11:03                 ` Peter Zijlstra
2013-05-30 11:03                   ` Peter Zijlstra
2013-05-31  4:09           ` Max Filippov
2013-05-31  4:09             ` Max Filippov
2013-05-31  7:55             ` Peter Zijlstra
2013-05-31  7:55               ` Peter Zijlstra
2013-06-03  9:05             ` Peter Zijlstra
2013-06-03  9:05               ` Peter Zijlstra
2013-06-03  9:16               ` Ingo Molnar
2013-06-03  9:16                 ` Ingo Molnar
2013-06-03 10:01                 ` Catalin Marinas
2013-06-03 10:01                   ` Catalin Marinas
2013-06-03 10:04                   ` Peter Zijlstra
2013-06-03 10:04                     ` Peter Zijlstra
2013-06-03 10:09                     ` Catalin Marinas
2013-06-03 10:09                       ` Catalin Marinas
2013-06-04  9:52               ` Peter Zijlstra
2013-06-04  9:52                 ` Peter Zijlstra
2013-06-05  0:05                 ` Linus Torvalds
2013-06-05  0:05                   ` Linus Torvalds
2013-06-05 10:26                   ` [PATCH] arch, mm: Remove tlb_fast_mode() Peter Zijlstra
2013-06-05 10:26                     ` Peter Zijlstra
2013-05-31  1:40       ` TLB and PTE coherency during munmap Max Filippov
2013-05-31  1:40         ` Max Filippov
2013-05-28 14:34   ` Konrad Rzeszutek Wilk
2013-05-28 14:34     ` Konrad Rzeszutek Wilk
2013-05-29  3:23     ` Max Filippov
2013-05-29  3:23       ` Max Filippov
2013-05-28 15:16   ` Michal Hocko
2013-05-28 15:16     ` Michal Hocko
2013-05-28 15:23     ` Catalin Marinas
2013-05-28 15:23       ` Catalin Marinas
2013-05-28 14:35 ` Catalin Marinas
2013-05-29  4:15   ` Max Filippov
2013-05-29  4:15     ` Max Filippov
2013-05-29 10:15     ` Catalin Marinas
2013-05-29 10:15       ` Catalin Marinas
2013-05-31  1:26       ` Max Filippov
2013-05-31  1:26         ` Max Filippov
2013-05-31  9:06         ` Catalin Marinas
2013-05-31  9:06           ` Catalin Marinas
2013-06-03  9:16         ` Max Filippov
2013-06-03  9:16           ` Max Filippov
2013-05-29 11:53   ` Vineet Gupta
2013-05-29 12:00   ` Vineet Gupta
2013-05-29 12:00     ` Vineet Gupta
2013-06-07  2:21 George Spelvin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.