All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Fixes for SW PAN
@ 2017-12-06 11:16 Will Deacon
  2017-12-06 11:16 ` [PATCH 1/2] arm64: SW PAN: Point saved ttbr0 at the zero page when switching to init_mm Will Deacon
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Will Deacon @ 2017-12-06 11:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

After lots of collective head scratching in response to Vinayak's mail
here:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545641.html

It turns out that we have a problem with SW PAN and kernel threads, where
the saved ttbr0 value for a kernel thread can be stale and subsequently
inherited by other kernel threads over a fork.

These two patches attempt to fix that. We've not be able to reproduce
the exact failure reported above, but I added some assertions to the
uaccess routines to check for discrepancies between the active_mm pgd
and the saved ttbr0 value (ignoring the zero page) and these no longer
fire with these changes, but do fire without them if EFI runtime services
are enabled on my Seattle board.

Cheers,

Will

--->8

Will Deacon (2):
  arm64: SW PAN: Point saved ttbr0 at the zero page when switching to
    init_mm
  arm64: SW PAN: Update saved ttbr0 value on enter_lazy_tlb

 arch/arm64/include/asm/efi.h         |  4 +---
 arch/arm64/include/asm/mmu_context.h | 46 ++++++++++++++++++------------------
 2 files changed, 24 insertions(+), 26 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] arm64: SW PAN: Point saved ttbr0 at the zero page when switching to init_mm
  2017-12-06 11:16 [PATCH 0/2] Fixes for SW PAN Will Deacon
@ 2017-12-06 11:16 ` Will Deacon
  2017-12-06 12:09   ` Mark Rutland
  2017-12-06 12:15   ` Catalin Marinas
  2017-12-06 11:16 ` [PATCH 2/2] arm64: SW PAN: Update saved ttbr0 value on enter_lazy_tlb Will Deacon
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 17+ messages in thread
From: Will Deacon @ 2017-12-06 11:16 UTC (permalink / raw)
  To: linux-arm-kernel

update_saved_ttbr0 mandates that mm->pgd is not swapper, since swapper
contains kernel mappings and should never be installed into ttbr0. However,
this means that callers must avoid passing the init_mm to update_saved_ttbr0
which in turn can cause the saved ttbr0 value to be out-of-date in the context
of the idle thread. For example, EFI runtime services may leave the saved ttbr0
pointing at the EFI page table, and kernel threads may end up with stale
references to freed page tables.

This patch changes update_saved_ttbr0 so that the init_mm points the saved
ttbr0 value to the empty zero page, which always exists and never contains
valid translations. EFI and switch can then call into update_saved_ttbr0
unconditionally.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Reported-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/efi.h         |  4 +---
 arch/arm64/include/asm/mmu_context.h | 22 +++++++++++++---------
 2 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
index 650344d01124..c4cd5081d78b 100644
--- a/arch/arm64/include/asm/efi.h
+++ b/arch/arm64/include/asm/efi.h
@@ -132,11 +132,9 @@ static inline void efi_set_pgd(struct mm_struct *mm)
 			 * Defer the switch to the current thread's TTBR0_EL1
 			 * until uaccess_enable(). Restore the current
 			 * thread's saved ttbr0 corresponding to its active_mm
-			 * (if different from init_mm).
 			 */
 			cpu_set_reserved_ttbr0();
-			if (current->active_mm != &init_mm)
-				update_saved_ttbr0(current, current->active_mm);
+			update_saved_ttbr0(current, current->active_mm);
 		}
 	}
 }
diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 3257895a9b5e..f7773f90546e 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -174,11 +174,17 @@ enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
 static inline void update_saved_ttbr0(struct task_struct *tsk,
 				      struct mm_struct *mm)
 {
-	if (system_uses_ttbr0_pan()) {
-		BUG_ON(mm->pgd == swapper_pg_dir);
-		task_thread_info(tsk)->ttbr0 =
-			virt_to_phys(mm->pgd) | ASID(mm) << 48;
-	}
+	u64 ttbr;
+
+	if (!system_uses_ttbr0_pan())
+		return;
+
+	if (mm == &init_mm)
+		ttbr = __pa_symbol(empty_zero_page);
+	else
+		ttbr = virt_to_phys(mm->pgd) | ASID(mm) << 48;
+
+	task_thread_info(tsk)->ttbr0 = ttbr;
 }
 #else
 static inline void update_saved_ttbr0(struct task_struct *tsk,
@@ -214,11 +220,9 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
 	 * Update the saved TTBR0_EL1 of the scheduled-in task as the previous
 	 * value may have not been initialised yet (activate_mm caller) or the
 	 * ASID has changed since the last run (following the context switch
-	 * of another thread of the same process). Avoid setting the reserved
-	 * TTBR0_EL1 to swapper_pg_dir (init_mm; e.g. via idle_task_exit).
+	 * of another thread of the same process).
 	 */
-	if (next != &init_mm)
-		update_saved_ttbr0(tsk, next);
+	update_saved_ttbr0(tsk, next);
 }
 
 #define deactivate_mm(tsk,mm)	do { } while (0)
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/2] arm64: SW PAN: Update saved ttbr0 value on enter_lazy_tlb
  2017-12-06 11:16 [PATCH 0/2] Fixes for SW PAN Will Deacon
  2017-12-06 11:16 ` [PATCH 1/2] arm64: SW PAN: Point saved ttbr0 at the zero page when switching to init_mm Will Deacon
@ 2017-12-06 11:16 ` Will Deacon
  2017-12-06 12:10   ` Mark Rutland
  2017-12-06 12:17   ` Catalin Marinas
  2017-12-06 12:19 ` [PATCH 0/2] Fixes for SW PAN Mark Rutland
  2017-12-06 17:31 ` Vinayak Menon
  3 siblings, 2 replies; 17+ messages in thread
From: Will Deacon @ 2017-12-06 11:16 UTC (permalink / raw)
  To: linux-arm-kernel

enter_lazy_tlb is called when a kernel thread rides on the back of
another mm, due to a context switch or an explicit call to unuse_mm
where a call to switch_mm is elided.

In these cases, it's important to keep the saved ttbr value up to date
with the active mm, otherwise we can end up with a stale value which
points to a potentially freed page table.

This patch implements enter_lazy_tlb for arm64, so that the saved ttbr0
is kept up-to-date with the active mm for kernel threads.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Reported-by: Vinayak Menon <vinmenon@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/mmu_context.h | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index f7773f90546e..9d155fa9a507 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -156,20 +156,6 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu);
 
 #define init_new_context(tsk,mm)	({ atomic64_set(&(mm)->context.id, 0); 0; })
 
-/*
- * This is called when "tsk" is about to enter lazy TLB mode.
- *
- * mm:  describes the currently active mm context
- * tsk: task which is entering lazy tlb
- * cpu: cpu number which is entering lazy tlb
- *
- * tsk->mm will be NULL
- */
-static inline void
-enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
-{
-}
-
 #ifdef CONFIG_ARM64_SW_TTBR0_PAN
 static inline void update_saved_ttbr0(struct task_struct *tsk,
 				      struct mm_struct *mm)
@@ -193,6 +179,16 @@ static inline void update_saved_ttbr0(struct task_struct *tsk,
 }
 #endif
 
+static inline void
+enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
+{
+	/*
+	 * We don't actually care about the ttbr0 mapping, so point it at the
+	 * zero page.
+	 */
+	update_saved_ttbr0(tsk, &init_mm);
+}
+
 static inline void __switch_mm(struct mm_struct *next)
 {
 	unsigned int cpu = smp_processor_id();
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 1/2] arm64: SW PAN: Point saved ttbr0 at the zero page when switching to init_mm
  2017-12-06 11:16 ` [PATCH 1/2] arm64: SW PAN: Point saved ttbr0 at the zero page when switching to init_mm Will Deacon
@ 2017-12-06 12:09   ` Mark Rutland
  2017-12-06 12:15   ` Catalin Marinas
  1 sibling, 0 replies; 17+ messages in thread
From: Mark Rutland @ 2017-12-06 12:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Wed, Dec 06, 2017 at 11:16:07AM +0000, Will Deacon wrote:
> update_saved_ttbr0 mandates that mm->pgd is not swapper, since swapper
> contains kernel mappings and should never be installed into ttbr0. However,
> this means that callers must avoid passing the init_mm to update_saved_ttbr0
> which in turn can cause the saved ttbr0 value to be out-of-date in the context
> of the idle thread. For example, EFI runtime services may leave the saved ttbr0
> pointing at the EFI page table, and kernel threads may end up with stale
> references to freed page tables.

I think we should s/the idle thread/a kernel thread/ here, since IIUC
this could happen in the context of any kernel thread, and there are
multiple idle threads.

> This patch changes update_saved_ttbr0 so that the init_mm points the saved
> ttbr0 value to the empty zero page, which always exists and never contains
> valid translations. EFI and switch can then call into update_saved_ttbr0
> unconditionally.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Vinayak Menon <vinmenon@codeaurora.org>
> Reported-by: Vinayak Menon <vinmenon@codeaurora.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

I guess this should have:

Fixes: 39bc88e5e38e9b21 ("arm64: Disable TTBR0_EL1 during normal kernel execution")

Otherwise, looks good to me.

Mark.

> ---
>  arch/arm64/include/asm/efi.h         |  4 +---
>  arch/arm64/include/asm/mmu_context.h | 22 +++++++++++++---------
>  2 files changed, 14 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/efi.h b/arch/arm64/include/asm/efi.h
> index 650344d01124..c4cd5081d78b 100644
> --- a/arch/arm64/include/asm/efi.h
> +++ b/arch/arm64/include/asm/efi.h
> @@ -132,11 +132,9 @@ static inline void efi_set_pgd(struct mm_struct *mm)
>  			 * Defer the switch to the current thread's TTBR0_EL1
>  			 * until uaccess_enable(). Restore the current
>  			 * thread's saved ttbr0 corresponding to its active_mm
> -			 * (if different from init_mm).
>  			 */
>  			cpu_set_reserved_ttbr0();
> -			if (current->active_mm != &init_mm)
> -				update_saved_ttbr0(current, current->active_mm);
> +			update_saved_ttbr0(current, current->active_mm);
>  		}
>  	}
>  }
> diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
> index 3257895a9b5e..f7773f90546e 100644
> --- a/arch/arm64/include/asm/mmu_context.h
> +++ b/arch/arm64/include/asm/mmu_context.h
> @@ -174,11 +174,17 @@ enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
>  static inline void update_saved_ttbr0(struct task_struct *tsk,
>  				      struct mm_struct *mm)
>  {
> -	if (system_uses_ttbr0_pan()) {
> -		BUG_ON(mm->pgd == swapper_pg_dir);
> -		task_thread_info(tsk)->ttbr0 =
> -			virt_to_phys(mm->pgd) | ASID(mm) << 48;
> -	}
> +	u64 ttbr;
> +
> +	if (!system_uses_ttbr0_pan())
> +		return;
> +
> +	if (mm == &init_mm)
> +		ttbr = __pa_symbol(empty_zero_page);
> +	else
> +		ttbr = virt_to_phys(mm->pgd) | ASID(mm) << 48;
> +
> +	task_thread_info(tsk)->ttbr0 = ttbr;
>  }
>  #else
>  static inline void update_saved_ttbr0(struct task_struct *tsk,
> @@ -214,11 +220,9 @@ switch_mm(struct mm_struct *prev, struct mm_struct *next,
>  	 * Update the saved TTBR0_EL1 of the scheduled-in task as the previous
>  	 * value may have not been initialised yet (activate_mm caller) or the
>  	 * ASID has changed since the last run (following the context switch
> -	 * of another thread of the same process). Avoid setting the reserved
> -	 * TTBR0_EL1 to swapper_pg_dir (init_mm; e.g. via idle_task_exit).
> +	 * of another thread of the same process).
>  	 */
> -	if (next != &init_mm)
> -		update_saved_ttbr0(tsk, next);
> +	update_saved_ttbr0(tsk, next);
>  }
>  
>  #define deactivate_mm(tsk,mm)	do { } while (0)
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/2] arm64: SW PAN: Update saved ttbr0 value on enter_lazy_tlb
  2017-12-06 11:16 ` [PATCH 2/2] arm64: SW PAN: Update saved ttbr0 value on enter_lazy_tlb Will Deacon
@ 2017-12-06 12:10   ` Mark Rutland
  2017-12-06 12:17   ` Catalin Marinas
  1 sibling, 0 replies; 17+ messages in thread
From: Mark Rutland @ 2017-12-06 12:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 06, 2017 at 11:16:08AM +0000, Will Deacon wrote:
> enter_lazy_tlb is called when a kernel thread rides on the back of
> another mm, due to a context switch or an explicit call to unuse_mm
> where a call to switch_mm is elided.
> 
> In these cases, it's important to keep the saved ttbr value up to date
> with the active mm, otherwise we can end up with a stale value which
> points to a potentially freed page table.
> 
> This patch implements enter_lazy_tlb for arm64, so that the saved ttbr0
> is kept up-to-date with the active mm for kernel threads.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Vinayak Menon <vinmenon@codeaurora.org>
> Reported-by: Vinayak Menon <vinmenon@codeaurora.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

As with the prior patch, I think this needs:

Fixes: 39bc88e5e38e9b21 ("arm64: Disable TTBR0_EL1 during normal kernel execution")

Other than that, looks good to me.

Mark.

> ---
>  arch/arm64/include/asm/mmu_context.h | 24 ++++++++++--------------
>  1 file changed, 10 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
> index f7773f90546e..9d155fa9a507 100644
> --- a/arch/arm64/include/asm/mmu_context.h
> +++ b/arch/arm64/include/asm/mmu_context.h
> @@ -156,20 +156,6 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu);
>  
>  #define init_new_context(tsk,mm)	({ atomic64_set(&(mm)->context.id, 0); 0; })
>  
> -/*
> - * This is called when "tsk" is about to enter lazy TLB mode.
> - *
> - * mm:  describes the currently active mm context
> - * tsk: task which is entering lazy tlb
> - * cpu: cpu number which is entering lazy tlb
> - *
> - * tsk->mm will be NULL
> - */
> -static inline void
> -enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
> -{
> -}
> -
>  #ifdef CONFIG_ARM64_SW_TTBR0_PAN
>  static inline void update_saved_ttbr0(struct task_struct *tsk,
>  				      struct mm_struct *mm)
> @@ -193,6 +179,16 @@ static inline void update_saved_ttbr0(struct task_struct *tsk,
>  }
>  #endif
>  
> +static inline void
> +enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
> +{
> +	/*
> +	 * We don't actually care about the ttbr0 mapping, so point it at the
> +	 * zero page.
> +	 */
> +	update_saved_ttbr0(tsk, &init_mm);
> +}
> +
>  static inline void __switch_mm(struct mm_struct *next)
>  {
>  	unsigned int cpu = smp_processor_id();
> -- 
> 2.1.4
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/2] arm64: SW PAN: Point saved ttbr0 at the zero page when switching to init_mm
  2017-12-06 11:16 ` [PATCH 1/2] arm64: SW PAN: Point saved ttbr0 at the zero page when switching to init_mm Will Deacon
  2017-12-06 12:09   ` Mark Rutland
@ 2017-12-06 12:15   ` Catalin Marinas
  1 sibling, 0 replies; 17+ messages in thread
From: Catalin Marinas @ 2017-12-06 12:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 06, 2017 at 11:16:07AM +0000, Will Deacon wrote:
> update_saved_ttbr0 mandates that mm->pgd is not swapper, since swapper
> contains kernel mappings and should never be installed into ttbr0. However,
> this means that callers must avoid passing the init_mm to update_saved_ttbr0
> which in turn can cause the saved ttbr0 value to be out-of-date in the context
> of the idle thread. For example, EFI runtime services may leave the saved ttbr0
> pointing at the EFI page table, and kernel threads may end up with stale
> references to freed page tables.
> 
> This patch changes update_saved_ttbr0 so that the init_mm points the saved
> ttbr0 value to the empty zero page, which always exists and never contains
> valid translations. EFI and switch can then call into update_saved_ttbr0
> unconditionally.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Vinayak Menon <vinmenon@codeaurora.org>
> Reported-by: Vinayak Menon <vinmenon@codeaurora.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 2/2] arm64: SW PAN: Update saved ttbr0 value on enter_lazy_tlb
  2017-12-06 11:16 ` [PATCH 2/2] arm64: SW PAN: Update saved ttbr0 value on enter_lazy_tlb Will Deacon
  2017-12-06 12:10   ` Mark Rutland
@ 2017-12-06 12:17   ` Catalin Marinas
  1 sibling, 0 replies; 17+ messages in thread
From: Catalin Marinas @ 2017-12-06 12:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 06, 2017 at 11:16:08AM +0000, Will Deacon wrote:
> enter_lazy_tlb is called when a kernel thread rides on the back of
> another mm, due to a context switch or an explicit call to unuse_mm
> where a call to switch_mm is elided.
> 
> In these cases, it's important to keep the saved ttbr value up to date
> with the active mm, otherwise we can end up with a stale value which
> points to a potentially freed page table.
> 
> This patch implements enter_lazy_tlb for arm64, so that the saved ttbr0
> is kept up-to-date with the active mm for kernel threads.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Vinayak Menon <vinmenon@codeaurora.org>
> Reported-by: Vinayak Menon <vinmenon@codeaurora.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/2] Fixes for SW PAN
  2017-12-06 11:16 [PATCH 0/2] Fixes for SW PAN Will Deacon
  2017-12-06 11:16 ` [PATCH 1/2] arm64: SW PAN: Point saved ttbr0 at the zero page when switching to init_mm Will Deacon
  2017-12-06 11:16 ` [PATCH 2/2] arm64: SW PAN: Update saved ttbr0 value on enter_lazy_tlb Will Deacon
@ 2017-12-06 12:19 ` Mark Rutland
  2017-12-06 13:37   ` Will Deacon
  2017-12-06 17:31 ` Vinayak Menon
  3 siblings, 1 reply; 17+ messages in thread
From: Mark Rutland @ 2017-12-06 12:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 06, 2017 at 11:16:06AM +0000, Will Deacon wrote:
> Hi all,
> 
> After lots of collective head scratching in response to Vinayak's mail
> here:
> 
>   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545641.html
> 
> It turns out that we have a problem with SW PAN and kernel threads, where
> the saved ttbr0 value for a kernel thread can be stale and subsequently
> inherited by other kernel threads over a fork.
> 
> These two patches attempt to fix that. We've not be able to reproduce
> the exact failure reported above, but I added some assertions to the
> uaccess routines to check for discrepancies between the active_mm pgd
> and the saved ttbr0 value (ignoring the zero page) and these no longer
> fire with these changes, but do fire without them if EFI runtime services
> are enabled on my Seattle board.

Both patches look good to me, per my understanding of the problem, so
FWIW, for the series:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

However, I think we have another issue in this area. We have sequences
where we update the HW TTBR0 before the shadow, e.g. in efi_set_pgd(),
where we do:

	cpu_set_reserved_ttbr0();
	update_saved_ttbr0(current, current->active_mm);

... so if an exception comes in between after HW TTBR0 update but before
the shadow update, the stale shadow value will be restored upon the
exception return, leaving the HW TTBR0 stale.

That's easy enough to fix in the efi code, e.g.

	update_saved_ttbr0(current, current->active_mm);
	barrier();
	cpu_set_reserved_ttbr0();

... but I haven't yet figured out a nice way to sort out switch_mm(),
__switch_mm(), and check_and_switch_context().

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/2] Fixes for SW PAN
  2017-12-06 12:19 ` [PATCH 0/2] Fixes for SW PAN Mark Rutland
@ 2017-12-06 13:37   ` Will Deacon
  0 siblings, 0 replies; 17+ messages in thread
From: Will Deacon @ 2017-12-06 13:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 06, 2017 at 12:19:41PM +0000, Mark Rutland wrote:
> On Wed, Dec 06, 2017 at 11:16:06AM +0000, Will Deacon wrote:
> > After lots of collective head scratching in response to Vinayak's mail
> > here:
> > 
> >   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545641.html
> > 
> > It turns out that we have a problem with SW PAN and kernel threads, where
> > the saved ttbr0 value for a kernel thread can be stale and subsequently
> > inherited by other kernel threads over a fork.
> > 
> > These two patches attempt to fix that. We've not be able to reproduce
> > the exact failure reported above, but I added some assertions to the
> > uaccess routines to check for discrepancies between the active_mm pgd
> > and the saved ttbr0 value (ignoring the zero page) and these no longer
> > fire with these changes, but do fire without them if EFI runtime services
> > are enabled on my Seattle board.
> 
> Both patches look good to me, per my understanding of the problem, so
> FWIW, for the series:
> 
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> 
> However, I think we have another issue in this area. We have sequences
> where we update the HW TTBR0 before the shadow, e.g. in efi_set_pgd(),
> where we do:
> 
> 	cpu_set_reserved_ttbr0();
> 	update_saved_ttbr0(current, current->active_mm);
> 
> ... so if an exception comes in between after HW TTBR0 update but before
> the shadow update, the stale shadow value will be restored upon the
> exception return, leaving the HW TTBR0 stale.

I don't see how that can happen. The entry code checks the ttbr0 ASID value
and if it's zero (as it will be above), it sets the PAN bit in the saved PSR
and we avoid doing any uaccess toggling (including on exception return).

> That's easy enough to fix in the efi code, e.g.
> 
> 	update_saved_ttbr0(current, current->active_mm);
> 	barrier();
> 	cpu_set_reserved_ttbr0();
> 
> ... but I haven't yet figured out a nice way to sort out switch_mm(),
> __switch_mm(), and check_and_switch_context().

I don't see the problem there either. Can you elaborate please?

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/2] Fixes for SW PAN
  2017-12-06 11:16 [PATCH 0/2] Fixes for SW PAN Will Deacon
                   ` (2 preceding siblings ...)
  2017-12-06 12:19 ` [PATCH 0/2] Fixes for SW PAN Mark Rutland
@ 2017-12-06 17:31 ` Vinayak Menon
  2017-12-06 17:56   ` Will Deacon
  3 siblings, 1 reply; 17+ messages in thread
From: Vinayak Menon @ 2017-12-06 17:31 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/6/2017 4:46 PM, Will Deacon wrote:
> Hi all,
>
> After lots of collective head scratching in response to Vinayak's mail
> here:
>
>   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545641.html
>
> It turns out that we have a problem with SW PAN and kernel threads, where
> the saved ttbr0 value for a kernel thread can be stale and subsequently
> inherited by other kernel threads over a fork.
>
> These two patches attempt to fix that. We've not be able to reproduce
> the exact failure reported above, but I added some assertions to the
> uaccess routines to check for discrepancies between the active_mm pgd
> and the saved ttbr0 value (ignoring the zero page) and these no longer
> fire with these changes, but do fire without them if EFI runtime services
> are enabled on my Seattle board.

Thanks Will. So these 2 patches fix the case of kthreads having a stale saved ttbr0. The callstack I had shared
in the original issue description was not of a kthread (its user task with PF_KTHREAD not set. The tsk->mm was
set to NULL by exit_mm I think). So do you think this could be a different problem ?
I had a look at the dumps again and what I see is that, the PA part of the saved ttbr0
(from thread_info) is not the same as the pa(tsk->active_mm->pgd). The PA derived from saved ttbr0 actually
points to a page which is "now" owned by slab.
>
> Cheers,
>
> Will
>
> --->8
>
> Will Deacon (2):
>   arm64: SW PAN: Point saved ttbr0 at the zero page when switching to
>     init_mm
>   arm64: SW PAN: Update saved ttbr0 value on enter_lazy_tlb
>
>  arch/arm64/include/asm/efi.h         |  4 +---
>  arch/arm64/include/asm/mmu_context.h | 46 ++++++++++++++++++------------------
>  2 files changed, 24 insertions(+), 26 deletions(-)
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/2] Fixes for SW PAN
  2017-12-06 17:31 ` Vinayak Menon
@ 2017-12-06 17:56   ` Will Deacon
  2017-12-06 18:01     ` Catalin Marinas
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2017-12-06 17:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 06, 2017 at 11:01:46PM +0530, Vinayak Menon wrote:
> On 12/6/2017 4:46 PM, Will Deacon wrote:
> > Hi all,
> >
> > After lots of collective head scratching in response to Vinayak's mail
> > here:
> >
> >   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545641.html
> >
> > It turns out that we have a problem with SW PAN and kernel threads, where
> > the saved ttbr0 value for a kernel thread can be stale and subsequently
> > inherited by other kernel threads over a fork.
> >
> > These two patches attempt to fix that. We've not be able to reproduce
> > the exact failure reported above, but I added some assertions to the
> > uaccess routines to check for discrepancies between the active_mm pgd
> > and the saved ttbr0 value (ignoring the zero page) and these no longer
> > fire with these changes, but do fire without them if EFI runtime services
> > are enabled on my Seattle board.
> 
> Thanks Will. So these 2 patches fix the case of kthreads having a stale saved ttbr0. The callstack I had shared
> in the original issue description was not of a kthread (its user task with PF_KTHREAD not set. The tsk->mm was
> set to NULL by exit_mm I think). So do you think this could be a different problem ?
> I had a look at the dumps again and what I see is that, the PA part of the saved ttbr0
> (from thread_info) is not the same as the pa(tsk->active_mm->pgd). The PA derived from saved ttbr0 actually
> points to a page which is "now" owned by slab.

Having not been able to reproduce the failure you described, I can't give
you a good answer to this. How much information do you have about the task
that crashes?

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/2] Fixes for SW PAN
  2017-12-06 17:56   ` Will Deacon
@ 2017-12-06 18:01     ` Catalin Marinas
  2017-12-06 18:07       ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Catalin Marinas @ 2017-12-06 18:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 06, 2017 at 05:56:42PM +0000, Will Deacon wrote:
> On Wed, Dec 06, 2017 at 11:01:46PM +0530, Vinayak Menon wrote:
> > On 12/6/2017 4:46 PM, Will Deacon wrote:
> > > After lots of collective head scratching in response to Vinayak's mail
> > > here:
> > >
> > >   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545641.html
> > >
> > > It turns out that we have a problem with SW PAN and kernel threads, where
> > > the saved ttbr0 value for a kernel thread can be stale and subsequently
> > > inherited by other kernel threads over a fork.
> > >
> > > These two patches attempt to fix that. We've not be able to reproduce
> > > the exact failure reported above, but I added some assertions to the
> > > uaccess routines to check for discrepancies between the active_mm pgd
> > > and the saved ttbr0 value (ignoring the zero page) and these no longer
> > > fire with these changes, but do fire without them if EFI runtime services
> > > are enabled on my Seattle board.
> > 
> > Thanks Will. So these 2 patches fix the case of kthreads having a stale saved ttbr0. The callstack I had shared
> > in the original issue description was not of a kthread (its user task with PF_KTHREAD not set. The tsk->mm was
> > set to NULL by exit_mm I think). So do you think this could be a different problem ?
> > I had a look at the dumps again and what I see is that, the PA part of the saved ttbr0
> > (from thread_info) is not the same as the pa(tsk->active_mm->pgd). The PA derived from saved ttbr0 actually
> > points to a page which is "now" owned by slab.
> 
> Having not been able to reproduce the failure you described, I can't give
> you a good answer to this.

While these fixes make sense for a stable backport, I could go back to
per-cpu saved_ttbr0 as in the first version of this patchset:

http://lkml.kernel.org/r/1471015666-23125-4-git-send-email-catalin.marinas at arm.com

(changed in v2 for some marginally shorter asm code)

-- 
Catalin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/2] Fixes for SW PAN
  2017-12-06 18:01     ` Catalin Marinas
@ 2017-12-06 18:07       ` Will Deacon
  2017-12-06 18:18         ` Catalin Marinas
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2017-12-06 18:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 06, 2017 at 06:01:35PM +0000, Catalin Marinas wrote:
> On Wed, Dec 06, 2017 at 05:56:42PM +0000, Will Deacon wrote:
> > On Wed, Dec 06, 2017 at 11:01:46PM +0530, Vinayak Menon wrote:
> > > On 12/6/2017 4:46 PM, Will Deacon wrote:
> > > > After lots of collective head scratching in response to Vinayak's mail
> > > > here:
> > > >
> > > >   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545641.html
> > > >
> > > > It turns out that we have a problem with SW PAN and kernel threads, where
> > > > the saved ttbr0 value for a kernel thread can be stale and subsequently
> > > > inherited by other kernel threads over a fork.
> > > >
> > > > These two patches attempt to fix that. We've not be able to reproduce
> > > > the exact failure reported above, but I added some assertions to the
> > > > uaccess routines to check for discrepancies between the active_mm pgd
> > > > and the saved ttbr0 value (ignoring the zero page) and these no longer
> > > > fire with these changes, but do fire without them if EFI runtime services
> > > > are enabled on my Seattle board.
> > > 
> > > Thanks Will. So these 2 patches fix the case of kthreads having a stale saved ttbr0. The callstack I had shared
> > > in the original issue description was not of a kthread (its user task with PF_KTHREAD not set. The tsk->mm was
> > > set to NULL by exit_mm I think). So do you think this could be a different problem ?
> > > I had a look at the dumps again and what I see is that, the PA part of the saved ttbr0
> > > (from thread_info) is not the same as the pa(tsk->active_mm->pgd). The PA derived from saved ttbr0 actually
> > > points to a page which is "now" owned by slab.
> > 
> > Having not been able to reproduce the failure you described, I can't give
> > you a good answer to this.
> 
> While these fixes make sense for a stable backport, I could go back to
> per-cpu saved_ttbr0 as in the first version of this patchset:
> 
> http://lkml.kernel.org/r/1471015666-23125-4-git-send-email-catalin.marinas at arm.com
> 
> (changed in v2 for some marginally shorter asm code)

To be honest, if we're going to consider changes as fundamental as that, I'd
much prefer for us to use the active_mm->pgd directly, setting the zero page
if it's either NULL or init_mm. This would need some hacking for EFI...

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/2] Fixes for SW PAN
  2017-12-06 18:07       ` Will Deacon
@ 2017-12-06 18:18         ` Catalin Marinas
  2017-12-06 18:26           ` Will Deacon
  0 siblings, 1 reply; 17+ messages in thread
From: Catalin Marinas @ 2017-12-06 18:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 06, 2017 at 06:07:07PM +0000, Will Deacon wrote:
> On Wed, Dec 06, 2017 at 06:01:35PM +0000, Catalin Marinas wrote:
> > On Wed, Dec 06, 2017 at 05:56:42PM +0000, Will Deacon wrote:
> > > On Wed, Dec 06, 2017 at 11:01:46PM +0530, Vinayak Menon wrote:
> > > > On 12/6/2017 4:46 PM, Will Deacon wrote:
> > > > > After lots of collective head scratching in response to Vinayak's mail
> > > > > here:
> > > > >
> > > > >   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545641.html
> > > > >
> > > > > It turns out that we have a problem with SW PAN and kernel threads, where
> > > > > the saved ttbr0 value for a kernel thread can be stale and subsequently
> > > > > inherited by other kernel threads over a fork.
> > > > >
> > > > > These two patches attempt to fix that. We've not be able to reproduce
> > > > > the exact failure reported above, but I added some assertions to the
> > > > > uaccess routines to check for discrepancies between the active_mm pgd
> > > > > and the saved ttbr0 value (ignoring the zero page) and these no longer
> > > > > fire with these changes, but do fire without them if EFI runtime services
> > > > > are enabled on my Seattle board.
> > > > 
> > > > Thanks Will. So these 2 patches fix the case of kthreads having a stale saved ttbr0. The callstack I had shared
> > > > in the original issue description was not of a kthread (its user task with PF_KTHREAD not set. The tsk->mm was
> > > > set to NULL by exit_mm I think). So do you think this could be a different problem ?
> > > > I had a look at the dumps again and what I see is that, the PA part of the saved ttbr0
> > > > (from thread_info) is not the same as the pa(tsk->active_mm->pgd). The PA derived from saved ttbr0 actually
> > > > points to a page which is "now" owned by slab.
> > > 
> > > Having not been able to reproduce the failure you described, I can't give
> > > you a good answer to this.
> > 
> > While these fixes make sense for a stable backport, I could go back to
> > per-cpu saved_ttbr0 as in the first version of this patchset:
> > 
> > http://lkml.kernel.org/r/1471015666-23125-4-git-send-email-catalin.marinas at arm.com
> > 
> > (changed in v2 for some marginally shorter asm code)
> 
> To be honest, if we're going to consider changes as fundamental as that, I'd
> much prefer for us to use the active_mm->pgd directly, setting the zero page
> if it's either NULL or init_mm. This would need some hacking for EFI...

The problem is the __pa(active_mm->pgd) and doing it in assembly may be
pretty unreadable.

Yet another option is to move save_ttbr0 in mm_context_t.

-- 
Catalin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/2] Fixes for SW PAN
  2017-12-06 18:18         ` Catalin Marinas
@ 2017-12-06 18:26           ` Will Deacon
  2017-12-07  8:55             ` Vinayak Menon
  0 siblings, 1 reply; 17+ messages in thread
From: Will Deacon @ 2017-12-06 18:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 06, 2017 at 06:18:01PM +0000, Catalin Marinas wrote:
> On Wed, Dec 06, 2017 at 06:07:07PM +0000, Will Deacon wrote:
> > On Wed, Dec 06, 2017 at 06:01:35PM +0000, Catalin Marinas wrote:
> > > On Wed, Dec 06, 2017 at 05:56:42PM +0000, Will Deacon wrote:
> > > > On Wed, Dec 06, 2017 at 11:01:46PM +0530, Vinayak Menon wrote:
> > > > > On 12/6/2017 4:46 PM, Will Deacon wrote:
> > > > > > After lots of collective head scratching in response to Vinayak's mail
> > > > > > here:
> > > > > >
> > > > > >   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545641.html
> > > > > >
> > > > > > It turns out that we have a problem with SW PAN and kernel threads, where
> > > > > > the saved ttbr0 value for a kernel thread can be stale and subsequently
> > > > > > inherited by other kernel threads over a fork.
> > > > > >
> > > > > > These two patches attempt to fix that. We've not be able to reproduce
> > > > > > the exact failure reported above, but I added some assertions to the
> > > > > > uaccess routines to check for discrepancies between the active_mm pgd
> > > > > > and the saved ttbr0 value (ignoring the zero page) and these no longer
> > > > > > fire with these changes, but do fire without them if EFI runtime services
> > > > > > are enabled on my Seattle board.
> > > > > 
> > > > > Thanks Will. So these 2 patches fix the case of kthreads having a stale saved ttbr0. The callstack I had shared
> > > > > in the original issue description was not of a kthread (its user task with PF_KTHREAD not set. The tsk->mm was
> > > > > set to NULL by exit_mm I think). So do you think this could be a different problem ?
> > > > > I had a look at the dumps again and what I see is that, the PA part of the saved ttbr0
> > > > > (from thread_info) is not the same as the pa(tsk->active_mm->pgd). The PA derived from saved ttbr0 actually
> > > > > points to a page which is "now" owned by slab.
> > > > 
> > > > Having not been able to reproduce the failure you described, I can't give
> > > > you a good answer to this.

Looking at the code (again), if we context switch in do_exit after exit_mm,
then the thread behaves an awful lot like a kernel thread: current->mm is
NULL and we're in lazy TLB mode. Furthermore, that context switch will drop
the last reference to the old mm and the pgd will finally be freed.

So I think my patches will solve your case too because we'll call
enter_lazy_tlb again when getting scheduled back in. If you have any way
to test them, that would be great.

> > > While these fixes make sense for a stable backport, I could go back to
> > > per-cpu saved_ttbr0 as in the first version of this patchset:
> > > 
> > > http://lkml.kernel.org/r/1471015666-23125-4-git-send-email-catalin.marinas at arm.com
> > > 
> > > (changed in v2 for some marginally shorter asm code)
> > 
> > To be honest, if we're going to consider changes as fundamental as that, I'd
> > much prefer for us to use the active_mm->pgd directly, setting the zero page
> > if it's either NULL or init_mm. This would need some hacking for EFI...
> 
> The problem is the __pa(active_mm->pgd) and doing it in assembly may be
> pretty unreadable.

I don't think it would be *that* bad if you can get hold of memstart_addr.

> Yet another option is to move save_ttbr0 in mm_context_t.

Perhaps, but I don't dislike the current code as much as I did now that,
we understand it better ;)

Will

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/2] Fixes for SW PAN
  2017-12-06 18:26           ` Will Deacon
@ 2017-12-07  8:55             ` Vinayak Menon
  2017-12-12  3:30               ` Vinayak Menon
  0 siblings, 1 reply; 17+ messages in thread
From: Vinayak Menon @ 2017-12-07  8:55 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/6/2017 11:56 PM, Will Deacon wrote:
> On Wed, Dec 06, 2017 at 06:18:01PM +0000, Catalin Marinas wrote:
>> On Wed, Dec 06, 2017 at 06:07:07PM +0000, Will Deacon wrote:
>>> On Wed, Dec 06, 2017 at 06:01:35PM +0000, Catalin Marinas wrote:
>>>> On Wed, Dec 06, 2017 at 05:56:42PM +0000, Will Deacon wrote:
>>>>> On Wed, Dec 06, 2017 at 11:01:46PM +0530, Vinayak Menon wrote:
>>>>>> On 12/6/2017 4:46 PM, Will Deacon wrote:
>>>>>>> After lots of collective head scratching in response to Vinayak's mail
>>>>>>> here:
>>>>>>>
>>>>>>>   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545641.html
>>>>>>>
>>>>>>> It turns out that we have a problem with SW PAN and kernel threads, where
>>>>>>> the saved ttbr0 value for a kernel thread can be stale and subsequently
>>>>>>> inherited by other kernel threads over a fork.
>>>>>>>
>>>>>>> These two patches attempt to fix that. We've not be able to reproduce
>>>>>>> the exact failure reported above, but I added some assertions to the
>>>>>>> uaccess routines to check for discrepancies between the active_mm pgd
>>>>>>> and the saved ttbr0 value (ignoring the zero page) and these no longer
>>>>>>> fire with these changes, but do fire without them if EFI runtime services
>>>>>>> are enabled on my Seattle board.
>>>>>> Thanks Will. So these 2 patches fix the case of kthreads having a stale saved ttbr0. The callstack I had shared
>>>>>> in the original issue description was not of a kthread (its user task with PF_KTHREAD not set. The tsk->mm was
>>>>>> set to NULL by exit_mm I think). So do you think this could be a different problem ?
>>>>>> I had a look at the dumps again and what I see is that, the PA part of the saved ttbr0
>>>>>> (from thread_info) is not the same as the pa(tsk->active_mm->pgd). The PA derived from saved ttbr0 actually
>>>>>> points to a page which is "now" owned by slab.
>>>>> Having not been able to reproduce the failure you described, I can't give
>>>>> you a good answer to this.
> Looking at the code (again), if we context switch in do_exit after exit_mm,
> then the thread behaves an awful lot like a kernel thread: current->mm is
> NULL and we're in lazy TLB mode.
Yes, that could be the case.
I am going to try out these 2 patches and see if the issue gets resolved. It usually takes more
than a day to reproduce the problem. Will update you as soon as I get the results.

> Furthermore, that context switch will drop
> the last reference to the old mm and the pgd will finally be freed.
>
> So I think my patches will solve your case too because we'll call
> enter_lazy_tlb again when getting scheduled back in. If you have any way
> to test them, that would be great.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 0/2] Fixes for SW PAN
  2017-12-07  8:55             ` Vinayak Menon
@ 2017-12-12  3:30               ` Vinayak Menon
  0 siblings, 0 replies; 17+ messages in thread
From: Vinayak Menon @ 2017-12-12  3:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/7/2017 2:25 PM, Vinayak Menon wrote:
> On 12/6/2017 11:56 PM, Will Deacon wrote:
>> On Wed, Dec 06, 2017 at 06:18:01PM +0000, Catalin Marinas wrote:
>>> On Wed, Dec 06, 2017 at 06:07:07PM +0000, Will Deacon wrote:
>>>> On Wed, Dec 06, 2017 at 06:01:35PM +0000, Catalin Marinas wrote:
>>>>> On Wed, Dec 06, 2017 at 05:56:42PM +0000, Will Deacon wrote:
>>>>>> On Wed, Dec 06, 2017 at 11:01:46PM +0530, Vinayak Menon wrote:
>>>>>>> On 12/6/2017 4:46 PM, Will Deacon wrote:
>>>>>>>> After lots of collective head scratching in response to Vinayak's mail
>>>>>>>> here:
>>>>>>>>
>>>>>>>>   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-December/545641.html
>>>>>>>>
>>>>>>>> It turns out that we have a problem with SW PAN and kernel threads, where
>>>>>>>> the saved ttbr0 value for a kernel thread can be stale and subsequently
>>>>>>>> inherited by other kernel threads over a fork.
>>>>>>>>
>>>>>>>> These two patches attempt to fix that. We've not be able to reproduce
>>>>>>>> the exact failure reported above, but I added some assertions to the
>>>>>>>> uaccess routines to check for discrepancies between the active_mm pgd
>>>>>>>> and the saved ttbr0 value (ignoring the zero page) and these no longer
>>>>>>>> fire with these changes, but do fire without them if EFI runtime services
>>>>>>>> are enabled on my Seattle board.
>>>>>>> Thanks Will. So these 2 patches fix the case of kthreads having a stale saved ttbr0. The callstack I had shared
>>>>>>> in the original issue description was not of a kthread (its user task with PF_KTHREAD not set. The tsk->mm was
>>>>>>> set to NULL by exit_mm I think). So do you think this could be a different problem ?
>>>>>>> I had a look at the dumps again and what I see is that, the PA part of the saved ttbr0
>>>>>>> (from thread_info) is not the same as the pa(tsk->active_mm->pgd). The PA derived from saved ttbr0 actually
>>>>>>> points to a page which is "now" owned by slab.
>>>>>> Having not been able to reproduce the failure you described, I can't give
>>>>>> you a good answer to this.
>> Looking at the code (again), if we context switch in do_exit after exit_mm,
>> then the thread behaves an awful lot like a kernel thread: current->mm is
>> NULL and we're in lazy TLB mode.
> Yes, that could be the case.
> I am going to try out these 2 patches and see if the issue gets resolved. It usually takes more
> than a day to reproduce the problem. Will update you as soon as I get the results.

The patches seem to fix the issue. The original issue with non-kthreads is not seen even after 3 days of testing. Thanks Will.

>> Furthermore, that context switch will drop
>> the last reference to the old mm and the pgd will finally be freed.
>>
>> So I think my patches will solve your case too because we'll call
>> enter_lazy_tlb again when getting scheduled back in. If you have any way
>> to test them, that would be great.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-12-12  3:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-06 11:16 [PATCH 0/2] Fixes for SW PAN Will Deacon
2017-12-06 11:16 ` [PATCH 1/2] arm64: SW PAN: Point saved ttbr0 at the zero page when switching to init_mm Will Deacon
2017-12-06 12:09   ` Mark Rutland
2017-12-06 12:15   ` Catalin Marinas
2017-12-06 11:16 ` [PATCH 2/2] arm64: SW PAN: Update saved ttbr0 value on enter_lazy_tlb Will Deacon
2017-12-06 12:10   ` Mark Rutland
2017-12-06 12:17   ` Catalin Marinas
2017-12-06 12:19 ` [PATCH 0/2] Fixes for SW PAN Mark Rutland
2017-12-06 13:37   ` Will Deacon
2017-12-06 17:31 ` Vinayak Menon
2017-12-06 17:56   ` Will Deacon
2017-12-06 18:01     ` Catalin Marinas
2017-12-06 18:07       ` Will Deacon
2017-12-06 18:18         ` Catalin Marinas
2017-12-06 18:26           ` Will Deacon
2017-12-07  8:55             ` Vinayak Menon
2017-12-12  3:30               ` Vinayak Menon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.