All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org, Mark Rutland <mark.rutland@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Marc Zyngier <maz@kernel.org>, Will Deacon <will@kernel.org>
Subject: [PATCH 0/6] Fix TLB invalidation on arm64
Date: Tue, 27 Aug 2019 14:18:12 +0100	[thread overview]
Message-ID: <20190827131818.14724-1-will@kernel.org> (raw)

Hi all,

[+linux-arch since the end of this may be applicable to other architectures]

Commit 24fe1b0efad4fcdd ("arm64: Remove unnecessary ISBs from
set_{pte,pmd,pud") removed ISB instructions immediately following updates to
the page table, on the grounds that they are not required by the
architecture and a DSB alone is sufficient to ensure that subsequent data
accesses use the new translation:

  DDI0487E_a, B2-128:

  | ... no instruction that appears in program order after the DSB instruction
  | can alter any state of the system or perform any part of its functionality
  | until the DSB completes other than:
  |
  | * Being fetched from memory and decoded
  | * Reading the general-purpose, SIMD and floating-point, Special-purpose, or
  |   System registers that are directly or indirectly read without causing
  |   side-effects.

However, the same document also states the following:

  DDI0487E_a, B2-125:

  | DMB and DSB instructions affect reads and writes to the memory system
  | generated by Load/Store instructions and data or unified cache maintenance
  | instructions being executed by the PE. Instruction fetches or accesses
  | caused by a hardware translation table access are not explicit accesses.

which appears to claim that the DSB alone is insufficient. Unfortunately,
some CPU designers have followed the second clause above, whereas in Linux
we've been relying on the first. This means that our mapping sequence:

	MOV	X0, <valid pte>
	STR	X0, [Xptep]	// Store new PTE to page table
	DSB	ISHST
	LDR	X1, [X2]	// Translates using the new PTE

can actually raise a translation fault on the load instruction because the
translation can be performed speculatively before the page table update and
then marked as "faulting" by the CPU. For user PTEs, this is ok because we
can handle the spurious fault, but for kernel PTEs and intermediate table
entries this results in a panic().

We can fix this by reverting 24fe1b0efad4fcdd, but the fun doesn't stop
there. If we consider the unmap case, then a similar constraint applies to
ordering subsequent memory accesses after the completion of the TLB
invalidation, so we also need to add an ISB instruction to
__flush_tlb_kernel_pgtable(). For user addresses, the exception return
provides the necessary context synchronisation.

This then raises an interesting question: if an ISB is required after a TLBI
instruction to prevent speculative translation of subsequent instructions,
how is this speculation prevented on concurrent CPUs that receive the
broadcast TLB invalidation message? Sending and completing a broadcast TLB
invalidation message does not imply execution of an ISB on the remote CPU,
however it /does/ require that the remote CPU will no longer make use of any
old translations because otherwise we wouldn't be able to guarantee that an
unmapped page could no longer be modified. In this regard, receiving a TLB
invalidation is in some ways stronger than sending one (where you need the
ISB).

So far, so good, but the final piece of the puzzle isn't quite so rosy.

*** Other architecture maintainers -- start here! ***

In the case that one CPU maps a page and then sets a flag to tell another
CPU:

	CPU 0
	-----

	MOV	X0, <valid pte>
	STR	X0, [Xptep]	// Store new PTE to page table
	DSB	ISHST
	ISB
	MOV	X1, #1
	STR	X1, [Xflag]	// Set the flag

	CPU 1
	-----

loop:	LDAR	X0, [Xflag]	// Poll flag with Acquire semantics
	CBZ	X0, loop
	LDR	X1, [X2]	// Translates using the new PTE

then the final load on CPU 1 can raise a translation fault for the same
reasons as mentioned at the start of this description. In reality, code
such as:

	CPU 0				CPU 1
	-----				-----
	spin_lock(&lock);		spin_lock(&lock);
	*ptr = vmalloc(size);		if (*ptr)
	spin_unlock(&lock);			foo = **ptr;
					spin_unlock(&lock);

will not trigger the fault because there is an address dependency on
CPU1 which prevents the speculative translation. However, more exotic
code where the virtual address is known ahead of time, such as:

	CPU 0				CPU 1
	-----				-----
	spin_lock(&lock);		spin_lock(&lock);
	set_fixmap(0, paddr, prot);	if (mapped)
	mapped = true;				foo = *fix_to_virt(0);
	spin_unlock(&lock);		spin_unlock(&lock);

could fault. This can be avoided by any of:

	* Introducing broadcast TLB maintenance on the map path
	* Adding a DSB;ISB sequence after checking a flag which indicates
	  that a virtual address is now mapped
	* Handling the spurious fault

Given that we have never observed a problem in the concurrent case under
Linux and future revisions of the architecture are being tightened so that
translation table walks are effectively ordered in the same way as explicit
memory accesses, we no longer treat spurious kernel faults as fatal if the
page table indicates that the access was valid.

Anyway, this patch series attempts to implement some of this and I plan
to queue it for 5.4.

Will

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>

--->8

Will Deacon (6):
  Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"
  arm64: tlb: Ensure we execute an ISB following walk cache invalidation
  arm64: mm: Add ISB instruction to set_pgd()
  arm64: sysreg: Add some field definitions for PAR_EL1
  arm64: mm: Ignore spurious translation faults taken from the kernel
  arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F

 arch/arm64/include/asm/pgtable.h  | 13 ++++++++++---
 arch/arm64/include/asm/sysreg.h   |  3 +++
 arch/arm64/include/asm/tlbflush.h |  1 +
 arch/arm64/kvm/hyp/switch.c       |  2 +-
 arch/arm64/mm/fault.c             | 33 +++++++++++++++++++++++++++++++++
 5 files changed, 48 insertions(+), 4 deletions(-)

-- 
2.11.0

WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will@kernel.org>
To: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org, Will Deacon <will@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Mark Rutland <mark.rutland@arm.com>,
	Marc Zyngier <maz@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: [PATCH 0/6] Fix TLB invalidation on arm64
Date: Tue, 27 Aug 2019 14:18:12 +0100	[thread overview]
Message-ID: <20190827131818.14724-1-will@kernel.org> (raw)
Message-ID: <20190827131812.LLF0JHp8I6IGjYJ3PsFXgb1o1l3wDu-Ra9vhw094sUg@z> (raw)

Hi all,

[+linux-arch since the end of this may be applicable to other architectures]

Commit 24fe1b0efad4fcdd ("arm64: Remove unnecessary ISBs from
set_{pte,pmd,pud") removed ISB instructions immediately following updates to
the page table, on the grounds that they are not required by the
architecture and a DSB alone is sufficient to ensure that subsequent data
accesses use the new translation:

  DDI0487E_a, B2-128:

  | ... no instruction that appears in program order after the DSB instruction
  | can alter any state of the system or perform any part of its functionality
  | until the DSB completes other than:
  |
  | * Being fetched from memory and decoded
  | * Reading the general-purpose, SIMD and floating-point, Special-purpose, or
  |   System registers that are directly or indirectly read without causing
  |   side-effects.

However, the same document also states the following:

  DDI0487E_a, B2-125:

  | DMB and DSB instructions affect reads and writes to the memory system
  | generated by Load/Store instructions and data or unified cache maintenance
  | instructions being executed by the PE. Instruction fetches or accesses
  | caused by a hardware translation table access are not explicit accesses.

which appears to claim that the DSB alone is insufficient. Unfortunately,
some CPU designers have followed the second clause above, whereas in Linux
we've been relying on the first. This means that our mapping sequence:

	MOV	X0, <valid pte>
	STR	X0, [Xptep]	// Store new PTE to page table
	DSB	ISHST
	LDR	X1, [X2]	// Translates using the new PTE

can actually raise a translation fault on the load instruction because the
translation can be performed speculatively before the page table update and
then marked as "faulting" by the CPU. For user PTEs, this is ok because we
can handle the spurious fault, but for kernel PTEs and intermediate table
entries this results in a panic().

We can fix this by reverting 24fe1b0efad4fcdd, but the fun doesn't stop
there. If we consider the unmap case, then a similar constraint applies to
ordering subsequent memory accesses after the completion of the TLB
invalidation, so we also need to add an ISB instruction to
__flush_tlb_kernel_pgtable(). For user addresses, the exception return
provides the necessary context synchronisation.

This then raises an interesting question: if an ISB is required after a TLBI
instruction to prevent speculative translation of subsequent instructions,
how is this speculation prevented on concurrent CPUs that receive the
broadcast TLB invalidation message? Sending and completing a broadcast TLB
invalidation message does not imply execution of an ISB on the remote CPU,
however it /does/ require that the remote CPU will no longer make use of any
old translations because otherwise we wouldn't be able to guarantee that an
unmapped page could no longer be modified. In this regard, receiving a TLB
invalidation is in some ways stronger than sending one (where you need the
ISB).

So far, so good, but the final piece of the puzzle isn't quite so rosy.

*** Other architecture maintainers -- start here! ***

In the case that one CPU maps a page and then sets a flag to tell another
CPU:

	CPU 0
	-----

	MOV	X0, <valid pte>
	STR	X0, [Xptep]	// Store new PTE to page table
	DSB	ISHST
	ISB
	MOV	X1, #1
	STR	X1, [Xflag]	// Set the flag

	CPU 1
	-----

loop:	LDAR	X0, [Xflag]	// Poll flag with Acquire semantics
	CBZ	X0, loop
	LDR	X1, [X2]	// Translates using the new PTE

then the final load on CPU 1 can raise a translation fault for the same
reasons as mentioned at the start of this description. In reality, code
such as:

	CPU 0				CPU 1
	-----				-----
	spin_lock(&lock);		spin_lock(&lock);
	*ptr = vmalloc(size);		if (*ptr)
	spin_unlock(&lock);			foo = **ptr;
					spin_unlock(&lock);

will not trigger the fault because there is an address dependency on
CPU1 which prevents the speculative translation. However, more exotic
code where the virtual address is known ahead of time, such as:

	CPU 0				CPU 1
	-----				-----
	spin_lock(&lock);		spin_lock(&lock);
	set_fixmap(0, paddr, prot);	if (mapped)
	mapped = true;				foo = *fix_to_virt(0);
	spin_unlock(&lock);		spin_unlock(&lock);

could fault. This can be avoided by any of:

	* Introducing broadcast TLB maintenance on the map path
	* Adding a DSB;ISB sequence after checking a flag which indicates
	  that a virtual address is now mapped
	* Handling the spurious fault

Given that we have never observed a problem in the concurrent case under
Linux and future revisions of the architecture are being tightened so that
translation table walks are effectively ordered in the same way as explicit
memory accesses, we no longer treat spurious kernel faults as fatal if the
page table indicates that the access was valid.

Anyway, this patch series attempts to implement some of this and I plan
to queue it for 5.4.

Will

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>

--->8

Will Deacon (6):
  Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"
  arm64: tlb: Ensure we execute an ISB following walk cache invalidation
  arm64: mm: Add ISB instruction to set_pgd()
  arm64: sysreg: Add some field definitions for PAR_EL1
  arm64: mm: Ignore spurious translation faults taken from the kernel
  arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F

 arch/arm64/include/asm/pgtable.h  | 13 ++++++++++---
 arch/arm64/include/asm/sysreg.h   |  3 +++
 arch/arm64/include/asm/tlbflush.h |  1 +
 arch/arm64/kvm/hyp/switch.c       |  2 +-
 arch/arm64/mm/fault.c             | 33 +++++++++++++++++++++++++++++++++
 5 files changed, 48 insertions(+), 4 deletions(-)

-- 
2.11.0

WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will@kernel.org>
To: linux-arm-kernel@lists.infradead.org
Cc: linux-arch@vger.kernel.org, Mark Rutland <mark.rutland@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Marc Zyngier <maz@kernel.org>, Will Deacon <will@kernel.org>
Subject: [PATCH 0/6] Fix TLB invalidation on arm64
Date: Tue, 27 Aug 2019 14:18:12 +0100	[thread overview]
Message-ID: <20190827131818.14724-1-will@kernel.org> (raw)

Hi all,

[+linux-arch since the end of this may be applicable to other architectures]

Commit 24fe1b0efad4fcdd ("arm64: Remove unnecessary ISBs from
set_{pte,pmd,pud") removed ISB instructions immediately following updates to
the page table, on the grounds that they are not required by the
architecture and a DSB alone is sufficient to ensure that subsequent data
accesses use the new translation:

  DDI0487E_a, B2-128:

  | ... no instruction that appears in program order after the DSB instruction
  | can alter any state of the system or perform any part of its functionality
  | until the DSB completes other than:
  |
  | * Being fetched from memory and decoded
  | * Reading the general-purpose, SIMD and floating-point, Special-purpose, or
  |   System registers that are directly or indirectly read without causing
  |   side-effects.

However, the same document also states the following:

  DDI0487E_a, B2-125:

  | DMB and DSB instructions affect reads and writes to the memory system
  | generated by Load/Store instructions and data or unified cache maintenance
  | instructions being executed by the PE. Instruction fetches or accesses
  | caused by a hardware translation table access are not explicit accesses.

which appears to claim that the DSB alone is insufficient. Unfortunately,
some CPU designers have followed the second clause above, whereas in Linux
we've been relying on the first. This means that our mapping sequence:

	MOV	X0, <valid pte>
	STR	X0, [Xptep]	// Store new PTE to page table
	DSB	ISHST
	LDR	X1, [X2]	// Translates using the new PTE

can actually raise a translation fault on the load instruction because the
translation can be performed speculatively before the page table update and
then marked as "faulting" by the CPU. For user PTEs, this is ok because we
can handle the spurious fault, but for kernel PTEs and intermediate table
entries this results in a panic().

We can fix this by reverting 24fe1b0efad4fcdd, but the fun doesn't stop
there. If we consider the unmap case, then a similar constraint applies to
ordering subsequent memory accesses after the completion of the TLB
invalidation, so we also need to add an ISB instruction to
__flush_tlb_kernel_pgtable(). For user addresses, the exception return
provides the necessary context synchronisation.

This then raises an interesting question: if an ISB is required after a TLBI
instruction to prevent speculative translation of subsequent instructions,
how is this speculation prevented on concurrent CPUs that receive the
broadcast TLB invalidation message? Sending and completing a broadcast TLB
invalidation message does not imply execution of an ISB on the remote CPU,
however it /does/ require that the remote CPU will no longer make use of any
old translations because otherwise we wouldn't be able to guarantee that an
unmapped page could no longer be modified. In this regard, receiving a TLB
invalidation is in some ways stronger than sending one (where you need the
ISB).

So far, so good, but the final piece of the puzzle isn't quite so rosy.

*** Other architecture maintainers -- start here! ***

In the case that one CPU maps a page and then sets a flag to tell another
CPU:

	CPU 0
	-----

	MOV	X0, <valid pte>
	STR	X0, [Xptep]	// Store new PTE to page table
	DSB	ISHST
	ISB
	MOV	X1, #1
	STR	X1, [Xflag]	// Set the flag

	CPU 1
	-----

loop:	LDAR	X0, [Xflag]	// Poll flag with Acquire semantics
	CBZ	X0, loop
	LDR	X1, [X2]	// Translates using the new PTE

then the final load on CPU 1 can raise a translation fault for the same
reasons as mentioned at the start of this description. In reality, code
such as:

	CPU 0				CPU 1
	-----				-----
	spin_lock(&lock);		spin_lock(&lock);
	*ptr = vmalloc(size);		if (*ptr)
	spin_unlock(&lock);			foo = **ptr;
					spin_unlock(&lock);

will not trigger the fault because there is an address dependency on
CPU1 which prevents the speculative translation. However, more exotic
code where the virtual address is known ahead of time, such as:

	CPU 0				CPU 1
	-----				-----
	spin_lock(&lock);		spin_lock(&lock);
	set_fixmap(0, paddr, prot);	if (mapped)
	mapped = true;				foo = *fix_to_virt(0);
	spin_unlock(&lock);		spin_unlock(&lock);

could fault. This can be avoided by any of:

	* Introducing broadcast TLB maintenance on the map path
	* Adding a DSB;ISB sequence after checking a flag which indicates
	  that a virtual address is now mapped
	* Handling the spurious fault

Given that we have never observed a problem in the concurrent case under
Linux and future revisions of the architecture are being tightened so that
translation table walks are effectively ordered in the same way as explicit
memory accesses, we no longer treat spurious kernel faults as fatal if the
page table indicates that the access was valid.

Anyway, this patch series attempts to implement some of this and I plan
to queue it for 5.4.

Will

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>

--->8

Will Deacon (6):
  Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}"
  arm64: tlb: Ensure we execute an ISB following walk cache invalidation
  arm64: mm: Add ISB instruction to set_pgd()
  arm64: sysreg: Add some field definitions for PAR_EL1
  arm64: mm: Ignore spurious translation faults taken from the kernel
  arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F

 arch/arm64/include/asm/pgtable.h  | 13 ++++++++++---
 arch/arm64/include/asm/sysreg.h   |  3 +++
 arch/arm64/include/asm/tlbflush.h |  1 +
 arch/arm64/kvm/hyp/switch.c       |  2 +-
 arch/arm64/mm/fault.c             | 33 +++++++++++++++++++++++++++++++++
 5 files changed, 48 insertions(+), 4 deletions(-)

-- 
2.11.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

             reply	other threads:[~2019-08-27 13:18 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-27 13:18 Will Deacon [this message]
2019-08-27 13:18 ` [PATCH 0/6] Fix TLB invalidation on arm64 Will Deacon
2019-08-27 13:18 ` Will Deacon
2019-08-27 13:18 ` [PATCH 1/6] Revert "arm64: Remove unnecessary ISBs from set_{pte,pmd,pud}" Will Deacon
2019-08-27 13:18   ` [PATCH 1/6] Revert "arm64: Remove unnecessary ISBs from set_{pte, pmd, pud}" Will Deacon
2019-08-28  2:29   ` Sasha Levin
2019-08-28  2:29     ` Sasha Levin
2019-08-27 13:18 ` [PATCH 2/6] arm64: tlb: Ensure we execute an ISB following walk cache invalidation Will Deacon
2019-08-27 13:18   ` Will Deacon
2019-08-27 13:18 ` [PATCH 3/6] arm64: mm: Add ISB instruction to set_pgd() Will Deacon
2019-08-27 13:18   ` Will Deacon
2019-08-27 13:18   ` Will Deacon
2019-08-27 13:18 ` [PATCH 4/6] arm64: sysreg: Add some field definitions for PAR_EL1 Will Deacon
2019-08-27 13:18   ` Will Deacon
2019-08-27 13:18   ` Will Deacon
2019-08-27 13:18 ` [PATCH 5/6] arm64: mm: Ignore spurious translation faults taken from the kernel Will Deacon
2019-08-27 13:18   ` Will Deacon
2019-08-27 13:18   ` Will Deacon
2019-08-27 13:18 ` [PATCH 6/6] arm64: kvm: Replace hardcoded '1' with SYS_PAR_EL1_F Will Deacon
2019-08-27 13:18   ` Will Deacon
2019-08-27 13:18   ` Will Deacon
2019-08-27 16:19 ` [PATCH 0/6] Fix TLB invalidation on arm64 Mark Rutland
2019-08-27 16:19   ` Mark Rutland
2019-08-27 16:19   ` Mark Rutland
2019-08-28  0:35 ` Nicholas Piggin
2019-08-28  0:35   ` Nicholas Piggin
2019-08-28  0:35   ` Nicholas Piggin
2019-08-28 16:12   ` Will Deacon
2019-08-28 16:12     ` Will Deacon
2019-08-28 16:12     ` Will Deacon
2019-08-29 14:08     ` Nicholas Piggin
2019-08-29 14:08       ` Nicholas Piggin
2019-08-29 14:08       ` Nicholas Piggin
2019-08-30 12:40       ` Will Deacon
2019-08-30 12:40         ` Will Deacon
2019-08-30 12:40         ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190827131818.14724-1-will@kernel.org \
    --to=will@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=mark.rutland@arm.com \
    --cc=maz@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.