All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/20] arm64: mm: rework page table creation
@ 2015-12-09 12:44 Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 01/20] arm64: mm: remove pointless PAGE_MASKing Mark Rutland
                   ` (19 more replies)
  0 siblings, 20 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

This is a first attempt at reworking the arm64 page table creation, which is
necessary to:

(a) Avoid issues with potentially-conflicting TTBR1 TLB entries (as raised in
    Jeremy's thread [1]). This can happen when splitting/merging sections or
    contiguous ranges, and per a pessimistic reading of the ARM ARM may happen
    for changes to other fields in translation table entries.
    
(b) Allow for more complex page table creation early on, with tables created
    with fine-grained permissions as early as possible. In the cases where we
    currently use fine-grained permissions (e.g. DEBUG_RODATA and marking .init
    as non-executable), this is required for the same reasons as (a), as we
    must ensure that changes to page tables do not split/merge sections or
    contiguous regions for memory in active use.

(c) Avoid (rare/theoretical) edge cases where we need to allocate memory before
    a sufficient proportion of the early linear map is in place.

This series:

* Introduces the necessary infrastructure to safely swap TTBR1_EL1 (i.e.
  without risking conflicting TLB entries being allocated).

* Adds helpers to walk page tables by physical address, independent of the
  linear mapping, and modifies __create_mapping and friends to relying on a new
  set of FIX_{PGD,PUD,PMD,PTE} to map tables as required for modification.

* Removes the early memblock limit, now that create_mapping does not rely on the
  early linear map. This solves (c), and allows for (b).

* Generates an entirely new set of kernel page tables with fine-grained (i.e.
  page-level) permission boundaries, which can then be safely installed. These
  are created with sufficient granularity such that later changes (currently
  only fixup_init) will not split/merge sections or contiguous regions, and can
  follow a break-before-make approach without affecting the rest of the page
  tables.

There is still work to do:

* Implement the necessary page table copying and/or creation for KASAN.

* BUG() when splitting sections or creating overlapping entries in
  create_mapping, as these both indicate serious bugs in kernel page table
  creation.
  
  This will require rework to the EFI runtime services pagetable creation, as
  for >4K page kernels EFI memory descriptors may share pages (and currently
  such overlap is assumed to be benign).

* Solve ROX mapping the kernel text and rodata, as updating execute
  permissions may risk TLB conflicts.

  Ideally we'd map these separately as ROX and RO immediately, but the
  alternatives patching code relies on being able to use the kernel mapping to
  update the text. We cannot rely on any text which itself may be patched, and
  updates may straddle page boundaries, so this is non-trivial.

* Clean up usage of swapper_pg_dir so we can switch to the new tables without
  having to reuse the existing pgd. This will allow us to free the original
  pgd.

Any and all feedback is welcome.

The series is based on v4.4-rc4, and a can be found in my git repo [2] on
kernel.org. This version is tagged as arm64-pagetable-rework-20151209, while
the latest version should be in the unstable branch arm64/pagetable-rework.

Thanks,
Mark.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-November/386178.html
[2] git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git

Mark Rutland (20):
  arm64: mm: remove pointless PAGE_MASKing
  arm64: Remove redundant padding from linker script
  arm64: mm: fold alternatives into .init
  arm64: mm: assume PAGE SIZE for page table allocator
  asm-generic: make __set_fixmap_offset a static inline
  arm64: mm: place empty_zero_page in bss
  arm64: unify idmap removal
  arm64: unmap idmap earlier
  arm64: add function to install the idmap
  arm64: mm: add code to safely replace TTBR1_EL1
  arm64: mm: move pte_* macros
  arm64: mm: add functions to walk page tables by PA
  arm64: mm: avoid redundant __pa(__va(x))
  arm64: mm: add __{pud,pgd}_populate
  arm64: mm: add functions to walk tables in fixmap
  arm64: mm: use fixmap when creating page tables
  arm64: mm: allocate pagetables anywhere
  arm64: mm: allow passing a pgdir to alloc_init_*
  arm64: ensure _stext and _etext are page-aligned
  arm64: mm: create new fine-grained mappings at boot

 arch/arm64/include/asm/alternative.h |   1 -
 arch/arm64/include/asm/fixmap.h      |  10 ++
 arch/arm64/include/asm/mmu_context.h |  63 +++++++-
 arch/arm64/include/asm/pgalloc.h     |  26 ++-
 arch/arm64/include/asm/pgtable.h     |  87 +++++++----
 arch/arm64/kernel/alternative.c      |   6 -
 arch/arm64/kernel/setup.c            |   7 +
 arch/arm64/kernel/smp.c              |   4 +-
 arch/arm64/kernel/suspend.c          |  20 +--
 arch/arm64/kernel/vmlinux.lds.S      |  12 +-
 arch/arm64/mm/init.c                 |   1 -
 arch/arm64/mm/mmu.c                  | 295 +++++++++++++++++------------------
 arch/arm64/mm/proc.S                 |  27 ++++
 include/asm-generic/fixmap.h         |  14 +-
 14 files changed, 344 insertions(+), 229 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 01/20] arm64: mm: remove pointless PAGE_MASKing
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 02/20] arm64: Remove redundant padding from linker script Mark Rutland
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

As pgd_offset{,_k} shift the input address by PGDIR_SHIFT, the sub-page
bits will always be shifted out. There is no need to apply PAGE_MASK
before this.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/mm/mmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 873e363..7dfb4a9 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -280,7 +280,7 @@ static void __init create_mapping(phys_addr_t phys, unsigned long virt,
 			&phys, virt);
 		return;
 	}
-	__create_mapping(&init_mm, pgd_offset_k(virt & PAGE_MASK), phys, virt,
+	__create_mapping(&init_mm, pgd_offset_k(virt), phys, virt,
 			 size, prot, early_alloc);
 }
 
@@ -301,7 +301,7 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
 		return;
 	}
 
-	return __create_mapping(&init_mm, pgd_offset_k(virt & PAGE_MASK),
+	return __create_mapping(&init_mm, pgd_offset_k(virt),
 				phys, virt, size, prot, late_alloc);
 }
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 02/20] arm64: Remove redundant padding from linker script
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 01/20] arm64: mm: remove pointless PAGE_MASKing Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 03/20] arm64: mm: fold alternatives into .init Mark Rutland
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

Currently we place an ALIGN_DEBUG_RO between text and data for the .text
and .init sections, and depending on configuration each of these may
result in up to SECTION_SIZE bytes worth of padding (for
DEBUG_RODATA_ALIGN).

We make no distinction between the text and data in each of these
sections at any point when creating the initial page tables in head.S.
We also make no distinction when modifying the tables; __map_memblock,
fixup_executable, mark_rodata_ro, and fixup_init only work at section
granularity. Thus this padding is unnecessary.

For the spit between init text and data we impose a minimum alignment of
16 bytes, but this is also unnecessary. The init data is output
immediately after the padding before any symbols are defined, so this is
not required to keep a symbol for linker a section array correctly
associated with the data. Any objects within the section will be given
at least their usual alignment regardless.

This patch removes the redundant padding.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/vmlinux.lds.S | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 1ee2c39..a64340d 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -112,7 +112,6 @@ SECTIONS
 		*(.got)			/* Global offset table		*/
 	}
 
-	ALIGN_DEBUG_RO
 	RO_DATA(PAGE_SIZE)
 	EXCEPTION_TABLE(8)
 	NOTES
@@ -127,7 +126,6 @@ SECTIONS
 		ARM_EXIT_KEEP(EXIT_TEXT)
 	}
 
-	ALIGN_DEBUG_RO_MIN(16)
 	.init.data : {
 		INIT_DATA
 		INIT_SETUP(16)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 03/20] arm64: mm: fold alternatives into .init
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 01/20] arm64: mm: remove pointless PAGE_MASKing Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 02/20] arm64: Remove redundant padding from linker script Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 04/20] arm64: mm: assume PAGE SIZE for page table allocator Mark Rutland
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

Currently we treat the alternatives separately from other data that's
only used during initialisation, using separate .altinstructions and
.altinstr_replacement linker sections. These are freed for general
allocation separately from .init*. This is problematic as:

* We do not remove execute permissions, as we do for .init, leaving the
  memory executable.

* We pad between them, making the kernel Image bianry up to PAGE_SIZE
  bytes larger than necessary.

This patch moves the two sections into the contiguous region used for
.init*. This saves some memory, ensures that we remove execute
permissions, and allows us to remove some code made redundant by this
reorganisation.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/alternative.h | 1 -
 arch/arm64/kernel/alternative.c      | 6 ------
 arch/arm64/kernel/vmlinux.lds.S      | 5 ++---
 arch/arm64/mm/init.c                 | 1 -
 4 files changed, 2 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index d56ec07..e4962f0 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -19,7 +19,6 @@ struct alt_instr {
 
 void __init apply_alternatives_all(void);
 void apply_alternatives(void *start, size_t length);
-void free_alternatives_memory(void);
 
 #define ALTINSTR_ENTRY(feature)						      \
 	" .word 661b - .\n"				/* label           */ \
diff --git a/arch/arm64/kernel/alternative.c b/arch/arm64/kernel/alternative.c
index ab9db0e..d2ee1b2 100644
--- a/arch/arm64/kernel/alternative.c
+++ b/arch/arm64/kernel/alternative.c
@@ -158,9 +158,3 @@ void apply_alternatives(void *start, size_t length)
 
 	__apply_alternatives(&region);
 }
-
-void free_alternatives_memory(void)
-{
-	free_reserved_area(__alt_instructions, __alt_instructions_end,
-			   0, "alternatives");
-}
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index a64340d..f943a84 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -140,9 +140,6 @@ SECTIONS
 
 	PERCPU_SECTION(64)
 
-	. = ALIGN(PAGE_SIZE);
-	__init_end = .;
-
 	. = ALIGN(4);
 	.altinstructions : {
 		__alt_instructions = .;
@@ -154,6 +151,8 @@ SECTIONS
 	}
 
 	. = ALIGN(PAGE_SIZE);
+	__init_end = .;
+
 	_data = .;
 	_sdata = .;
 	RW_DATA_SECTION(64, PAGE_SIZE, THREAD_SIZE)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 17bf39a..9b979e0 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -360,7 +360,6 @@ void free_initmem(void)
 {
 	fixup_init();
 	free_initmem_default(0);
-	free_alternatives_memory();
 }
 
 #ifdef CONFIG_BLK_DEV_INITRD
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 04/20] arm64: mm: assume PAGE SIZE for page table allocator
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (2 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 03/20] arm64: mm: fold alternatives into .init Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-10 14:08   ` Will Deacon
  2015-12-09 12:44 ` [RFC PATCH 05/20] asm-generic: make __set_fixmap_offset a static inline Mark Rutland
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

We pass a size parameter to early_alloc and late_alloc, but these are
only ever used to allocate single pages. In late_alloc we always
allocate a single page.

Remove the redundant size parameter.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/mm/mmu.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 7dfb4a9..304ff23 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -62,15 +62,15 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 }
 EXPORT_SYMBOL(phys_mem_access_prot);
 
-static void __init *early_alloc(unsigned long sz)
+static void __init *early_alloc(void)
 {
 	phys_addr_t phys;
 	void *ptr;
 
-	phys = memblock_alloc(sz, sz);
+	phys = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 	BUG_ON(!phys);
 	ptr = __va(phys);
-	memset(ptr, 0, sz);
+	memset(ptr, 0, PAGE_SIZE);
 	return ptr;
 }
 
@@ -95,12 +95,12 @@ static void split_pmd(pmd_t *pmd, pte_t *pte)
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 				  unsigned long end, unsigned long pfn,
 				  pgprot_t prot,
-				  void *(*alloc)(unsigned long size))
+				  void *(*alloc)(void))
 {
 	pte_t *pte;
 
 	if (pmd_none(*pmd) || pmd_sect(*pmd)) {
-		pte = alloc(PTRS_PER_PTE * sizeof(pte_t));
+		pte = alloc();
 		if (pmd_sect(*pmd))
 			split_pmd(pmd, pte);
 		__pmd_populate(pmd, __pa(pte), PMD_TYPE_TABLE);
@@ -130,7 +130,7 @@ static void split_pud(pud_t *old_pud, pmd_t *pmd)
 static void alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
 				  unsigned long addr, unsigned long end,
 				  phys_addr_t phys, pgprot_t prot,
-				  void *(*alloc)(unsigned long size))
+				  void *(*alloc)(void))
 {
 	pmd_t *pmd;
 	unsigned long next;
@@ -139,7 +139,7 @@ static void alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
 	 * Check for initial section mappings in the pgd/pud and remove them.
 	 */
 	if (pud_none(*pud) || pud_sect(*pud)) {
-		pmd = alloc(PTRS_PER_PMD * sizeof(pmd_t));
+		pmd = alloc();
 		if (pud_sect(*pud)) {
 			/*
 			 * need to have the 1G of mappings continue to be
@@ -195,13 +195,13 @@ static inline bool use_1G_block(unsigned long addr, unsigned long next,
 static void alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
 				  unsigned long addr, unsigned long end,
 				  phys_addr_t phys, pgprot_t prot,
-				  void *(*alloc)(unsigned long size))
+				  void *(*alloc)(void))
 {
 	pud_t *pud;
 	unsigned long next;
 
 	if (pgd_none(*pgd)) {
-		pud = alloc(PTRS_PER_PUD * sizeof(pud_t));
+		pud = alloc();
 		pgd_populate(mm, pgd, pud);
 	}
 	BUG_ON(pgd_bad(*pgd));
@@ -247,7 +247,7 @@ static void alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
 static void  __create_mapping(struct mm_struct *mm, pgd_t *pgd,
 				    phys_addr_t phys, unsigned long virt,
 				    phys_addr_t size, pgprot_t prot,
-				    void *(*alloc)(unsigned long size))
+				    void *(*alloc)(void))
 {
 	unsigned long addr, length, end, next;
 
@@ -262,12 +262,9 @@ static void  __create_mapping(struct mm_struct *mm, pgd_t *pgd,
 	} while (pgd++, addr = next, addr != end);
 }
 
-static void *late_alloc(unsigned long size)
+static void *late_alloc(void)
 {
-	void *ptr;
-
-	BUG_ON(size > PAGE_SIZE);
-	ptr = (void *)__get_free_page(PGALLOC_GFP);
+	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
 	BUG_ON(!ptr);
 	return ptr;
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 05/20] asm-generic: make __set_fixmap_offset a static inline
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (3 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 04/20] arm64: mm: assume PAGE SIZE for page table allocator Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 06/20] arm64: mm: place empty_zero_page in bss Mark Rutland
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

Currently __set_fixmap_offset is a macro function which has a local
variable called 'addr'. If a caller passes a 'phys' parameter which is
derived from a variable also called 'addr', the local variable will
shadow this, and the compiler will complain about the use of an
uninitialized variable.

It is likely that fixmap users may use the name 'addr' for variables
that may be directly passed to __set_fixmap_offset, or that may be
indirectly generated via other macros. Rather than placing the burden on
callers to avoid the name 'addr', this patch changes __set_fixmap_offset
into a static inline function, avoiding namespace collisions.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 include/asm-generic/fixmap.h | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/asm-generic/fixmap.h b/include/asm-generic/fixmap.h
index 1cbb833..f9c27b6 100644
--- a/include/asm-generic/fixmap.h
+++ b/include/asm-generic/fixmap.h
@@ -70,13 +70,13 @@ static inline unsigned long virt_to_fix(const unsigned long vaddr)
 #endif
 
 /* Return a pointer with offset calculated */
-#define __set_fixmap_offset(idx, phys, flags)		      \
-({							      \
-	unsigned long addr;				      \
-	__set_fixmap(idx, phys, flags);			      \
-	addr = fix_to_virt(idx) + ((phys) & (PAGE_SIZE - 1)); \
-	addr;						      \
-})
+static inline unsigned long __set_fixmap_offset(enum fixed_addresses idx,
+						phys_addr_t phys,
+						pgprot_t flags)
+{
+	__set_fixmap(idx, phys, flags);
+	return fix_to_virt(idx) + (phys & (PAGE_SIZE - 1));
+}
 
 #define set_fixmap_offset(idx, phys) \
 	__set_fixmap_offset(idx, phys, FIXMAP_PAGE_NORMAL)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 06/20] arm64: mm: place empty_zero_page in bss
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (4 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 05/20] asm-generic: make __set_fixmap_offset a static inline Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-10 14:11   ` Will Deacon
  2015-12-09 12:44 ` [RFC PATCH 07/20] arm64: unify idmap removal Mark Rutland
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

Currently the zero page is set up in paging_init, and thus we cannot use
the zero page earlier. We use the zero page as a reserved TTBR value
from which no TLB entries may be allocated (e.g. when uninstalling the
idmap). To enable such usage earlier (as may be required for invasive
changes to the kernel page tables), and to minimise the time that the
idmap is active, we need to be able to use the zero page before
paging_init.

This patch follows the example set by x86, by allocating the zero page
at compile time, in .bss. This means that the zero page itself is
available immediately upon entry to start_kernel (as we zero .bss before
this), and also means that the zero page takes up no space in the raw
Image binary. The associated struct page is allocated in bootmem_init,
and remains unavailable until this time.

Outside of arch code, the only users of empty_zero_page assume that the
empty_zero_page symbol refers to the zeroed memory itself, and that
ZERO_PAGE(x) must be used to acquire the associated struct page,
following the example of x86. This patch also brings arm64 inline with
these assumptions.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/mmu_context.h | 2 +-
 arch/arm64/include/asm/pgtable.h     | 4 ++--
 arch/arm64/mm/mmu.c                  | 9 +--------
 3 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 2416578..600eacb 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -48,7 +48,7 @@ static inline void contextidr_thread_switch(struct task_struct *next)
  */
 static inline void cpu_set_reserved_ttbr0(void)
 {
-	unsigned long ttbr = page_to_phys(empty_zero_page);
+	unsigned long ttbr = virt_to_phys(empty_zero_page);
 
 	asm(
 	"	msr	ttbr0_el1, %0			// set TTBR0\n"
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 7e074f9..00f5a4b8 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -121,8 +121,8 @@ extern void __pgd_error(const char *file, int line, unsigned long val);
  * ZERO_PAGE is a global shared page that is always zero: used
  * for zero-mapped memory areas etc..
  */
-extern struct page *empty_zero_page;
-#define ZERO_PAGE(vaddr)	(empty_zero_page)
+extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
+#define ZERO_PAGE(vaddr)	virt_to_page(empty_zero_page)
 
 #define pte_ERROR(pte)		__pte_error(__FILE__, __LINE__, pte_val(pte))
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 304ff23..7559c22 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -48,7 +48,7 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
  * Empty_zero_page is a special page that is used for zero-initialized data
  * and COW.
  */
-struct page *empty_zero_page;
+unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
 EXPORT_SYMBOL(empty_zero_page);
 
 pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
@@ -441,18 +441,11 @@ void fixup_init(void)
  */
 void __init paging_init(void)
 {
-	void *zero_page;
-
 	map_mem();
 	fixup_executable();
 
-	/* allocate the zero page. */
-	zero_page = early_alloc(PAGE_SIZE);
-
 	bootmem_init();
 
-	empty_zero_page = virt_to_page(zero_page);
-
 	/*
 	 * TTBR0 is only used for the identity mapping at this stage. Make it
 	 * point to zero page to avoid speculatively fetching new entries.
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 07/20] arm64: unify idmap removal
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (5 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 06/20] arm64: mm: place empty_zero_page in bss Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 08/20] arm64: unmap idmap earlier Mark Rutland
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

We currently open-code the removal of the idmap and restoration of the
current task's MMU state in a few places.

Before introducing yet more copies of this sequence, unify these to call
a new helper, cpu_uninstall_idmap.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/mmu_context.h | 25 +++++++++++++++++++++++++
 arch/arm64/kernel/setup.c            |  1 +
 arch/arm64/kernel/smp.c              |  4 +---
 arch/arm64/kernel/suspend.c          | 20 ++++----------------
 arch/arm64/mm/mmu.c                  |  4 +---
 5 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 600eacb..b1b2514 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -27,6 +27,7 @@
 #include <asm-generic/mm_hooks.h>
 #include <asm/cputype.h>
 #include <asm/pgtable.h>
+#include <asm/tlbflush.h>
 
 #ifdef CONFIG_PID_IN_CONTEXTIDR
 static inline void contextidr_thread_switch(struct task_struct *next)
@@ -90,6 +91,30 @@ static inline void cpu_set_default_tcr_t0sz(void)
 }
 
 /*
+ * Remove the idmap from TTBR0_EL1 and install the pgd of the active mm.
+ *
+ * The idmap lives in the same VA range as userspace, but uses global entries
+ * and may use a different TCR_EL1.T0SZ. To avoid issues resulting from
+ * speculative TLB fetches, we must temporarily install the reserved page
+ * tables while we invalidate the TLBs and set up the correct TCR_EL1.T0SZ.
+ *
+ * If current is a not a user task, the mm covers the TTBR1_EL1 page tables,
+ * which should not be installed in TTBR0_EL1. In this case we can leave the
+ * reserved page tables in place.
+ */
+static inline void cpu_uninstall_idmap(void)
+{
+	struct mm_struct *mm = current->active_mm;
+
+	cpu_set_reserved_ttbr0();
+	local_flush_tlb_all();
+	cpu_set_default_tcr_t0sz();
+
+	if (mm != &init_mm)
+		cpu_switch_mm(mm->pgd, mm);
+}
+
+/*
  * It would be nice to return ASIDs back to the allocator, but unfortunately
  * that introduces a race with a generation rollover where we could erroneously
  * free an ASID allocated in a future generation. We could workaround this by
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 8119479..f6621ba 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -62,6 +62,7 @@
 #include <asm/memblock.h>
 #include <asm/efi.h>
 #include <asm/xen/hypervisor.h>
+#include <asm/mmu_context.h>
 
 phys_addr_t __fdt_pointer __initdata;
 
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index b1adc51..68e7f79 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -149,9 +149,7 @@ asmlinkage void secondary_start_kernel(void)
 	 * TTBR0 is only used for the identity mapping at this stage. Make it
 	 * point to zero page to avoid speculatively fetching new entries.
 	 */
-	cpu_set_reserved_ttbr0();
-	local_flush_tlb_all();
-	cpu_set_default_tcr_t0sz();
+	cpu_uninstall_idmap();
 
 	preempt_disable();
 	trace_hardirqs_off();
diff --git a/arch/arm64/kernel/suspend.c b/arch/arm64/kernel/suspend.c
index 1095aa4..6605539 100644
--- a/arch/arm64/kernel/suspend.c
+++ b/arch/arm64/kernel/suspend.c
@@ -60,7 +60,6 @@ void __init cpu_suspend_set_dbg_restorer(void (*hw_bp_restore)(void *))
  */
 int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
 {
-	struct mm_struct *mm = current->active_mm;
 	int ret;
 	unsigned long flags;
 
@@ -87,22 +86,11 @@ int cpu_suspend(unsigned long arg, int (*fn)(unsigned long))
 	ret = __cpu_suspend_enter(arg, fn);
 	if (ret == 0) {
 		/*
-		 * We are resuming from reset with TTBR0_EL1 set to the
-		 * idmap to enable the MMU; set the TTBR0 to the reserved
-		 * page tables to prevent speculative TLB allocations, flush
-		 * the local tlb and set the default tcr_el1.t0sz so that
-		 * the TTBR0 address space set-up is properly restored.
-		 * If the current active_mm != &init_mm we entered cpu_suspend
-		 * with mappings in TTBR0 that must be restored, so we switch
-		 * them back to complete the address space configuration
-		 * restoration before returning.
+		 * We are resuming from reset with the idmap active in TTBR0_EL1.
+		 * We must uninstall the idmap and restore the expected MMU
+		 * state before we can possibly return to userspace.
 		 */
-		cpu_set_reserved_ttbr0();
-		local_flush_tlb_all();
-		cpu_set_default_tcr_t0sz();
-
-		if (mm != &init_mm)
-			cpu_switch_mm(mm->pgd, mm);
+		cpu_uninstall_idmap();
 
 		/*
 		 * Restore per-cpu offset before any kernel
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 7559c22..98a98ac 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -450,9 +450,7 @@ void __init paging_init(void)
 	 * TTBR0 is only used for the identity mapping at this stage. Make it
 	 * point to zero page to avoid speculatively fetching new entries.
 	 */
-	cpu_set_reserved_ttbr0();
-	local_flush_tlb_all();
-	cpu_set_default_tcr_t0sz();
+	cpu_uninstall_idmap();
 }
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 08/20] arm64: unmap idmap earlier
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (6 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 07/20] arm64: unify idmap removal Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 09/20] arm64: add function to install the idmap Mark Rutland
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

During boot we leave the idmap in place until paging_init, as we
previously had to wait for the zero page to become allocated and
accessible.

Now that we have a statically-allocated zero page, we can uninstall the
idmap much earlier in the boot process, making it far eaier to spot
accidental use of physical addresses. This also brings the cold boot
path in line with the secondary boot path.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/setup.c | 6 ++++++
 arch/arm64/mm/mmu.c       | 6 ------
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index f6621ba..cfed56f 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -314,6 +314,12 @@ void __init setup_arch(char **cmdline_p)
 	 */
 	local_async_enable();
 
+	/*
+	 * TTBR0 is only used for the identity mapping at this stage. Make it
+	 * point to zero page to avoid speculatively fetching new entries.
+	 */
+	cpu_uninstall_idmap();
+
 	efi_init();
 	arm64_memblock_init();
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 98a98ac..0c16c25 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -445,12 +445,6 @@ void __init paging_init(void)
 	fixup_executable();
 
 	bootmem_init();
-
-	/*
-	 * TTBR0 is only used for the identity mapping at this stage. Make it
-	 * point to zero page to avoid speculatively fetching new entries.
-	 */
-	cpu_uninstall_idmap();
 }
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 09/20] arm64: add function to install the idmap
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (7 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 08/20] arm64: unmap idmap earlier Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 10/20] arm64: mm: add code to safely replace TTBR1_EL1 Mark Rutland
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

In some cases (e.g. when making invasive changes to the kernel page
tables) we will need to execute code from the idmap.

Add a new helper which may be used to install the idmap, complementing
the existing cpu_uninstall_idmap.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/mmu_context.h | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index b1b2514..944f273 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -74,7 +74,7 @@ static inline bool __cpu_uses_extended_idmap(void)
 /*
  * Set TCR.T0SZ to its default value (based on VA_BITS)
  */
-static inline void cpu_set_default_tcr_t0sz(void)
+static inline void __cpu_set_tcr_t0sz(unsigned long t0sz)
 {
 	unsigned long tcr;
 
@@ -87,9 +87,12 @@ static inline void cpu_set_default_tcr_t0sz(void)
 	"	msr	tcr_el1, %0	;"
 	"	isb"
 	: "=&r" (tcr)
-	: "r"(TCR_T0SZ(VA_BITS)), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
+	: "r"(t0sz), "I"(TCR_T0SZ_OFFSET), "I"(TCR_TxSZ_WIDTH));
 }
 
+#define cpu_set_default_tcr_t0sz()	__cpu_set_tcr_t0sz(TCR_T0SZ(VA_BITS))
+#define cpu_set_idmap_tcr_t0sz()	__cpu_set_tcr_t0sz(idmap_t0sz)
+
 /*
  * Remove the idmap from TTBR0_EL1 and install the pgd of the active mm.
  *
@@ -114,6 +117,15 @@ static inline void cpu_uninstall_idmap(void)
 		cpu_switch_mm(mm->pgd, mm);
 }
 
+static inline void cpu_install_idmap(void)
+{
+	cpu_set_reserved_ttbr0();
+	local_flush_tlb_all();
+	cpu_set_idmap_tcr_t0sz();
+
+	cpu_switch_mm(idmap_pg_dir, &init_mm);
+}
+
 /*
  * It would be nice to return ASIDs back to the allocator, but unfortunately
  * that introduces a race with a generation rollover where we could erroneously
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 10/20] arm64: mm: add code to safely replace TTBR1_EL1
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (8 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 09/20] arm64: add function to install the idmap Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 11/20] arm64: mm: move pte_* macros Mark Rutland
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

If page tables are modified without suitable TLB maintenance, the ARM
architecture permits multiple TLB entries to be allocated for the same
VA. When this occurs, it is permitted that TLB conflict aborts are
raised in response to synchronous data/instruction accesses, and/or and
amalgamation of the TLB entries may be used as a result of a TLB lookup.

The presence of conflicting TLB entries may result in a variety of
behaviours detrimental to the system (e.g. erroneous physical addresses
may be used by I-cache fetches and/or page table walks). Some of these
cases may result in unexpected changes of hardware state, and/or result
in the (asynchronous) delivery of SError.

To avoid these issues, we must avoid situations where conflicting
entries may be allocated into TLBs. For user and module mappings we can
follow a strict break-before-make approach, but this cannot work for
modifications to the swapper page tables that cover the kernel text and
data.

Instead, this patch adds code which is intended to be executed from the
idmap, which can safely unmap the swapper page tables as it only
requires the idmap to be active. This enables us to uninstall the active
TTBR1_EL1 entry, invalidate TLBs, then install a new TTBR1_EL1 entry
without potentially unmapping code or data required for the sequence.
This avoids the risk of conflict, but requires that updates are staged
in a copy of the swapper page tables prior to being installed.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/mmu_context.h | 20 ++++++++++++++++++++
 arch/arm64/mm/proc.S                 | 27 +++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/arch/arm64/include/asm/mmu_context.h b/arch/arm64/include/asm/mmu_context.h
index 944f273..280ce2e 100644
--- a/arch/arm64/include/asm/mmu_context.h
+++ b/arch/arm64/include/asm/mmu_context.h
@@ -127,6 +127,26 @@ static inline void cpu_install_idmap(void)
 }
 
 /*
+ * Atomically replaces the active TTBR1_EL1 PGD with a new VA-compatible PGD,
+ * avoiding the possibility of conflicting TLB entries being allocated.
+ */
+static inline void cpu_replace_ttbr1(pgd_t *pgd)
+{
+	typedef void (ttbr_replace_func)(phys_addr_t, phys_addr_t);
+	extern ttbr_replace_func idmap_cpu_replace_ttbr1;
+	ttbr_replace_func *replace_phys;
+
+	phys_addr_t pgd_phys = virt_to_phys(pgd);
+	phys_addr_t reserved_phys = virt_to_phys(empty_zero_page);
+
+	replace_phys = (void*)virt_to_phys(idmap_cpu_replace_ttbr1);
+
+	cpu_install_idmap();
+	replace_phys(pgd_phys, reserved_phys);
+	cpu_uninstall_idmap();
+}
+
+/*
  * It would be nice to return ASIDs back to the allocator, but unfortunately
  * that introduces a race with a generation rollover where we could erroneously
  * free an ASID allocated in a future generation. We could workaround this by
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index cacecc4..d97c461 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -139,6 +139,33 @@ ENTRY(cpu_do_switch_mm)
 	ret
 ENDPROC(cpu_do_switch_mm)
 
+	.pushsection ".idmap.text", "ax"
+/*
+ * void idmap_cpu_replace_ttbr1(phys_addr_t new_pgd, phys_addr_t reserved_pgd)
+ *
+ * This is the low-level counterpart to cpu_replace_ttbr1, and should not be
+ * called by anything else. It can only be executed from a TTBR0 mapping.
+ */
+ENTRY(idmap_cpu_replace_ttbr1)
+	mrs	x2, daif
+	msr	daifset, #0xf
+
+	msr	ttbr1_el1, x1
+	isb
+
+	tlbi	vmalle1
+	dsb	nsh
+	isb
+
+	msr	ttbr1_el1, x0
+	isb
+
+	msr	daif, x2
+
+	ret
+ENDPROC(idmap_cpu_replace_ttbr1)
+	.popsection
+
 	.section ".text.init", #alloc, #execinstr
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 11/20] arm64: mm: move pte_* macros
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (9 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 10/20] arm64: mm: add code to safely replace TTBR1_EL1 Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 12/20] arm64: mm: add functions to walk page tables by PA Mark Rutland
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

For pmd, pud, and pgd levels of table, functions including p?d_index and
p?d_offset are defined after the p?d_page_vaddr function for the
immediately higher level of table.

The pte functions however are defined much earlier, even though several
rely on the later definition of pmd_page_vaddr. While this isn't
currently a problem as these are macros, it prevents the logical
grouping of later C functions (which cannot rely on prototypes for
functions not yet defined).

Move these definitions after pmd_page_vaddr, for consistency with the
placement of these functions for other levels of table.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 00f5a4b8..738cc37 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -134,16 +134,6 @@ extern unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)];
 #define pte_clear(mm,addr,ptep)	set_pte(ptep, __pte(0))
 #define pte_page(pte)		(pfn_to_page(pte_pfn(pte)))
 
-/* Find an entry in the third-level page table. */
-#define pte_index(addr)		(((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
-
-#define pte_offset_kernel(dir,addr)	(pmd_page_vaddr(*(dir)) + pte_index(addr))
-
-#define pte_offset_map(dir,addr)	pte_offset_kernel((dir), (addr))
-#define pte_offset_map_nested(dir,addr)	pte_offset_kernel((dir), (addr))
-#define pte_unmap(pte)			do { } while (0)
-#define pte_unmap_nested(pte)		do { } while (0)
-
 /*
  * The following only work if pte_present(). Undefined behaviour otherwise.
  */
@@ -425,6 +415,16 @@ static inline pte_t *pmd_page_vaddr(pmd_t pmd)
 	return __va(pmd_val(pmd) & PHYS_MASK & (s32)PAGE_MASK);
 }
 
+/* Find an entry in the third-level page table. */
+#define pte_index(addr)		(((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
+
+#define pte_offset_kernel(dir,addr)	(pmd_page_vaddr(*(dir)) + pte_index(addr))
+
+#define pte_offset_map(dir,addr)	pte_offset_kernel((dir), (addr))
+#define pte_offset_map_nested(dir,addr)	pte_offset_kernel((dir), (addr))
+#define pte_unmap(pte)			do { } while (0)
+#define pte_unmap_nested(pte)		do { } while (0)
+
 #define pmd_page(pmd)		pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
 
 /*
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 12/20] arm64: mm: add functions to walk page tables by PA
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (10 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 11/20] arm64: mm: move pte_* macros Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 13/20] arm64: mm: avoid redundant __pa(__va(x)) Mark Rutland
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

To allow us to walk tables allocated into the fixmap, we need to acquire
the physical address of a page, rather than the virtual address in the
linear map.

This patch adds new p??_page_paddr and p??_offset_phys functions to
acquire the physical address of a next-level table, and changes
p??_offset* into macros which simply convert this to a linear map VA.
This renders p??_page_vaddr unused, end hence they are removed.

At the pgd level, a new pgd_offset_raw function is added to find the
relevant PGD entry given the base of a PGD and a virtual address.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 39 +++++++++++++++++++++++----------------
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 738cc37..9c679cf 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -410,15 +410,16 @@ static inline void pmd_clear(pmd_t *pmdp)
 	set_pmd(pmdp, __pmd(0));
 }
 
-static inline pte_t *pmd_page_vaddr(pmd_t pmd)
+static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
 {
-	return __va(pmd_val(pmd) & PHYS_MASK & (s32)PAGE_MASK);
+	return pmd_val(pmd) & PHYS_MASK & (s32)PAGE_MASK;
 }
 
 /* Find an entry in the third-level page table. */
 #define pte_index(addr)		(((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
 
-#define pte_offset_kernel(dir,addr)	(pmd_page_vaddr(*(dir)) + pte_index(addr))
+#define pte_offset_phys(dir,addr)	(pmd_page_paddr(*(dir)) + pte_index(addr) * sizeof(pte_t))
+#define pte_offset_kernel(dir,addr)	((pte_t *)__va(pte_offset_phys((dir), (addr))))
 
 #define pte_offset_map(dir,addr)	pte_offset_kernel((dir), (addr))
 #define pte_offset_map_nested(dir,addr)	pte_offset_kernel((dir), (addr))
@@ -453,21 +454,23 @@ static inline void pud_clear(pud_t *pudp)
 	set_pud(pudp, __pud(0));
 }
 
-static inline pmd_t *pud_page_vaddr(pud_t pud)
+static inline phys_addr_t pud_page_paddr(pud_t pud)
 {
-	return __va(pud_val(pud) & PHYS_MASK & (s32)PAGE_MASK);
+	return pud_val(pud) & PHYS_MASK & (s32)PAGE_MASK;
 }
 
 /* Find an entry in the second-level page table. */
 #define pmd_index(addr)		(((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
 
-static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
-{
-	return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(addr);
-}
+#define pmd_offset_phys(dir, addr)	(pud_page_paddr(*(dir)) + pmd_index(addr) * sizeof(pmd_t))
+#define pmd_offset(dir, addr)		((pmd_t *)__va(pmd_offset_phys((dir), (addr))))
 
 #define pud_page(pud)		pfn_to_page(__phys_to_pfn(pud_val(pud) & PHYS_MASK))
 
+#else
+
+#define pud_page_paddr(pud)	({ BUILD_BUG(); 0; })
+
 #endif	/* CONFIG_PGTABLE_LEVELS > 2 */
 
 #if CONFIG_PGTABLE_LEVELS > 3
@@ -489,21 +492,23 @@ static inline void pgd_clear(pgd_t *pgdp)
 	set_pgd(pgdp, __pgd(0));
 }
 
-static inline pud_t *pgd_page_vaddr(pgd_t pgd)
+static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 {
-	return __va(pgd_val(pgd) & PHYS_MASK & (s32)PAGE_MASK);
+	return pgd_val(pgd) & PHYS_MASK & (s32)PAGE_MASK;
 }
 
 /* Find an entry in the frst-level page table. */
 #define pud_index(addr)		(((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
 
-static inline pud_t *pud_offset(pgd_t *pgd, unsigned long addr)
-{
-	return (pud_t *)pgd_page_vaddr(*pgd) + pud_index(addr);
-}
+#define pud_offset_phys(dir, addr)	(pgd_page_paddr(*(dir)) + pud_index(addr) * sizeof(pud_t))
+#define pud_offset(dir, addr)		((pud_t *)__va(pud_offset_phys((dir), (addr))))
 
 #define pgd_page(pgd)		pfn_to_page(__phys_to_pfn(pgd_val(pgd) & PHYS_MASK))
 
+#else
+
+#define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
+
 #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
 
 #define pgd_ERROR(pgd)		__pgd_error(__FILE__, __LINE__, pgd_val(pgd))
@@ -511,7 +516,9 @@ static inline pud_t *pud_offset(pgd_t *pgd, unsigned long addr)
 /* to find an entry in a page-table-directory */
 #define pgd_index(addr)		(((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
 
-#define pgd_offset(mm, addr)	((mm)->pgd+pgd_index(addr))
+#define pgd_offset_raw(pgd, addr)	((pgd) + pgd_index(addr))
+
+#define pgd_offset(mm, addr)	(pgd_offset_raw((mm)->pgd, (addr)))
 
 /* to find an entry in a kernel page-table-directory */
 #define pgd_offset_k(addr)	pgd_offset(&init_mm, addr)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 13/20] arm64: mm: avoid redundant __pa(__va(x))
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (11 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 12/20] arm64: mm: add functions to walk page tables by PA Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 14/20] arm64: mm: add __{pud,pgd}_populate Mark Rutland
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

When we "upgrade" to a section mapping, we free any table we made
redundant by giving it back to memblock. To get the PA, we acquire the
physical address and convert this to a VA, then subsequently convert
this back to a PA.

This works currently, but will not work if the tables are not accessed
via linear map VAs (e.g. is we use fixmap slots).

This patch uses {pmd,pud}_page_paddr to acquire the PA. This avoids the
__pa(__va()) round trip, saving some work and avoiding reliance on the
linear mapping.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/mm/mmu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 0c16c25..5ed1623 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -167,7 +167,7 @@ static void alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
 			if (!pmd_none(old_pmd)) {
 				flush_tlb_all();
 				if (pmd_table(old_pmd)) {
-					phys_addr_t table = __pa(pte_offset_map(&old_pmd, 0));
+					phys_addr_t table = pmd_page_paddr(old_pmd);
 					if (!WARN_ON_ONCE(slab_is_available()))
 						memblock_free(table, PAGE_SIZE);
 				}
@@ -228,7 +228,7 @@ static void alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
 			if (!pud_none(old_pud)) {
 				flush_tlb_all();
 				if (pud_table(old_pud)) {
-					phys_addr_t table = __pa(pmd_offset(&old_pud, 0));
+					phys_addr_t table = pud_page_paddr(old_pud);
 					if (!WARN_ON_ONCE(slab_is_available()))
 						memblock_free(table, PAGE_SIZE);
 				}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 14/20] arm64: mm: add __{pud,pgd}_populate
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (12 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 13/20] arm64: mm: avoid redundant __pa(__va(x)) Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 15/20] arm64: mm: add functions to walk tables in fixmap Mark Rutland
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

We currently have __pmd_populate for creating a pmd table entry given
the physical address of a pte, but don't have equivalents for the pud or
pgd levels of table.

To enable us to manipulate tables which are mapped outside of the linear
mapping (where we have a PA, but not a linear map VA), it is useful to
have these functions.

This patch adds __{pud,pgd}_populate. As these should not be called when
the kernel uses folded {pmd,pud}s, in these cases they expand to
BUILD_BUG(). So long as the appropriate checks are made on the {pud,pgd}
entry prior to attempting population, these should be optimized out at
compile time.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/pgalloc.h | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/pgalloc.h b/arch/arm64/include/asm/pgalloc.h
index c150539..ff98585 100644
--- a/arch/arm64/include/asm/pgalloc.h
+++ b/arch/arm64/include/asm/pgalloc.h
@@ -42,11 +42,20 @@ static inline void pmd_free(struct mm_struct *mm, pmd_t *pmd)
 	free_page((unsigned long)pmd);
 }
 
-static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
+static inline void __pud_populate(pud_t *pud, phys_addr_t pmd, pudval_t prot)
 {
-	set_pud(pud, __pud(__pa(pmd) | PMD_TYPE_TABLE));
+	set_pud(pud, __pud(pmd | prot));
 }
 
+static inline void pud_populate(struct mm_struct *mm, pud_t *pud, pmd_t *pmd)
+{
+	__pud_populate(pud, __pa(pmd), PMD_TYPE_TABLE);
+}
+#else
+static inline void __pud_populate(pud_t *pud, phys_addr_t pmd, pudval_t prot)
+{
+	BUILD_BUG();
+}
 #endif	/* CONFIG_PGTABLE_LEVELS > 2 */
 
 #if CONFIG_PGTABLE_LEVELS > 3
@@ -62,11 +71,20 @@ static inline void pud_free(struct mm_struct *mm, pud_t *pud)
 	free_page((unsigned long)pud);
 }
 
-static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
+static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t pud, pgdval_t prot)
 {
-	set_pgd(pgd, __pgd(__pa(pud) | PUD_TYPE_TABLE));
+	set_pgd(pgdp, __pgd(pud | prot));
 }
 
+static inline void pgd_populate(struct mm_struct *mm, pgd_t *pgd, pud_t *pud)
+{
+	__pgd_populate(pgd, __pa(pud), PUD_TYPE_TABLE);
+}
+#else
+static inline void __pgd_populate(pgd_t *pgdp, phys_addr_t pud, pgdval_t prot)
+{
+	BUILD_BUG();
+}
 #endif	/* CONFIG_PGTABLE_LEVELS > 3 */
 
 extern pgd_t *pgd_alloc(struct mm_struct *mm);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 15/20] arm64: mm: add functions to walk tables in fixmap
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (13 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 14/20] arm64: mm: add __{pud,pgd}_populate Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 16/20] arm64: mm: use fixmap when creating page tables Mark Rutland
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

As a prepratory step to allow us to allocate early page tables from
unmapped memory using memblock_alloc, add new p??_fixmap* functions that
can be used to walk page tables outside of the linear mapping by using
fixmap slots.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/fixmap.h  | 10 ++++++++++
 arch/arm64/include/asm/pgtable.h | 26 ++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index 3097045..1a617d4 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -62,6 +62,16 @@ enum fixed_addresses {
 
 	FIX_BTMAP_END = __end_of_permanent_fixed_addresses,
 	FIX_BTMAP_BEGIN = FIX_BTMAP_END + TOTAL_FIX_BTMAPS - 1,
+
+	/*
+	 * Used for kernel page table creation, so unmapped memory may be used
+	 * for tables.
+	 */
+	FIX_PTE,
+	FIX_PMD,
+	FIX_PUD,
+	FIX_PGD,
+
 	__end_of_fixed_addresses
 };
 
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 9c679cf..8a00a1cb 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -57,6 +57,7 @@
 
 #ifndef __ASSEMBLY__
 
+#include <asm/fixmap.h>
 #include <linux/mmdebug.h>
 
 extern void __pte_error(const char *file, int line, unsigned long val);
@@ -426,6 +427,10 @@ static inline phys_addr_t pmd_page_paddr(pmd_t pmd)
 #define pte_unmap(pte)			do { } while (0)
 #define pte_unmap_nested(pte)		do { } while (0)
 
+#define pte_fixmap(addr)		((pte_t *)set_fixmap_offset(FIX_PTE, addr))
+#define pte_fixmap_offset(pmd, addr)	pte_fixmap(pte_offset_phys(pmd, addr))
+#define pte_fixmap_unmap()		clear_fixmap(FIX_PTE)
+
 #define pmd_page(pmd)		pfn_to_page(__phys_to_pfn(pmd_val(pmd) & PHYS_MASK))
 
 /*
@@ -465,12 +470,21 @@ static inline phys_addr_t pud_page_paddr(pud_t pud)
 #define pmd_offset_phys(dir, addr)	(pud_page_paddr(*(dir)) + pmd_index(addr) * sizeof(pmd_t))
 #define pmd_offset(dir, addr)		((pmd_t *)__va(pmd_offset_phys((dir), (addr))))
 
+#define pmd_fixmap(addr)		((pmd_t *)set_fixmap_offset(FIX_PMD, addr))
+#define pmd_fixmap_offset(pud, addr)	pmd_fixmap(pmd_offset_phys(pud, addr))
+#define pmd_fixmap_unmap()		clear_fixmap(FIX_PMD)
+
 #define pud_page(pud)		pfn_to_page(__phys_to_pfn(pud_val(pud) & PHYS_MASK))
 
 #else
 
 #define pud_page_paddr(pud)	({ BUILD_BUG(); 0; })
 
+/* Match pmd_offset folding in <asm/generic/pgtable-nopmd.h> */
+#define pmd_fixmap(addr)		NULL
+#define pmd_fixmap_offset(pudp, addr)	((pmd_t *)pudp)
+#define pmd_fixmap_unmap()
+
 #endif	/* CONFIG_PGTABLE_LEVELS > 2 */
 
 #if CONFIG_PGTABLE_LEVELS > 3
@@ -503,12 +517,21 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 #define pud_offset_phys(dir, addr)	(pgd_page_paddr(*(dir)) + pud_index(addr) * sizeof(pud_t))
 #define pud_offset(dir, addr)		((pud_t *)__va(pud_offset_phys((dir), (addr))))
 
+#define pud_fixmap(addr)		((pud_t *)set_fixmap_offset(FIX_PUD, addr))
+#define pud_fixmap_offset(pgd, addr)	pud_fixmap(pmd_offset_phys(pgd, addr))
+#define pud_fixmap_unmap()		clear_fixmap(FIX_PUD)
+
 #define pgd_page(pgd)		pfn_to_page(__phys_to_pfn(pgd_val(pgd) & PHYS_MASK))
 
 #else
 
 #define pgd_page_paddr(pgd)	({ BUILD_BUG(); 0;})
 
+/* Match pud_offset folding in <asm/generic/pgtable-nopud.h> */
+#define pud_fixmap(addr)		NULL
+#define pud_fixmap_offset(pgdp, addr)	((pud_t *)pgdp)
+#define pud_fixmap_unmap()
+
 #endif  /* CONFIG_PGTABLE_LEVELS > 3 */
 
 #define pgd_ERROR(pgd)		__pgd_error(__FILE__, __LINE__, pgd_val(pgd))
@@ -523,6 +546,9 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 /* to find an entry in a kernel page-table-directory */
 #define pgd_offset_k(addr)	pgd_offset(&init_mm, addr)
 
+#define pgd_fixmap(addr)		((pgd_t *)set_fixmap_offset(FIX_PGD, addr))
+#define pgd_fixmap_unmap()		clear_fixmap(FIX_PGD)
+
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
 	const pteval_t mask = PTE_USER | PTE_PXN | PTE_UXN | PTE_RDONLY |
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 16/20] arm64: mm: use fixmap when creating page tables
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (14 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 15/20] arm64: mm: add functions to walk tables in fixmap Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 17/20] arm64: mm: allocate pagetables anywhere Mark Rutland
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

As a prepratory step to allow us to allocate early page tables form
unmapped memory using memblock_alloc, modify the __create_mapping
callees to map and unmap the tables they modify using fixmap entries.

All but the top-level pgd initialisation is performed via the fixmap.
Subsequent patches will inject the pgd physical address, and migrate to
using the FIX_PGD slot.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/pgtable.h |  2 +-
 arch/arm64/mm/mmu.c              | 54 ++++++++++++++++++++++++++--------------
 2 files changed, 37 insertions(+), 19 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 8a00a1cb..0664468 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -518,7 +518,7 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd)
 #define pud_offset(dir, addr)		((pud_t *)__va(pud_offset_phys((dir), (addr))))
 
 #define pud_fixmap(addr)		((pud_t *)set_fixmap_offset(FIX_PUD, addr))
-#define pud_fixmap_offset(pgd, addr)	pud_fixmap(pmd_offset_phys(pgd, addr))
+#define pud_fixmap_offset(pgd, addr)	pud_fixmap(pud_offset_phys(pgd, addr))
 #define pud_fixmap_unmap()		clear_fixmap(FIX_PUD)
 
 #define pgd_page(pgd)		pfn_to_page(__phys_to_pfn(pgd_val(pgd) & PHYS_MASK))
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 5ed1623..1a516ac 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -62,16 +62,24 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 }
 EXPORT_SYMBOL(phys_mem_access_prot);
 
-static void __init *early_alloc(void)
+static phys_addr_t __init early_alloc(void)
 {
 	phys_addr_t phys;
 	void *ptr;
 
 	phys = memblock_alloc(PAGE_SIZE, PAGE_SIZE);
 	BUG_ON(!phys);
-	ptr = __va(phys);
+
+	/*
+	 * The FIX_{PGD,PUD,PMD} slots may be in active use, but the FIX_PTE
+	 * slot will be free, so we can (ab)use the FIX_PTE slot to initialise
+	 * any level of table.
+	 */
+	ptr = pte_fixmap(phys);
 	memset(ptr, 0, PAGE_SIZE);
-	return ptr;
+	pte_fixmap_unmap();
+
+	return phys;
 }
 
 /*
@@ -95,24 +103,28 @@ static void split_pmd(pmd_t *pmd, pte_t *pte)
 static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 				  unsigned long end, unsigned long pfn,
 				  pgprot_t prot,
-				  void *(*alloc)(void))
+				  phys_addr_t (*alloc)(void))
 {
 	pte_t *pte;
 
 	if (pmd_none(*pmd) || pmd_sect(*pmd)) {
-		pte = alloc();
+		phys_addr_t pte_phys = alloc();
+		pte = pte_fixmap(pte_phys);
 		if (pmd_sect(*pmd))
 			split_pmd(pmd, pte);
-		__pmd_populate(pmd, __pa(pte), PMD_TYPE_TABLE);
+		__pmd_populate(pmd, pte_phys, PMD_TYPE_TABLE);
 		flush_tlb_all();
+		pte_fixmap_unmap();
 	}
 	BUG_ON(pmd_bad(*pmd));
 
-	pte = pte_offset_kernel(pmd, addr);
+	pte = pte_fixmap_offset(pmd, addr);
 	do {
 		set_pte(pte, pfn_pte(pfn, prot));
 		pfn++;
 	} while (pte++, addr += PAGE_SIZE, addr != end);
+
+	pte_fixmap_unmap();
 }
 
 static void split_pud(pud_t *old_pud, pmd_t *pmd)
@@ -130,7 +142,7 @@ static void split_pud(pud_t *old_pud, pmd_t *pmd)
 static void alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
 				  unsigned long addr, unsigned long end,
 				  phys_addr_t phys, pgprot_t prot,
-				  void *(*alloc)(void))
+				  phys_addr_t (*alloc)(void))
 {
 	pmd_t *pmd;
 	unsigned long next;
@@ -139,7 +151,8 @@ static void alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
 	 * Check for initial section mappings in the pgd/pud and remove them.
 	 */
 	if (pud_none(*pud) || pud_sect(*pud)) {
-		pmd = alloc();
+		phys_addr_t pmd_phys = alloc();
+		pmd = pmd_fixmap(pmd_phys);
 		if (pud_sect(*pud)) {
 			/*
 			 * need to have the 1G of mappings continue to be
@@ -147,12 +160,13 @@ static void alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
 			 */
 			split_pud(pud, pmd);
 		}
-		pud_populate(mm, pud, pmd);
+		__pud_populate(pud, pmd_phys, PUD_TYPE_TABLE);
 		flush_tlb_all();
+		pmd_fixmap_unmap();
 	}
 	BUG_ON(pud_bad(*pud));
 
-	pmd = pmd_offset(pud, addr);
+	pmd = pmd_fixmap_offset(pud, addr);
 	do {
 		next = pmd_addr_end(addr, end);
 		/* try section mapping first */
@@ -178,6 +192,8 @@ static void alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
 		}
 		phys += next - addr;
 	} while (pmd++, addr = next, addr != end);
+
+	pmd_fixmap_unmap();
 }
 
 static inline bool use_1G_block(unsigned long addr, unsigned long next,
@@ -195,18 +211,18 @@ static inline bool use_1G_block(unsigned long addr, unsigned long next,
 static void alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
 				  unsigned long addr, unsigned long end,
 				  phys_addr_t phys, pgprot_t prot,
-				  void *(*alloc)(void))
+				  phys_addr_t (*alloc)(void))
 {
 	pud_t *pud;
 	unsigned long next;
 
 	if (pgd_none(*pgd)) {
-		pud = alloc();
-		pgd_populate(mm, pgd, pud);
+		phys_addr_t pud_phys = alloc();
+		__pgd_populate(pgd, pud_phys, PUD_TYPE_TABLE);
 	}
 	BUG_ON(pgd_bad(*pgd));
 
-	pud = pud_offset(pgd, addr);
+	pud = pud_fixmap_offset(pgd, addr);
 	do {
 		next = pud_addr_end(addr, end);
 
@@ -238,6 +254,8 @@ static void alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
 		}
 		phys += next - addr;
 	} while (pud++, addr = next, addr != end);
+
+	pud_fixmap_unmap();
 }
 
 /*
@@ -247,7 +265,7 @@ static void alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
 static void  __create_mapping(struct mm_struct *mm, pgd_t *pgd,
 				    phys_addr_t phys, unsigned long virt,
 				    phys_addr_t size, pgprot_t prot,
-				    void *(*alloc)(void))
+				    phys_addr_t (*alloc)(void))
 {
 	unsigned long addr, length, end, next;
 
@@ -262,11 +280,11 @@ static void  __create_mapping(struct mm_struct *mm, pgd_t *pgd,
 	} while (pgd++, addr = next, addr != end);
 }
 
-static void *late_alloc(void)
+static phys_addr_t late_alloc(void)
 {
 	void *ptr = (void *)__get_free_page(PGALLOC_GFP);
 	BUG_ON(!ptr);
-	return ptr;
+	return __pa(ptr);
 }
 
 static void __init create_mapping(phys_addr_t phys, unsigned long virt,
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 17/20] arm64: mm: allocate pagetables anywhere
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (15 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 16/20] arm64: mm: use fixmap when creating page tables Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 18/20] arm64: mm: allow passing a pgdir to alloc_init_* Mark Rutland
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

Now that create_mapping uses fixmap slots to modify pte, pmd, and pud
entries, we can access page tables anywhere in physical memory,
regardless of the extent of the linear mapping.

Given that, we no longer need to limit memblock allocations during page
table creation, and can leave the limit as its default
MEMBLOCK_ALLOC_ANYWHERE.

We never add memory which will fall outside of the linear map range
given phys_offset and MAX_MEMBLOCK_ADDR are configured appropriately, so
any tables we create will fall in the linear map of the final tables.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/mm/mmu.c | 35 -----------------------------------
 1 file changed, 35 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 1a516ac..27dd475 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -365,20 +365,6 @@ static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
 static void __init map_mem(void)
 {
 	struct memblock_region *reg;
-	phys_addr_t limit;
-
-	/*
-	 * Temporarily limit the memblock range. We need to do this as
-	 * create_mapping requires puds, pmds and ptes to be allocated from
-	 * memory addressable from the initial direct kernel mapping.
-	 *
-	 * The initial direct kernel mapping, located at swapper_pg_dir, gives
-	 * us PUD_SIZE (with SECTION maps) or PMD_SIZE (without SECTION maps,
-	 * memory starting from PHYS_OFFSET (which must be aligned to 2MB as
-	 * per Documentation/arm64/booting.txt).
-	 */
-	limit = PHYS_OFFSET + SWAPPER_INIT_MAP_SIZE;
-	memblock_set_current_limit(limit);
 
 	/* map all the memory banks */
 	for_each_memblock(memory, reg) {
@@ -388,29 +374,8 @@ static void __init map_mem(void)
 		if (start >= end)
 			break;
 
-		if (ARM64_SWAPPER_USES_SECTION_MAPS) {
-			/*
-			 * For the first memory bank align the start address and
-			 * current memblock limit to prevent create_mapping() from
-			 * allocating pte page tables from unmapped memory. With
-			 * the section maps, if the first block doesn't end on section
-			 * size boundary, create_mapping() will try to allocate a pte
-			 * page, which may be returned from an unmapped area.
-			 * When section maps are not used, the pte page table for the
-			 * current limit is already present in swapper_pg_dir.
-			 */
-			if (start < limit)
-				start = ALIGN(start, SECTION_SIZE);
-			if (end < limit) {
-				limit = end & SECTION_MASK;
-				memblock_set_current_limit(limit);
-			}
-		}
 		__map_memblock(start, end);
 	}
-
-	/* Limit no longer required. */
-	memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
 }
 
 static void __init fixup_executable(void)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 18/20] arm64: mm: allow passing a pgdir to alloc_init_*
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (16 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 17/20] arm64: mm: allocate pagetables anywhere Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 19/20] arm64: ensure _stext and _etext are page-aligned Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 20/20] arm64: mm: create new fine-grained mappings at boot Mark Rutland
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

To allow us to initialise pgdirs which are fixmapped, allow explicitly
passing a pgdir rather than an mm. A new __create_pgd_mapping function
is added for this, with existing __create_mapping callers migrated to
this.

The mm argument was previously only used at the top level. Now that it
is redundant at all levels, it is removed. To indicate its new found
similarity to alloc_init_{pud,pmd,pte}, __create_mapping is renamed to
init_pgd.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/mm/mmu.c | 42 ++++++++++++++++++++++--------------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 27dd475..c20a1ce 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -139,10 +139,9 @@ static void split_pud(pud_t *old_pud, pmd_t *pmd)
 	} while (pmd++, i++, i < PTRS_PER_PMD);
 }
 
-static void alloc_init_pmd(struct mm_struct *mm, pud_t *pud,
-				  unsigned long addr, unsigned long end,
-				  phys_addr_t phys, pgprot_t prot,
-				  phys_addr_t (*alloc)(void))
+static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
+			   phys_addr_t phys, pgprot_t prot,
+			   phys_addr_t (*alloc)(void))
 {
 	pmd_t *pmd;
 	unsigned long next;
@@ -208,10 +207,9 @@ static inline bool use_1G_block(unsigned long addr, unsigned long next,
 	return true;
 }
 
-static void alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
-				  unsigned long addr, unsigned long end,
-				  phys_addr_t phys, pgprot_t prot,
-				  phys_addr_t (*alloc)(void))
+static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
+			   phys_addr_t phys, pgprot_t prot,
+			   phys_addr_t (*alloc)(void))
 {
 	pud_t *pud;
 	unsigned long next;
@@ -250,7 +248,7 @@ static void alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
 				}
 			}
 		} else {
-			alloc_init_pmd(mm, pud, addr, next, phys, prot, alloc);
+			alloc_init_pmd(pud, addr, next, phys, prot, alloc);
 		}
 		phys += next - addr;
 	} while (pud++, addr = next, addr != end);
@@ -262,10 +260,9 @@ static void alloc_init_pud(struct mm_struct *mm, pgd_t *pgd,
  * Create the page directory entries and any necessary page tables for the
  * mapping specified by 'md'.
  */
-static void  __create_mapping(struct mm_struct *mm, pgd_t *pgd,
-				    phys_addr_t phys, unsigned long virt,
-				    phys_addr_t size, pgprot_t prot,
-				    phys_addr_t (*alloc)(void))
+static void init_pgd(pgd_t *pgd, phys_addr_t phys, unsigned long virt,
+		     phys_addr_t size, pgprot_t prot,
+		     phys_addr_t (*alloc)(void))
 {
 	unsigned long addr, length, end, next;
 
@@ -275,7 +272,7 @@ static void  __create_mapping(struct mm_struct *mm, pgd_t *pgd,
 	end = addr + length;
 	do {
 		next = pgd_addr_end(addr, end);
-		alloc_init_pud(mm, pgd, addr, next, phys, prot, alloc);
+		alloc_init_pud(pgd, addr, next, phys, prot, alloc);
 		phys += next - addr;
 	} while (pgd++, addr = next, addr != end);
 }
@@ -287,6 +284,14 @@ static phys_addr_t late_alloc(void)
 	return __pa(ptr);
 }
 
+static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
+				 unsigned long virt, phys_addr_t size,
+				 pgprot_t prot,
+				 phys_addr_t (*alloc)(void))
+{
+	init_pgd(pgd_offset_raw(pgdir, virt), phys, virt, size, prot, alloc);
+}
+
 static void __init create_mapping(phys_addr_t phys, unsigned long virt,
 				  phys_addr_t size, pgprot_t prot)
 {
@@ -295,16 +300,14 @@ static void __init create_mapping(phys_addr_t phys, unsigned long virt,
 			&phys, virt);
 		return;
 	}
-	__create_mapping(&init_mm, pgd_offset_k(virt), phys, virt,
-			 size, prot, early_alloc);
+	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, early_alloc);
 }
 
 void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 			       unsigned long virt, phys_addr_t size,
 			       pgprot_t prot)
 {
-	__create_mapping(mm, pgd_offset(mm, virt), phys, virt, size, prot,
-				late_alloc);
+	__create_pgd_mapping(mm->pgd, phys, virt, size, prot, late_alloc);
 }
 
 static void create_mapping_late(phys_addr_t phys, unsigned long virt,
@@ -316,8 +319,7 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
 		return;
 	}
 
-	return __create_mapping(&init_mm, pgd_offset_k(virt),
-				phys, virt, size, prot, late_alloc);
+	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, late_alloc);
 }
 
 #ifdef CONFIG_DEBUG_RODATA
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 19/20] arm64: ensure _stext and _etext are page-aligned
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (17 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 18/20] arm64: mm: allow passing a pgdir to alloc_init_* Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  2015-12-09 12:44 ` [RFC PATCH 20/20] arm64: mm: create new fine-grained mappings at boot Mark Rutland
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

Currently we have separate ALIGN_DEBUG_RO{,_MIN} directives to align
_etext and __init_begin. While we ensure that __init_begin is
page-aligned, we do not provide the same guarantee for _etext. This is
not problematic currently as the alignemtn of __init_begin is suffucient
to prevent issues when we modify permissions.

Subsequent patches will assume page alignment of segments of the kernel
we wish to map with different permissions. To ensure this, move _etext
after the ALIGN_DEBUG_RO_MIN for the init section. This renders the
prior ALIGN_DEBUG_RO irrelevant, and hence it is removed. Likewise,
upgrade to ALIGN_DEBUG_RO_MIN(PAGE_SIZE) for _stext.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/vmlinux.lds.S | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index f943a84..7de6c39 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -94,7 +94,7 @@ SECTIONS
 		_text = .;
 		HEAD_TEXT
 	}
-	ALIGN_DEBUG_RO
+	ALIGN_DEBUG_RO_MIN(PAGE_SIZE)
 	.text : {			/* Real text segment		*/
 		_stext = .;		/* Text and read-only data	*/
 			__exception_text_start = .;
@@ -115,10 +115,9 @@ SECTIONS
 	RO_DATA(PAGE_SIZE)
 	EXCEPTION_TABLE(8)
 	NOTES
-	ALIGN_DEBUG_RO
-	_etext = .;			/* End of text and rodata section */
 
 	ALIGN_DEBUG_RO_MIN(PAGE_SIZE)
+	_etext = .;			/* End of text and rodata section */
 	__init_begin = .;
 
 	INIT_TEXT_SECTION(8)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 20/20] arm64: mm: create new fine-grained mappings at boot
  2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
                   ` (18 preceding siblings ...)
  2015-12-09 12:44 ` [RFC PATCH 19/20] arm64: ensure _stext and _etext are page-aligned Mark Rutland
@ 2015-12-09 12:44 ` Mark Rutland
  19 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-09 12:44 UTC (permalink / raw)
  To: linux-arm-kernel

At boot we may change the granularity of the tables mapping the kernel
(by splitting or making sections). This may happen when we create the
linear mapping (in __map_memblock), or at any point we try to apply
fine-grained permissions to the kernel (e.g. fixup_executable,
mark_rodata_ro, fixup_init).

Changing the active page tables in this manner may result in multiple
entries for the same address being allocated into TLBs, risking problems
such as TLB conflict aborts or issues derived from the amalgamation of
TLB entries. Generally, a break-before-make (BBM) approach is necessary
to avoid conflicts, but we cannot do this for the kernel tables as it
risks unmapping text or data being used to do so.

Instead, we can create a new set of tables from scratch in the safety of
the existing mappings, and subsequently migrate over to these using the
new cpu_replace_ttbr1 helper, which avoids the two sets of tables being
active simultaneously.

To avoid issues when we later modify permissions of the page tables
(e.g. in fixup_init), we must create the page tables at a granularity
such that later modification does not result in splitting of tables.

This patch applies this strategy, creating a new set of fine-grained
page tables from scratch, and safely migrating to them.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jeremy Linton <jeremy.linton@arm.com>
Cc: Laura Abbott <labbott@fedoraproject.org>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/mm/mmu.c | 146 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 84 insertions(+), 62 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index c20a1ce..aa4f381 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -322,49 +322,41 @@ static void create_mapping_late(phys_addr_t phys, unsigned long virt,
 	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot, late_alloc);
 }
 
-#ifdef CONFIG_DEBUG_RODATA
-static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
+static void __init __map_memblock(pgd_t *pgd, phys_addr_t start, phys_addr_t end)
 {
+
+	unsigned long kernel_start = __pa(_stext);
+	unsigned long kernel_end = __pa(_end);
+
 	/*
-	 * Set up the executable regions using the existing section mappings
-	 * for now. This will get more fine grained later once all memory
-	 * is mapped
+	 * The kernel itself is mapped at page granularity. Map all other
+	 * memory, making sure we don't overwrite the existing kernel mappings.
 	 */
-	unsigned long kernel_x_start = round_down(__pa(_stext), SWAPPER_BLOCK_SIZE);
-	unsigned long kernel_x_end = round_up(__pa(__init_end), SWAPPER_BLOCK_SIZE);
-
-	if (end < kernel_x_start) {
-		create_mapping(start, __phys_to_virt(start),
-			end - start, PAGE_KERNEL);
-	} else if (start >= kernel_x_end) {
-		create_mapping(start, __phys_to_virt(start),
-			end - start, PAGE_KERNEL);
-	} else {
-		if (start < kernel_x_start)
-			create_mapping(start, __phys_to_virt(start),
-				kernel_x_start - start,
-				PAGE_KERNEL);
-		create_mapping(kernel_x_start,
-				__phys_to_virt(kernel_x_start),
-				kernel_x_end - kernel_x_start,
-				PAGE_KERNEL_EXEC);
-		if (kernel_x_end < end)
-			create_mapping(kernel_x_end,
-				__phys_to_virt(kernel_x_end),
-				end - kernel_x_end,
-				PAGE_KERNEL);
+
+	/* No overlap with the kernel. */
+	if (end < kernel_start || start >= kernel_end) {
+		__create_pgd_mapping(pgd, start, __phys_to_virt(start),
+				     end - start, PAGE_KERNEL, early_alloc);
+		return;
 	}
 
+	/*
+	 * This block overlaps the kernel mapping. Map the portion(s) which
+	 * don't overlap.
+	 */
+	if (start < kernel_start)
+		__create_pgd_mapping(pgd, start,
+				     __phys_to_virt(start),
+				     kernel_start - start, PAGE_KERNEL,
+				     early_alloc);
+	if (kernel_end < end)
+		__create_pgd_mapping(pgd, kernel_end,
+				     __phys_to_virt(kernel_end),
+				     end - kernel_end, PAGE_KERNEL,
+				     early_alloc);
 }
-#else
-static void __init __map_memblock(phys_addr_t start, phys_addr_t end)
-{
-	create_mapping(start, __phys_to_virt(start), end - start,
-			PAGE_KERNEL_EXEC);
-}
-#endif
 
-static void __init map_mem(void)
+static void __init map_mem(pgd_t *pgd)
 {
 	struct memblock_region *reg;
 
@@ -376,33 +368,10 @@ static void __init map_mem(void)
 		if (start >= end)
 			break;
 
-		__map_memblock(start, end);
+		__map_memblock(pgd, start, end);
 	}
 }
 
-static void __init fixup_executable(void)
-{
-#ifdef CONFIG_DEBUG_RODATA
-	/* now that we are actually fully mapped, make the start/end more fine grained */
-	if (!IS_ALIGNED((unsigned long)_stext, SWAPPER_BLOCK_SIZE)) {
-		unsigned long aligned_start = round_down(__pa(_stext),
-							 SWAPPER_BLOCK_SIZE);
-
-		create_mapping(aligned_start, __phys_to_virt(aligned_start),
-				__pa(_stext) - aligned_start,
-				PAGE_KERNEL);
-	}
-
-	if (!IS_ALIGNED((unsigned long)__init_end, SWAPPER_BLOCK_SIZE)) {
-		unsigned long aligned_end = round_up(__pa(__init_end),
-							  SWAPPER_BLOCK_SIZE);
-		create_mapping(__pa(__init_end), (unsigned long)__init_end,
-				aligned_end - __pa(__init_end),
-				PAGE_KERNEL);
-	}
-#endif
-}
-
 #ifdef CONFIG_DEBUG_RODATA
 void mark_rodata_ro(void)
 {
@@ -420,14 +389,67 @@ void fixup_init(void)
 			PAGE_KERNEL);
 }
 
+static void __init map_kernel_chunk(pgd_t *pgd, void *va_start, void *va_end, pgprot_t prot)
+{
+	phys_addr_t pa_start = __pa(va_start);
+	unsigned long size = va_end - va_start;
+
+	BUG_ON(!PAGE_ALIGNED(pa_start));
+	BUG_ON(!PAGE_ALIGNED(size));
+
+	__create_pgd_mapping(pgd, pa_start, (unsigned long)va_start, size, prot, early_alloc);
+}
+
+/*
+ * Create fine-grained mappings for the kernel.
+ */
+static void __init map_kernel(pgd_t *pgd)
+{
+
+	map_kernel_chunk(pgd, _stext, _etext, PAGE_KERNEL_EXEC);
+	map_kernel_chunk(pgd, __init_begin, __init_end, PAGE_KERNEL_EXEC);
+	map_kernel_chunk(pgd, _data, _end, PAGE_KERNEL);
+
+	/*
+	 * The fixmap falls in a separate pgd to the kernel, and doesn't live
+	 * in the carveout for the swapper_pg_dir. We can simply re-use the
+	 * existing dir for the fixmap.
+	 */
+	set_pgd(pgd_offset_raw(pgd, FIXADDR_START), *pgd_offset_k(FIXADDR_START));
+
+	/* TODO: either copy or initialise KASAN here */
+}
+
+
 /*
  * paging_init() sets up the page tables, initialises the zone memory
  * maps and sets up the zero page.
  */
 void __init paging_init(void)
 {
-	map_mem();
-	fixup_executable();
+	phys_addr_t pgd_phys = early_alloc();
+	pgd_t *pgd = pgd_fixmap(pgd_phys);
+
+	map_kernel(pgd);
+	map_mem(pgd);
+
+	/*
+	 * HACK: ensure that we use the original swapper_pg_dir pgd so that:
+	 * - secondaries get the right stack in secondary_entry
+	 * - cpu_switch_mm can validate the pgd handed to it
+	 */
+	cpu_replace_ttbr1(__va(pgd_phys));
+	memcpy(swapper_pg_dir, pgd, PAGE_SIZE);
+	cpu_replace_ttbr1(swapper_pg_dir);
+
+	/*
+	 * TODO: this leaves the swapper_pgdir pud & pmd unused but not free.
+	 * It would be better if we could avoid the hack above and free the
+	 * entire swapper_pg_dir region in one go (e.g. by placing it in
+	 * .init).
+	 */
+	pgd_fixmap_unmap();
+	memblock_free(pgd_phys, PAGE_SIZE);
 
 	bootmem_init();
 }
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC PATCH 04/20] arm64: mm: assume PAGE SIZE for page table allocator
  2015-12-09 12:44 ` [RFC PATCH 04/20] arm64: mm: assume PAGE SIZE for page table allocator Mark Rutland
@ 2015-12-10 14:08   ` Will Deacon
  2015-12-10 14:23     ` Mark Rutland
  0 siblings, 1 reply; 28+ messages in thread
From: Will Deacon @ 2015-12-10 14:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 09, 2015 at 12:44:39PM +0000, Mark Rutland wrote:
> We pass a size parameter to early_alloc and late_alloc, but these are
> only ever used to allocate single pages. In late_alloc we always
> allocate a single page.
> 
> Remove the redundant size parameter.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jeremy Linton <jeremy.linton@arm.com>
> Cc: Laura Abbott <labbott@fedoraproject.org>
> Cc: Will Deacon <will.deacon@arm.com>
> ---
>  arch/arm64/mm/mmu.c | 27 ++++++++++++---------------
>  1 file changed, 12 insertions(+), 15 deletions(-)

Looks sensible to me. Cosmetic nit, but we could we rename these to
early_page_alloc/late_page_alloc instead, please?

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 06/20] arm64: mm: place empty_zero_page in bss
  2015-12-09 12:44 ` [RFC PATCH 06/20] arm64: mm: place empty_zero_page in bss Mark Rutland
@ 2015-12-10 14:11   ` Will Deacon
  2015-12-10 15:29     ` Mark Rutland
  0 siblings, 1 reply; 28+ messages in thread
From: Will Deacon @ 2015-12-10 14:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Dec 09, 2015 at 12:44:41PM +0000, Mark Rutland wrote:
> Currently the zero page is set up in paging_init, and thus we cannot use
> the zero page earlier. We use the zero page as a reserved TTBR value
> from which no TLB entries may be allocated (e.g. when uninstalling the
> idmap). To enable such usage earlier (as may be required for invasive
> changes to the kernel page tables), and to minimise the time that the
> idmap is active, we need to be able to use the zero page before
> paging_init.
> 
> This patch follows the example set by x86, by allocating the zero page
> at compile time, in .bss. This means that the zero page itself is
> available immediately upon entry to start_kernel (as we zero .bss before
> this), and also means that the zero page takes up no space in the raw
> Image binary. The associated struct page is allocated in bootmem_init,
> and remains unavailable until this time.
> 
> Outside of arch code, the only users of empty_zero_page assume that the
> empty_zero_page symbol refers to the zeroed memory itself, and that
> ZERO_PAGE(x) must be used to acquire the associated struct page,
> following the example of x86. This patch also brings arm64 inline with
> these assumptions.
> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Jeremy Linton <jeremy.linton@arm.com>
> Cc: Laura Abbott <labbott@fedoraproject.org>
> Cc: Will Deacon <will.deacon@arm.com>
> ---
>  arch/arm64/include/asm/mmu_context.h | 2 +-
>  arch/arm64/include/asm/pgtable.h     | 4 ++--
>  arch/arm64/mm/mmu.c                  | 9 +--------
>  3 files changed, 4 insertions(+), 11 deletions(-)

[...]

> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 304ff23..7559c22 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -48,7 +48,7 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>   * Empty_zero_page is a special page that is used for zero-initialized data
>   * and COW.
>   */
> -struct page *empty_zero_page;
> +unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
>  EXPORT_SYMBOL(empty_zero_page);

I've been looking at this, and it was making me feel uneasy because it's
full of junk before the bss is zeroed. Working that through, it's no
worse than what we currently have but I then realised that (a) we don't
have a dsb after zeroing the zero page (which we need to make sure the
zeroes are visible to the page table walker and (b) the zero page is
never explicitly cleaned to the PoC.

There may be cases where the zero-page is used to back read-only,
non-cacheable mappings (something to do with KVM?), so I'd sleep better
if we made sure that it was clean.

Thoughts?

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 04/20] arm64: mm: assume PAGE SIZE for page table allocator
  2015-12-10 14:08   ` Will Deacon
@ 2015-12-10 14:23     ` Mark Rutland
  0 siblings, 0 replies; 28+ messages in thread
From: Mark Rutland @ 2015-12-10 14:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 10, 2015 at 02:08:40PM +0000, Will Deacon wrote:
> On Wed, Dec 09, 2015 at 12:44:39PM +0000, Mark Rutland wrote:
> > We pass a size parameter to early_alloc and late_alloc, but these are
> > only ever used to allocate single pages. In late_alloc we always
> > allocate a single page.
> > 
> > Remove the redundant size parameter.
> > 
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Jeremy Linton <jeremy.linton@arm.com>
> > Cc: Laura Abbott <labbott@fedoraproject.org>
> > Cc: Will Deacon <will.deacon@arm.com>
> > ---
> >  arch/arm64/mm/mmu.c | 27 ++++++++++++---------------
> >  1 file changed, 12 insertions(+), 15 deletions(-)
> 
> Looks sensible to me. Cosmetic nit, but we could we rename these to
> early_page_alloc/late_page_alloc instead, please?

Sure. I'll also s/alloc/page_alloc/ for the function pointer. Luckily
this doesn't clash with the usual alloc_page function.

I'll also fix up the zero page init, as I forgot to remove the PAGE_SIZE
parameter in that early_alloc() call there.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 06/20] arm64: mm: place empty_zero_page in bss
  2015-12-10 14:11   ` Will Deacon
@ 2015-12-10 15:29     ` Mark Rutland
  2015-12-10 15:40       ` Marc Zyngier
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2015-12-10 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 10, 2015 at 02:11:08PM +0000, Will Deacon wrote:
> On Wed, Dec 09, 2015 at 12:44:41PM +0000, Mark Rutland wrote:
> > Currently the zero page is set up in paging_init, and thus we cannot use
> > the zero page earlier. We use the zero page as a reserved TTBR value
> > from which no TLB entries may be allocated (e.g. when uninstalling the
> > idmap). To enable such usage earlier (as may be required for invasive
> > changes to the kernel page tables), and to minimise the time that the
> > idmap is active, we need to be able to use the zero page before
> > paging_init.
> > 
> > This patch follows the example set by x86, by allocating the zero page
> > at compile time, in .bss. This means that the zero page itself is
> > available immediately upon entry to start_kernel (as we zero .bss before
> > this), and also means that the zero page takes up no space in the raw
> > Image binary. The associated struct page is allocated in bootmem_init,
> > and remains unavailable until this time.
> > 
> > Outside of arch code, the only users of empty_zero_page assume that the
> > empty_zero_page symbol refers to the zeroed memory itself, and that
> > ZERO_PAGE(x) must be used to acquire the associated struct page,
> > following the example of x86. This patch also brings arm64 inline with
> > these assumptions.
> > 
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Jeremy Linton <jeremy.linton@arm.com>
> > Cc: Laura Abbott <labbott@fedoraproject.org>
> > Cc: Will Deacon <will.deacon@arm.com>
> > ---
> >  arch/arm64/include/asm/mmu_context.h | 2 +-
> >  arch/arm64/include/asm/pgtable.h     | 4 ++--
> >  arch/arm64/mm/mmu.c                  | 9 +--------
> >  3 files changed, 4 insertions(+), 11 deletions(-)
> 
> [...]
> 
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index 304ff23..7559c22 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -48,7 +48,7 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
> >   * Empty_zero_page is a special page that is used for zero-initialized data
> >   * and COW.
> >   */
> > -struct page *empty_zero_page;
> > +unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
> >  EXPORT_SYMBOL(empty_zero_page);
> 
> I've been looking at this, and it was making me feel uneasy because it's
> full of junk before the bss is zeroed. Working that through, it's no
> worse than what we currently have but I then realised that (a) we don't
> have a dsb after zeroing the zero page (which we need to make sure the
> zeroes are visible to the page table walker and (b) the zero page is
> never explicitly cleaned to the PoC.

Ouch; that's scary.

> There may be cases where the zero-page is used to back read-only,
> non-cacheable mappings (something to do with KVM?), so I'd sleep better
> if we made sure that it was clean.

>From a grep around for uses of ZERO_PAGE, in most places the zero page
is simply used as an empty buffer for I/O. In these cases it's either
accessed coherently or goes via the usual machinery for non-coherent DMA
kicks in.

I don't believe that we usually give userspace the ability to create
non-cacheable mappings, and I couldn't spot any paths it could do so via
some driver-specific IOCTL applied to the zero page.

Looking around, kvm_clear_guest_page seemed problematic, but isn't used
on arm64. I can imagine the zero page being mapped into guests in other
situations when mirroring the userspace mapping. 

Marc, Christoffer, I thought we cleaned pages to the PoC before mapping
them into a guest? Is that right? Or do we have potential issues there?

> Thoughts?

I supect that other than the missing barrier, we're fine for the
timebeing.

We should figure out what other architectures do. If drivers cannot
assume that the zero page is accessible by non-cacheable accesses I'm
not sure wehther we should clean it (though I agree this is the simplest
thing to do).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 06/20] arm64: mm: place empty_zero_page in bss
  2015-12-10 15:29     ` Mark Rutland
@ 2015-12-10 15:40       ` Marc Zyngier
  2015-12-10 15:51         ` Mark Rutland
  0 siblings, 1 reply; 28+ messages in thread
From: Marc Zyngier @ 2015-12-10 15:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/12/15 15:29, Mark Rutland wrote:
> On Thu, Dec 10, 2015 at 02:11:08PM +0000, Will Deacon wrote:
>> On Wed, Dec 09, 2015 at 12:44:41PM +0000, Mark Rutland wrote:
>>> Currently the zero page is set up in paging_init, and thus we cannot use
>>> the zero page earlier. We use the zero page as a reserved TTBR value
>>> from which no TLB entries may be allocated (e.g. when uninstalling the
>>> idmap). To enable such usage earlier (as may be required for invasive
>>> changes to the kernel page tables), and to minimise the time that the
>>> idmap is active, we need to be able to use the zero page before
>>> paging_init.
>>>
>>> This patch follows the example set by x86, by allocating the zero page
>>> at compile time, in .bss. This means that the zero page itself is
>>> available immediately upon entry to start_kernel (as we zero .bss before
>>> this), and also means that the zero page takes up no space in the raw
>>> Image binary. The associated struct page is allocated in bootmem_init,
>>> and remains unavailable until this time.
>>>
>>> Outside of arch code, the only users of empty_zero_page assume that the
>>> empty_zero_page symbol refers to the zeroed memory itself, and that
>>> ZERO_PAGE(x) must be used to acquire the associated struct page,
>>> following the example of x86. This patch also brings arm64 inline with
>>> these assumptions.
>>>
>>> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
>>> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>> Cc: Jeremy Linton <jeremy.linton@arm.com>
>>> Cc: Laura Abbott <labbott@fedoraproject.org>
>>> Cc: Will Deacon <will.deacon@arm.com>
>>> ---
>>>  arch/arm64/include/asm/mmu_context.h | 2 +-
>>>  arch/arm64/include/asm/pgtable.h     | 4 ++--
>>>  arch/arm64/mm/mmu.c                  | 9 +--------
>>>  3 files changed, 4 insertions(+), 11 deletions(-)
>>
>> [...]
>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 304ff23..7559c22 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -48,7 +48,7 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>>>   * Empty_zero_page is a special page that is used for zero-initialized data
>>>   * and COW.
>>>   */
>>> -struct page *empty_zero_page;
>>> +unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
>>>  EXPORT_SYMBOL(empty_zero_page);
>>
>> I've been looking at this, and it was making me feel uneasy because it's
>> full of junk before the bss is zeroed. Working that through, it's no
>> worse than what we currently have but I then realised that (a) we don't
>> have a dsb after zeroing the zero page (which we need to make sure the
>> zeroes are visible to the page table walker and (b) the zero page is
>> never explicitly cleaned to the PoC.
> 
> Ouch; that's scary.
> 
>> There may be cases where the zero-page is used to back read-only,
>> non-cacheable mappings (something to do with KVM?), so I'd sleep better
>> if we made sure that it was clean.
> 
> From a grep around for uses of ZERO_PAGE, in most places the zero page
> is simply used as an empty buffer for I/O. In these cases it's either
> accessed coherently or goes via the usual machinery for non-coherent DMA
> kicks in.
> 
> I don't believe that we usually give userspace the ability to create
> non-cacheable mappings, and I couldn't spot any paths it could do so via
> some driver-specific IOCTL applied to the zero page.
> 
> Looking around, kvm_clear_guest_page seemed problematic, but isn't used
> on arm64. I can imagine the zero page being mapped into guests in other
> situations when mirroring the userspace mapping. 
> 
> Marc, Christoffer, I thought we cleaned pages to the PoC before mapping
> them into a guest? Is that right? Or do we have potential issues there?

I think we're OK. Looking at __coherent_cache_guest_page (which is
called when transitioning from an invalid to valid mapping), we do flush
things to PoC if the vcpu has its cache disabled (or if we know that the
IPA shouldn't be cached - the whole NOR flash emulation horror story).

Does it answer your question?

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 06/20] arm64: mm: place empty_zero_page in bss
  2015-12-10 15:40       ` Marc Zyngier
@ 2015-12-10 15:51         ` Mark Rutland
  2015-12-10 16:01           ` Marc Zyngier
  0 siblings, 1 reply; 28+ messages in thread
From: Mark Rutland @ 2015-12-10 15:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 10, 2015 at 03:40:08PM +0000, Marc Zyngier wrote:
> On 10/12/15 15:29, Mark Rutland wrote:
> > On Thu, Dec 10, 2015 at 02:11:08PM +0000, Will Deacon wrote:
> >> On Wed, Dec 09, 2015 at 12:44:41PM +0000, Mark Rutland wrote:
> >>> Currently the zero page is set up in paging_init, and thus we cannot use
> >>> the zero page earlier. We use the zero page as a reserved TTBR value
> >>> from which no TLB entries may be allocated (e.g. when uninstalling the
> >>> idmap). To enable such usage earlier (as may be required for invasive
> >>> changes to the kernel page tables), and to minimise the time that the
> >>> idmap is active, we need to be able to use the zero page before
> >>> paging_init.
> >>>
> >>> This patch follows the example set by x86, by allocating the zero page
> >>> at compile time, in .bss. This means that the zero page itself is
> >>> available immediately upon entry to start_kernel (as we zero .bss before
> >>> this), and also means that the zero page takes up no space in the raw
> >>> Image binary. The associated struct page is allocated in bootmem_init,
> >>> and remains unavailable until this time.
> >>>
> >>> Outside of arch code, the only users of empty_zero_page assume that the
> >>> empty_zero_page symbol refers to the zeroed memory itself, and that
> >>> ZERO_PAGE(x) must be used to acquire the associated struct page,
> >>> following the example of x86. This patch also brings arm64 inline with
> >>> these assumptions.
> >>>
> >>> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> >>> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> >>> Cc: Catalin Marinas <catalin.marinas@arm.com>
> >>> Cc: Jeremy Linton <jeremy.linton@arm.com>
> >>> Cc: Laura Abbott <labbott@fedoraproject.org>
> >>> Cc: Will Deacon <will.deacon@arm.com>
> >>> ---
> >>>  arch/arm64/include/asm/mmu_context.h | 2 +-
> >>>  arch/arm64/include/asm/pgtable.h     | 4 ++--
> >>>  arch/arm64/mm/mmu.c                  | 9 +--------
> >>>  3 files changed, 4 insertions(+), 11 deletions(-)
> >>
> >> [...]
> >>
> >>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> >>> index 304ff23..7559c22 100644
> >>> --- a/arch/arm64/mm/mmu.c
> >>> +++ b/arch/arm64/mm/mmu.c
> >>> @@ -48,7 +48,7 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
> >>>   * Empty_zero_page is a special page that is used for zero-initialized data
> >>>   * and COW.
> >>>   */
> >>> -struct page *empty_zero_page;
> >>> +unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
> >>>  EXPORT_SYMBOL(empty_zero_page);
> >>
> >> I've been looking at this, and it was making me feel uneasy because it's
> >> full of junk before the bss is zeroed. Working that through, it's no
> >> worse than what we currently have but I then realised that (a) we don't
> >> have a dsb after zeroing the zero page (which we need to make sure the
> >> zeroes are visible to the page table walker and (b) the zero page is
> >> never explicitly cleaned to the PoC.
> > 
> > Ouch; that's scary.
> > 
> >> There may be cases where the zero-page is used to back read-only,
> >> non-cacheable mappings (something to do with KVM?), so I'd sleep better
> >> if we made sure that it was clean.
> > 
> > From a grep around for uses of ZERO_PAGE, in most places the zero page
> > is simply used as an empty buffer for I/O. In these cases it's either
> > accessed coherently or goes via the usual machinery for non-coherent DMA
> > kicks in.
> > 
> > I don't believe that we usually give userspace the ability to create
> > non-cacheable mappings, and I couldn't spot any paths it could do so via
> > some driver-specific IOCTL applied to the zero page.
> > 
> > Looking around, kvm_clear_guest_page seemed problematic, but isn't used
> > on arm64. I can imagine the zero page being mapped into guests in other
> > situations when mirroring the userspace mapping. 
> > 
> > Marc, Christoffer, I thought we cleaned pages to the PoC before mapping
> > them into a guest? Is that right? Or do we have potential issues there?
> 
> I think we're OK. Looking at __coherent_cache_guest_page (which is
> called when transitioning from an invalid to valid mapping), we do flush
> things to PoC if the vcpu has its cache disabled (or if we know that the
> IPA shouldn't be cached - the whole NOR flash emulation horror story).

So we asume the guest never disables the MMU, and always uses consistent
attributes for a given IPA (e.g. it doesn't have a Device and Normal
Cacheable mapping)?

> Does it answer your question?

I think so. If those assumptions are true then I agree we're ok. If
those aren't we have other problems.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [RFC PATCH 06/20] arm64: mm: place empty_zero_page in bss
  2015-12-10 15:51         ` Mark Rutland
@ 2015-12-10 16:01           ` Marc Zyngier
  0 siblings, 0 replies; 28+ messages in thread
From: Marc Zyngier @ 2015-12-10 16:01 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/12/15 15:51, Mark Rutland wrote:
> On Thu, Dec 10, 2015 at 03:40:08PM +0000, Marc Zyngier wrote:
>> On 10/12/15 15:29, Mark Rutland wrote:
>>> On Thu, Dec 10, 2015 at 02:11:08PM +0000, Will Deacon wrote:
>>>> On Wed, Dec 09, 2015 at 12:44:41PM +0000, Mark Rutland wrote:
>>>>> Currently the zero page is set up in paging_init, and thus we cannot use
>>>>> the zero page earlier. We use the zero page as a reserved TTBR value
>>>>> from which no TLB entries may be allocated (e.g. when uninstalling the
>>>>> idmap). To enable such usage earlier (as may be required for invasive
>>>>> changes to the kernel page tables), and to minimise the time that the
>>>>> idmap is active, we need to be able to use the zero page before
>>>>> paging_init.
>>>>>
>>>>> This patch follows the example set by x86, by allocating the zero page
>>>>> at compile time, in .bss. This means that the zero page itself is
>>>>> available immediately upon entry to start_kernel (as we zero .bss before
>>>>> this), and also means that the zero page takes up no space in the raw
>>>>> Image binary. The associated struct page is allocated in bootmem_init,
>>>>> and remains unavailable until this time.
>>>>>
>>>>> Outside of arch code, the only users of empty_zero_page assume that the
>>>>> empty_zero_page symbol refers to the zeroed memory itself, and that
>>>>> ZERO_PAGE(x) must be used to acquire the associated struct page,
>>>>> following the example of x86. This patch also brings arm64 inline with
>>>>> these assumptions.
>>>>>
>>>>> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
>>>>> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>>>>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>>>>> Cc: Jeremy Linton <jeremy.linton@arm.com>
>>>>> Cc: Laura Abbott <labbott@fedoraproject.org>
>>>>> Cc: Will Deacon <will.deacon@arm.com>
>>>>> ---
>>>>>  arch/arm64/include/asm/mmu_context.h | 2 +-
>>>>>  arch/arm64/include/asm/pgtable.h     | 4 ++--
>>>>>  arch/arm64/mm/mmu.c                  | 9 +--------
>>>>>  3 files changed, 4 insertions(+), 11 deletions(-)
>>>>
>>>> [...]
>>>>
>>>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>>>> index 304ff23..7559c22 100644
>>>>> --- a/arch/arm64/mm/mmu.c
>>>>> +++ b/arch/arm64/mm/mmu.c
>>>>> @@ -48,7 +48,7 @@ u64 idmap_t0sz = TCR_T0SZ(VA_BITS);
>>>>>   * Empty_zero_page is a special page that is used for zero-initialized data
>>>>>   * and COW.
>>>>>   */
>>>>> -struct page *empty_zero_page;
>>>>> +unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)] __page_aligned_bss;
>>>>>  EXPORT_SYMBOL(empty_zero_page);
>>>>
>>>> I've been looking at this, and it was making me feel uneasy because it's
>>>> full of junk before the bss is zeroed. Working that through, it's no
>>>> worse than what we currently have but I then realised that (a) we don't
>>>> have a dsb after zeroing the zero page (which we need to make sure the
>>>> zeroes are visible to the page table walker and (b) the zero page is
>>>> never explicitly cleaned to the PoC.
>>>
>>> Ouch; that's scary.
>>>
>>>> There may be cases where the zero-page is used to back read-only,
>>>> non-cacheable mappings (something to do with KVM?), so I'd sleep better
>>>> if we made sure that it was clean.
>>>
>>> From a grep around for uses of ZERO_PAGE, in most places the zero page
>>> is simply used as an empty buffer for I/O. In these cases it's either
>>> accessed coherently or goes via the usual machinery for non-coherent DMA
>>> kicks in.
>>>
>>> I don't believe that we usually give userspace the ability to create
>>> non-cacheable mappings, and I couldn't spot any paths it could do so via
>>> some driver-specific IOCTL applied to the zero page.
>>>
>>> Looking around, kvm_clear_guest_page seemed problematic, but isn't used
>>> on arm64. I can imagine the zero page being mapped into guests in other
>>> situations when mirroring the userspace mapping. 
>>>
>>> Marc, Christoffer, I thought we cleaned pages to the PoC before mapping
>>> them into a guest? Is that right? Or do we have potential issues there?
>>
>> I think we're OK. Looking at __coherent_cache_guest_page (which is
>> called when transitioning from an invalid to valid mapping), we do flush
>> things to PoC if the vcpu has its cache disabled (or if we know that the
>> IPA shouldn't be cached - the whole NOR flash emulation horror story).
> 
> So we asume the guest never disables the MMU, and always uses consistent
> attributes for a given IPA (e.g. it doesn't have a Device and Normal
> Cacheable mapping)?

Yup. If it starts using stupid attributes, it will get stupid results,
and there isn't much the architecture gives us to deal with this.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2015-12-10 16:01 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-09 12:44 [RFC PATCH 00/20] arm64: mm: rework page table creation Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 01/20] arm64: mm: remove pointless PAGE_MASKing Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 02/20] arm64: Remove redundant padding from linker script Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 03/20] arm64: mm: fold alternatives into .init Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 04/20] arm64: mm: assume PAGE SIZE for page table allocator Mark Rutland
2015-12-10 14:08   ` Will Deacon
2015-12-10 14:23     ` Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 05/20] asm-generic: make __set_fixmap_offset a static inline Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 06/20] arm64: mm: place empty_zero_page in bss Mark Rutland
2015-12-10 14:11   ` Will Deacon
2015-12-10 15:29     ` Mark Rutland
2015-12-10 15:40       ` Marc Zyngier
2015-12-10 15:51         ` Mark Rutland
2015-12-10 16:01           ` Marc Zyngier
2015-12-09 12:44 ` [RFC PATCH 07/20] arm64: unify idmap removal Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 08/20] arm64: unmap idmap earlier Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 09/20] arm64: add function to install the idmap Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 10/20] arm64: mm: add code to safely replace TTBR1_EL1 Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 11/20] arm64: mm: move pte_* macros Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 12/20] arm64: mm: add functions to walk page tables by PA Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 13/20] arm64: mm: avoid redundant __pa(__va(x)) Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 14/20] arm64: mm: add __{pud,pgd}_populate Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 15/20] arm64: mm: add functions to walk tables in fixmap Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 16/20] arm64: mm: use fixmap when creating page tables Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 17/20] arm64: mm: allocate pagetables anywhere Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 18/20] arm64: mm: allow passing a pgdir to alloc_init_* Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 19/20] arm64: ensure _stext and _etext are page-aligned Mark Rutland
2015-12-09 12:44 ` [RFC PATCH 20/20] arm64: mm: create new fine-grained mappings at boot Mark Rutland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.