linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 0/3] ARCH_SUPPORTS_DEBUG_PAGEALLOC for arm64
@ 2016-01-29 23:46 Laura Abbott
  2016-01-29 23:46 ` [PATCHv2 1/3] arm64: Drop alloc function from create_mapping Laura Abbott
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Laura Abbott @ 2016-01-29 23:46 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel
  Cc: Laura Abbott, linux-arm-kernel, linux-kernel

Hi,

This is v2 of the series to add proper debug pagealloc support for arm64.

Changes since v1:
- Dropped the config for forcing pages and changed it to runtime detection.
  This now matches better with how x86 and other arches are being updated.
  That patch is now dropped completely and folded in to the actual patch
  for ARCH_SUPPORTS_DEBUG_PAGEALLOC
- Took Mark's suggestion to rename create_mapping -> create_mapping_noalloc
- The only way to get all PAGE_SIZE mappings right now is to enable
  debug_pagealloc. This is the only actual use case right now. It is fairly
  simple to add the code to let this happen for other uses. If that
  time comes someone can plead their case to the court of ARM to accept such
  a patch.

This series now depends on the patch to let debug_pagealloc_enabled()
be used when !CONFIG_DEBUG_PAGEALLOC
(http://article.gmane.org/gmane.linux.kernel.mm/145208)
I can either take acks and submit this to the MM tree or we can wait for that
patch to show up in something I can rebase to.

Thanks,
Laura

Laura Abbott (3):
  arm64: Drop alloc function from create_mapping
  arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
  arm64: ptdump: Indicate whether memory should be faulting

 arch/arm64/Kconfig       |  3 +++
 arch/arm64/mm/dump.c     |  5 +++++
 arch/arm64/mm/mmu.c      | 37 +++++++++++++++++++++++++------------
 arch/arm64/mm/pageattr.c | 40 +++++++++++++++++++++++++++++++++-------
 4 files changed, 66 insertions(+), 19 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCHv2 1/3] arm64: Drop alloc function from create_mapping
  2016-01-29 23:46 [PATCHv2 0/3] ARCH_SUPPORTS_DEBUG_PAGEALLOC for arm64 Laura Abbott
@ 2016-01-29 23:46 ` Laura Abbott
  2016-01-30 10:34   ` Ard Biesheuvel
  2016-01-29 23:46 ` [PATCHv2 2/3] arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC Laura Abbott
  2016-01-29 23:46 ` [PATCHv2 3/3] arm64: ptdump: Indicate whether memory should be faulting Laura Abbott
  2 siblings, 1 reply; 12+ messages in thread
From: Laura Abbott @ 2016-01-29 23:46 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel
  Cc: Laura Abbott, linux-arm-kernel, linux-kernel


create_mapping is only used in fixmap_remap_fdt. All the create_mapping
calls need to happen on existing translation table pages without
additional allocations. Rather than have an alloc function be called
and fail, just set it to NULL and catch it's use. Also change
the name to create_mapping_noalloc to better capture what exactly is
going on.

Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
---
 arch/arm64/mm/mmu.c | 27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 7711554..103ebc0 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -116,7 +116,9 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
 	pte_t *pte;
 
 	if (pmd_none(*pmd) || pmd_sect(*pmd)) {
-		phys_addr_t pte_phys = pgtable_alloc();
+		phys_addr_t pte_phys;
+		BUG_ON(!pgtable_alloc);
+		pte_phys = pgtable_alloc();
 		pte = pte_set_fixmap(pte_phys);
 		if (pmd_sect(*pmd))
 			split_pmd(pmd, pte);
@@ -158,7 +160,9 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 	 * Check for initial section mappings in the pgd/pud and remove them.
 	 */
 	if (pud_none(*pud) || pud_sect(*pud)) {
-		phys_addr_t pmd_phys = pgtable_alloc();
+		phys_addr_t pmd_phys;
+		BUG_ON(!pgtable_alloc);
+		pmd_phys = pgtable_alloc();
 		pmd = pmd_set_fixmap(pmd_phys);
 		if (pud_sect(*pud)) {
 			/*
@@ -223,7 +227,9 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 	unsigned long next;
 
 	if (pgd_none(*pgd)) {
-		phys_addr_t pud_phys = pgtable_alloc();
+		phys_addr_t pud_phys;
+		BUG_ON(!pgtable_alloc);
+		pud_phys = pgtable_alloc();
 		__pgd_populate(pgd, pud_phys, PUD_TYPE_TABLE);
 	}
 	BUG_ON(pgd_bad(*pgd));
@@ -312,7 +318,10 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
 	init_pgd(pgd_offset_raw(pgdir, virt), phys, virt, size, prot, alloc);
 }
 
-static void __init create_mapping(phys_addr_t phys, unsigned long virt,
+/*
+ * This function is for mapping using existing sections only.
+ */
+static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
 				  phys_addr_t size, pgprot_t prot)
 {
 	if (virt < VMALLOC_START) {
@@ -321,7 +330,7 @@ static void __init create_mapping(phys_addr_t phys, unsigned long virt,
 		return;
 	}
 	__create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
-			     early_pgtable_alloc);
+			     NULL);
 }
 
 void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
@@ -680,7 +689,7 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys)
 	/*
 	 * Make sure that the FDT region can be mapped without the need to
 	 * allocate additional translation table pages, so that it is safe
-	 * to call create_mapping() this early.
+	 * to call create_mapping_noalloc() this early.
 	 *
 	 * On 64k pages, the FDT will be mapped using PTEs, so we need to
 	 * be in the same PMD as the rest of the fixmap.
@@ -696,8 +705,8 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys)
 	dt_virt = (void *)dt_virt_base + offset;
 
 	/* map the first chunk so we can read the size from the header */
-	create_mapping(round_down(dt_phys, SWAPPER_BLOCK_SIZE), dt_virt_base,
-		       SWAPPER_BLOCK_SIZE, prot);
+	create_mapping_noalloc(round_down(dt_phys, SWAPPER_BLOCK_SIZE),
+			dt_virt_base, SWAPPER_BLOCK_SIZE, prot);
 
 	if (fdt_check_header(dt_virt) != 0)
 		return NULL;
@@ -707,7 +716,7 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys)
 		return NULL;
 
 	if (offset + size > SWAPPER_BLOCK_SIZE)
-		create_mapping(round_down(dt_phys, SWAPPER_BLOCK_SIZE), dt_virt_base,
+		create_mapping_noalloc(round_down(dt_phys, SWAPPER_BLOCK_SIZE), dt_virt_base,
 			       round_up(offset + size, SWAPPER_BLOCK_SIZE), prot);
 
 	memblock_reserve(dt_phys, size);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv2 2/3] arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
  2016-01-29 23:46 [PATCHv2 0/3] ARCH_SUPPORTS_DEBUG_PAGEALLOC for arm64 Laura Abbott
  2016-01-29 23:46 ` [PATCHv2 1/3] arm64: Drop alloc function from create_mapping Laura Abbott
@ 2016-01-29 23:46 ` Laura Abbott
  2016-02-01 12:29   ` Mark Rutland
  2016-01-29 23:46 ` [PATCHv2 3/3] arm64: ptdump: Indicate whether memory should be faulting Laura Abbott
  2 siblings, 1 reply; 12+ messages in thread
From: Laura Abbott @ 2016-01-29 23:46 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel
  Cc: Laura Abbott, linux-arm-kernel, linux-kernel


ARCH_SUPPORTS_DEBUG_PAGEALLOC provides a hook to map and unmap
pages for debugging purposes. This requires memory be mapped
with PAGE_SIZE mappings since breaking down larger mappings
at runtime will lead to TLB conflicts. Check if debug_pagealloc
is enabled at runtime and if so, map everyting with PAGE_SIZE
pages. Implement the functions to actually map/unmap the
pages at runtime.


Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
---
 arch/arm64/Kconfig       |  3 +++
 arch/arm64/mm/mmu.c      | 10 +++++++---
 arch/arm64/mm/pageattr.c | 40 +++++++++++++++++++++++++++++++++-------
 3 files changed, 43 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 8cc6228..0f33218 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -537,6 +537,9 @@ config HOTPLUG_CPU
 source kernel/Kconfig.preempt
 source kernel/Kconfig.hz
 
+config ARCH_SUPPORTS_DEBUG_PAGEALLOC
+	def_bool y
+
 config ARCH_HAS_HOLES_MEMORYMODEL
 	def_bool y if SPARSEMEM
 
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 103ebc0..5abbd30 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -181,7 +181,8 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 	do {
 		next = pmd_addr_end(addr, end);
 		/* try section mapping first */
-		if (((addr | next | phys) & ~SECTION_MASK) == 0) {
+		if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
+		      (!debug_pagealloc_enabled() || !pgtable_alloc)) {
 			pmd_t old_pmd =*pmd;
 			set_pmd(pmd, __pmd(phys |
 					   pgprot_val(mk_sect_prot(prot))));
@@ -208,8 +209,11 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
 }
 
 static inline bool use_1G_block(unsigned long addr, unsigned long next,
-			unsigned long phys)
+			unsigned long phys, phys_addr_t (*pgtable_alloc)(void))
 {
+	if (pgtable_alloc && debug_pagealloc_enabled())
+		return false;
+
 	if (PAGE_SHIFT != 12)
 		return false;
 
@@ -241,7 +245,7 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
 		/*
 		 * For 4K granule only, attempt to put down a 1GB block
 		 */
-		if (use_1G_block(addr, next, phys)) {
+		if (use_1G_block(addr, next, phys, pgtable_alloc)) {
 			pud_t old_pud = *pud;
 			set_pud(pud, __pud(phys |
 					   pgprot_val(mk_sect_prot(prot))));
diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
index 1360a02..69a8a7d 100644
--- a/arch/arm64/mm/pageattr.c
+++ b/arch/arm64/mm/pageattr.c
@@ -38,7 +38,8 @@ static int change_page_range(pte_t *ptep, pgtable_t token, unsigned long addr,
 }
 
 static int change_memory_common(unsigned long addr, int numpages,
-				pgprot_t set_mask, pgprot_t clear_mask)
+				pgprot_t set_mask, pgprot_t clear_mask,
+				bool ignore_vma_check)
 {
 	unsigned long start = addr;
 	unsigned long size = PAGE_SIZE*numpages;
@@ -65,11 +66,14 @@ static int change_memory_common(unsigned long addr, int numpages,
 	 *
 	 * So check whether the [addr, addr + size) interval is entirely
 	 * covered by precisely one VM area that has the VM_ALLOC flag set.
+	 *
+	 * The one exception to this is is we're forcing everything to be
+	 * page mapped with DEBUG_PAGEALLOC
 	 */
 	area = find_vm_area((void *)addr);
-	if (!area ||
+	if (!ignore_vma_check && (!area ||
 	    end > (unsigned long)area->addr + area->size ||
-	    !(area->flags & VM_ALLOC))
+	    !(area->flags & VM_ALLOC)))
 		return -EINVAL;
 
 	data.set_mask = set_mask;
@@ -86,21 +90,24 @@ int set_memory_ro(unsigned long addr, int numpages)
 {
 	return change_memory_common(addr, numpages,
 					__pgprot(PTE_RDONLY),
-					__pgprot(PTE_WRITE));
+					__pgprot(PTE_WRITE),
+					false);
 }
 
 int set_memory_rw(unsigned long addr, int numpages)
 {
 	return change_memory_common(addr, numpages,
 					__pgprot(PTE_WRITE),
-					__pgprot(PTE_RDONLY));
+					__pgprot(PTE_RDONLY),
+					false);
 }
 
 int set_memory_nx(unsigned long addr, int numpages)
 {
 	return change_memory_common(addr, numpages,
 					__pgprot(PTE_PXN),
-					__pgprot(0));
+					__pgprot(0),
+					false);
 }
 EXPORT_SYMBOL_GPL(set_memory_nx);
 
@@ -108,6 +115,25 @@ int set_memory_x(unsigned long addr, int numpages)
 {
 	return change_memory_common(addr, numpages,
 					__pgprot(0),
-					__pgprot(PTE_PXN));
+					__pgprot(PTE_PXN),
+					false);
 }
 EXPORT_SYMBOL_GPL(set_memory_x);
+
+#ifdef CONFIG_DEBUG_PAGEALLOC
+void __kernel_map_pages(struct page *page, int numpages, int enable)
+{
+	unsigned long addr = (unsigned long) page_address(page);
+
+	if (enable)
+		change_memory_common(addr, numpages,
+					__pgprot(PTE_VALID),
+					__pgprot(0),
+					true);
+	else
+		change_memory_common(addr, numpages,
+					__pgprot(0),
+					__pgprot(PTE_VALID),
+					true);
+}
+#endif
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCHv2 3/3] arm64: ptdump: Indicate whether memory should be faulting
  2016-01-29 23:46 [PATCHv2 0/3] ARCH_SUPPORTS_DEBUG_PAGEALLOC for arm64 Laura Abbott
  2016-01-29 23:46 ` [PATCHv2 1/3] arm64: Drop alloc function from create_mapping Laura Abbott
  2016-01-29 23:46 ` [PATCHv2 2/3] arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC Laura Abbott
@ 2016-01-29 23:46 ` Laura Abbott
  2 siblings, 0 replies; 12+ messages in thread
From: Laura Abbott @ 2016-01-29 23:46 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Mark Rutland, Ard Biesheuvel
  Cc: Laura Abbott, linux-arm-kernel, linux-kernel

With CONFIG_DEBUG_PAGEALLOC, pages do not have the valid bit
set when free in the buddy allocator. Add an indiciation to
the page table dumping code that the valid bit is not set,
'F' for fault, to make this easier to understand.

Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
---
 arch/arm64/mm/dump.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 5a22a11..f381ac9 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -90,6 +90,11 @@ struct prot_bits {
 
 static const struct prot_bits pte_bits[] = {
 	{
+		.mask	= PTE_VALID,
+		.val	= PTE_VALID,
+		.set	= " ",
+		.clear	= "F",
+	}, {
 		.mask	= PTE_USER,
 		.val	= PTE_USER,
 		.set	= "USR",
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 1/3] arm64: Drop alloc function from create_mapping
  2016-01-29 23:46 ` [PATCHv2 1/3] arm64: Drop alloc function from create_mapping Laura Abbott
@ 2016-01-30 10:34   ` Ard Biesheuvel
  2016-02-01 12:11     ` Mark Rutland
  0 siblings, 1 reply; 12+ messages in thread
From: Ard Biesheuvel @ 2016-01-30 10:34 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Catalin Marinas, Will Deacon, Mark Rutland, linux-arm-kernel,
	linux-kernel

On 30 January 2016 at 00:46, Laura Abbott <labbott@fedoraproject.org> wrote:
>
> create_mapping is only used in fixmap_remap_fdt. All the create_mapping
> calls need to happen on existing translation table pages without
> additional allocations. Rather than have an alloc function be called
> and fail, just set it to NULL and catch it's use. Also change

s/it's/its/

> the name to create_mapping_noalloc to better capture what exactly is
> going on.
>
> Signed-off-by: Laura Abbott <labbott@fedoraproject.org>

With one nit below:
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

> ---
>  arch/arm64/mm/mmu.c | 27 ++++++++++++++++++---------
>  1 file changed, 18 insertions(+), 9 deletions(-)
>
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 7711554..103ebc0 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -116,7 +116,9 @@ static void alloc_init_pte(pmd_t *pmd, unsigned long addr,
>         pte_t *pte;
>
>         if (pmd_none(*pmd) || pmd_sect(*pmd)) {
> -               phys_addr_t pte_phys = pgtable_alloc();
> +               phys_addr_t pte_phys;
> +               BUG_ON(!pgtable_alloc);
> +               pte_phys = pgtable_alloc();
>                 pte = pte_set_fixmap(pte_phys);
>                 if (pmd_sect(*pmd))
>                         split_pmd(pmd, pte);
> @@ -158,7 +160,9 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>          * Check for initial section mappings in the pgd/pud and remove them.
>          */
>         if (pud_none(*pud) || pud_sect(*pud)) {
> -               phys_addr_t pmd_phys = pgtable_alloc();
> +               phys_addr_t pmd_phys;
> +               BUG_ON(!pgtable_alloc);
> +               pmd_phys = pgtable_alloc();
>                 pmd = pmd_set_fixmap(pmd_phys);
>                 if (pud_sect(*pud)) {
>                         /*
> @@ -223,7 +227,9 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
>         unsigned long next;
>
>         if (pgd_none(*pgd)) {
> -               phys_addr_t pud_phys = pgtable_alloc();
> +               phys_addr_t pud_phys;
> +               BUG_ON(!pgtable_alloc);
> +               pud_phys = pgtable_alloc();
>                 __pgd_populate(pgd, pud_phys, PUD_TYPE_TABLE);
>         }
>         BUG_ON(pgd_bad(*pgd));
> @@ -312,7 +318,10 @@ static void __create_pgd_mapping(pgd_t *pgdir, phys_addr_t phys,
>         init_pgd(pgd_offset_raw(pgdir, virt), phys, virt, size, prot, alloc);
>  }
>
> -static void __init create_mapping(phys_addr_t phys, unsigned long virt,
> +/*
> + * This function is for mapping using existing sections only.

Could you improve this comment? 'existing sections' does not quite
cover what we expect to deal with imo
> + */
> +static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
>                                   phys_addr_t size, pgprot_t prot)
>  {
>         if (virt < VMALLOC_START) {
> @@ -321,7 +330,7 @@ static void __init create_mapping(phys_addr_t phys, unsigned long virt,
>                 return;
>         }
>         __create_pgd_mapping(init_mm.pgd, phys, virt, size, prot,
> -                            early_pgtable_alloc);
> +                            NULL);
>  }
>
>  void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
> @@ -680,7 +689,7 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys)
>         /*
>          * Make sure that the FDT region can be mapped without the need to
>          * allocate additional translation table pages, so that it is safe
> -        * to call create_mapping() this early.
> +        * to call create_mapping_noalloc() this early.
>          *
>          * On 64k pages, the FDT will be mapped using PTEs, so we need to
>          * be in the same PMD as the rest of the fixmap.
> @@ -696,8 +705,8 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys)
>         dt_virt = (void *)dt_virt_base + offset;
>
>         /* map the first chunk so we can read the size from the header */
> -       create_mapping(round_down(dt_phys, SWAPPER_BLOCK_SIZE), dt_virt_base,
> -                      SWAPPER_BLOCK_SIZE, prot);
> +       create_mapping_noalloc(round_down(dt_phys, SWAPPER_BLOCK_SIZE),
> +                       dt_virt_base, SWAPPER_BLOCK_SIZE, prot);
>
>         if (fdt_check_header(dt_virt) != 0)
>                 return NULL;
> @@ -707,7 +716,7 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys)
>                 return NULL;
>
>         if (offset + size > SWAPPER_BLOCK_SIZE)
> -               create_mapping(round_down(dt_phys, SWAPPER_BLOCK_SIZE), dt_virt_base,
> +               create_mapping_noalloc(round_down(dt_phys, SWAPPER_BLOCK_SIZE), dt_virt_base,
>                                round_up(offset + size, SWAPPER_BLOCK_SIZE), prot);
>
>         memblock_reserve(dt_phys, size);
> --
> 2.5.0
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 1/3] arm64: Drop alloc function from create_mapping
  2016-01-30 10:34   ` Ard Biesheuvel
@ 2016-02-01 12:11     ` Mark Rutland
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Rutland @ 2016-02-01 12:11 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Laura Abbott, Catalin Marinas, Will Deacon, linux-arm-kernel,
	linux-kernel

On Sat, Jan 30, 2016 at 11:34:50AM +0100, Ard Biesheuvel wrote:
> On 30 January 2016 at 00:46, Laura Abbott <labbott@fedoraproject.org> wrote:
> >
> > create_mapping is only used in fixmap_remap_fdt. All the create_mapping
> > calls need to happen on existing translation table pages without
> > additional allocations. Rather than have an alloc function be called
> > and fail, just set it to NULL and catch it's use. Also change
> 
> s/it's/its/
> 
> > the name to create_mapping_noalloc to better capture what exactly is
> > going on.
> >
> > Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
> 
> With one nit below:
> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

[...]

> > +/*
> > + * This function is for mapping using existing sections only.
> 
> Could you improve this comment? 'existing sections' does not quite
> cover what we expect to deal with imo
> > + */
> > +static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
> >                                   phys_addr_t size, pgprot_t prot)

Perhaps:

/*
 * This function can only be used to modify existing table entries,
 * without allocating new levels of table. Note that this permits the
 * creation of new section or page entries.
 */

Either way:

Reviewed-by: Mark Rutland <mark.rutland@arm.com>

Thanks for putting this together!

Mark.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 2/3] arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
  2016-01-29 23:46 ` [PATCHv2 2/3] arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC Laura Abbott
@ 2016-02-01 12:29   ` Mark Rutland
  2016-02-01 21:24     ` Laura Abbott
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Rutland @ 2016-02-01 12:29 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, linux-arm-kernel,
	linux-kernel

Hi,

On Fri, Jan 29, 2016 at 03:46:57PM -0800, Laura Abbott wrote:
> 
> ARCH_SUPPORTS_DEBUG_PAGEALLOC provides a hook to map and unmap
> pages for debugging purposes. This requires memory be mapped
> with PAGE_SIZE mappings since breaking down larger mappings
> at runtime will lead to TLB conflicts. Check if debug_pagealloc
> is enabled at runtime and if so, map everyting with PAGE_SIZE
> pages. Implement the functions to actually map/unmap the
> pages at runtime.
> 
> 
> Signed-off-by: Laura Abbott <labbott@fedoraproject.org>

I tried to apply atop of the arm64 for-next/pgtable branch, but git
wasn't very happy about that -- which branch/patches is this based on?

I'm not sure if I'm missing something, have something I shouldn't, or if
my MTA is corrupting patches again...

> ---
>  arch/arm64/Kconfig       |  3 +++
>  arch/arm64/mm/mmu.c      | 10 +++++++---
>  arch/arm64/mm/pageattr.c | 40 +++++++++++++++++++++++++++++++++-------
>  3 files changed, 43 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 8cc6228..0f33218 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -537,6 +537,9 @@ config HOTPLUG_CPU
>  source kernel/Kconfig.preempt
>  source kernel/Kconfig.hz
>  
> +config ARCH_SUPPORTS_DEBUG_PAGEALLOC
> +	def_bool y
> +
>  config ARCH_HAS_HOLES_MEMORYMODEL
>  	def_bool y if SPARSEMEM
>  
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index 103ebc0..5abbd30 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -181,7 +181,8 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>  	do {
>  		next = pmd_addr_end(addr, end);
>  		/* try section mapping first */
> -		if (((addr | next | phys) & ~SECTION_MASK) == 0) {
> +		if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
> +		      (!debug_pagealloc_enabled() || !pgtable_alloc)) {
>  			pmd_t old_pmd =*pmd;
>  			set_pmd(pmd, __pmd(phys |
>  					   pgprot_val(mk_sect_prot(prot))));
> @@ -208,8 +209,11 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>  }
>  
>  static inline bool use_1G_block(unsigned long addr, unsigned long next,
> -			unsigned long phys)
> +			unsigned long phys, phys_addr_t (*pgtable_alloc)(void))
>  {
> +	if (pgtable_alloc && debug_pagealloc_enabled())
> +		return false;
> +
>  	if (PAGE_SHIFT != 12)
>  		return false;
>  
> @@ -241,7 +245,7 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
>  		/*
>  		 * For 4K granule only, attempt to put down a 1GB block
>  		 */
> -		if (use_1G_block(addr, next, phys)) {
> +		if (use_1G_block(addr, next, phys, pgtable_alloc)) {
>  			pud_t old_pud = *pud;
>  			set_pud(pud, __pud(phys |
>  					   pgprot_val(mk_sect_prot(prot))));

The checks for pgtable_alloc is to allow create_mapping_noalloc to use
sections regardless, right?

I got a little confused by that initially, but that seems to make sense.

> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> index 1360a02..69a8a7d 100644
> --- a/arch/arm64/mm/pageattr.c
> +++ b/arch/arm64/mm/pageattr.c
> @@ -38,7 +38,8 @@ static int change_page_range(pte_t *ptep, pgtable_t token, unsigned long addr,
>  }
>  
>  static int change_memory_common(unsigned long addr, int numpages,
> -				pgprot_t set_mask, pgprot_t clear_mask)
> +				pgprot_t set_mask, pgprot_t clear_mask,
> +				bool ignore_vma_check)
>  {
>  	unsigned long start = addr;
>  	unsigned long size = PAGE_SIZE*numpages;
> @@ -65,11 +66,14 @@ static int change_memory_common(unsigned long addr, int numpages,
>  	 *
>  	 * So check whether the [addr, addr + size) interval is entirely
>  	 * covered by precisely one VM area that has the VM_ALLOC flag set.
> +	 *
> +	 * The one exception to this is is we're forcing everything to be
> +	 * page mapped with DEBUG_PAGEALLOC
>  	 */
>  	area = find_vm_area((void *)addr);
> -	if (!area ||
> +	if (!ignore_vma_check && (!area ||
>  	    end > (unsigned long)area->addr + area->size ||
> -	    !(area->flags & VM_ALLOC))
> +	    !(area->flags & VM_ALLOC)))
>  		return -EINVAL;

Perhaps we could pull out everything but the VMA checks into a static
__change_memory_common. Then we can call that directly in
__kernel_map_pages, and don't need to pass a confusing boolean
parameter.

Otherwise, this looks good to me.

Mark.

>  
>  	data.set_mask = set_mask;
> @@ -86,21 +90,24 @@ int set_memory_ro(unsigned long addr, int numpages)
>  {
>  	return change_memory_common(addr, numpages,
>  					__pgprot(PTE_RDONLY),
> -					__pgprot(PTE_WRITE));
> +					__pgprot(PTE_WRITE),
> +					false);
>  }
>  
>  int set_memory_rw(unsigned long addr, int numpages)
>  {
>  	return change_memory_common(addr, numpages,
>  					__pgprot(PTE_WRITE),
> -					__pgprot(PTE_RDONLY));
> +					__pgprot(PTE_RDONLY),
> +					false);
>  }
>  
>  int set_memory_nx(unsigned long addr, int numpages)
>  {
>  	return change_memory_common(addr, numpages,
>  					__pgprot(PTE_PXN),
> -					__pgprot(0));
> +					__pgprot(0),
> +					false);
>  }
>  EXPORT_SYMBOL_GPL(set_memory_nx);
>  
> @@ -108,6 +115,25 @@ int set_memory_x(unsigned long addr, int numpages)
>  {
>  	return change_memory_common(addr, numpages,
>  					__pgprot(0),
> -					__pgprot(PTE_PXN));
> +					__pgprot(PTE_PXN),
> +					false);
>  }
>  EXPORT_SYMBOL_GPL(set_memory_x);
> +
> +#ifdef CONFIG_DEBUG_PAGEALLOC
> +void __kernel_map_pages(struct page *page, int numpages, int enable)
> +{
> +	unsigned long addr = (unsigned long) page_address(page);
> +
> +	if (enable)
> +		change_memory_common(addr, numpages,
> +					__pgprot(PTE_VALID),
> +					__pgprot(0),
> +					true);
> +	else
> +		change_memory_common(addr, numpages,
> +					__pgprot(0),
> +					__pgprot(PTE_VALID),
> +					true);
> +}
> +#endif
> -- 
> 2.5.0
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 2/3] arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
  2016-02-01 12:29   ` Mark Rutland
@ 2016-02-01 21:24     ` Laura Abbott
  2016-02-02  8:57       ` Ard Biesheuvel
  2016-02-02 12:23       ` Mark Rutland
  0 siblings, 2 replies; 12+ messages in thread
From: Laura Abbott @ 2016-02-01 21:24 UTC (permalink / raw)
  To: Mark Rutland, Laura Abbott
  Cc: Catalin Marinas, Will Deacon, Ard Biesheuvel, linux-arm-kernel,
	linux-kernel

On 02/01/2016 04:29 AM, Mark Rutland wrote:
> Hi,
>
> On Fri, Jan 29, 2016 at 03:46:57PM -0800, Laura Abbott wrote:
>>
>> ARCH_SUPPORTS_DEBUG_PAGEALLOC provides a hook to map and unmap
>> pages for debugging purposes. This requires memory be mapped
>> with PAGE_SIZE mappings since breaking down larger mappings
>> at runtime will lead to TLB conflicts. Check if debug_pagealloc
>> is enabled at runtime and if so, map everyting with PAGE_SIZE
>> pages. Implement the functions to actually map/unmap the
>> pages at runtime.
>>
>>
>> Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
>
> I tried to apply atop of the arm64 for-next/pgtable branch, but git
> wasn't very happy about that -- which branch/patches is this based on?
>
> I'm not sure if I'm missing something, have something I shouldn't, or if
> my MTA is corrupting patches again...
>

Hmmm, I based it off of your arm64-pagetable-rework-20160125 tag and
Ard's patch for vmalloc and set_memory_* . The patches seem to apply
on the for-next/pgtable branch as well so I'm guessing you are missing
Ard's patch.

  
>> ---
>>   arch/arm64/Kconfig       |  3 +++
>>   arch/arm64/mm/mmu.c      | 10 +++++++---
>>   arch/arm64/mm/pageattr.c | 40 +++++++++++++++++++++++++++++++++-------
>>   3 files changed, 43 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 8cc6228..0f33218 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -537,6 +537,9 @@ config HOTPLUG_CPU
>>   source kernel/Kconfig.preempt
>>   source kernel/Kconfig.hz
>>
>> +config ARCH_SUPPORTS_DEBUG_PAGEALLOC
>> +	def_bool y
>> +
>>   config ARCH_HAS_HOLES_MEMORYMODEL
>>   	def_bool y if SPARSEMEM
>>
>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>> index 103ebc0..5abbd30 100644
>> --- a/arch/arm64/mm/mmu.c
>> +++ b/arch/arm64/mm/mmu.c
>> @@ -181,7 +181,8 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>>   	do {
>>   		next = pmd_addr_end(addr, end);
>>   		/* try section mapping first */
>> -		if (((addr | next | phys) & ~SECTION_MASK) == 0) {
>> +		if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
>> +		      (!debug_pagealloc_enabled() || !pgtable_alloc)) {
>>   			pmd_t old_pmd =*pmd;
>>   			set_pmd(pmd, __pmd(phys |
>>   					   pgprot_val(mk_sect_prot(prot))));
>> @@ -208,8 +209,11 @@ static void alloc_init_pmd(pud_t *pud, unsigned long addr, unsigned long end,
>>   }
>>
>>   static inline bool use_1G_block(unsigned long addr, unsigned long next,
>> -			unsigned long phys)
>> +			unsigned long phys, phys_addr_t (*pgtable_alloc)(void))
>>   {
>> +	if (pgtable_alloc && debug_pagealloc_enabled())
>> +		return false;
>> +
>>   	if (PAGE_SHIFT != 12)
>>   		return false;
>>
>> @@ -241,7 +245,7 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long addr, unsigned long end,
>>   		/*
>>   		 * For 4K granule only, attempt to put down a 1GB block
>>   		 */
>> -		if (use_1G_block(addr, next, phys)) {
>> +		if (use_1G_block(addr, next, phys, pgtable_alloc)) {
>>   			pud_t old_pud = *pud;
>>   			set_pud(pud, __pud(phys |
>>   					   pgprot_val(mk_sect_prot(prot))));
>
> The checks for pgtable_alloc is to allow create_mapping_noalloc to use
> sections regardless, right?
>
> I got a little confused by that initially, but that seems to make sense.
>

Yeah, that was what I was going for. I went for the implication with pgtable_alloc
over adding a bool variable to create_mapping. I'll see if I can put a comment
in somewhere to clarify the intentions.
  
>> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
>> index 1360a02..69a8a7d 100644
>> --- a/arch/arm64/mm/pageattr.c
>> +++ b/arch/arm64/mm/pageattr.c
>> @@ -38,7 +38,8 @@ static int change_page_range(pte_t *ptep, pgtable_t token, unsigned long addr,
>>   }
>>
>>   static int change_memory_common(unsigned long addr, int numpages,
>> -				pgprot_t set_mask, pgprot_t clear_mask)
>> +				pgprot_t set_mask, pgprot_t clear_mask,
>> +				bool ignore_vma_check)
>>   {
>>   	unsigned long start = addr;
>>   	unsigned long size = PAGE_SIZE*numpages;
>> @@ -65,11 +66,14 @@ static int change_memory_common(unsigned long addr, int numpages,
>>   	 *
>>   	 * So check whether the [addr, addr + size) interval is entirely
>>   	 * covered by precisely one VM area that has the VM_ALLOC flag set.
>> +	 *
>> +	 * The one exception to this is is we're forcing everything to be
>> +	 * page mapped with DEBUG_PAGEALLOC
>>   	 */
>>   	area = find_vm_area((void *)addr);
>> -	if (!area ||
>> +	if (!ignore_vma_check && (!area ||
>>   	    end > (unsigned long)area->addr + area->size ||
>> -	    !(area->flags & VM_ALLOC))
>> +	    !(area->flags & VM_ALLOC)))
>>   		return -EINVAL;
>
> Perhaps we could pull out everything but the VMA checks into a static
> __change_memory_common. Then we can call that directly in
> __kernel_map_pages, and don't need to pass a confusing boolean
> parameter.

Sounds fine.

>
> Otherwise, this looks good to me.
>  
> Mark.
>

Thanks,
Laura
  
>>
>>   	data.set_mask = set_mask;
>> @@ -86,21 +90,24 @@ int set_memory_ro(unsigned long addr, int numpages)
>>   {
>>   	return change_memory_common(addr, numpages,
>>   					__pgprot(PTE_RDONLY),
>> -					__pgprot(PTE_WRITE));
>> +					__pgprot(PTE_WRITE),
>> +					false);
>>   }
>>
>>   int set_memory_rw(unsigned long addr, int numpages)
>>   {
>>   	return change_memory_common(addr, numpages,
>>   					__pgprot(PTE_WRITE),
>> -					__pgprot(PTE_RDONLY));
>> +					__pgprot(PTE_RDONLY),
>> +					false);
>>   }
>>
>>   int set_memory_nx(unsigned long addr, int numpages)
>>   {
>>   	return change_memory_common(addr, numpages,
>>   					__pgprot(PTE_PXN),
>> -					__pgprot(0));
>> +					__pgprot(0),
>> +					false);
>>   }
>>   EXPORT_SYMBOL_GPL(set_memory_nx);
>>
>> @@ -108,6 +115,25 @@ int set_memory_x(unsigned long addr, int numpages)
>>   {
>>   	return change_memory_common(addr, numpages,
>>   					__pgprot(0),
>> -					__pgprot(PTE_PXN));
>> +					__pgprot(PTE_PXN),
>> +					false);
>>   }
>>   EXPORT_SYMBOL_GPL(set_memory_x);
>> +
>> +#ifdef CONFIG_DEBUG_PAGEALLOC
>> +void __kernel_map_pages(struct page *page, int numpages, int enable)
>> +{
>> +	unsigned long addr = (unsigned long) page_address(page);
>> +
>> +	if (enable)
>> +		change_memory_common(addr, numpages,
>> +					__pgprot(PTE_VALID),
>> +					__pgprot(0),
>> +					true);
>> +	else
>> +		change_memory_common(addr, numpages,
>> +					__pgprot(0),
>> +					__pgprot(PTE_VALID),
>> +					true);
>> +}
>> +#endif
>> --
>> 2.5.0
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 2/3] arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
  2016-02-01 21:24     ` Laura Abbott
@ 2016-02-02  8:57       ` Ard Biesheuvel
  2016-02-02 12:23       ` Mark Rutland
  1 sibling, 0 replies; 12+ messages in thread
From: Ard Biesheuvel @ 2016-02-02  8:57 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Mark Rutland, Laura Abbott, Catalin Marinas, Will Deacon,
	linux-arm-kernel, linux-kernel

On 1 February 2016 at 22:24, Laura Abbott <labbott@redhat.com> wrote:
> On 02/01/2016 04:29 AM, Mark Rutland wrote:
>>
>> Hi,
>>
>> On Fri, Jan 29, 2016 at 03:46:57PM -0800, Laura Abbott wrote:
>>>
>>>
>>> ARCH_SUPPORTS_DEBUG_PAGEALLOC provides a hook to map and unmap
>>> pages for debugging purposes. This requires memory be mapped
>>> with PAGE_SIZE mappings since breaking down larger mappings
>>> at runtime will lead to TLB conflicts. Check if debug_pagealloc
>>> is enabled at runtime and if so, map everyting with PAGE_SIZE
>>> pages. Implement the functions to actually map/unmap the
>>> pages at runtime.
>>>
>>>
>>> Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
>>
>>
>> I tried to apply atop of the arm64 for-next/pgtable branch, but git
>> wasn't very happy about that -- which branch/patches is this based on?
>>
>> I'm not sure if I'm missing something, have something I shouldn't, or if
>> my MTA is corrupting patches again...
>>
>
> Hmmm, I based it off of your arm64-pagetable-rework-20160125 tag and
> Ard's patch for vmalloc and set_memory_* . The patches seem to apply
> on the for-next/pgtable branch as well so I'm guessing you are missing
> Ard's patch.
>
>
>
>>>
>>> ---
>>>   arch/arm64/Kconfig       |  3 +++
>>>   arch/arm64/mm/mmu.c      | 10 +++++++---
>>>   arch/arm64/mm/pageattr.c | 40 +++++++++++++++++++++++++++++++++-------
>>>   3 files changed, 43 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index 8cc6228..0f33218 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -537,6 +537,9 @@ config HOTPLUG_CPU
>>>   source kernel/Kconfig.preempt
>>>   source kernel/Kconfig.hz
>>>
>>> +config ARCH_SUPPORTS_DEBUG_PAGEALLOC
>>> +       def_bool y
>>> +
>>>   config ARCH_HAS_HOLES_MEMORYMODEL
>>>         def_bool y if SPARSEMEM
>>>
>>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
>>> index 103ebc0..5abbd30 100644
>>> --- a/arch/arm64/mm/mmu.c
>>> +++ b/arch/arm64/mm/mmu.c
>>> @@ -181,7 +181,8 @@ static void alloc_init_pmd(pud_t *pud, unsigned long
>>> addr, unsigned long end,
>>>         do {
>>>                 next = pmd_addr_end(addr, end);
>>>                 /* try section mapping first */
>>> -               if (((addr | next | phys) & ~SECTION_MASK) == 0) {
>>> +               if (((addr | next | phys) & ~SECTION_MASK) == 0 &&
>>> +                     (!debug_pagealloc_enabled() || !pgtable_alloc)) {
>>>                         pmd_t old_pmd =*pmd;
>>>                         set_pmd(pmd, __pmd(phys |
>>>
>>> pgprot_val(mk_sect_prot(prot))));
>>> @@ -208,8 +209,11 @@ static void alloc_init_pmd(pud_t *pud, unsigned long
>>> addr, unsigned long end,
>>>   }
>>>
>>>   static inline bool use_1G_block(unsigned long addr, unsigned long next,
>>> -                       unsigned long phys)
>>> +                       unsigned long phys, phys_addr_t
>>> (*pgtable_alloc)(void))
>>>   {
>>> +       if (pgtable_alloc && debug_pagealloc_enabled())
>>> +               return false;
>>> +
>>>         if (PAGE_SHIFT != 12)
>>>                 return false;
>>>
>>> @@ -241,7 +245,7 @@ static void alloc_init_pud(pgd_t *pgd, unsigned long
>>> addr, unsigned long end,
>>>                 /*
>>>                  * For 4K granule only, attempt to put down a 1GB block
>>>                  */
>>> -               if (use_1G_block(addr, next, phys)) {
>>> +               if (use_1G_block(addr, next, phys, pgtable_alloc)) {
>>>                         pud_t old_pud = *pud;
>>>                         set_pud(pud, __pud(phys |
>>>
>>> pgprot_val(mk_sect_prot(prot))));
>>
>>
>> The checks for pgtable_alloc is to allow create_mapping_noalloc to use
>> sections regardless, right?
>>
>> I got a little confused by that initially, but that seems to make sense.
>>
>
> Yeah, that was what I was going for. I went for the implication with
> pgtable_alloc
> over adding a bool variable to create_mapping. I'll see if I can put a
> comment
> in somewhere to clarify the intentions.
>

Yes, the logic becomes slightly fuzzy, so it would be good to document
this one way or the other.

-- 
Ard.

>
>>>
>>> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
>>> index 1360a02..69a8a7d 100644
>>> --- a/arch/arm64/mm/pageattr.c
>>> +++ b/arch/arm64/mm/pageattr.c
>>> @@ -38,7 +38,8 @@ static int change_page_range(pte_t *ptep, pgtable_t
>>> token, unsigned long addr,
>>>   }
>>>
>>>   static int change_memory_common(unsigned long addr, int numpages,
>>> -                               pgprot_t set_mask, pgprot_t clear_mask)
>>> +                               pgprot_t set_mask, pgprot_t clear_mask,
>>> +                               bool ignore_vma_check)
>>>   {
>>>         unsigned long start = addr;
>>>         unsigned long size = PAGE_SIZE*numpages;
>>> @@ -65,11 +66,14 @@ static int change_memory_common(unsigned long addr,
>>> int numpages,
>>>          *
>>>          * So check whether the [addr, addr + size) interval is entirely
>>>          * covered by precisely one VM area that has the VM_ALLOC flag
>>> set.
>>> +        *
>>> +        * The one exception to this is is we're forcing everything to be
>>> +        * page mapped with DEBUG_PAGEALLOC
>>>          */
>>>         area = find_vm_area((void *)addr);
>>> -       if (!area ||
>>> +       if (!ignore_vma_check && (!area ||
>>>             end > (unsigned long)area->addr + area->size ||
>>> -           !(area->flags & VM_ALLOC))
>>> +           !(area->flags & VM_ALLOC)))
>>>                 return -EINVAL;
>>
>>
>> Perhaps we could pull out everything but the VMA checks into a static
>> __change_memory_common. Then we can call that directly in
>> __kernel_map_pages, and don't need to pass a confusing boolean
>> parameter.
>
>
> Sounds fine.
>
>>
>> Otherwise, this looks good to me.
>>  Mark.
>>
>
> Thanks,
> Laura
>
>
>>>
>>>
>>>         data.set_mask = set_mask;
>>> @@ -86,21 +90,24 @@ int set_memory_ro(unsigned long addr, int numpages)
>>>   {
>>>         return change_memory_common(addr, numpages,
>>>                                         __pgprot(PTE_RDONLY),
>>> -                                       __pgprot(PTE_WRITE));
>>> +                                       __pgprot(PTE_WRITE),
>>> +                                       false);
>>>   }
>>>
>>>   int set_memory_rw(unsigned long addr, int numpages)
>>>   {
>>>         return change_memory_common(addr, numpages,
>>>                                         __pgprot(PTE_WRITE),
>>> -                                       __pgprot(PTE_RDONLY));
>>> +                                       __pgprot(PTE_RDONLY),
>>> +                                       false);
>>>   }
>>>
>>>   int set_memory_nx(unsigned long addr, int numpages)
>>>   {
>>>         return change_memory_common(addr, numpages,
>>>                                         __pgprot(PTE_PXN),
>>> -                                       __pgprot(0));
>>> +                                       __pgprot(0),
>>> +                                       false);
>>>   }
>>>   EXPORT_SYMBOL_GPL(set_memory_nx);
>>>
>>> @@ -108,6 +115,25 @@ int set_memory_x(unsigned long addr, int numpages)
>>>   {
>>>         return change_memory_common(addr, numpages,
>>>                                         __pgprot(0),
>>> -                                       __pgprot(PTE_PXN));
>>> +                                       __pgprot(PTE_PXN),
>>> +                                       false);
>>>   }
>>>   EXPORT_SYMBOL_GPL(set_memory_x);
>>> +
>>> +#ifdef CONFIG_DEBUG_PAGEALLOC
>>> +void __kernel_map_pages(struct page *page, int numpages, int enable)
>>> +{
>>> +       unsigned long addr = (unsigned long) page_address(page);
>>> +
>>> +       if (enable)
>>> +               change_memory_common(addr, numpages,
>>> +                                       __pgprot(PTE_VALID),
>>> +                                       __pgprot(0),
>>> +                                       true);
>>> +       else
>>> +               change_memory_common(addr, numpages,
>>> +                                       __pgprot(0),
>>> +                                       __pgprot(PTE_VALID),
>>> +                                       true);
>>> +}
>>> +#endif
>>> --
>>> 2.5.0
>>>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 2/3] arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
  2016-02-01 21:24     ` Laura Abbott
  2016-02-02  8:57       ` Ard Biesheuvel
@ 2016-02-02 12:23       ` Mark Rutland
  2016-02-02 12:31         ` Mark Rutland
  1 sibling, 1 reply; 12+ messages in thread
From: Mark Rutland @ 2016-02-02 12:23 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Laura Abbott, Catalin Marinas, Will Deacon, Ard Biesheuvel,
	linux-arm-kernel, linux-kernel

On Mon, Feb 01, 2016 at 01:24:25PM -0800, Laura Abbott wrote:
> On 02/01/2016 04:29 AM, Mark Rutland wrote:
> >Hi,
> >
> >On Fri, Jan 29, 2016 at 03:46:57PM -0800, Laura Abbott wrote:
> >>
> >>ARCH_SUPPORTS_DEBUG_PAGEALLOC provides a hook to map and unmap
> >>pages for debugging purposes. This requires memory be mapped
> >>with PAGE_SIZE mappings since breaking down larger mappings
> >>at runtime will lead to TLB conflicts. Check if debug_pagealloc
> >>is enabled at runtime and if so, map everyting with PAGE_SIZE
> >>pages. Implement the functions to actually map/unmap the
> >>pages at runtime.
> >>
> >>
> >>Signed-off-by: Laura Abbott <labbott@fedoraproject.org>
> >
> >I tried to apply atop of the arm64 for-next/pgtable branch, but git
> >wasn't very happy about that -- which branch/patches is this based on?
> >
> >I'm not sure if I'm missing something, have something I shouldn't, or if
> >my MTA is corrupting patches again...
> >
> 
> Hmmm, I based it off of your arm64-pagetable-rework-20160125 tag and
> Ard's patch for vmalloc and set_memory_* . The patches seem to apply
> on the for-next/pgtable branch as well so I'm guessing you are missing
> Ard's patch.

Yup, that was it. I evidently was paying far too little attention as I'd
also missed the mm/ patch for the !CONFIG_DEBUG_PAGEALLOC case.

Is there anything else in mm/ that I've potentially missed? I'm seeing a
hang on Juno just after reaching userspace (splat below) with
debug_pagealloc=on.

It looks like something's gone wrong around find_vmap_area -- at least
one CPU is forever awaiting vmap_area_lock, and presumably some other
CPU has held it and gone into the weeds, leading to the RCU stalls and
NMI lockup warnings.

[   31.037054] INFO: rcu_preempt detected stalls on CPUs/tasks:
[   31.042684]  0-...: (1 ticks this GP) idle=999/140000000000001/0 softirq=738/738 fqs=418 
[   31.050795]  (detected by 1, t=5255 jiffies, g=340, c=339, q=50)
[   31.056935] rcu_preempt kthread starved for 4838 jiffies! g340 c339 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0
[   36.509055] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [kworker/2:2H:995]
[   36.521059] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 23s! [systemd-udevd:1048]
[   36.533056] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 23s! [systemd-udevd:1037]
[   36.545055] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 23s! [systemd-udevd:1036]
[   56.497055] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [upstart-file-br:1012]
[   94.057052] INFO: rcu_preempt detected stalls on CPUs/tasks:
[   94.062671]  0-...: (1 ticks this GP) idle=999/140000000000001/0 softirq=738/738 fqs=418 
[   94.070780]  (detected by 1, t=21010 jiffies, g=340, c=339, q=50)
[   94.076981] rcu_preempt kthread starved for 20593 jiffies! g340 c339 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0
[  157.077052] INFO: rcu_preempt detected stalls on CPUs/tasks:
[  157.082673]  0-...: (1 ticks this GP) idle=999/140000000000001/0 softirq=738/738 fqs=418 
[  157.090782]  (detected by 2, t=36765 jiffies, g=340, c=339, q=50)
[  157.096986] rcu_preempt kthread starved for 36348 jiffies! g340 c339 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0
[  220.097052] INFO: rcu_preempt detected stalls on CPUs/tasks:
[  220.102670]  0-...: (1 ticks this GP) idle=999/140000000000001/0 softirq=738/738 fqs=418 
[  220.110779]  (detected by 2, t=52520 jiffies, g=340, c=339, q=50)
[  220.116971] rcu_preempt kthread starved for 52103 jiffies! g340 c339 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0
[  283.117052] INFO: rcu_preempt detected stalls on CPUs/tasks:
[  283.122670]  0-...: (1 ticks this GP) idle=999/140000000000001/0 softirq=738/738 fqs=418 
[  283.130779]  (detected by 1, t=68275 jiffies, g=340, c=339, q=50)
[  283.136973] rcu_preempt kthread starved for 67858 jiffies! g340 c339 f0x2 RCU_GP_WAIT_FQS(3) ->state=0x0

Typically show-backtrace-all-active-cpus(l) gives me something like:

[  183.282835] CPU: 0 PID: 998 Comm: systemd-udevd Tainted: G             L  4.5.0-rc1+ #7
[  183.290783] Hardware name: ARM Juno development board (r0) (DT)
[  183.296659] task: ffffffc97437a400 ti: ffffffc973ec8000 task.ti: ffffffc973ec8000
[  183.304095] PC is at _raw_spin_lock+0x34/0x48
[  183.308421] LR is at find_vmap_area+0x24/0xa0
[  183.312746] pc : [<ffffffc00065faf4>] lr : [<ffffffc000185bc4>] pstate: 60000145
[  183.320092] sp : ffffffc973ecb6c0
[  183.323382] x29: ffffffc973ecb6c0 x28: ffffffbde7d50300 
[  183.328662] x27: ffffffffffffffff x26: ffffffbde7d50300 
[  183.333941] x25: 000000097e513000 x24: 0000000000000001 
[  183.339219] x23: 0000000000000000 x22: 0000000000000001 
[  183.344498] x21: ffffffc000a6dd90 x20: ffffffc000a6d000 
[  183.349778] x19: ffffffc97540c000 x18: 0000007fc4e8b960 
[  183.355057] x17: 0000007fac3088d4 x16: ffffffc0001be448 
[  183.360336] x15: 003b9aca00000000 x14: 0032aa26d4000000 
[  183.365614] x13: ffffffffa94f64df x12: 0000000000000018 
[  183.370894] x11: ffffffc97eecd730 x10: 0000000000000030 
[  183.376173] x9 : ffffffbde7d50340 x8 : ffffffc0008556a0 
[  183.381451] x7 : ffffffc0008556b8 x6 : ffffffc0008556d0 
[  183.386729] x5 : ffffffc0009d2000 x4 : 0000000000000001 
[  183.392008] x3 : 000000000000d033 x2 : 000000000000000b 
[  183.397286] x1 : 00000000d038d033 x0 : ffffffc000a6dd90 
[  183.402563] 

I'll have a go with lock debugging. Otherwise do you have any ideas?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 2/3] arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
  2016-02-02 12:23       ` Mark Rutland
@ 2016-02-02 12:31         ` Mark Rutland
  2016-02-02 16:08           ` Laura Abbott
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Rutland @ 2016-02-02 12:31 UTC (permalink / raw)
  To: Laura Abbott
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, linux-kernel,
	linux-arm-kernel, Laura Abbott

On Tue, Feb 02, 2016 at 12:23:18PM +0000, Mark Rutland wrote:
 Is there anything else in mm/ that I've potentially missed?
> I'm seeing a hang on Juno just after reaching userspace (splat below)
> with debug_pagealloc=on.
> 
> It looks like something's gone wrong around find_vmap_area -- at least
> one CPU is forever awaiting vmap_area_lock, and presumably some other
> CPU has held it and gone into the weeds, leading to the RCU stalls and
> NMI lockup warnings.

[...]

> I'll have a go with lock debugging.

FWIW, with lock debugging I get the below splat before reaching userspace.

[    0.145869] =================================
[    0.145901] [ INFO: inconsistent lock state ]
[    0.145935] 4.5.0-rc1+ #8 Not tainted
[    0.145964] ---------------------------------
[    0.145996] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[    0.146036] swapper/5/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[    0.146070]  (vmap_area_lock){+.?...}, at: [<ffffffc0001a749c>] find_vmap_area+0x1c/0x98
[    0.146151] {SOFTIRQ-ON-W} state was registered at:
[    0.146184]   [<ffffffc0000fc2ac>] mark_lock+0x1bc/0x708
[    0.146229]   [<ffffffc0000feb18>] __lock_acquire+0x928/0x1d90
[    0.146274]   [<ffffffc00010032c>] lock_acquire+0x9c/0xe0
[    0.146318]   [<ffffffc0006c8cd8>] _raw_spin_lock+0x40/0x58
[    0.146362]   [<ffffffc0001a749c>] find_vmap_area+0x1c/0x98
[    0.146406]   [<ffffffc0001a9c6c>] find_vm_area+0xc/0x38
[    0.146447]   [<ffffffc000096420>] change_memory_common+0x38/0x120
[    0.146495]   [<ffffffc0000965dc>] __kernel_map_pages+0x54/0x60
[    0.146537]   [<ffffffc000176744>] get_page_from_freelist+0x86c/0x9a0
[    0.146584]   [<ffffffc000176b98>] __alloc_pages_nodemask+0xf0/0x8a0
[    0.146629]   [<ffffffc0009dd46c>] alloc_pages_exact_nid+0x48/0x90
[    0.146675]   [<ffffffc0009b8004>] page_ext_init+0x94/0x124
[    0.146718]   [<ffffffc0009a390c>] start_kernel+0x350/0x3d4
[    0.146761]   [<ffffffc0000811b4>] 0xffffffc0000811b4
[    0.146802] irq event stamp: 402
[    0.146830] hardirqs last  enabled at (402): [<ffffffc000172c58>] free_pages_prepare+0x270/0x330
[    0.146894] hardirqs last disabled at (401): [<ffffffc000172c58>] free_pages_prepare+0x270/0x330
[    0.146956] softirqs last  enabled at (368): [<ffffffc0000bc070>] _local_bh_enable+0x20/0x48
[    0.147022] softirqs last disabled at (369): [<ffffffc0000bca10>] irq_exit+0xa0/0xd8
[    0.147081] 
[    0.147081] other info that might help us debug this:
[    0.147130]  Possible unsafe locking scenario:
[    0.147130] 
[    0.147177]        CPU0
[    0.147201]        ----
[    0.147225]   lock(vmap_area_lock);
[    0.147260]   <Interrupt>
[    0.147285]     lock(vmap_area_lock);
[    0.147321] 
[    0.147321]  *** DEADLOCK ***
[    0.147321] 
[    0.147381] 1 lock held by swapper/5/0:
[    0.147410]  #0:  (rcu_callback){......}, at: [<ffffffc0001193f0>] rcu_process_callbacks+0x2b8/0x5f8
[    0.147492] 
[    0.147492] stack backtrace:
[    0.147538] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.5.0-rc1+ #8
[    0.147577] Hardware name: ARM Juno development board (r0) (DT)
[    0.147613] Call trace:
[    0.147644] [<ffffffc000089a38>] dump_backtrace+0x0/0x180
[    0.147684] [<ffffffc000089bcc>] show_stack+0x14/0x20
[    0.147724] [<ffffffc000373bb8>] dump_stack+0x90/0xc8
[    0.147764] [<ffffffc00016bb3c>] print_usage_bug.part.21+0x260/0x278
[    0.147807] [<ffffffc0000fc238>] mark_lock+0x148/0x708
[    0.147846] [<ffffffc0000feadc>] __lock_acquire+0x8ec/0x1d90
[    0.147887] [<ffffffc00010032c>] lock_acquire+0x9c/0xe0
[    0.147925] [<ffffffc0006c8cd8>] _raw_spin_lock+0x40/0x58
[    0.147965] [<ffffffc0001a749c>] find_vmap_area+0x1c/0x98
[    0.148003] [<ffffffc0001a9c6c>] find_vm_area+0xc/0x38
[    0.148044] [<ffffffc000096420>] change_memory_common+0x38/0x120
[    0.148084] [<ffffffc0000965c8>] __kernel_map_pages+0x40/0x60
[    0.148123] [<ffffffc000172cb8>] free_pages_prepare+0x2d0/0x330
[    0.148164] [<ffffffc000174660>] __free_pages_ok+0x20/0x108
[    0.148203] [<ffffffc0001750e0>] __free_pages+0x30/0x50
[    0.148241] [<ffffffc0001753ac>] __free_kmem_pages+0x24/0x50
[    0.148280] [<ffffffc000175410>] free_kmem_pages+0x38/0x40
[    0.148320] [<ffffffc0000b5908>] free_task+0x30/0x60
[    0.148359] [<ffffffc0000b59f8>] __put_task_struct+0xc0/0x110
[    0.148400] [<ffffffc0000b91bc>] delayed_put_task_struct+0x44/0x50
[    0.148442] [<ffffffc000119430>] rcu_process_callbacks+0x2f8/0x5f8
[    0.148482] [<ffffffc0000bc594>] __do_softirq+0x13c/0x278
[    0.148520] [<ffffffc0000bca10>] irq_exit+0xa0/0xd8
[    0.148559] [<ffffffc00010ad90>] __handle_domain_irq+0x60/0xb8
[    0.148599] [<ffffffc0000824f0>] gic_handle_irq+0x58/0xa8
[    0.148636] Exception stack(0xffffffc97594be30 to 0xffffffc97594bf50)
[    0.148678] be20:                                   ffffffc975933f00 0000000000000243
[    0.148734] be40: 000000097e4a7000 0000000000000000 0000000000000000 0000000000000008
[    0.148791] be60: 00000007de290000 00000000000270f0 0000000000000001 ffffffc975948000
[    0.148848] be80: ffffffc00179e000 0000000000000000 ffffffc001507000 ffffffc001507f00
[    0.148903] bea0: 000000000000000e 0000000000000007 0000000000000001 0000000000000007
[    0.148960] bec0: 000000000000000e ffffffc000a5a000 ffffffc975948000 ffffffc000a5a000
[    0.149017] bee0: ffffffc000a38c40 ffffffc000a3c460 ffffffc975948000 ffffffc000a5a000
[    0.149073] bf00: ffffffc000af6000 0000000000000000 0000000000000000 ffffffc97594bf50
[    0.149130] bf20: ffffffc0000867e0 ffffffc97594bf50 ffffffc0000867e4 0000000060000045
[    0.149185] bf40: ffffffc97594bf50 ffffffc0000867e0
[    0.149222] [<ffffffc0000855e4>] el1_irq+0xa4/0x114
[    0.149260] [<ffffffc0000867e4>] arch_cpu_idle+0x14/0x20
[    0.149299] [<ffffffc0000f7878>] default_idle_call+0x18/0x30
[    0.149339] [<ffffffc0000f7a78>] cpu_startup_entry+0x1e8/0x240
[    0.149380] [<ffffffc00008f064>] secondary_start_kernel+0x16c/0x198
[    0.149419] [<00000000800827fc>] 0x800827fc

The kernel then happily ran userspace for a while, but running hackbench
was sufficient to lock it up:

[  132.624028] BUG: spinlock lockup suspected on CPU#4, hackbench/5589
[  132.624600] BUG: spinlock lockup suspected on CPU#5, hackbench/5270
[  132.624619]  lock: vmap_area_lock+0x0/0x38, .magic: dead4ead, .owner: hackbench/7089, .owner_cpu: 1
[  132.626651] BUG: spinlock lockup suspected on CPU#3, hackbench/5280
[  132.626663]  lock: vmap_area_lock+0x0/0x38, .magic: dead4ead, .owner: hackbench/7089, .owner_cpu: 1
[  132.628358] BUG: spinlock lockup suspected on CPU#0, init/1
[  132.628370]  lock: vmap_area_lock+0x0/0x38, .magic: dead4ead, .owner: hackbench/7089, .owner_cpu: 1
[  132.675602]  lock: vmap_area_lock+0x0/0x38, .magic: dead4ead, .owner: hackbench/7089, .owner_cpu: 1
[  136.619403] BUG: spinlock lockup suspected on CPU#2, hackbench/6768
[  136.625640]  lock: vmap_area_lock+0x0/0x38, .magic: dead4ead, .owner: hackbench/7089, .owner_cpu: 1
[  136.675626] BUG: spinlock lockup suspected on CPU#1, hackbench/7089
[  136.681860]  lock: vmap_area_lock+0x0/0x38, .magic: dead4ead, .owner: hackbench/7089, .owner_cpu: 1
[  152.689601] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [hackbench:7089]
[  155.149604] INFO: rcu_preempt self-detected stall on CPU
[  155.149609] INFO: rcu_preempt self-detected stall on CPU
[  155.149611] INFO: rcu_preempt self-detected stall on CPU
[  155.149625]  1-...: (6496 ticks this GP) idle=fef/140000000000002/0 softirq=1935/1935 fqs=1 
[  155.149639]  
[  155.149640]  4-...: (6493 ticks this GP) idle=305/140000000000002/0 softirq=1961/1961 fqs=1 
[  155.149650]  
[  155.149651] rcu_preempt kthread starved for 6499 jiffies! g1204 c1203 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0
[  155.149665] rcu_preempt kthread starved for 6499 jiffies! g1204 c1203 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0
[  155.205127]  0-...: (6498 ticks this GP) idle=a2d/140000000000002/0 softirq=2595/2595 fqs=1 
[  155.213526]   (t=6516 jiffies g=1204 c=1203 q=422)
[  155.218307] rcu_preempt kthread starved for 6516 jiffies! g1204 c1203 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0
[  156.677602] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [init:1]
[  156.701602] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [hackbench:6768]
[  156.713603] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [hackbench:5280]
[  156.725602] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [hackbench:5589]
[  156.737603] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [hackbench:5270]

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCHv2 2/3] arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
  2016-02-02 12:31         ` Mark Rutland
@ 2016-02-02 16:08           ` Laura Abbott
  0 siblings, 0 replies; 12+ messages in thread
From: Laura Abbott @ 2016-02-02 16:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ard Biesheuvel, Catalin Marinas, Will Deacon, linux-kernel,
	linux-arm-kernel, Laura Abbott

On 02/02/2016 04:31 AM, Mark Rutland wrote:
> On Tue, Feb 02, 2016 at 12:23:18PM +0000, Mark Rutland wrote:
>   Is there anything else in mm/ that I've potentially missed?
>> I'm seeing a hang on Juno just after reaching userspace (splat below)
>> with debug_pagealloc=on.
>>
>> It looks like something's gone wrong around find_vmap_area -- at least
>> one CPU is forever awaiting vmap_area_lock, and presumably some other
>> CPU has held it and gone into the weeds, leading to the RCU stalls and
>> NMI lockup warnings.
>
> [...]
>
>> I'll have a go with lock debugging.
>
> FWIW, with lock debugging I get the below splat before reaching userspace.
>
> [    0.145869] =================================
> [    0.145901] [ INFO: inconsistent lock state ]
> [    0.145935] 4.5.0-rc1+ #8 Not tainted
> [    0.145964] ---------------------------------
> [    0.145996] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> [    0.146036] swapper/5/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
> [    0.146070]  (vmap_area_lock){+.?...}, at: [<ffffffc0001a749c>] find_vmap_area+0x1c/0x98
> [    0.146151] {SOFTIRQ-ON-W} state was registered at:
> [    0.146184]   [<ffffffc0000fc2ac>] mark_lock+0x1bc/0x708
> [    0.146229]   [<ffffffc0000feb18>] __lock_acquire+0x928/0x1d90
> [    0.146274]   [<ffffffc00010032c>] lock_acquire+0x9c/0xe0
> [    0.146318]   [<ffffffc0006c8cd8>] _raw_spin_lock+0x40/0x58
> [    0.146362]   [<ffffffc0001a749c>] find_vmap_area+0x1c/0x98
> [    0.146406]   [<ffffffc0001a9c6c>] find_vm_area+0xc/0x38
> [    0.146447]   [<ffffffc000096420>] change_memory_common+0x38/0x120
> [    0.146495]   [<ffffffc0000965dc>] __kernel_map_pages+0x54/0x60
> [    0.146537]   [<ffffffc000176744>] get_page_from_freelist+0x86c/0x9a0
> [    0.146584]   [<ffffffc000176b98>] __alloc_pages_nodemask+0xf0/0x8a0
> [    0.146629]   [<ffffffc0009dd46c>] alloc_pages_exact_nid+0x48/0x90
> [    0.146675]   [<ffffffc0009b8004>] page_ext_init+0x94/0x124
> [    0.146718]   [<ffffffc0009a390c>] start_kernel+0x350/0x3d4
> [    0.146761]   [<ffffffc0000811b4>] 0xffffffc0000811b4
> [    0.146802] irq event stamp: 402
> [    0.146830] hardirqs last  enabled at (402): [<ffffffc000172c58>] free_pages_prepare+0x270/0x330
> [    0.146894] hardirqs last disabled at (401): [<ffffffc000172c58>] free_pages_prepare+0x270/0x330
> [    0.146956] softirqs last  enabled at (368): [<ffffffc0000bc070>] _local_bh_enable+0x20/0x48
> [    0.147022] softirqs last disabled at (369): [<ffffffc0000bca10>] irq_exit+0xa0/0xd8
> [    0.147081]
> [    0.147081] other info that might help us debug this:
> [    0.147130]  Possible unsafe locking scenario:
> [    0.147130]
> [    0.147177]        CPU0
> [    0.147201]        ----
> [    0.147225]   lock(vmap_area_lock);
> [    0.147260]   <Interrupt>
> [    0.147285]     lock(vmap_area_lock);
> [    0.147321]
> [    0.147321]  *** DEADLOCK ***
> [    0.147321]
> [    0.147381] 1 lock held by swapper/5/0:
> [    0.147410]  #0:  (rcu_callback){......}, at: [<ffffffc0001193f0>] rcu_process_callbacks+0x2b8/0x5f8
> [    0.147492]
> [    0.147492] stack backtrace:
> [    0.147538] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.5.0-rc1+ #8
> [    0.147577] Hardware name: ARM Juno development board (r0) (DT)
> [    0.147613] Call trace:
> [    0.147644] [<ffffffc000089a38>] dump_backtrace+0x0/0x180
> [    0.147684] [<ffffffc000089bcc>] show_stack+0x14/0x20
> [    0.147724] [<ffffffc000373bb8>] dump_stack+0x90/0xc8
> [    0.147764] [<ffffffc00016bb3c>] print_usage_bug.part.21+0x260/0x278
> [    0.147807] [<ffffffc0000fc238>] mark_lock+0x148/0x708
> [    0.147846] [<ffffffc0000feadc>] __lock_acquire+0x8ec/0x1d90
> [    0.147887] [<ffffffc00010032c>] lock_acquire+0x9c/0xe0
> [    0.147925] [<ffffffc0006c8cd8>] _raw_spin_lock+0x40/0x58
> [    0.147965] [<ffffffc0001a749c>] find_vmap_area+0x1c/0x98
> [    0.148003] [<ffffffc0001a9c6c>] find_vm_area+0xc/0x38
> [    0.148044] [<ffffffc000096420>] change_memory_common+0x38/0x120
> [    0.148084] [<ffffffc0000965c8>] __kernel_map_pages+0x40/0x60
> [    0.148123] [<ffffffc000172cb8>] free_pages_prepare+0x2d0/0x330
> [    0.148164] [<ffffffc000174660>] __free_pages_ok+0x20/0x108
> [    0.148203] [<ffffffc0001750e0>] __free_pages+0x30/0x50
> [    0.148241] [<ffffffc0001753ac>] __free_kmem_pages+0x24/0x50
> [    0.148280] [<ffffffc000175410>] free_kmem_pages+0x38/0x40
> [    0.148320] [<ffffffc0000b5908>] free_task+0x30/0x60
> [    0.148359] [<ffffffc0000b59f8>] __put_task_struct+0xc0/0x110
> [    0.148400] [<ffffffc0000b91bc>] delayed_put_task_struct+0x44/0x50
> [    0.148442] [<ffffffc000119430>] rcu_process_callbacks+0x2f8/0x5f8
> [    0.148482] [<ffffffc0000bc594>] __do_softirq+0x13c/0x278
> [    0.148520] [<ffffffc0000bca10>] irq_exit+0xa0/0xd8
> [    0.148559] [<ffffffc00010ad90>] __handle_domain_irq+0x60/0xb8
> [    0.148599] [<ffffffc0000824f0>] gic_handle_irq+0x58/0xa8
> [    0.148636] Exception stack(0xffffffc97594be30 to 0xffffffc97594bf50)
> [    0.148678] be20:                                   ffffffc975933f00 0000000000000243
> [    0.148734] be40: 000000097e4a7000 0000000000000000 0000000000000000 0000000000000008
> [    0.148791] be60: 00000007de290000 00000000000270f0 0000000000000001 ffffffc975948000
> [    0.148848] be80: ffffffc00179e000 0000000000000000 ffffffc001507000 ffffffc001507f00
> [    0.148903] bea0: 000000000000000e 0000000000000007 0000000000000001 0000000000000007
> [    0.148960] bec0: 000000000000000e ffffffc000a5a000 ffffffc975948000 ffffffc000a5a000
> [    0.149017] bee0: ffffffc000a38c40 ffffffc000a3c460 ffffffc975948000 ffffffc000a5a000
> [    0.149073] bf00: ffffffc000af6000 0000000000000000 0000000000000000 ffffffc97594bf50
> [    0.149130] bf20: ffffffc0000867e0 ffffffc97594bf50 ffffffc0000867e4 0000000060000045
> [    0.149185] bf40: ffffffc97594bf50 ffffffc0000867e0
> [    0.149222] [<ffffffc0000855e4>] el1_irq+0xa4/0x114
> [    0.149260] [<ffffffc0000867e4>] arch_cpu_idle+0x14/0x20
> [    0.149299] [<ffffffc0000f7878>] default_idle_call+0x18/0x30
> [    0.149339] [<ffffffc0000f7a78>] cpu_startup_entry+0x1e8/0x240
> [    0.149380] [<ffffffc00008f064>] secondary_start_kernel+0x16c/0x198
> [    0.149419] [<00000000800827fc>] 0x800827fc
>
> The kernel then happily ran userspace for a while, but running hackbench
> was sufficient to lock it up:
>
> [  132.624028] BUG: spinlock lockup suspected on CPU#4, hackbench/5589
> [  132.624600] BUG: spinlock lockup suspected on CPU#5, hackbench/5270
> [  132.624619]  lock: vmap_area_lock+0x0/0x38, .magic: dead4ead, .owner: hackbench/7089, .owner_cpu: 1
> [  132.626651] BUG: spinlock lockup suspected on CPU#3, hackbench/5280
> [  132.626663]  lock: vmap_area_lock+0x0/0x38, .magic: dead4ead, .owner: hackbench/7089, .owner_cpu: 1
> [  132.628358] BUG: spinlock lockup suspected on CPU#0, init/1
> [  132.628370]  lock: vmap_area_lock+0x0/0x38, .magic: dead4ead, .owner: hackbench/7089, .owner_cpu: 1
> [  132.675602]  lock: vmap_area_lock+0x0/0x38, .magic: dead4ead, .owner: hackbench/7089, .owner_cpu: 1
> [  136.619403] BUG: spinlock lockup suspected on CPU#2, hackbench/6768
> [  136.625640]  lock: vmap_area_lock+0x0/0x38, .magic: dead4ead, .owner: hackbench/7089, .owner_cpu: 1
> [  136.675626] BUG: spinlock lockup suspected on CPU#1, hackbench/7089
> [  136.681860]  lock: vmap_area_lock+0x0/0x38, .magic: dead4ead, .owner: hackbench/7089, .owner_cpu: 1
> [  152.689601] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [hackbench:7089]
> [  155.149604] INFO: rcu_preempt self-detected stall on CPU
> [  155.149609] INFO: rcu_preempt self-detected stall on CPU
> [  155.149611] INFO: rcu_preempt self-detected stall on CPU
> [  155.149625]  1-...: (6496 ticks this GP) idle=fef/140000000000002/0 softirq=1935/1935 fqs=1
> [  155.149639]
> [  155.149640]  4-...: (6493 ticks this GP) idle=305/140000000000002/0 softirq=1961/1961 fqs=1
> [  155.149650]
> [  155.149651] rcu_preempt kthread starved for 6499 jiffies! g1204 c1203 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0
> [  155.149665] rcu_preempt kthread starved for 6499 jiffies! g1204 c1203 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0
> [  155.205127]  0-...: (6498 ticks this GP) idle=a2d/140000000000002/0 softirq=2595/2595 fqs=1
> [  155.213526]   (t=6516 jiffies g=1204 c=1203 q=422)
> [  155.218307] rcu_preempt kthread starved for 6516 jiffies! g1204 c1203 f0x0 RCU_GP_WAIT_FQS(3) ->state=0x0
> [  156.677602] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [init:1]
> [  156.701602] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [hackbench:6768]
> [  156.713603] NMI watchdog: BUG: soft lockup - CPU#3 stuck for 22s! [hackbench:5280]
> [  156.725602] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [hackbench:5589]
> [  156.737603] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 22s! [hackbench:5270]
>
> Thanks,
> Mark.
>

Yes, this is absolutely a deadlock with the vmap_lock. Your suggestion to pull it out so
we don't have the extra check should fix it I suspect.

Thanks,
Laura

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-02-02 16:08 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-29 23:46 [PATCHv2 0/3] ARCH_SUPPORTS_DEBUG_PAGEALLOC for arm64 Laura Abbott
2016-01-29 23:46 ` [PATCHv2 1/3] arm64: Drop alloc function from create_mapping Laura Abbott
2016-01-30 10:34   ` Ard Biesheuvel
2016-02-01 12:11     ` Mark Rutland
2016-01-29 23:46 ` [PATCHv2 2/3] arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC Laura Abbott
2016-02-01 12:29   ` Mark Rutland
2016-02-01 21:24     ` Laura Abbott
2016-02-02  8:57       ` Ard Biesheuvel
2016-02-02 12:23       ` Mark Rutland
2016-02-02 12:31         ` Mark Rutland
2016-02-02 16:08           ` Laura Abbott
2016-01-29 23:46 ` [PATCHv2 3/3] arm64: ptdump: Indicate whether memory should be faulting Laura Abbott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).