All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <marc.zyngier@arm.com>
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.cs.columbia.edu
Cc: Christoffer Dall <christoffer.dall@linaro.com>
Subject: [PATCH 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting
Date: Wed, 25 Feb 2015 16:55:38 +0000	[thread overview]
Message-ID: <1424883340-29940-2-git-send-email-marc.zyngier@arm.com> (raw)
In-Reply-To: <1424883340-29940-1-git-send-email-marc.zyngier@arm.com>

We're using __get_free_pages with to allocate the guest's stage-2
PGD. The standard behaviour of this function is to return a set of
pages where only the head page has a valid refcount.

This behaviour gets us into trouble when we're trying to increment
the refount on a non-head page:

page:ffff7c00cfb693c0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x4000000000000000()
page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0)
BUG: failure at include/linux/mm.h:548/get_page()!
Kernel panic - not syncing: BUG!
CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825
Hardware name: APM X-Gene Mustang board (DT)
Call trace:
[<ffff80000008a09c>] dump_backtrace+0x0/0x13c
[<ffff80000008a1e8>] show_stack+0x10/0x1c
[<ffff800000691da8>] dump_stack+0x74/0x94
[<ffff800000690d78>] panic+0x100/0x240
[<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc
[<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0
[<ffff8000000a420c>] handle_exit+0x58/0x180
[<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c
[<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754
[<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8
[<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78
CPU0: stopping

Passing the (unintuitively named) __GFP_COMP flag to __get_free_pages
forces the allocator to maintain a per-page refcount, which is exactly
what we need.

This has been tested on an X-Gene platform with a 4kB/48bit-VA host
kernel, and kvmtool hacked to place memory in the second page of
the hardware PGD (PUD for the host kernel).

Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   | 7 +++++++
 arch/arm/kvm/mmu.c               | 2 +-
 arch/arm64/include/asm/kvm_mmu.h | 9 ++++++++-
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 37ca2a4..1cac89b 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -162,6 +162,13 @@ static inline bool kvm_page_empty(void *ptr)
 
 #define KVM_PREALLOC_LEVEL	0
 
+/*
+ * We need to ensure that stage-2 PGDs are allocated with a per-page
+ * refcount, as we fiddle with the refcounts of non-head pages.
+ * __GFP_COMP forces the allocator to do what we want.
+ */
+#define KVM_GFP_S2_PGD	(GFP_KERNEL | __GFP_ZERO | __GFP_COMP)
+
 static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
 {
 	return 0;
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 3e6859b..a6a8252 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -666,7 +666,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
 		 * Allocate actual first-level Stage-2 page table used by the
 		 * hardware for Stage-2 page table walks.
 		 */
-		pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, S2_PGD_ORDER);
+		pgd = (pgd_t *)__get_free_pages(KVM_GFP_S2_PGD, S2_PGD_ORDER);
 	}
 
 	if (!pgd)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 6458b53..06c733a 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -171,6 +171,13 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
 #define KVM_PREALLOC_LEVEL	(0)
 #endif
 
+/*
+ * We need to ensure that stage-2 PGDs are allocated with a per-page
+ * refcount, as we fiddle with the refcounts of non-head pages.
+ * __GFP_COMP forces the allocator to do what we want.
+ */
+#define KVM_GFP_S2_PGD	(GFP_KERNEL | __GFP_ZERO | __GFP_COMP)
+
 /**
  * kvm_prealloc_hwpgd - allocate inital table for VTTBR
  * @kvm:	The KVM struct pointer for the VM.
@@ -192,7 +199,7 @@ static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
 	if (KVM_PREALLOC_LEVEL == 0)
 		return 0;
 
-	hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, PTRS_PER_S2_PGD_SHIFT);
+	hwpgd = __get_free_pages(KVM_GFP_S2_PGD, PTRS_PER_S2_PGD_SHIFT);
 	if (!hwpgd)
 		return -ENOMEM;
 
-- 
2.1.4

WARNING: multiple messages have this Message-ID (diff)
From: marc.zyngier@arm.com (Marc Zyngier)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting
Date: Wed, 25 Feb 2015 16:55:38 +0000	[thread overview]
Message-ID: <1424883340-29940-2-git-send-email-marc.zyngier@arm.com> (raw)
In-Reply-To: <1424883340-29940-1-git-send-email-marc.zyngier@arm.com>

We're using __get_free_pages with to allocate the guest's stage-2
PGD. The standard behaviour of this function is to return a set of
pages where only the head page has a valid refcount.

This behaviour gets us into trouble when we're trying to increment
the refount on a non-head page:

page:ffff7c00cfb693c0 count:0 mapcount:0 mapping:          (null) index:0x0
flags: 0x4000000000000000()
page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0)
BUG: failure@include/linux/mm.h:548/get_page()!
Kernel panic - not syncing: BUG!
CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825
Hardware name: APM X-Gene Mustang board (DT)
Call trace:
[<ffff80000008a09c>] dump_backtrace+0x0/0x13c
[<ffff80000008a1e8>] show_stack+0x10/0x1c
[<ffff800000691da8>] dump_stack+0x74/0x94
[<ffff800000690d78>] panic+0x100/0x240
[<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc
[<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0
[<ffff8000000a420c>] handle_exit+0x58/0x180
[<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c
[<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754
[<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8
[<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78
CPU0: stopping

Passing the (unintuitively named) __GFP_COMP flag to __get_free_pages
forces the allocator to maintain a per-page refcount, which is exactly
what we need.

This has been tested on an X-Gene platform with a 4kB/48bit-VA host
kernel, and kvmtool hacked to place memory in the second page of
the hardware PGD (PUD for the host kernel).

Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
---
 arch/arm/include/asm/kvm_mmu.h   | 7 +++++++
 arch/arm/kvm/mmu.c               | 2 +-
 arch/arm64/include/asm/kvm_mmu.h | 9 ++++++++-
 3 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 37ca2a4..1cac89b 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -162,6 +162,13 @@ static inline bool kvm_page_empty(void *ptr)
 
 #define KVM_PREALLOC_LEVEL	0
 
+/*
+ * We need to ensure that stage-2 PGDs are allocated with a per-page
+ * refcount, as we fiddle with the refcounts of non-head pages.
+ * __GFP_COMP forces the allocator to do what we want.
+ */
+#define KVM_GFP_S2_PGD	(GFP_KERNEL | __GFP_ZERO | __GFP_COMP)
+
 static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
 {
 	return 0;
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 3e6859b..a6a8252 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -666,7 +666,7 @@ int kvm_alloc_stage2_pgd(struct kvm *kvm)
 		 * Allocate actual first-level Stage-2 page table used by the
 		 * hardware for Stage-2 page table walks.
 		 */
-		pgd = (pgd_t *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, S2_PGD_ORDER);
+		pgd = (pgd_t *)__get_free_pages(KVM_GFP_S2_PGD, S2_PGD_ORDER);
 	}
 
 	if (!pgd)
diff --git a/arch/arm64/include/asm/kvm_mmu.h b/arch/arm64/include/asm/kvm_mmu.h
index 6458b53..06c733a 100644
--- a/arch/arm64/include/asm/kvm_mmu.h
+++ b/arch/arm64/include/asm/kvm_mmu.h
@@ -171,6 +171,13 @@ static inline bool kvm_s2pmd_readonly(pmd_t *pmd)
 #define KVM_PREALLOC_LEVEL	(0)
 #endif
 
+/*
+ * We need to ensure that stage-2 PGDs are allocated with a per-page
+ * refcount, as we fiddle with the refcounts of non-head pages.
+ * __GFP_COMP forces the allocator to do what we want.
+ */
+#define KVM_GFP_S2_PGD	(GFP_KERNEL | __GFP_ZERO | __GFP_COMP)
+
 /**
  * kvm_prealloc_hwpgd - allocate inital table for VTTBR
  * @kvm:	The KVM struct pointer for the VM.
@@ -192,7 +199,7 @@ static inline int kvm_prealloc_hwpgd(struct kvm *kvm, pgd_t *pgd)
 	if (KVM_PREALLOC_LEVEL == 0)
 		return 0;
 
-	hwpgd = __get_free_pages(GFP_KERNEL | __GFP_ZERO, PTRS_PER_S2_PGD_SHIFT);
+	hwpgd = __get_free_pages(KVM_GFP_S2_PGD, PTRS_PER_S2_PGD_SHIFT);
 	if (!hwpgd)
 		return -ENOMEM;
 
-- 
2.1.4

  reply	other threads:[~2015-02-25 16:50 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-02-25 16:55 [PATCH 0/3] arm64: KVM: High memory guest fixes Marc Zyngier
2015-02-25 16:55 ` Marc Zyngier
2015-02-25 16:55 ` Marc Zyngier [this message]
2015-02-25 16:55   ` [PATCH 1/3] arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting Marc Zyngier
2015-03-02 18:27   ` Christoffer Dall
2015-03-02 18:27     ` Christoffer Dall
2015-03-05 13:58     ` Marc Zyngier
2015-03-05 13:58       ` Marc Zyngier
2015-02-25 16:55 ` [PATCH 2/3] arm64: KVM: Do not use pgd_index to index stage-2 pgd Marc Zyngier
2015-02-25 16:55   ` Marc Zyngier
2015-03-02 18:45   ` Christoffer Dall
2015-03-02 18:45     ` Christoffer Dall
2015-02-25 16:55 ` [PATCH 3/3] arm64: KVM: Fix outdated comment about VTCR_EL2.PS Marc Zyngier
2015-02-25 16:55   ` Marc Zyngier
2015-03-02 18:52   ` Christoffer Dall
2015-03-02 18:52     ` Christoffer Dall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1424883340-29940-2-git-send-email-marc.zyngier@arm.com \
    --to=marc.zyngier@arm.com \
    --cc=christoffer.dall@linaro.com \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.