LinuxPPC-Dev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] KVM: PPC: Book3S HV: fix a oops in kvmppc_uvmem_page_free()
@ 2020-07-30 23:25 Ram Pai
  2020-07-31  4:29 ` Bharata B Rao
  0 siblings, 1 reply; 4+ messages in thread
From: Ram Pai @ 2020-07-30 23:25 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev
  Cc: ldufour, linuxram, cclaudio, bharata, sathnaga, aneesh.kumar,
	sukadev, bauerman, david

Observed the following oops while stress-testing, using multiple
secureVM on a distro kernel. However this issue theoritically exists in
5.5 kernel and later.

This issue occurs when the total number of requested device-PFNs exceed
the total-number of available device-PFNs.  PFN migration fails to
allocate a device-pfn, which causes migrate_vma_finalize() to trigger
kvmppc_uvmem_page_free() on a page, that is not associated with any
device-pfn.  kvmppc_uvmem_page_free() blindly tries to access the
contents of the private data which can be null, leading to the following
kernel fault.

 --------------------------------------------------------------------------
 Unable to handle kernel paging request for data at address 0x00000011
 Faulting instruction address: 0xc00800000e36e110
 Oops: Kernel access of bad area, sig: 11 [#1]
 LE SMP NR_CPUS=2048 NUMA PowerNV
....
 MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
		 CR: 24424822  XER: 00000000
 CFAR: c000000000e3d764 DAR: 0000000000000011 DSISR: 40000000 IRQMASK: 0
 GPR00: c00800000e36e0a4 c000001f1d59f610 c00800000e38a400 0000000000000000
 GPR04: c000001fa5000000 fffffffffffffffe ffffffffffffffff c000201fffeaf300
 GPR08: 00000000000001f0 0000000000000000 0000000000000f80 c00800000e373608
 GPR12: c000000000e3d710 c000201fffeaf300 0000000000000001 00007fef87360000
 GPR16: 00007fff97db4410 c000201c3b66a578 ffffffffffffffff 0000000000000000
 GPR20: 0000000119db9ad0 000000000000000a fffffffffffffffc 0000000000000001
 GPR24: c000201c3b660000 c000001f1d59f7a0 c0000000004cffb0 0000000000000001
 GPR28: 0000000000000000 c00a001ff003e000 c00800000e386150 0000000000000f80
 NIP [c00800000e36e110] kvmppc_uvmem_page_free+0xc8/0x210 [kvm_hv]
 LR [c00800000e36e0a4] kvmppc_uvmem_page_free+0x5c/0x210 [kvm_hv]
 Call Trace:
 [c000000000512010] free_devmap_managed_page+0xd0/0x100
 [c0000000003f71d0] put_devmap_managed_page+0xa0/0xc0
 [c0000000004d24bc] migrate_vma_finalize+0x32c/0x410
 [c00800000e36e828] kvmppc_svm_page_in.constprop.5+0xa0/0x460 [kvm_hv]
 [c00800000e36eddc] kvmppc_uv_migrate_mem_slot.isra.2+0x1f4/0x230 [kvm_hv]
 [c00800000e36fa98] kvmppc_h_svm_init_done+0x90/0x170 [kvm_hv]
 [c00800000e35bb14] kvmppc_pseries_do_hcall+0x1ac/0x10a0 [kvm_hv]
 [c00800000e35edf4] kvmppc_vcpu_run_hv+0x83c/0x1060 [kvm_hv]
 [c00800000e95eb2c] kvmppc_vcpu_run+0x34/0x48 [kvm]
 [c00800000e95a2dc] kvm_arch_vcpu_ioctl_run+0x374/0x830 [kvm]
 [c00800000e9433b4] kvm_vcpu_ioctl+0x45c/0x7c0 [kvm]
 [c0000000005451d0] do_vfs_ioctl+0xe0/0xaa0
 [c000000000545d64] sys_ioctl+0xc4/0x160
 [c00000000000b408] system_call+0x5c/0x70
 Instruction dump:
 a12d1174 2f890000 409e0158 a1271172 3929ffff b1271172 7c2004ac 39200000
 913e0140 39200000 e87d0010 f93d0010 <89230011> e8c30000 e9030008 2f890000
 --------------------------------------------------------------------------

 Fix the oops..

fixes: ca9f49 ("KVM: PPC: Book3S HV: Support for running secure guests")
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 2806983..f4002bf 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -1018,13 +1018,15 @@ static void kvmppc_uvmem_page_free(struct page *page)
 {
 	unsigned long pfn = page_to_pfn(page) -
 			(kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT);
-	struct kvmppc_uvmem_page_pvt *pvt;
+	struct kvmppc_uvmem_page_pvt *pvt = page->zone_device_data;
+
+	if (!pvt)
+		return;
 
 	spin_lock(&kvmppc_uvmem_bitmap_lock);
 	bitmap_clear(kvmppc_uvmem_bitmap, pfn, 1);
 	spin_unlock(&kvmppc_uvmem_bitmap_lock);
 
-	pvt = page->zone_device_data;
 	page->zone_device_data = NULL;
 	if (pvt->remove_gfn)
 		kvmppc_gfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: PPC: Book3S HV: fix a oops in kvmppc_uvmem_page_free()
  2020-07-30 23:25 [PATCH] KVM: PPC: Book3S HV: fix a oops in kvmppc_uvmem_page_free() Ram Pai
@ 2020-07-31  4:29 ` Bharata B Rao
  2020-07-31  8:37   ` Ram Pai
  0 siblings, 1 reply; 4+ messages in thread
From: Bharata B Rao @ 2020-07-31  4:29 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Thu, Jul 30, 2020 at 04:25:26PM -0700, Ram Pai wrote:
> Observed the following oops while stress-testing, using multiple
> secureVM on a distro kernel. However this issue theoritically exists in
> 5.5 kernel and later.
> 
> This issue occurs when the total number of requested device-PFNs exceed
> the total-number of available device-PFNs.  PFN migration fails to
> allocate a device-pfn, which causes migrate_vma_finalize() to trigger
> kvmppc_uvmem_page_free() on a page, that is not associated with any
> device-pfn.  kvmppc_uvmem_page_free() blindly tries to access the
> contents of the private data which can be null, leading to the following
> kernel fault.
> 
>  --------------------------------------------------------------------------
>  Unable to handle kernel paging request for data at address 0x00000011
>  Faulting instruction address: 0xc00800000e36e110
>  Oops: Kernel access of bad area, sig: 11 [#1]
>  LE SMP NR_CPUS=2048 NUMA PowerNV
> ....
>  MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
> 		 CR: 24424822  XER: 00000000
>  CFAR: c000000000e3d764 DAR: 0000000000000011 DSISR: 40000000 IRQMASK: 0
>  GPR00: c00800000e36e0a4 c000001f1d59f610 c00800000e38a400 0000000000000000
>  GPR04: c000001fa5000000 fffffffffffffffe ffffffffffffffff c000201fffeaf300
>  GPR08: 00000000000001f0 0000000000000000 0000000000000f80 c00800000e373608
>  GPR12: c000000000e3d710 c000201fffeaf300 0000000000000001 00007fef87360000
>  GPR16: 00007fff97db4410 c000201c3b66a578 ffffffffffffffff 0000000000000000
>  GPR20: 0000000119db9ad0 000000000000000a fffffffffffffffc 0000000000000001
>  GPR24: c000201c3b660000 c000001f1d59f7a0 c0000000004cffb0 0000000000000001
>  GPR28: 0000000000000000 c00a001ff003e000 c00800000e386150 0000000000000f80
>  NIP [c00800000e36e110] kvmppc_uvmem_page_free+0xc8/0x210 [kvm_hv]
>  LR [c00800000e36e0a4] kvmppc_uvmem_page_free+0x5c/0x210 [kvm_hv]
>  Call Trace:
>  [c000000000512010] free_devmap_managed_page+0xd0/0x100
>  [c0000000003f71d0] put_devmap_managed_page+0xa0/0xc0
>  [c0000000004d24bc] migrate_vma_finalize+0x32c/0x410
>  [c00800000e36e828] kvmppc_svm_page_in.constprop.5+0xa0/0x460 [kvm_hv]
>  [c00800000e36eddc] kvmppc_uv_migrate_mem_slot.isra.2+0x1f4/0x230 [kvm_hv]
>  [c00800000e36fa98] kvmppc_h_svm_init_done+0x90/0x170 [kvm_hv]
>  [c00800000e35bb14] kvmppc_pseries_do_hcall+0x1ac/0x10a0 [kvm_hv]
>  [c00800000e35edf4] kvmppc_vcpu_run_hv+0x83c/0x1060 [kvm_hv]
>  [c00800000e95eb2c] kvmppc_vcpu_run+0x34/0x48 [kvm]
>  [c00800000e95a2dc] kvm_arch_vcpu_ioctl_run+0x374/0x830 [kvm]
>  [c00800000e9433b4] kvm_vcpu_ioctl+0x45c/0x7c0 [kvm]
>  [c0000000005451d0] do_vfs_ioctl+0xe0/0xaa0
>  [c000000000545d64] sys_ioctl+0xc4/0x160
>  [c00000000000b408] system_call+0x5c/0x70
>  Instruction dump:
>  a12d1174 2f890000 409e0158 a1271172 3929ffff b1271172 7c2004ac 39200000
>  913e0140 39200000 e87d0010 f93d0010 <89230011> e8c30000 e9030008 2f890000
>  --------------------------------------------------------------------------
> 
>  Fix the oops..
> 
> fixes: ca9f49 ("KVM: PPC: Book3S HV: Support for running secure guests")
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_hv_uvmem.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> index 2806983..f4002bf 100644
> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> @@ -1018,13 +1018,15 @@ static void kvmppc_uvmem_page_free(struct page *page)
>  {
>  	unsigned long pfn = page_to_pfn(page) -
>  			(kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT);
> -	struct kvmppc_uvmem_page_pvt *pvt;
> +	struct kvmppc_uvmem_page_pvt *pvt = page->zone_device_data;
> +
> +	if (!pvt)
> +		return;
>  
>  	spin_lock(&kvmppc_uvmem_bitmap_lock);
>  	bitmap_clear(kvmppc_uvmem_bitmap, pfn, 1);
>  	spin_unlock(&kvmppc_uvmem_bitmap_lock);
>  
> -	pvt = page->zone_device_data;
>  	page->zone_device_data = NULL;
>  	if (pvt->remove_gfn)
>  		kvmppc_gfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm);

In our case, device pages that are in use are always associated with a valid
pvt member. See kvmppc_uvmem_get_page() which returns failure if it
runs out of device pfns and that will result in proper failure of
page-in calls.

For the case where we run out of device pfns, migrate_vma_finalize() will
restore the original PTE and will not replace the PTE with device private PTE.

Also kvmppc_uvmem_page_free() (=dev_pagemap_ops.page_free()) is never
called for non-device-private pages.

This could be a use-after-free case possibly arising out of the new state
changes in HV. If so, this fix will only mask the bug and not address the
original problem.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: PPC: Book3S HV: fix a oops in kvmppc_uvmem_page_free()
  2020-07-31  4:29 ` Bharata B Rao
@ 2020-07-31  8:37   ` Ram Pai
  2020-07-31 10:02     ` Bharata B Rao
  0 siblings, 1 reply; 4+ messages in thread
From: Ram Pai @ 2020-07-31  8:37 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Fri, Jul 31, 2020 at 09:59:40AM +0530, Bharata B Rao wrote:
> On Thu, Jul 30, 2020 at 04:25:26PM -0700, Ram Pai wrote:
> > Observed the following oops while stress-testing, using multiple
> > secureVM on a distro kernel. However this issue theoritically exists in
> > 5.5 kernel and later.
> > 
> > This issue occurs when the total number of requested device-PFNs exceed
> > the total-number of available device-PFNs.  PFN migration fails to
> > allocate a device-pfn, which causes migrate_vma_finalize() to trigger
> > kvmppc_uvmem_page_free() on a page, that is not associated with any
> > device-pfn.  kvmppc_uvmem_page_free() blindly tries to access the
> > contents of the private data which can be null, leading to the following
> > kernel fault.
> > 
> >  --------------------------------------------------------------------------
> >  Unable to handle kernel paging request for data at address 0x00000011
> >  Faulting instruction address: 0xc00800000e36e110
> >  Oops: Kernel access of bad area, sig: 11 [#1]
> >  LE SMP NR_CPUS=2048 NUMA PowerNV
> > ....
> >  MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
> > 		 CR: 24424822  XER: 00000000
> >  CFAR: c000000000e3d764 DAR: 0000000000000011 DSISR: 40000000 IRQMASK: 0
> >  GPR00: c00800000e36e0a4 c000001f1d59f610 c00800000e38a400 0000000000000000
> >  GPR04: c000001fa5000000 fffffffffffffffe ffffffffffffffff c000201fffeaf300
> >  GPR08: 00000000000001f0 0000000000000000 0000000000000f80 c00800000e373608
> >  GPR12: c000000000e3d710 c000201fffeaf300 0000000000000001 00007fef87360000
> >  GPR16: 00007fff97db4410 c000201c3b66a578 ffffffffffffffff 0000000000000000
> >  GPR20: 0000000119db9ad0 000000000000000a fffffffffffffffc 0000000000000001
> >  GPR24: c000201c3b660000 c000001f1d59f7a0 c0000000004cffb0 0000000000000001
> >  GPR28: 0000000000000000 c00a001ff003e000 c00800000e386150 0000000000000f80
> >  NIP [c00800000e36e110] kvmppc_uvmem_page_free+0xc8/0x210 [kvm_hv]
> >  LR [c00800000e36e0a4] kvmppc_uvmem_page_free+0x5c/0x210 [kvm_hv]
> >  Call Trace:
> >  [c000000000512010] free_devmap_managed_page+0xd0/0x100
> >  [c0000000003f71d0] put_devmap_managed_page+0xa0/0xc0
> >  [c0000000004d24bc] migrate_vma_finalize+0x32c/0x410
> >  [c00800000e36e828] kvmppc_svm_page_in.constprop.5+0xa0/0x460 [kvm_hv]
> >  [c00800000e36eddc] kvmppc_uv_migrate_mem_slot.isra.2+0x1f4/0x230 [kvm_hv]
> >  [c00800000e36fa98] kvmppc_h_svm_init_done+0x90/0x170 [kvm_hv]
> >  [c00800000e35bb14] kvmppc_pseries_do_hcall+0x1ac/0x10a0 [kvm_hv]
> >  [c00800000e35edf4] kvmppc_vcpu_run_hv+0x83c/0x1060 [kvm_hv]
> >  [c00800000e95eb2c] kvmppc_vcpu_run+0x34/0x48 [kvm]
> >  [c00800000e95a2dc] kvm_arch_vcpu_ioctl_run+0x374/0x830 [kvm]
> >  [c00800000e9433b4] kvm_vcpu_ioctl+0x45c/0x7c0 [kvm]
> >  [c0000000005451d0] do_vfs_ioctl+0xe0/0xaa0
> >  [c000000000545d64] sys_ioctl+0xc4/0x160
> >  [c00000000000b408] system_call+0x5c/0x70
> >  Instruction dump:
> >  a12d1174 2f890000 409e0158 a1271172 3929ffff b1271172 7c2004ac 39200000
> >  913e0140 39200000 e87d0010 f93d0010 <89230011> e8c30000 e9030008 2f890000
> >  --------------------------------------------------------------------------
> > 
> >  Fix the oops..
> > 
> > fixes: ca9f49 ("KVM: PPC: Book3S HV: Support for running secure guests")
> > Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> > ---
> >  arch/powerpc/kvm/book3s_hv_uvmem.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > index 2806983..f4002bf 100644
> > --- a/arch/powerpc/kvm/book3s_hv_uvmem.c
> > +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
> > @@ -1018,13 +1018,15 @@ static void kvmppc_uvmem_page_free(struct page *page)
> >  {
> >  	unsigned long pfn = page_to_pfn(page) -
> >  			(kvmppc_uvmem_pgmap.res.start >> PAGE_SHIFT);
> > -	struct kvmppc_uvmem_page_pvt *pvt;
> > +	struct kvmppc_uvmem_page_pvt *pvt = page->zone_device_data;
> > +
> > +	if (!pvt)
> > +		return;
> >  
> >  	spin_lock(&kvmppc_uvmem_bitmap_lock);
> >  	bitmap_clear(kvmppc_uvmem_bitmap, pfn, 1);
> >  	spin_unlock(&kvmppc_uvmem_bitmap_lock);
> >  
> > -	pvt = page->zone_device_data;
> >  	page->zone_device_data = NULL;
> >  	if (pvt->remove_gfn)
> >  		kvmppc_gfn_remove(pvt->gpa >> PAGE_SHIFT, pvt->kvm);
> 
> In our case, device pages that are in use are always associated with a valid
> pvt member. See kvmppc_uvmem_get_page() which returns failure if it
> runs out of device pfns and that will result in proper failure of
> page-in calls.

looked at the code, and yes that code path looks correct. So my
reasoning behind the root cause of this bug is incorrect. However the
bug is surfacing and there must be a reason.

> 
> For the case where we run out of device pfns, migrate_vma_finalize() will
> restore the original PTE and will not replace the PTE with device private PTE.
> 
> Also kvmppc_uvmem_page_free() (=dev_pagemap_ops.page_free()) is never
> called for non-device-private pages.

Yes. it should not be called. But as seen above in the stack trace, it is called. 

What would cause the HMM to call ->page_free() on a page that is not
associated with that device's pfn?

> 
> This could be a use-after-free case possibly arising out of the new state
> changes in HV. If so, this fix will only mask the bug and not address the
> original problem.

I can verify by rerunning the tests, without the new state changes. But
I do not see how those changes can cause this fault?

This could also be caused by a duplicate ->page_free() call due to some
bug in the migrate_page path? Could there be a race between
migrate_page() and a page_fault ?


Regardless, kvmppc_uvmem_page_free() needs to be fixed. It should not
access contents of pvt, without verifing pvt is valid.

> 
> Regards,
> Bharata.

-- 
Ram Pai

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] KVM: PPC: Book3S HV: fix a oops in kvmppc_uvmem_page_free()
  2020-07-31  8:37   ` Ram Pai
@ 2020-07-31 10:02     ` Bharata B Rao
  0 siblings, 0 replies; 4+ messages in thread
From: Bharata B Rao @ 2020-07-31 10:02 UTC (permalink / raw)
  To: Ram Pai
  Cc: ldufour, cclaudio, kvm-ppc, sathnaga, aneesh.kumar, sukadev,
	linuxppc-dev, bauerman, david

On Fri, Jul 31, 2020 at 01:37:00AM -0700, Ram Pai wrote:
> On Fri, Jul 31, 2020 at 09:59:40AM +0530, Bharata B Rao wrote:
> > On Thu, Jul 30, 2020 at 04:25:26PM -0700, Ram Pai wrote:
> > In our case, device pages that are in use are always associated with a valid
> > pvt member. See kvmppc_uvmem_get_page() which returns failure if it
> > runs out of device pfns and that will result in proper failure of
> > page-in calls.
> 
> looked at the code, and yes that code path looks correct. So my
> reasoning behind the root cause of this bug is incorrect. However the
> bug is surfacing and there must be a reason.
> 
> > 
> > For the case where we run out of device pfns, migrate_vma_finalize() will
> > restore the original PTE and will not replace the PTE with device private PTE.
> > 
> > Also kvmppc_uvmem_page_free() (=dev_pagemap_ops.page_free()) is never
> > called for non-device-private pages.
> 
> Yes. it should not be called. But as seen above in the stack trace, it is called. 
> 
> What would cause the HMM to call ->page_free() on a page that is not
> associated with that device's pfn?

I believe it is being called for a device private page, you can verify
it when you hit it next time?

> 
> > 
> > This could be a use-after-free case possibly arising out of the new state
> > changes in HV. If so, this fix will only mask the bug and not address the
> > original problem.
> 
> I can verify by rerunning the tests, without the new state changes. But
> I do not see how those changes can cause this fault?
> 
> This could also be caused by a duplicate ->page_free() call due to some
> bug in the migrate_page path? Could there be a race between
> migrate_page() and a page_fault ?
> 
> 
> Regardless, kvmppc_uvmem_page_free() needs to be fixed. It should not
> access contents of pvt, without verifing pvt is valid.

We don't expect pvt to be NULL here. Checking for NULL and returning
isn't the right fix, I think.

Regards,
Bharata.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-30 23:25 [PATCH] KVM: PPC: Book3S HV: fix a oops in kvmppc_uvmem_page_free() Ram Pai
2020-07-31  4:29 ` Bharata B Rao
2020-07-31  8:37   ` Ram Pai
2020-07-31 10:02     ` Bharata B Rao

LinuxPPC-Dev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linuxppc-dev/0 linuxppc-dev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linuxppc-dev linuxppc-dev/ https://lore.kernel.org/linuxppc-dev \
		linuxppc-dev@lists.ozlabs.org linuxppc-dev@ozlabs.org
	public-inbox-index linuxppc-dev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.ozlabs.lists.linuxppc-dev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git