All of lore.kernel.org
 help / color / mirror / Atom feed
* [bug report] vfio: Can't find phys by iova in vfio_unmap_unpin()
@ 2019-05-20  7:50 jiangyiwen
  2019-05-20 19:28 ` Alex Williamson
  0 siblings, 1 reply; 5+ messages in thread
From: jiangyiwen @ 2019-05-20  7:50 UTC (permalink / raw)
  To: alex.williamson; +Cc: kvm

Hello alex,

We test a call trace as follows use ARM64 architecture,
it prints a WARN_ON() when find not physical address by
iova in vfio_unmap_unpin(), I can't find the cause of
problem now, do you have any ideas?

In addition, I want to know why there is a WARN_ON() instead
of BUG_ON()? Does it affect the follow-up process?

Thanks,
Yiwen.

2019-05-17T15:43:36.565426+08:00|warning|kernel[-]|[12727.392078] WARNING: CPU: 70 PID: 13816 at drivers/vfio/vfio_iommu_type1.c:795 vfio_unmap_unpin+0x300/0x370
2019-05-17T15:43:36.565501+08:00|warning|kernel[-]|[12727.392083] Modules linked in: dm_service_time dm_multipath ebtable_filter ebtables ip6table_filter ip6_tables dev_connlimit(O) vhba(O) iptable_filter elbtrans(O) vm_eth_qos(O) vm_pps_qos(O) vm_bps_qos(O) bum(O) ip_set nfnetlink prio(O) nat(O) vport_vxlan(O) openvswitch(O) nf_nat_ipv6 nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 gre signo_catch(O) hotpatch(O) gcn_hotpatch(O) kboxdriver(O) kbox(O) ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad rpcrdma sunrpc rdma_ucm ib_uverbs ib_iser rdma_cm iw_cm ib_cm aes_ce_blk crypto_simd cryptd ses aes_ce_cipher enclosure crc32_ce ghash_ce sbsa_gwdt sha2_ce sha256_arm64 sha1_ce hinic sg hibmc_drm ttm drm_kms_helper drm fb_sys_fops syscopyarea sysfillrect sysimgblt hns_roce_hw_v2 hns_roce ib_core
2019-05-17T15:43:36.566357+08:00|warning|kernel[-]|[12727.392156]  realtek hns3 hclge hnae3 remote_trigger(O) vhost_net(O) tun(O) vhost(O) tap ip_tables dm_mod ipmi_si ipmi_devintf ipmi_msghandler megaraid_sas hisi_sas_v3_hw hisi_sas_main br_netfilter xt_sctp
2019-05-17T15:43:36.566371+08:00|warning|kernel[-]|[12727.392178] CPU: 70 PID: 13816 Comm: vnc_worker Kdump: loaded Tainted: G           O      4.19.36-1.2.142.aarch64 #1
2019-05-17T15:43:36.566383+08:00|warning|kernel[-]|[12727.392179] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 0.12 05/14/2019
2019-05-17T15:43:36.566394+08:00|warning|kernel[-]|[12727.392181] pstate: 80400009 (Nzcv daif +PAN -UAO)
2019-05-17T15:43:36.566404+08:00|warning|kernel[-]|[12727.392182] pc : vfio_unmap_unpin+0x300/0x370
2019-05-17T15:43:36.566414+08:00|warning|kernel[-]|[12727.392183] lr : vfio_unmap_unpin+0xe4/0x370
2019-05-17T15:43:36.566425+08:00|warning|kernel[-]|[12727.392184] sp : ffff0000216eb950
2019-05-17T15:43:36.566439+08:00|warning|kernel[-]|[12727.392185] x29: ffff0000216eb950 x28: ffffa05deef8e280
2019-05-17T15:43:36.566449+08:00|warning|kernel[-]|[12727.392187] x27: ffffa05deef8fa80 x26: ffff8042055f6688
2019-05-17T15:43:36.566460+08:00|warning|kernel[-]|[12727.392189] x25: ffff0000216eb9d8 x24: 00000000008b29d8
2019-05-17T15:43:36.566470+08:00|warning|kernel[-]|[12727.392191] x23: ffff804104c5b700 x22: 00000008fb51e000
2019-05-17T15:43:36.566480+08:00|warning|kernel[-]|[12727.392193] x21: 0000000a40000000 x20: 0000000000000000
2019-05-17T15:43:36.566490+08:00|warning|kernel[-]|[12727.392195] x19: ffff8042055f6680 x18: ffff000009605d28
2019-05-17T15:43:36.566501+08:00|warning|kernel[-]|[12727.392197] x17: 0000000000000000 x16: 000000000000000c
2019-05-17T15:43:36.566511+08:00|warning|kernel[-]|[12727.392199] x15: 00000000ffffffff x14: 000000000000003f
2019-05-17T15:43:36.566523+08:00|warning|kernel[-]|[12727.392201] x13: 0000000000001000 x12: 0000000000000000
2019-05-17T15:43:36.566533+08:00|warning|kernel[-]|[12727.392203] x11: ffff805d7aa8df00 x10: 000000000000000c
2019-05-17T15:43:36.566543+08:00|warning|kernel[-]|[12727.392205] x9 : 000000000000000c x8 : 0000000000000000
2019-05-17T15:43:36.566554+08:00|warning|kernel[-]|[12727.392207] x7 : 0000000000000009 x6 : 00000000ffffffff
2019-05-17T15:43:36.566564+08:00|warning|kernel[-]|[12727.392209] x5 : 0000000000000000 x4 : 0000000000000001
2019-05-17T15:43:36.566576+08:00|warning|kernel[-]|[12727.392211] x3 : 0000000000000001 x2 : 0000000000000000
2019-05-17T15:43:36.566586+08:00|warning|kernel[-]|[12727.392213] x1 : 0000000000000000 x0 : 0000000000000000
2019-05-17T15:43:36.566597+08:00|warning|kernel[-]|[12727.392215] Call trace:
2019-05-17T15:43:36.566607+08:00|warning|kernel[-]|[12727.392217]  vfio_unmap_unpin+0x300/0x370
2019-05-17T15:43:36.566618+08:00|warning|kernel[-]|[12727.392218]  vfio_remove_dma+0x2c/0x80
2019-05-17T15:43:36.566628+08:00|warning|kernel[-]|[12727.392220]  vfio_iommu_unmap_unpin_all+0x2c/0x48
2019-05-17T15:43:36.566638+08:00|warning|kernel[-]|[12727.392221]  vfio_iommu_type1_detach_group+0x2e8/0x2f0
2019-05-17T15:43:36.566648+08:00|warning|kernel[-]|[12727.392226]  __vfio_group_unset_container+0x54/0x180
2019-05-17T15:43:36.566659+08:00|warning|kernel[-]|[12727.392228]  vfio_group_try_dissolve_container+0x54/0x68
2019-05-17T15:43:36.566669+08:00|warning|kernel[-]|[12727.392230]  vfio_group_put_external_user+0x20/0x38
2019-05-17T15:43:36.566680+08:00|warning|kernel[-]|[12727.392235]  kvm_vfio_group_put_external_user+0x38/0x50
2019-05-17T15:43:36.566690+08:00|warning|kernel[-]|[12727.392236]  kvm_vfio_destroy+0x5c/0xc8
2019-05-17T15:43:36.566700+08:00|warning|kernel[-]|[12727.392237]  kvm_put_kvm+0x1c8/0x2e0
2019-05-17T15:43:36.566710+08:00|warning|kernel[-]|[12727.392239]  kvm_vm_release+0x2c/0x40
2019-05-17T15:43:36.566721+08:00|warning|kernel[-]|[12727.392243]  __fput+0xac/0x218
2019-05-17T15:43:36.566731+08:00|warning|kernel[-]|[12727.392244]  ____fput+0x20/0x30
2019-05-17T15:43:36.566741+08:00|warning|kernel[-]|[12727.392247]  task_work_run+0xc0/0xf8
2019-05-17T15:43:36.566751+08:00|warning|kernel[-]|[12727.392250]  do_exit+0x300/0x5b0
2019-05-17T15:43:36.566761+08:00|warning|kernel[-]|[12727.392251]  do_group_exit+0x3c/0xe0
2019-05-17T15:43:36.566772+08:00|warning|kernel[-]|[12727.392254]  get_signal+0x12c/0x6e0
2019-05-17T15:43:36.566783+08:00|warning|kernel[-]|[12727.392257]  do_signal+0x180/0x288
2019-05-17T15:43:36.566793+08:00|warning|kernel[-]|[12727.392259]  do_notify_resume+0x100/0x188
2019-05-17T15:43:36.566804+08:00|warning|kernel[-]|[12727.392261]  work_pending+0x8/0x10
2019-05-17T15:43:36.566814+08:00|warning|kernel[-]|[12727.392263] ---[ end trace 12212429631eec72 ]---


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [bug report] vfio: Can't find phys by iova in vfio_unmap_unpin()
  2019-05-20  7:50 [bug report] vfio: Can't find phys by iova in vfio_unmap_unpin() jiangyiwen
@ 2019-05-20 19:28 ` Alex Williamson
  2019-06-11  3:21   ` jiangyiwen
  0 siblings, 1 reply; 5+ messages in thread
From: Alex Williamson @ 2019-05-20 19:28 UTC (permalink / raw)
  To: jiangyiwen; +Cc: kvm

On Mon, 20 May 2019 15:50:11 +0800
jiangyiwen <jiangyiwen@huawei.com> wrote:

> Hello alex,
> 
> We test a call trace as follows use ARM64 architecture,
> it prints a WARN_ON() when find not physical address by
> iova in vfio_unmap_unpin(), I can't find the cause of
> problem now, do you have any ideas?

Is it reproducible?  Can you explain how to reproduce it?  The stack
trace indicates a KVM VM is being shutdown and we're trying to clean
out the IOMMU mappings from the domain and find a page that we think
should be mapped that the IOMMU doesn't have mapped.  What device(s) was
assigned to the VM?  This could be an IOMMU driver bug or a
vfio_iommu_type1 bug.  Have you been able to reproduce this on other
platforms?

> In addition, I want to know why there is a WARN_ON() instead
> of BUG_ON()? Does it affect the follow-up process?

We're removing an IOMMU page mapping entry and find that it's not
present, so ultimately the effect at the IOMMU is the same, there's no
mapping at that address, but I can't say without further analysis
whether that means a page remains pinned or if that inconsistency was
resolved previously elsewhere.  We WARN_ON because this is not what we
expect, but potentially leaking a page of memory doesn't seem worthy of
crashing the host, nor would a crash dump at that point necessarily aid
in resolving the missing page as it potentially occurred well in the
past.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [bug report] vfio: Can't find phys by iova in vfio_unmap_unpin()
  2019-05-20 19:28 ` Alex Williamson
@ 2019-06-11  3:21   ` jiangyiwen
       [not found]     ` <5CFFA149.8070303@huawei.com>
  0 siblings, 1 reply; 5+ messages in thread
From: jiangyiwen @ 2019-06-11  3:21 UTC (permalink / raw)
  To: Alex Williamson; +Cc: kvm

On 2019/5/21 3:28, Alex Williamson wrote:
> On Mon, 20 May 2019 15:50:11 +0800
> jiangyiwen <jiangyiwen@huawei.com> wrote:
> 
>> Hello alex,
>>
>> We test a call trace as follows use ARM64 architecture,
>> it prints a WARN_ON() when find not physical address by
>> iova in vfio_unmap_unpin(), I can't find the cause of
>> problem now, do you have any ideas?
> 
> Is it reproducible?  Can you explain how to reproduce it?  The stack
> trace indicates a KVM VM is being shutdown and we're trying to clean
> out the IOMMU mappings from the domain and find a page that we think
> should be mapped that the IOMMU doesn't have mapped.  What device(s) was
> assigned to the VM?  This could be an IOMMU driver bug or a
> vfio_iommu_type1 bug.  Have you been able to reproduce this on other
> platforms?
> 

Hello Alex,

Sorry to reply you so late because of some things,
this problem's reason is in some platform (like ARM64),
the "0" physical address is valid and can be used for
system memory, so in this case it should not print a
WARN_ON() and continue, we should unmap and unpin this
"0" physical address in these platform.

So I want to return FFFFFFFFFFFFFFFFL instead of "0" as invalid
physical address in function iommu_iova_to_phys(). Do you think
it's appropriate?

Thanks,
Yiwen.

>> In addition, I want to know why there is a WARN_ON() instead
>> of BUG_ON()? Does it affect the follow-up process?
> 
> We're removing an IOMMU page mapping entry and find that it's not
> present, so ultimately the effect at the IOMMU is the same, there's no
> mapping at that address, but I can't say without further analysis
> whether that means a page remains pinned or if that inconsistency was
> resolved previously elsewhere.  We WARN_ON because this is not what we
> expect, but potentially leaking a page of memory doesn't seem worthy of
> crashing the host, nor would a crash dump at that point necessarily aid
> in resolving the missing page as it potentially occurred well in the
> past.  Thanks,
> 
> Alex
> 
> .
> 



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [bug report] vfio: Can't find phys by iova in vfio_unmap_unpin()
       [not found]     ` <5CFFA149.8070303@huawei.com>
@ 2019-06-11 15:13         ` Alex Williamson
  0 siblings, 0 replies; 5+ messages in thread
From: Alex Williamson @ 2019-06-11 15:13 UTC (permalink / raw)
  To: Jiangyiwen; +Cc: kvm, open list:AMD IOMMU (AMD-VI)

[cc +iommu]

On Tue, 11 Jun 2019 20:40:41 +0800
Jiangyiwen <jiangyiwen@huawei.com> wrote:

> Hi Alex,
> 
> I found this problem is not very easy to solve, for
> now, in arm64 platform, the "0" physical address
> is a valid system memory address, so in function
> arm_smmu_iova_to_phys() I think it should not use
> "0" as abnormal return value.
> 
> Do you have any idea?

I think you're going to need to redefine iommu_iova_to_phys() and fix
all the IOMMU implementations of it to comply.  Currently AMD and Intel
IOMMU driver return zero if a mapping is not found.  You could make the
function return 0/errno and return the physical address via a pointer
arg.  You could also keep the existing definition, but introduce a test
for a valid result that might use an architecture specific value (akin
to IS_ERR()).  You could also just reserve the zero page from userspace
allocation.  I really don't want #ifdef in the vfio iommu driver trying
to discern the correct invalid value though.  Thanks,

Alex

> On 2019/6/11 11:21, jiangyiwen wrote:
> > On 2019/5/21 3:28, Alex Williamson wrote:  
> >> On Mon, 20 May 2019 15:50:11 +0800
> >> jiangyiwen <jiangyiwen@huawei.com> wrote:
> >>  
> >>> Hello alex,
> >>>
> >>> We test a call trace as follows use ARM64 architecture,
> >>> it prints a WARN_ON() when find not physical address by
> >>> iova in vfio_unmap_unpin(), I can't find the cause of
> >>> problem now, do you have any ideas?  
> >> Is it reproducible?  Can you explain how to reproduce it?  The stack
> >> trace indicates a KVM VM is being shutdown and we're trying to clean
> >> out the IOMMU mappings from the domain and find a page that we think
> >> should be mapped that the IOMMU doesn't have mapped.  What device(s) was
> >> assigned to the VM?  This could be an IOMMU driver bug or a
> >> vfio_iommu_type1 bug.  Have you been able to reproduce this on other
> >> platforms?
> >>  
> > Hello Alex,
> >
> > Sorry to reply you so late because of some things,
> > this problem's reason is in some platform (like ARM64),
> > the "0" physical address is valid and can be used for
> > system memory, so in this case it should not print a
> > WARN_ON() and continue, we should unmap and unpin this
> > "0" physical address in these platform.
> >
> > So I want to return FFFFFFFFFFFFFFFFL instead of "0" as invalid
> > physical address in function iommu_iova_to_phys(). Do you think
> > it's appropriate?
> >
> > Thanks,
> > Yiwen.
> >  
> >>> In addition, I want to know why there is a WARN_ON() instead
> >>> of BUG_ON()? Does it affect the follow-up process?  
> >> We're removing an IOMMU page mapping entry and find that it's not
> >> present, so ultimately the effect at the IOMMU is the same, there's no
> >> mapping at that address, but I can't say without further analysis
> >> whether that means a page remains pinned or if that inconsistency was
> >> resolved previously elsewhere.  We WARN_ON because this is not what we
> >> expect, but potentially leaking a page of memory doesn't seem worthy of
> >> crashing the host, nor would a crash dump at that point necessarily aid
> >> in resolving the missing page as it potentially occurred well in the
> >> past.  Thanks,
> >>
> >> Alex
> >>
> >> .
> >>  
> >
> >
> > .
> >  
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [bug report] vfio: Can't find phys by iova in vfio_unmap_unpin()
@ 2019-06-11 15:13         ` Alex Williamson
  0 siblings, 0 replies; 5+ messages in thread
From: Alex Williamson @ 2019-06-11 15:13 UTC (permalink / raw)
  To: Jiangyiwen; +Cc: open list:AMD IOMMU (AMD-VI), kvm

[cc +iommu]

On Tue, 11 Jun 2019 20:40:41 +0800
Jiangyiwen <jiangyiwen@huawei.com> wrote:

> Hi Alex,
> 
> I found this problem is not very easy to solve, for
> now, in arm64 platform, the "0" physical address
> is a valid system memory address, so in function
> arm_smmu_iova_to_phys() I think it should not use
> "0" as abnormal return value.
> 
> Do you have any idea?

I think you're going to need to redefine iommu_iova_to_phys() and fix
all the IOMMU implementations of it to comply.  Currently AMD and Intel
IOMMU driver return zero if a mapping is not found.  You could make the
function return 0/errno and return the physical address via a pointer
arg.  You could also keep the existing definition, but introduce a test
for a valid result that might use an architecture specific value (akin
to IS_ERR()).  You could also just reserve the zero page from userspace
allocation.  I really don't want #ifdef in the vfio iommu driver trying
to discern the correct invalid value though.  Thanks,

Alex

> On 2019/6/11 11:21, jiangyiwen wrote:
> > On 2019/5/21 3:28, Alex Williamson wrote:  
> >> On Mon, 20 May 2019 15:50:11 +0800
> >> jiangyiwen <jiangyiwen@huawei.com> wrote:
> >>  
> >>> Hello alex,
> >>>
> >>> We test a call trace as follows use ARM64 architecture,
> >>> it prints a WARN_ON() when find not physical address by
> >>> iova in vfio_unmap_unpin(), I can't find the cause of
> >>> problem now, do you have any ideas?  
> >> Is it reproducible?  Can you explain how to reproduce it?  The stack
> >> trace indicates a KVM VM is being shutdown and we're trying to clean
> >> out the IOMMU mappings from the domain and find a page that we think
> >> should be mapped that the IOMMU doesn't have mapped.  What device(s) was
> >> assigned to the VM?  This could be an IOMMU driver bug or a
> >> vfio_iommu_type1 bug.  Have you been able to reproduce this on other
> >> platforms?
> >>  
> > Hello Alex,
> >
> > Sorry to reply you so late because of some things,
> > this problem's reason is in some platform (like ARM64),
> > the "0" physical address is valid and can be used for
> > system memory, so in this case it should not print a
> > WARN_ON() and continue, we should unmap and unpin this
> > "0" physical address in these platform.
> >
> > So I want to return FFFFFFFFFFFFFFFFL instead of "0" as invalid
> > physical address in function iommu_iova_to_phys(). Do you think
> > it's appropriate?
> >
> > Thanks,
> > Yiwen.
> >  
> >>> In addition, I want to know why there is a WARN_ON() instead
> >>> of BUG_ON()? Does it affect the follow-up process?  
> >> We're removing an IOMMU page mapping entry and find that it's not
> >> present, so ultimately the effect at the IOMMU is the same, there's no
> >> mapping at that address, but I can't say without further analysis
> >> whether that means a page remains pinned or if that inconsistency was
> >> resolved previously elsewhere.  We WARN_ON because this is not what we
> >> expect, but potentially leaking a page of memory doesn't seem worthy of
> >> crashing the host, nor would a crash dump at that point necessarily aid
> >> in resolving the missing page as it potentially occurred well in the
> >> past.  Thanks,
> >>
> >> Alex
> >>
> >> .
> >>  
> >
> >
> > .
> >  
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-06-11 15:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-20  7:50 [bug report] vfio: Can't find phys by iova in vfio_unmap_unpin() jiangyiwen
2019-05-20 19:28 ` Alex Williamson
2019-06-11  3:21   ` jiangyiwen
     [not found]     ` <5CFFA149.8070303@huawei.com>
2019-06-11 15:13       ` Alex Williamson
2019-06-11 15:13         ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.