* [bug report] vfio: Can't find phys by iova in vfio_unmap_unpin() @ 2019-05-20 7:50 jiangyiwen 2019-05-20 19:28 ` Alex Williamson 0 siblings, 1 reply; 5+ messages in thread From: jiangyiwen @ 2019-05-20 7:50 UTC (permalink / raw) To: alex.williamson; +Cc: kvm Hello alex, We test a call trace as follows use ARM64 architecture, it prints a WARN_ON() when find not physical address by iova in vfio_unmap_unpin(), I can't find the cause of problem now, do you have any ideas? In addition, I want to know why there is a WARN_ON() instead of BUG_ON()? Does it affect the follow-up process? Thanks, Yiwen. 2019-05-17T15:43:36.565426+08:00|warning|kernel[-]|[12727.392078] WARNING: CPU: 70 PID: 13816 at drivers/vfio/vfio_iommu_type1.c:795 vfio_unmap_unpin+0x300/0x370 2019-05-17T15:43:36.565501+08:00|warning|kernel[-]|[12727.392083] Modules linked in: dm_service_time dm_multipath ebtable_filter ebtables ip6table_filter ip6_tables dev_connlimit(O) vhba(O) iptable_filter elbtrans(O) vm_eth_qos(O) vm_pps_qos(O) vm_bps_qos(O) bum(O) ip_set nfnetlink prio(O) nat(O) vport_vxlan(O) openvswitch(O) nf_nat_ipv6 nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 gre signo_catch(O) hotpatch(O) gcn_hotpatch(O) kboxdriver(O) kbox(O) ib_isert iscsi_target_mod ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_umad rpcrdma sunrpc rdma_ucm ib_uverbs ib_iser rdma_cm iw_cm ib_cm aes_ce_blk crypto_simd cryptd ses aes_ce_cipher enclosure crc32_ce ghash_ce sbsa_gwdt sha2_ce sha256_arm64 sha1_ce hinic sg hibmc_drm ttm drm_kms_helper drm fb_sys_fops syscopyarea sysfillrect sysimgblt hns_roce_hw_v2 hns_roce ib_core 2019-05-17T15:43:36.566357+08:00|warning|kernel[-]|[12727.392156] realtek hns3 hclge hnae3 remote_trigger(O) vhost_net(O) tun(O) vhost(O) tap ip_tables dm_mod ipmi_si ipmi_devintf ipmi_msghandler megaraid_sas hisi_sas_v3_hw hisi_sas_main br_netfilter xt_sctp 2019-05-17T15:43:36.566371+08:00|warning|kernel[-]|[12727.392178] CPU: 70 PID: 13816 Comm: vnc_worker Kdump: loaded Tainted: G O 4.19.36-1.2.142.aarch64 #1 2019-05-17T15:43:36.566383+08:00|warning|kernel[-]|[12727.392179] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 0.12 05/14/2019 2019-05-17T15:43:36.566394+08:00|warning|kernel[-]|[12727.392181] pstate: 80400009 (Nzcv daif +PAN -UAO) 2019-05-17T15:43:36.566404+08:00|warning|kernel[-]|[12727.392182] pc : vfio_unmap_unpin+0x300/0x370 2019-05-17T15:43:36.566414+08:00|warning|kernel[-]|[12727.392183] lr : vfio_unmap_unpin+0xe4/0x370 2019-05-17T15:43:36.566425+08:00|warning|kernel[-]|[12727.392184] sp : ffff0000216eb950 2019-05-17T15:43:36.566439+08:00|warning|kernel[-]|[12727.392185] x29: ffff0000216eb950 x28: ffffa05deef8e280 2019-05-17T15:43:36.566449+08:00|warning|kernel[-]|[12727.392187] x27: ffffa05deef8fa80 x26: ffff8042055f6688 2019-05-17T15:43:36.566460+08:00|warning|kernel[-]|[12727.392189] x25: ffff0000216eb9d8 x24: 00000000008b29d8 2019-05-17T15:43:36.566470+08:00|warning|kernel[-]|[12727.392191] x23: ffff804104c5b700 x22: 00000008fb51e000 2019-05-17T15:43:36.566480+08:00|warning|kernel[-]|[12727.392193] x21: 0000000a40000000 x20: 0000000000000000 2019-05-17T15:43:36.566490+08:00|warning|kernel[-]|[12727.392195] x19: ffff8042055f6680 x18: ffff000009605d28 2019-05-17T15:43:36.566501+08:00|warning|kernel[-]|[12727.392197] x17: 0000000000000000 x16: 000000000000000c 2019-05-17T15:43:36.566511+08:00|warning|kernel[-]|[12727.392199] x15: 00000000ffffffff x14: 000000000000003f 2019-05-17T15:43:36.566523+08:00|warning|kernel[-]|[12727.392201] x13: 0000000000001000 x12: 0000000000000000 2019-05-17T15:43:36.566533+08:00|warning|kernel[-]|[12727.392203] x11: ffff805d7aa8df00 x10: 000000000000000c 2019-05-17T15:43:36.566543+08:00|warning|kernel[-]|[12727.392205] x9 : 000000000000000c x8 : 0000000000000000 2019-05-17T15:43:36.566554+08:00|warning|kernel[-]|[12727.392207] x7 : 0000000000000009 x6 : 00000000ffffffff 2019-05-17T15:43:36.566564+08:00|warning|kernel[-]|[12727.392209] x5 : 0000000000000000 x4 : 0000000000000001 2019-05-17T15:43:36.566576+08:00|warning|kernel[-]|[12727.392211] x3 : 0000000000000001 x2 : 0000000000000000 2019-05-17T15:43:36.566586+08:00|warning|kernel[-]|[12727.392213] x1 : 0000000000000000 x0 : 0000000000000000 2019-05-17T15:43:36.566597+08:00|warning|kernel[-]|[12727.392215] Call trace: 2019-05-17T15:43:36.566607+08:00|warning|kernel[-]|[12727.392217] vfio_unmap_unpin+0x300/0x370 2019-05-17T15:43:36.566618+08:00|warning|kernel[-]|[12727.392218] vfio_remove_dma+0x2c/0x80 2019-05-17T15:43:36.566628+08:00|warning|kernel[-]|[12727.392220] vfio_iommu_unmap_unpin_all+0x2c/0x48 2019-05-17T15:43:36.566638+08:00|warning|kernel[-]|[12727.392221] vfio_iommu_type1_detach_group+0x2e8/0x2f0 2019-05-17T15:43:36.566648+08:00|warning|kernel[-]|[12727.392226] __vfio_group_unset_container+0x54/0x180 2019-05-17T15:43:36.566659+08:00|warning|kernel[-]|[12727.392228] vfio_group_try_dissolve_container+0x54/0x68 2019-05-17T15:43:36.566669+08:00|warning|kernel[-]|[12727.392230] vfio_group_put_external_user+0x20/0x38 2019-05-17T15:43:36.566680+08:00|warning|kernel[-]|[12727.392235] kvm_vfio_group_put_external_user+0x38/0x50 2019-05-17T15:43:36.566690+08:00|warning|kernel[-]|[12727.392236] kvm_vfio_destroy+0x5c/0xc8 2019-05-17T15:43:36.566700+08:00|warning|kernel[-]|[12727.392237] kvm_put_kvm+0x1c8/0x2e0 2019-05-17T15:43:36.566710+08:00|warning|kernel[-]|[12727.392239] kvm_vm_release+0x2c/0x40 2019-05-17T15:43:36.566721+08:00|warning|kernel[-]|[12727.392243] __fput+0xac/0x218 2019-05-17T15:43:36.566731+08:00|warning|kernel[-]|[12727.392244] ____fput+0x20/0x30 2019-05-17T15:43:36.566741+08:00|warning|kernel[-]|[12727.392247] task_work_run+0xc0/0xf8 2019-05-17T15:43:36.566751+08:00|warning|kernel[-]|[12727.392250] do_exit+0x300/0x5b0 2019-05-17T15:43:36.566761+08:00|warning|kernel[-]|[12727.392251] do_group_exit+0x3c/0xe0 2019-05-17T15:43:36.566772+08:00|warning|kernel[-]|[12727.392254] get_signal+0x12c/0x6e0 2019-05-17T15:43:36.566783+08:00|warning|kernel[-]|[12727.392257] do_signal+0x180/0x288 2019-05-17T15:43:36.566793+08:00|warning|kernel[-]|[12727.392259] do_notify_resume+0x100/0x188 2019-05-17T15:43:36.566804+08:00|warning|kernel[-]|[12727.392261] work_pending+0x8/0x10 2019-05-17T15:43:36.566814+08:00|warning|kernel[-]|[12727.392263] ---[ end trace 12212429631eec72 ]--- ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug report] vfio: Can't find phys by iova in vfio_unmap_unpin() 2019-05-20 7:50 [bug report] vfio: Can't find phys by iova in vfio_unmap_unpin() jiangyiwen @ 2019-05-20 19:28 ` Alex Williamson 2019-06-11 3:21 ` jiangyiwen 0 siblings, 1 reply; 5+ messages in thread From: Alex Williamson @ 2019-05-20 19:28 UTC (permalink / raw) To: jiangyiwen; +Cc: kvm On Mon, 20 May 2019 15:50:11 +0800 jiangyiwen <jiangyiwen@huawei.com> wrote: > Hello alex, > > We test a call trace as follows use ARM64 architecture, > it prints a WARN_ON() when find not physical address by > iova in vfio_unmap_unpin(), I can't find the cause of > problem now, do you have any ideas? Is it reproducible? Can you explain how to reproduce it? The stack trace indicates a KVM VM is being shutdown and we're trying to clean out the IOMMU mappings from the domain and find a page that we think should be mapped that the IOMMU doesn't have mapped. What device(s) was assigned to the VM? This could be an IOMMU driver bug or a vfio_iommu_type1 bug. Have you been able to reproduce this on other platforms? > In addition, I want to know why there is a WARN_ON() instead > of BUG_ON()? Does it affect the follow-up process? We're removing an IOMMU page mapping entry and find that it's not present, so ultimately the effect at the IOMMU is the same, there's no mapping at that address, but I can't say without further analysis whether that means a page remains pinned or if that inconsistency was resolved previously elsewhere. We WARN_ON because this is not what we expect, but potentially leaking a page of memory doesn't seem worthy of crashing the host, nor would a crash dump at that point necessarily aid in resolving the missing page as it potentially occurred well in the past. Thanks, Alex ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug report] vfio: Can't find phys by iova in vfio_unmap_unpin() 2019-05-20 19:28 ` Alex Williamson @ 2019-06-11 3:21 ` jiangyiwen [not found] ` <5CFFA149.8070303@huawei.com> 0 siblings, 1 reply; 5+ messages in thread From: jiangyiwen @ 2019-06-11 3:21 UTC (permalink / raw) To: Alex Williamson; +Cc: kvm On 2019/5/21 3:28, Alex Williamson wrote: > On Mon, 20 May 2019 15:50:11 +0800 > jiangyiwen <jiangyiwen@huawei.com> wrote: > >> Hello alex, >> >> We test a call trace as follows use ARM64 architecture, >> it prints a WARN_ON() when find not physical address by >> iova in vfio_unmap_unpin(), I can't find the cause of >> problem now, do you have any ideas? > > Is it reproducible? Can you explain how to reproduce it? The stack > trace indicates a KVM VM is being shutdown and we're trying to clean > out the IOMMU mappings from the domain and find a page that we think > should be mapped that the IOMMU doesn't have mapped. What device(s) was > assigned to the VM? This could be an IOMMU driver bug or a > vfio_iommu_type1 bug. Have you been able to reproduce this on other > platforms? > Hello Alex, Sorry to reply you so late because of some things, this problem's reason is in some platform (like ARM64), the "0" physical address is valid and can be used for system memory, so in this case it should not print a WARN_ON() and continue, we should unmap and unpin this "0" physical address in these platform. So I want to return FFFFFFFFFFFFFFFFL instead of "0" as invalid physical address in function iommu_iova_to_phys(). Do you think it's appropriate? Thanks, Yiwen. >> In addition, I want to know why there is a WARN_ON() instead >> of BUG_ON()? Does it affect the follow-up process? > > We're removing an IOMMU page mapping entry and find that it's not > present, so ultimately the effect at the IOMMU is the same, there's no > mapping at that address, but I can't say without further analysis > whether that means a page remains pinned or if that inconsistency was > resolved previously elsewhere. We WARN_ON because this is not what we > expect, but potentially leaking a page of memory doesn't seem worthy of > crashing the host, nor would a crash dump at that point necessarily aid > in resolving the missing page as it potentially occurred well in the > past. Thanks, > > Alex > > . > ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <5CFFA149.8070303@huawei.com>]
* Re: [bug report] vfio: Can't find phys by iova in vfio_unmap_unpin() [not found] ` <5CFFA149.8070303@huawei.com> @ 2019-06-11 15:13 ` Alex Williamson 0 siblings, 0 replies; 5+ messages in thread From: Alex Williamson @ 2019-06-11 15:13 UTC (permalink / raw) To: Jiangyiwen; +Cc: kvm, open list:AMD IOMMU (AMD-VI) [cc +iommu] On Tue, 11 Jun 2019 20:40:41 +0800 Jiangyiwen <jiangyiwen@huawei.com> wrote: > Hi Alex, > > I found this problem is not very easy to solve, for > now, in arm64 platform, the "0" physical address > is a valid system memory address, so in function > arm_smmu_iova_to_phys() I think it should not use > "0" as abnormal return value. > > Do you have any idea? I think you're going to need to redefine iommu_iova_to_phys() and fix all the IOMMU implementations of it to comply. Currently AMD and Intel IOMMU driver return zero if a mapping is not found. You could make the function return 0/errno and return the physical address via a pointer arg. You could also keep the existing definition, but introduce a test for a valid result that might use an architecture specific value (akin to IS_ERR()). You could also just reserve the zero page from userspace allocation. I really don't want #ifdef in the vfio iommu driver trying to discern the correct invalid value though. Thanks, Alex > On 2019/6/11 11:21, jiangyiwen wrote: > > On 2019/5/21 3:28, Alex Williamson wrote: > >> On Mon, 20 May 2019 15:50:11 +0800 > >> jiangyiwen <jiangyiwen@huawei.com> wrote: > >> > >>> Hello alex, > >>> > >>> We test a call trace as follows use ARM64 architecture, > >>> it prints a WARN_ON() when find not physical address by > >>> iova in vfio_unmap_unpin(), I can't find the cause of > >>> problem now, do you have any ideas? > >> Is it reproducible? Can you explain how to reproduce it? The stack > >> trace indicates a KVM VM is being shutdown and we're trying to clean > >> out the IOMMU mappings from the domain and find a page that we think > >> should be mapped that the IOMMU doesn't have mapped. What device(s) was > >> assigned to the VM? This could be an IOMMU driver bug or a > >> vfio_iommu_type1 bug. Have you been able to reproduce this on other > >> platforms? > >> > > Hello Alex, > > > > Sorry to reply you so late because of some things, > > this problem's reason is in some platform (like ARM64), > > the "0" physical address is valid and can be used for > > system memory, so in this case it should not print a > > WARN_ON() and continue, we should unmap and unpin this > > "0" physical address in these platform. > > > > So I want to return FFFFFFFFFFFFFFFFL instead of "0" as invalid > > physical address in function iommu_iova_to_phys(). Do you think > > it's appropriate? > > > > Thanks, > > Yiwen. > > > >>> In addition, I want to know why there is a WARN_ON() instead > >>> of BUG_ON()? Does it affect the follow-up process? > >> We're removing an IOMMU page mapping entry and find that it's not > >> present, so ultimately the effect at the IOMMU is the same, there's no > >> mapping at that address, but I can't say without further analysis > >> whether that means a page remains pinned or if that inconsistency was > >> resolved previously elsewhere. We WARN_ON because this is not what we > >> expect, but potentially leaking a page of memory doesn't seem worthy of > >> crashing the host, nor would a crash dump at that point necessarily aid > >> in resolving the missing page as it potentially occurred well in the > >> past. Thanks, > >> > >> Alex > >> > >> . > >> > > > > > > . > > > ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [bug report] vfio: Can't find phys by iova in vfio_unmap_unpin() @ 2019-06-11 15:13 ` Alex Williamson 0 siblings, 0 replies; 5+ messages in thread From: Alex Williamson @ 2019-06-11 15:13 UTC (permalink / raw) To: Jiangyiwen; +Cc: open list:AMD IOMMU (AMD-VI), kvm [cc +iommu] On Tue, 11 Jun 2019 20:40:41 +0800 Jiangyiwen <jiangyiwen@huawei.com> wrote: > Hi Alex, > > I found this problem is not very easy to solve, for > now, in arm64 platform, the "0" physical address > is a valid system memory address, so in function > arm_smmu_iova_to_phys() I think it should not use > "0" as abnormal return value. > > Do you have any idea? I think you're going to need to redefine iommu_iova_to_phys() and fix all the IOMMU implementations of it to comply. Currently AMD and Intel IOMMU driver return zero if a mapping is not found. You could make the function return 0/errno and return the physical address via a pointer arg. You could also keep the existing definition, but introduce a test for a valid result that might use an architecture specific value (akin to IS_ERR()). You could also just reserve the zero page from userspace allocation. I really don't want #ifdef in the vfio iommu driver trying to discern the correct invalid value though. Thanks, Alex > On 2019/6/11 11:21, jiangyiwen wrote: > > On 2019/5/21 3:28, Alex Williamson wrote: > >> On Mon, 20 May 2019 15:50:11 +0800 > >> jiangyiwen <jiangyiwen@huawei.com> wrote: > >> > >>> Hello alex, > >>> > >>> We test a call trace as follows use ARM64 architecture, > >>> it prints a WARN_ON() when find not physical address by > >>> iova in vfio_unmap_unpin(), I can't find the cause of > >>> problem now, do you have any ideas? > >> Is it reproducible? Can you explain how to reproduce it? The stack > >> trace indicates a KVM VM is being shutdown and we're trying to clean > >> out the IOMMU mappings from the domain and find a page that we think > >> should be mapped that the IOMMU doesn't have mapped. What device(s) was > >> assigned to the VM? This could be an IOMMU driver bug or a > >> vfio_iommu_type1 bug. Have you been able to reproduce this on other > >> platforms? > >> > > Hello Alex, > > > > Sorry to reply you so late because of some things, > > this problem's reason is in some platform (like ARM64), > > the "0" physical address is valid and can be used for > > system memory, so in this case it should not print a > > WARN_ON() and continue, we should unmap and unpin this > > "0" physical address in these platform. > > > > So I want to return FFFFFFFFFFFFFFFFL instead of "0" as invalid > > physical address in function iommu_iova_to_phys(). Do you think > > it's appropriate? > > > > Thanks, > > Yiwen. > > > >>> In addition, I want to know why there is a WARN_ON() instead > >>> of BUG_ON()? Does it affect the follow-up process? > >> We're removing an IOMMU page mapping entry and find that it's not > >> present, so ultimately the effect at the IOMMU is the same, there's no > >> mapping at that address, but I can't say without further analysis > >> whether that means a page remains pinned or if that inconsistency was > >> resolved previously elsewhere. We WARN_ON because this is not what we > >> expect, but potentially leaking a page of memory doesn't seem worthy of > >> crashing the host, nor would a crash dump at that point necessarily aid > >> in resolving the missing page as it potentially occurred well in the > >> past. Thanks, > >> > >> Alex > >> > >> . > >> > > > > > > . > > > _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2019-06-11 15:14 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-05-20 7:50 [bug report] vfio: Can't find phys by iova in vfio_unmap_unpin() jiangyiwen 2019-05-20 19:28 ` Alex Williamson 2019-06-11 3:21 ` jiangyiwen [not found] ` <5CFFA149.8070303@huawei.com> 2019-06-11 15:13 ` Alex Williamson 2019-06-11 15:13 ` Alex Williamson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.