* arm64: Internal error: Oops: qcom_iommu_tlb_inv_context free_io_pgtable_ops on db410c @ 2020-07-20 6:35 Naresh Kamboju 2020-07-20 7:17 ` Arnd Bergmann 0 siblings, 1 reply; 5+ messages in thread From: Naresh Kamboju @ 2020-07-20 6:35 UTC (permalink / raw) To: open list, linux-mediatek, linux-arm-msm, iommu Cc: Jean-Philippe Brucker, Joerg Roedel, Vinod Koul, Arnd Bergmann, Greg Kroah-Hartman, Sean Paul, Guohanjun (Hanjun Guo), Robin Murphy, lkft-triage, Thierry Reding, Andy Gross, Sudeep Holla, Matthias Brugger, Will Deacon This kernel oops while boot linux mainline kernel on arm64 db410c device. metadata: git branch: master git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git commit: f8456690ba8eb18ea4714e68554e242a04f65cff git describe: v5.8-rc5-48-gf8456690ba8e make_kernelversion: 5.8.0-rc5 kernel-config: https://builds.tuxbuild.com/2aLnwV7BLStU0t1R1QPwHQ/kernel.config [ 5.444121] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018 [ 5.456615] ESR = 0x96000004 [ 5.464471] SET = 0, FnV = 0 [ 5.464487] EA = 0, S1PTW = 0 [ 5.466521] Data abort info: [ 5.469971] ISV = 0, ISS = 0x00000004 [ 5.472768] CM = 0, WnR = 0 [ 5.476172] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000bacba000 [ 5.479349] [0000000000000018] pgd=0000000000000000, p4d=0000000000000000 [ 5.485820] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 5.492448] Modules linked in: crct10dif_ce adv7511(+) qcom_spmi_temp_alarm cec msm(+) mdt_loader qcom_camss videobuf2_dma_sg drm_kms_helper v4l2_fwnode videobuf2_memops videobuf2_v4l2 qcom_rng videobuf2_common i2c_qcom_cci display_connector socinfo drm qrtr ns rmtfs_mem fuse [ 5.500256] CPU: 0 PID: 286 Comm: systemd-udevd Not tainted 5.8.0-rc5 #1 [ 5.522484] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) [ 5.529170] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--) [ 5.535856] pc : qcom_iommu_tlb_inv_context+0x18/0xa8 [ 5.541148] lr : free_io_pgtable_ops+0x28/0x58 [ 5.546350] sp : ffff80001219b5f0 [ 5.550689] x29: ffff80001219b5f0 x28: 0000000000000013 [ 5.554078] x27: 0000000000000100 x26: ffff000036add3b8 [ 5.559459] x25: ffff80000915e910 x24: ffff00003a5458c0 [ 5.564753] x23: 0000000000000003 x22: ffff000036a37058 [ 5.570049] x21: ffff000036a3a100 x20: ffff000036a3a480 [ 5.575344] x19: ffff000036a37158 x18: 0000000000000000 [ 5.580639] x17: 0000000000000000 x16: 0000000000000000 [ 5.585935] x15: 0000000000000004 x14: 0000000000000368 [ 5.591229] x13: 0000000000000000 x12: ffff000039c61798 [ 5.596525] x11: ffff000039c616d0 x10: 0000000040000000 [ 5.601820] x9 : 0000000000000000 x8 : ffff000039c616f8 [ 5.607114] x7 : 0000000000000000 x6 : ffff000009f699a0 [ 5.612410] x5 : ffff80001219b520 x4 : ffff000036a3a000 [ 5.617705] x3 : ffff000009f69904 x2 : 0000000000000000 [ 5.623001] x1 : ffff8000107e27e8 x0 : ffff00003a545810 [ 5.628297] Call trace: [ 5.633592] qcom_iommu_tlb_inv_context+0x18/0xa8 [ 5.635764] free_io_pgtable_ops+0x28/0x58 [ 5.640624] qcom_iommu_domain_free+0x38/0x60 [ 5.644617] iommu_group_release+0x4c/0x70 [ 5.649045] kobject_put+0x6c/0x120 [ 5.653035] kobject_del+0x64/0x90 [ 5.656421] kobject_put+0xfc/0x120 [ 5.659893] iommu_group_remove_device+0xdc/0xf0 [ 5.663281] iommu_release_device+0x44/0x70 [ 5.668142] iommu_bus_notifier+0xbc/0xd0 [ 5.672048] notifier_call_chain+0x54/0x98 [ 5.676214] blocking_notifier_call_chain+0x48/0x70 [ 5.680209] device_del+0x26c/0x3a0 [ 5.684981] platform_device_del.part.0+0x1c/0x88 [ 5.688453] platform_device_unregister+0x24/0x40 [ 5.693316] of_platform_device_destroy+0xe4/0xf8 [ 5.698002] device_for_each_child+0x5c/0xa8 [ 5.702689] of_platform_depopulate+0x3c/0x80 [ 5.707144] msm_pdev_probe+0x1c4/0x308 [msm] [ 5.711286] platform_drv_probe+0x54/0xa8 [ 5.715624] really_probe+0xd8/0x320 [ 5.719617] driver_probe_device+0x58/0xb8 [ 5.723263] device_driver_attach+0x74/0x80 [ 5.727168] __driver_attach+0x58/0xe0 [ 5.731248] bus_for_each_dev+0x70/0xc0 [ 5.735067] driver_attach+0x24/0x30 [ 5.738801] bus_add_driver+0x14c/0x1f0 [ 5.742619] driver_register+0x64/0x120 [ 5.746178] __platform_driver_register+0x48/0x58 [ 5.750099] msm_drm_register+0x58/0x70 [msm] [ 5.754861] do_one_initcall+0x54/0x1a0 [ 5.759200] do_init_module+0x54/0x200 [ 5.762846] load_module+0x1d1c/0x2300 [ 5.766664] __do_sys_finit_module+0xd8/0xf0 [ 5.770398] __arm64_sys_finit_module+0x20/0x30 [ 5.774826] el0_svc_common.constprop.0+0x6c/0x168 [ 5.779078] do_el0_svc+0x24/0x90 [ 5.783939] el0_sync_handler+0x90/0x198 [ 5.787323] el0_sync+0x158/0x180 [ 5.791323] Code: 910003fd f9417404 b4000484 f9401482 (b9401846) [ 5.794532] ---[ end trace 3d6a53241629e560 ]--- full crash log details. https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.8-rc5-48-gf8456690ba8e/testrun/2945157/suite/linux-log-parser/test/check-kernel-oops-1573988/log -- Linaro LKFT https://lkft.linaro.org _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: arm64: Internal error: Oops: qcom_iommu_tlb_inv_context free_io_pgtable_ops on db410c 2020-07-20 6:35 arm64: Internal error: Oops: qcom_iommu_tlb_inv_context free_io_pgtable_ops on db410c Naresh Kamboju @ 2020-07-20 7:17 ` Arnd Bergmann 2020-07-20 11:28 ` Robin Murphy 0 siblings, 1 reply; 5+ messages in thread From: Arnd Bergmann @ 2020-07-20 7:17 UTC (permalink / raw) To: Naresh Kamboju Cc: lkft-triage, Eric Anholt, Thierry Reding, Guohanjun (Hanjun Guo), Will Deacon, Jean-Philippe Brucker, Vinod Koul, open list:IOMMU DRIVERS, Andy Gross, freedreno, Joerg Roedel, John Stultz, linux-arm-msm, moderated list:ARM/Mediatek SoC..., Matthias Brugger, Sean Paul, Greg Kroah-Hartman, open list, Sudeep Holla, Robin Murphy On Mon, Jul 20, 2020 at 8:36 AM Naresh Kamboju <naresh.kamboju@linaro.org> wrote: > > This kernel oops while boot linux mainline kernel on arm64 db410c device. > > metadata: > git branch: master > git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > git commit: f8456690ba8eb18ea4714e68554e242a04f65cff > git describe: v5.8-rc5-48-gf8456690ba8e > make_kernelversion: 5.8.0-rc5 > kernel-config: > https://builds.tuxbuild.com/2aLnwV7BLStU0t1R1QPwHQ/kernel.config Thanks for the report. Adding freedreno folks to Cc, as this may have something to do with that driver. > > [ 5.444121] Unable to handle kernel NULL pointer dereference at > virtual address 0000000000000018 > [ 5.456615] ESR = 0x96000004 > [ 5.464471] SET = 0, FnV = 0 > [ 5.464487] EA = 0, S1PTW = 0 > [ 5.466521] Data abort info: > [ 5.469971] ISV = 0, ISS = 0x00000004 > [ 5.472768] CM = 0, WnR = 0 > [ 5.476172] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000bacba000 > [ 5.479349] [0000000000000018] pgd=0000000000000000, p4d=0000000000000000 > [ 5.485820] Internal error: Oops: 96000004 [#1] PREEMPT SMP > [ 5.492448] Modules linked in: crct10dif_ce adv7511(+) > qcom_spmi_temp_alarm cec msm(+) mdt_loader qcom_camss videobuf2_dma_sg > drm_kms_helper v4l2_fwnode videobuf2_memops videobuf2_v4l2 qcom_rng > videobuf2_common i2c_qcom_cci display_connector socinfo drm qrtr ns > rmtfs_mem fuse > [ 5.500256] CPU: 0 PID: 286 Comm: systemd-udevd Not tainted 5.8.0-rc5 #1 > [ 5.522484] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > [ 5.529170] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--) > [ 5.535856] pc : qcom_iommu_tlb_inv_context+0x18/0xa8 > [ 5.541148] lr : free_io_pgtable_ops+0x28/0x58 > [ 5.546350] sp : ffff80001219b5f0 > [ 5.550689] x29: ffff80001219b5f0 x28: 0000000000000013 > [ 5.554078] x27: 0000000000000100 x26: ffff000036add3b8 > [ 5.559459] x25: ffff80000915e910 x24: ffff00003a5458c0 > [ 5.564753] x23: 0000000000000003 x22: ffff000036a37058 > [ 5.570049] x21: ffff000036a3a100 x20: ffff000036a3a480 > [ 5.575344] x19: ffff000036a37158 x18: 0000000000000000 > [ 5.580639] x17: 0000000000000000 x16: 0000000000000000 > [ 5.585935] x15: 0000000000000004 x14: 0000000000000368 > [ 5.591229] x13: 0000000000000000 x12: ffff000039c61798 > [ 5.596525] x11: ffff000039c616d0 x10: 0000000040000000 > [ 5.601820] x9 : 0000000000000000 x8 : ffff000039c616f8 > [ 5.607114] x7 : 0000000000000000 x6 : ffff000009f699a0 > [ 5.612410] x5 : ffff80001219b520 x4 : ffff000036a3a000 > [ 5.617705] x3 : ffff000009f69904 x2 : 0000000000000000 > [ 5.623001] x1 : ffff8000107e27e8 x0 : ffff00003a545810 > [ 5.628297] Call trace: > [ 5.633592] qcom_iommu_tlb_inv_context+0x18/0xa8 This means that dev_iommu_fwspec_get() has returned NULL in qcom_iommu_tlb_inv_context(), either because dev->iommu is NULL, or because dev->iommu->fwspec is NULL. qcom_iommu_tlb_inv_context() does not check for a NULL pointer before using the returned object. The bug is either in the lack of error handling, or the fact that it's possible to get into this function for a device that has not been fully set up. > [ 5.635764] free_io_pgtable_ops+0x28/0x58 > [ 5.640624] qcom_iommu_domain_free+0x38/0x60 > [ 5.644617] iommu_group_release+0x4c/0x70 > [ 5.649045] kobject_put+0x6c/0x120 > [ 5.653035] kobject_del+0x64/0x90 > [ 5.656421] kobject_put+0xfc/0x120 > [ 5.659893] iommu_group_remove_device+0xdc/0xf0 > [ 5.663281] iommu_release_device+0x44/0x70 > [ 5.668142] iommu_bus_notifier+0xbc/0xd0 > [ 5.672048] notifier_call_chain+0x54/0x98 > [ 5.676214] blocking_notifier_call_chain+0x48/0x70 > [ 5.680209] device_del+0x26c/0x3a0 > [ 5.684981] platform_device_del.part.0+0x1c/0x88 > [ 5.688453] platform_device_unregister+0x24/0x40 > [ 5.693316] of_platform_device_destroy+0xe4/0xf8 > [ 5.698002] device_for_each_child+0x5c/0xa8 > [ 5.702689] of_platform_depopulate+0x3c/0x80 > [ 5.707144] msm_pdev_probe+0x1c4/0x308 [msm] It was triggered by a failure in msm_pdev_probe(), which was calls of_platform_depopulate() in its error handling code. This is a combination of two problems: a) Whatever caused msm_pdev_probe() to fail means that the gpu won't be usable, though it should not have caused the kernel to crash. b) the error handling itself causing additional problems due to failed unwinding. > [ 5.711286] platform_drv_probe+0x54/0xa8 > [ 5.715624] really_probe+0xd8/0x320 > [ 5.719617] driver_probe_device+0x58/0xb8 > [ 5.723263] device_driver_attach+0x74/0x80 > [ 5.727168] __driver_attach+0x58/0xe0 > [ 5.731248] bus_for_each_dev+0x70/0xc0 > [ 5.735067] driver_attach+0x24/0x30 > [ 5.738801] bus_add_driver+0x14c/0x1f0 > [ 5.742619] driver_register+0x64/0x120 > [ 5.746178] __platform_driver_register+0x48/0x58 > [ 5.750099] msm_drm_register+0x58/0x70 [msm] > [ 5.754861] do_one_initcall+0x54/0x1a0 > [ 5.759200] do_init_module+0x54/0x200 > [ 5.762846] load_module+0x1d1c/0x2300 > [ 5.766664] __do_sys_finit_module+0xd8/0xf0 > [ 5.770398] __arm64_sys_finit_module+0x20/0x30 > [ 5.774826] el0_svc_common.constprop.0+0x6c/0x168 > [ 5.779078] do_el0_svc+0x24/0x90 > [ 5.783939] el0_sync_handler+0x90/0x198 > [ 5.787323] el0_sync+0x158/0x180 > [ 5.791323] Code: 910003fd f9417404 b4000484 f9401482 (b9401846) > [ 5.794532] ---[ end trace 3d6a53241629e560 ]--- > > full crash log details. > https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.8-rc5-48-gf8456690ba8e/testrun/2945157/suite/linux-log-parser/test/check-kernel-oops-1573988/log There are a couple of messages directly preceding the bug output that are probably relevant here: [ 5.259499] debugfs: Directory '1b0ac00.camss-vdda' with parent 'smd:rpm:rpm-requests:pm8916-regulators-l2' already present! Starting Resize root filesystem to fit available disk space... Starting Start the WCN core... [[0;32m OK [0m] Started Network Service. [[0;32m OK [0m] Started QRTR service. [ 5.352993] adreno 1c00000.gpu: Adding to iommu group 1 [ 5.357489] msm_mdp 1a01000.mdp: Adding to iommu group 2 [ 5.357757] msm_mdp 1a01000.mdp: No interconnect support may cause display underflows! [ 5.366215] adv7511 3-0039: supply dvdd not found, using dummy regulator [ 5.378036] msm 1a00000.mdss: supply vdd not found, using dummy regulator [ 5.378715] msm_mdp 1a01000.mdp: [drm:mdp5_bind [msm]] MDP5 version v1.6 [ 5.380549] adv7511 3-0039: supply pvdd not found, using dummy regulator [ 5.384606] msm 1a00000.mdss: bound 1a01000.mdp (ops mdp5_ops [msm]) [ 5.394368] adv7511 3-0039: supply a2vdd not found, using dummy regulator [ 5.397633] msm_dsi 1a98000.dsi: supply gdsc not found, using dummy regulator [ 5.411897] msm_dsi 1a98000.dsi: supply gdsc not found, using dummy regulator [ 5.420207] msm_dsi_manager_register: failed to register mipi dsi host for DSI 0 [ 5.425717] platform 1a01000.mdp: Removing from iommu group 2 [[0;1;31mFAILED[0m] Failed to start Entropy Daemon based on the HAVEGE algorithm.[ 5.444121] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018 See 'systemctl status haveged.service' for detai[ 5.456615] ESR = 0x96000004 ls. [ 5.464471] SET = 0, FnV = 0 [ 5.464487] EA = 0, S1PTW = 0 Arnd _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: arm64: Internal error: Oops: qcom_iommu_tlb_inv_context free_io_pgtable_ops on db410c 2020-07-20 7:17 ` Arnd Bergmann @ 2020-07-20 11:28 ` Robin Murphy 2020-07-20 15:58 ` [Freedreno] " Rob Clark 0 siblings, 1 reply; 5+ messages in thread From: Robin Murphy @ 2020-07-20 11:28 UTC (permalink / raw) To: Arnd Bergmann, Naresh Kamboju Cc: Sean Paul, Jean-Philippe Brucker, Joerg Roedel, Vinod Koul, Greg Kroah-Hartman, freedreno, linux-arm-msm, Sudeep Holla, Andy Gross, lkft-triage, open list, Eric Anholt, open list:IOMMU DRIVERS, Thierry Reding, John Stultz, Guohanjun (Hanjun Guo), Matthias Brugger, moderated list:ARM/Mediatek SoC..., Will Deacon On 2020-07-20 08:17, Arnd Bergmann wrote: > On Mon, Jul 20, 2020 at 8:36 AM Naresh Kamboju > <naresh.kamboju@linaro.org> wrote: >> >> This kernel oops while boot linux mainline kernel on arm64 db410c device. >> >> metadata: >> git branch: master >> git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >> git commit: f8456690ba8eb18ea4714e68554e242a04f65cff >> git describe: v5.8-rc5-48-gf8456690ba8e >> make_kernelversion: 5.8.0-rc5 >> kernel-config: >> https://builds.tuxbuild.com/2aLnwV7BLStU0t1R1QPwHQ/kernel.config > > Thanks for the report. Adding freedreno folks to Cc, as this may have something > to do with that driver. > >> >> [ 5.444121] Unable to handle kernel NULL pointer dereference at >> virtual address 0000000000000018 >> [ 5.456615] ESR = 0x96000004 >> [ 5.464471] SET = 0, FnV = 0 >> [ 5.464487] EA = 0, S1PTW = 0 >> [ 5.466521] Data abort info: >> [ 5.469971] ISV = 0, ISS = 0x00000004 >> [ 5.472768] CM = 0, WnR = 0 >> [ 5.476172] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000bacba000 >> [ 5.479349] [0000000000000018] pgd=0000000000000000, p4d=0000000000000000 >> [ 5.485820] Internal error: Oops: 96000004 [#1] PREEMPT SMP >> [ 5.492448] Modules linked in: crct10dif_ce adv7511(+) >> qcom_spmi_temp_alarm cec msm(+) mdt_loader qcom_camss videobuf2_dma_sg >> drm_kms_helper v4l2_fwnode videobuf2_memops videobuf2_v4l2 qcom_rng >> videobuf2_common i2c_qcom_cci display_connector socinfo drm qrtr ns >> rmtfs_mem fuse >> [ 5.500256] CPU: 0 PID: 286 Comm: systemd-udevd Not tainted 5.8.0-rc5 #1 >> [ 5.522484] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) >> [ 5.529170] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--) >> [ 5.535856] pc : qcom_iommu_tlb_inv_context+0x18/0xa8 >> [ 5.541148] lr : free_io_pgtable_ops+0x28/0x58 >> [ 5.546350] sp : ffff80001219b5f0 >> [ 5.550689] x29: ffff80001219b5f0 x28: 0000000000000013 >> [ 5.554078] x27: 0000000000000100 x26: ffff000036add3b8 >> [ 5.559459] x25: ffff80000915e910 x24: ffff00003a5458c0 >> [ 5.564753] x23: 0000000000000003 x22: ffff000036a37058 >> [ 5.570049] x21: ffff000036a3a100 x20: ffff000036a3a480 >> [ 5.575344] x19: ffff000036a37158 x18: 0000000000000000 >> [ 5.580639] x17: 0000000000000000 x16: 0000000000000000 >> [ 5.585935] x15: 0000000000000004 x14: 0000000000000368 >> [ 5.591229] x13: 0000000000000000 x12: ffff000039c61798 >> [ 5.596525] x11: ffff000039c616d0 x10: 0000000040000000 >> [ 5.601820] x9 : 0000000000000000 x8 : ffff000039c616f8 >> [ 5.607114] x7 : 0000000000000000 x6 : ffff000009f699a0 >> [ 5.612410] x5 : ffff80001219b520 x4 : ffff000036a3a000 >> [ 5.617705] x3 : ffff000009f69904 x2 : 0000000000000000 >> [ 5.623001] x1 : ffff8000107e27e8 x0 : ffff00003a545810 >> [ 5.628297] Call trace: >> [ 5.633592] qcom_iommu_tlb_inv_context+0x18/0xa8 > > This means that dev_iommu_fwspec_get() has returned NULL > in qcom_iommu_tlb_inv_context(), either because dev->iommu > is NULL, or because dev->iommu->fwspec is NULL. > > qcom_iommu_tlb_inv_context() does not check for a NULL > pointer before using the returned object. > > The bug is either in the lack of error handling, or the fact > that it's possible to get into this function for a device > that has not been fully set up. Not quite - the device *was* properly set up, but has already been properly torn down again in the removal path by iommu_release_device(). The problem is that qcom-iommu kept the device pointer as its TLB cookie for the domain, but the domain has a longer lifespan than the validity of that device - that's a fundamental design flaw in the driver. Robin. >> [ 5.635764] free_io_pgtable_ops+0x28/0x58 >> [ 5.640624] qcom_iommu_domain_free+0x38/0x60 >> [ 5.644617] iommu_group_release+0x4c/0x70 >> [ 5.649045] kobject_put+0x6c/0x120 >> [ 5.653035] kobject_del+0x64/0x90 >> [ 5.656421] kobject_put+0xfc/0x120 >> [ 5.659893] iommu_group_remove_device+0xdc/0xf0 >> [ 5.663281] iommu_release_device+0x44/0x70 >> [ 5.668142] iommu_bus_notifier+0xbc/0xd0 >> [ 5.672048] notifier_call_chain+0x54/0x98 >> [ 5.676214] blocking_notifier_call_chain+0x48/0x70 >> [ 5.680209] device_del+0x26c/0x3a0 >> [ 5.684981] platform_device_del.part.0+0x1c/0x88 >> [ 5.688453] platform_device_unregister+0x24/0x40 >> [ 5.693316] of_platform_device_destroy+0xe4/0xf8 >> [ 5.698002] device_for_each_child+0x5c/0xa8 >> [ 5.702689] of_platform_depopulate+0x3c/0x80 >> [ 5.707144] msm_pdev_probe+0x1c4/0x308 [msm] > > It was triggered by a failure in msm_pdev_probe(), which was > calls of_platform_depopulate() in its error handling code. > This is a combination of two problems: > > a) Whatever caused msm_pdev_probe() to fail means that > the gpu won't be usable, though it should not have caused the > kernel to crash. > > b) the error handling itself causing additional problems due > to failed unwinding. > >> [ 5.711286] platform_drv_probe+0x54/0xa8 >> [ 5.715624] really_probe+0xd8/0x320 >> [ 5.719617] driver_probe_device+0x58/0xb8 >> [ 5.723263] device_driver_attach+0x74/0x80 >> [ 5.727168] __driver_attach+0x58/0xe0 >> [ 5.731248] bus_for_each_dev+0x70/0xc0 >> [ 5.735067] driver_attach+0x24/0x30 >> [ 5.738801] bus_add_driver+0x14c/0x1f0 >> [ 5.742619] driver_register+0x64/0x120 >> [ 5.746178] __platform_driver_register+0x48/0x58 >> [ 5.750099] msm_drm_register+0x58/0x70 [msm] >> [ 5.754861] do_one_initcall+0x54/0x1a0 >> [ 5.759200] do_init_module+0x54/0x200 >> [ 5.762846] load_module+0x1d1c/0x2300 >> [ 5.766664] __do_sys_finit_module+0xd8/0xf0 >> [ 5.770398] __arm64_sys_finit_module+0x20/0x30 >> [ 5.774826] el0_svc_common.constprop.0+0x6c/0x168 >> [ 5.779078] do_el0_svc+0x24/0x90 >> [ 5.783939] el0_sync_handler+0x90/0x198 >> [ 5.787323] el0_sync+0x158/0x180 >> [ 5.791323] Code: 910003fd f9417404 b4000484 f9401482 (b9401846) >> [ 5.794532] ---[ end trace 3d6a53241629e560 ]--- >> >> full crash log details. >> https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.8-rc5-48-gf8456690ba8e/testrun/2945157/suite/linux-log-parser/test/check-kernel-oops-1573988/log > > There are a couple of messages directly preceding the bug output that are > probably relevant here: > > [ 5.259499] debugfs: Directory '1b0ac00.camss-vdda' with parent > 'smd:rpm:rpm-requests:pm8916-regulators-l2' already present! > Starting Resize root filesystem to fit available disk space... > Starting Start the WCN core... > [[0;32m OK [0m] Started Network Service. > [[0;32m OK [0m] Started QRTR service. > [ 5.352993] adreno 1c00000.gpu: Adding to iommu group 1 > [ 5.357489] msm_mdp 1a01000.mdp: Adding to iommu group 2 > [ 5.357757] msm_mdp 1a01000.mdp: No interconnect support may cause > display underflows! > [ 5.366215] adv7511 3-0039: supply dvdd not found, using dummy regulator > [ 5.378036] msm 1a00000.mdss: supply vdd not found, using dummy regulator > [ 5.378715] msm_mdp 1a01000.mdp: [drm:mdp5_bind [msm]] MDP5 version v1.6 > [ 5.380549] adv7511 3-0039: supply pvdd not found, using dummy regulator > [ 5.384606] msm 1a00000.mdss: bound 1a01000.mdp (ops mdp5_ops [msm]) > [ 5.394368] adv7511 3-0039: supply a2vdd not found, using dummy regulator > [ 5.397633] msm_dsi 1a98000.dsi: supply gdsc not found, using dummy regulator > [ 5.411897] msm_dsi 1a98000.dsi: supply gdsc not found, using dummy regulator > [ 5.420207] msm_dsi_manager_register: failed to register mipi dsi > host for DSI 0 > [ 5.425717] platform 1a01000.mdp: Removing from iommu group 2 > [[0;1;31mFAILED[0m] Failed to start Entropy Daemon based on the HAVEGE > algorithm.[ 5.444121] Unable to handle kernel NULL pointer > dereference at virtual address 0000000000000018 > > See 'systemctl status haveged.service' for detai[ 5.456615] ESR = > 0x96000004 > ls. > [ 5.464471] SET = 0, FnV = 0 > [ 5.464487] EA = 0, S1PTW = 0 > > Arnd > _______________________________________________ > iommu mailing list > iommu@lists.linux-foundation.org > https://lists.linuxfoundation.org/mailman/listinfo/iommu > _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Freedreno] arm64: Internal error: Oops: qcom_iommu_tlb_inv_context free_io_pgtable_ops on db410c 2020-07-20 11:28 ` Robin Murphy @ 2020-07-20 15:58 ` Rob Clark 2020-07-20 19:19 ` Naresh Kamboju 0 siblings, 1 reply; 5+ messages in thread From: Rob Clark @ 2020-07-20 15:58 UTC (permalink / raw) To: Robin Murphy Cc: Jean-Philippe Brucker, moderated list:ARM/Mediatek SoC..., Joerg Roedel, Vinod Koul, Arnd Bergmann, Will Deacon, Greg Kroah-Hartman, Naresh Kamboju, Sudeep Holla, Guohanjun (Hanjun Guo), open list, lkft-triage, Eric Anholt, open list:IOMMU DRIVERS, Andy Gross, Thierry Reding, linux-arm-msm, Matthias Brugger, John Stultz, freedreno, Sean Paul On Mon, Jul 20, 2020 at 4:28 AM Robin Murphy <robin.murphy@arm.com> wrote: > > On 2020-07-20 08:17, Arnd Bergmann wrote: > > On Mon, Jul 20, 2020 at 8:36 AM Naresh Kamboju > > <naresh.kamboju@linaro.org> wrote: > >> > >> This kernel oops while boot linux mainline kernel on arm64 db410c device. > >> > >> metadata: > >> git branch: master > >> git repo: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > >> git commit: f8456690ba8eb18ea4714e68554e242a04f65cff > >> git describe: v5.8-rc5-48-gf8456690ba8e > >> make_kernelversion: 5.8.0-rc5 > >> kernel-config: > >> https://builds.tuxbuild.com/2aLnwV7BLStU0t1R1QPwHQ/kernel.config > > > > Thanks for the report. Adding freedreno folks to Cc, as this may have something > > to do with that driver. > > > >> > >> [ 5.444121] Unable to handle kernel NULL pointer dereference at > >> virtual address 0000000000000018 > >> [ 5.456615] ESR = 0x96000004 > >> [ 5.464471] SET = 0, FnV = 0 > >> [ 5.464487] EA = 0, S1PTW = 0 > >> [ 5.466521] Data abort info: > >> [ 5.469971] ISV = 0, ISS = 0x00000004 > >> [ 5.472768] CM = 0, WnR = 0 > >> [ 5.476172] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000bacba000 > >> [ 5.479349] [0000000000000018] pgd=0000000000000000, p4d=0000000000000000 > >> [ 5.485820] Internal error: Oops: 96000004 [#1] PREEMPT SMP > >> [ 5.492448] Modules linked in: crct10dif_ce adv7511(+) > >> qcom_spmi_temp_alarm cec msm(+) mdt_loader qcom_camss videobuf2_dma_sg > >> drm_kms_helper v4l2_fwnode videobuf2_memops videobuf2_v4l2 qcom_rng > >> videobuf2_common i2c_qcom_cci display_connector socinfo drm qrtr ns > >> rmtfs_mem fuse > >> [ 5.500256] CPU: 0 PID: 286 Comm: systemd-udevd Not tainted 5.8.0-rc5 #1 > >> [ 5.522484] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > >> [ 5.529170] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--) > >> [ 5.535856] pc : qcom_iommu_tlb_inv_context+0x18/0xa8 > >> [ 5.541148] lr : free_io_pgtable_ops+0x28/0x58 > >> [ 5.546350] sp : ffff80001219b5f0 > >> [ 5.550689] x29: ffff80001219b5f0 x28: 0000000000000013 > >> [ 5.554078] x27: 0000000000000100 x26: ffff000036add3b8 > >> [ 5.559459] x25: ffff80000915e910 x24: ffff00003a5458c0 > >> [ 5.564753] x23: 0000000000000003 x22: ffff000036a37058 > >> [ 5.570049] x21: ffff000036a3a100 x20: ffff000036a3a480 > >> [ 5.575344] x19: ffff000036a37158 x18: 0000000000000000 > >> [ 5.580639] x17: 0000000000000000 x16: 0000000000000000 > >> [ 5.585935] x15: 0000000000000004 x14: 0000000000000368 > >> [ 5.591229] x13: 0000000000000000 x12: ffff000039c61798 > >> [ 5.596525] x11: ffff000039c616d0 x10: 0000000040000000 > >> [ 5.601820] x9 : 0000000000000000 x8 : ffff000039c616f8 > >> [ 5.607114] x7 : 0000000000000000 x6 : ffff000009f699a0 > >> [ 5.612410] x5 : ffff80001219b520 x4 : ffff000036a3a000 > >> [ 5.617705] x3 : ffff000009f69904 x2 : 0000000000000000 > >> [ 5.623001] x1 : ffff8000107e27e8 x0 : ffff00003a545810 > >> [ 5.628297] Call trace: > >> [ 5.633592] qcom_iommu_tlb_inv_context+0x18/0xa8 > > > > This means that dev_iommu_fwspec_get() has returned NULL > > in qcom_iommu_tlb_inv_context(), either because dev->iommu > > is NULL, or because dev->iommu->fwspec is NULL. > > > > qcom_iommu_tlb_inv_context() does not check for a NULL > > pointer before using the returned object. > > > > The bug is either in the lack of error handling, or the fact > > that it's possible to get into this function for a device > > that has not been fully set up. > > Not quite - the device *was* properly set up, but has already been > properly torn down again in the removal path by iommu_release_device(). > The problem is that qcom-iommu kept the device pointer as its TLB cookie > for the domain, but the domain has a longer lifespan than the validity > of that device - that's a fundamental design flaw in the driver. fwiw, I just sent "iommu/qcom: Use domain rather than dev as tlb cookie".. untested but looks like a straightforward enough change to switch over to using the domain rather than dev as cookie BR, -R > > Robin. > > >> [ 5.635764] free_io_pgtable_ops+0x28/0x58 > >> [ 5.640624] qcom_iommu_domain_free+0x38/0x60 > >> [ 5.644617] iommu_group_release+0x4c/0x70 > >> [ 5.649045] kobject_put+0x6c/0x120 > >> [ 5.653035] kobject_del+0x64/0x90 > >> [ 5.656421] kobject_put+0xfc/0x120 > >> [ 5.659893] iommu_group_remove_device+0xdc/0xf0 > >> [ 5.663281] iommu_release_device+0x44/0x70 > >> [ 5.668142] iommu_bus_notifier+0xbc/0xd0 > >> [ 5.672048] notifier_call_chain+0x54/0x98 > >> [ 5.676214] blocking_notifier_call_chain+0x48/0x70 > >> [ 5.680209] device_del+0x26c/0x3a0 > >> [ 5.684981] platform_device_del.part.0+0x1c/0x88 > >> [ 5.688453] platform_device_unregister+0x24/0x40 > >> [ 5.693316] of_platform_device_destroy+0xe4/0xf8 > >> [ 5.698002] device_for_each_child+0x5c/0xa8 > >> [ 5.702689] of_platform_depopulate+0x3c/0x80 > >> [ 5.707144] msm_pdev_probe+0x1c4/0x308 [msm] > > > > It was triggered by a failure in msm_pdev_probe(), which was > > calls of_platform_depopulate() in its error handling code. > > This is a combination of two problems: > > > > a) Whatever caused msm_pdev_probe() to fail means that > > the gpu won't be usable, though it should not have caused the > > kernel to crash. > > > > b) the error handling itself causing additional problems due > > to failed unwinding. > > > >> [ 5.711286] platform_drv_probe+0x54/0xa8 > >> [ 5.715624] really_probe+0xd8/0x320 > >> [ 5.719617] driver_probe_device+0x58/0xb8 > >> [ 5.723263] device_driver_attach+0x74/0x80 > >> [ 5.727168] __driver_attach+0x58/0xe0 > >> [ 5.731248] bus_for_each_dev+0x70/0xc0 > >> [ 5.735067] driver_attach+0x24/0x30 > >> [ 5.738801] bus_add_driver+0x14c/0x1f0 > >> [ 5.742619] driver_register+0x64/0x120 > >> [ 5.746178] __platform_driver_register+0x48/0x58 > >> [ 5.750099] msm_drm_register+0x58/0x70 [msm] > >> [ 5.754861] do_one_initcall+0x54/0x1a0 > >> [ 5.759200] do_init_module+0x54/0x200 > >> [ 5.762846] load_module+0x1d1c/0x2300 > >> [ 5.766664] __do_sys_finit_module+0xd8/0xf0 > >> [ 5.770398] __arm64_sys_finit_module+0x20/0x30 > >> [ 5.774826] el0_svc_common.constprop.0+0x6c/0x168 > >> [ 5.779078] do_el0_svc+0x24/0x90 > >> [ 5.783939] el0_sync_handler+0x90/0x198 > >> [ 5.787323] el0_sync+0x158/0x180 > >> [ 5.791323] Code: 910003fd f9417404 b4000484 f9401482 (b9401846) > >> [ 5.794532] ---[ end trace 3d6a53241629e560 ]--- > >> > >> full crash log details. > >> https://qa-reports.linaro.org/lkft/linux-mainline-oe/build/v5.8-rc5-48-gf8456690ba8e/testrun/2945157/suite/linux-log-parser/test/check-kernel-oops-1573988/log > > > > There are a couple of messages directly preceding the bug output that are > > probably relevant here: > > > > [ 5.259499] debugfs: Directory '1b0ac00.camss-vdda' with parent > > 'smd:rpm:rpm-requests:pm8916-regulators-l2' already present! > > Starting Resize root filesystem to fit available disk space... > > Starting Start the WCN core... > > [[0;32m OK [0m] Started Network Service. > > [[0;32m OK [0m] Started QRTR service. > > [ 5.352993] adreno 1c00000.gpu: Adding to iommu group 1 > > [ 5.357489] msm_mdp 1a01000.mdp: Adding to iommu group 2 > > [ 5.357757] msm_mdp 1a01000.mdp: No interconnect support may cause > > display underflows! > > [ 5.366215] adv7511 3-0039: supply dvdd not found, using dummy regulator > > [ 5.378036] msm 1a00000.mdss: supply vdd not found, using dummy regulator > > [ 5.378715] msm_mdp 1a01000.mdp: [drm:mdp5_bind [msm]] MDP5 version v1.6 > > [ 5.380549] adv7511 3-0039: supply pvdd not found, using dummy regulator > > [ 5.384606] msm 1a00000.mdss: bound 1a01000.mdp (ops mdp5_ops [msm]) > > [ 5.394368] adv7511 3-0039: supply a2vdd not found, using dummy regulator > > [ 5.397633] msm_dsi 1a98000.dsi: supply gdsc not found, using dummy regulator > > [ 5.411897] msm_dsi 1a98000.dsi: supply gdsc not found, using dummy regulator > > [ 5.420207] msm_dsi_manager_register: failed to register mipi dsi > > host for DSI 0 > > [ 5.425717] platform 1a01000.mdp: Removing from iommu group 2 > > [[0;1;31mFAILED[0m] Failed to start Entropy Daemon based on the HAVEGE > > algorithm.[ 5.444121] Unable to handle kernel NULL pointer > > dereference at virtual address 0000000000000018 > > > > See 'systemctl status haveged.service' for detai[ 5.456615] ESR = > > 0x96000004 > > ls. > > [ 5.464471] SET = 0, FnV = 0 > > [ 5.464487] EA = 0, S1PTW = 0 > > > > Arnd > > _______________________________________________ > > iommu mailing list > > iommu@lists.linux-foundation.org > > https://lists.linuxfoundation.org/mailman/listinfo/iommu > > > _______________________________________________ > Freedreno mailing list > Freedreno@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/freedreno _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Freedreno] arm64: Internal error: Oops: qcom_iommu_tlb_inv_context free_io_pgtable_ops on db410c 2020-07-20 15:58 ` [Freedreno] " Rob Clark @ 2020-07-20 19:19 ` Naresh Kamboju 0 siblings, 0 replies; 5+ messages in thread From: Naresh Kamboju @ 2020-07-20 19:19 UTC (permalink / raw) To: Rob Clark Cc: Jean-Philippe Brucker, moderated list:ARM/Mediatek SoC..., Joerg Roedel, Vinod Koul, Arnd Bergmann, Will Deacon, Robin Murphy, linux-arm-msm, Sudeep Holla, Guohanjun (Hanjun Guo), open list, lkft-triage, Eric Anholt, open list:IOMMU DRIVERS, Andy Gross, Thierry Reding, Greg Kroah-Hartman, Matthias Brugger, John Stultz, freedreno, Sean Paul On Mon, 20 Jul 2020 at 21:27, Rob Clark <robdclark@gmail.com> wrote: > > On Mon, Jul 20, 2020 at 4:28 AM Robin Murphy <robin.murphy@arm.com> wrote: > > > > On 2020-07-20 08:17, Arnd Bergmann wrote: > > > On Mon, Jul 20, 2020 at 8:36 AM Naresh Kamboju > > > <naresh.kamboju@linaro.org> wrote: <> > > >> [ 5.444121] Unable to handle kernel NULL pointer dereference at > > >> virtual address 0000000000000018 > > >> [ 5.456615] ESR = 0x96000004 > > >> [ 5.464471] SET = 0, FnV = 0 > > >> [ 5.464487] EA = 0, S1PTW = 0 > > >> [ 5.466521] Data abort info: > > >> [ 5.469971] ISV = 0, ISS = 0x00000004 > > >> [ 5.472768] CM = 0, WnR = 0 > > >> [ 5.476172] user pgtable: 4k pages, 48-bit VAs, pgdp=00000000bacba000 > > >> [ 5.479349] [0000000000000018] pgd=0000000000000000, p4d=0000000000000000 > > >> [ 5.485820] Internal error: Oops: 96000004 [#1] PREEMPT SMP > > >> [ 5.492448] Modules linked in: crct10dif_ce adv7511(+) > > >> qcom_spmi_temp_alarm cec msm(+) mdt_loader qcom_camss videobuf2_dma_sg > > >> drm_kms_helper v4l2_fwnode videobuf2_memops videobuf2_v4l2 qcom_rng > > >> videobuf2_common i2c_qcom_cci display_connector socinfo drm qrtr ns > > >> rmtfs_mem fuse > > >> [ 5.500256] CPU: 0 PID: 286 Comm: systemd-udevd Not tainted 5.8.0-rc5 #1 > > >> [ 5.522484] Hardware name: Qualcomm Technologies, Inc. APQ 8016 SBC (DT) > > >> [ 5.529170] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--) > > >> [ 5.535856] pc : qcom_iommu_tlb_inv_context+0x18/0xa8 > > >> [ 5.541148] lr : free_io_pgtable_ops+0x28/0x58 <> > > >> [ 5.628297] Call trace: > > >> [ 5.633592] qcom_iommu_tlb_inv_context+0x18/0xa8 > > > > > > This means that dev_iommu_fwspec_get() has returned NULL > > > in qcom_iommu_tlb_inv_context(), either because dev->iommu > > > is NULL, or because dev->iommu->fwspec is NULL. > > > > > > qcom_iommu_tlb_inv_context() does not check for a NULL > > > pointer before using the returned object. > > > > > > The bug is either in the lack of error handling, or the fact > > > that it's possible to get into this function for a device > > > that has not been fully set up. > > > > Not quite - the device *was* properly set up, but has already been > > properly torn down again in the removal path by iommu_release_device(). > > The problem is that qcom-iommu kept the device pointer as its TLB cookie > > for the domain, but the domain has a longer lifespan than the validity > > of that device - that's a fundamental design flaw in the driver. > > fwiw, I just sent "iommu/qcom: Use domain rather than dev as tlb > cookie".. untested but looks like a straightforward enough change to > switch over to using the domain rather than dev as cookie The proposed patch tested and confirmed the reported problem fixed. ref: https://lore.kernel.org/linux-iommu/CA+G9fYtj1RBYcPhXZRm-qm5ygtdLj1jD8vFZSqQvwi_DNJLBwQ@mail.gmail.com/T/#m36a1fca18098f6c34275d928f9ba9c40c6d7fd63 https://lkft.validation.linaro.org/scheduler/job/1593950#L3392 > > BR, > -R - Naresh _______________________________________________ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-07-20 19:19 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-20 6:35 arm64: Internal error: Oops: qcom_iommu_tlb_inv_context free_io_pgtable_ops on db410c Naresh Kamboju 2020-07-20 7:17 ` Arnd Bergmann 2020-07-20 11:28 ` Robin Murphy 2020-07-20 15:58 ` [Freedreno] " Rob Clark 2020-07-20 19:19 ` Naresh Kamboju
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).