From mboxrd@z Thu Jan 1 00:00:00 1970 From: Sinan Kaya Subject: Re: [PATCH V7 08/11] drivers: acpi: Handle IOMMU lookup failure with deferred probing or error Date: Mon, 30 Jan 2017 10:46:39 -0500 Message-ID: <1e048aff-0d77-b9f2-ebf8-2ba315b90ca7@codeaurora.org> References: <1485188293-20263-1-git-send-email-sricharan@codeaurora.org> <1485188293-20263-9-git-send-email-sricharan@codeaurora.org> <20170124123711.GA11996@red-moon> <93e79759-d614-9b36-d5ab-63e8eb725009@arm.com> <14751205-f034-7f0d-442a-854c3909425c@codeaurora.org> <5ba9f366d6e25397cdef8ad95b49e199@codeaurora.org> <175a3798-b824-ef1a-e112-9f6f472973ae@codeaurora.org> <20170130143851.GJ16461@arm.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org Errors-To: iommu-bounces-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org To: Nate Watterson , Will Deacon Cc: linux-pci-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-acpi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA@public.gmane.org, linux-arm-msm-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, bhelgaas-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org, Tomasz Nowicki , linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org List-Id: linux-arm-msm@vger.kernel.org On 1/30/2017 9:54 AM, Nate Watterson wrote: > On 2017-01-30 09:38, Will Deacon wrote: >> On Mon, Jan 30, 2017 at 09:33:50AM -0500, Sinan Kaya wrote: >>> On 1/30/2017 9:23 AM, Nate Watterson wrote: >>> > On 2017-01-30 08:59, Sinan Kaya wrote: >>> >> On 1/30/2017 7:22 AM, Robin Murphy wrote: >>> >>> On 29/01/17 17:53, Sinan Kaya wrote: >>> >>>> On 1/24/2017 7:37 AM, Lorenzo Pieralisi wrote: >>> >>>>> [+hanjun, tomasz, sinan] >>> >>>>> >>> >>>>> It is quite a key patchset, I would be glad if they can test on their >>> >>>>> respective platforms with IORT. >>> >>>>> >>> >>>> >>> >>>> Tested on top of 4.10-rc5. >>> >>>> >>> >>>> 1. Platform Hidma device passed dmatest >>> >>>> 2. Seeing some USB stalls on a platform USB device. >>> >>>> 3. PCIe NVME drive probed and worked fine with MSI interrupts after boot. >>> >>>> 4. NVMe driver didn't probe following a hotplug insertion and received an >>> >>>> SMMU error event during the insertion. >>> >>> >>> >>> What was the SMMU error - a translation/permission fault (implying the >>> >>> wrong DMA ops) or a bad STE fault (implying we totally failed to tell >>> >>> the SMMU about the device at all)? >>> >>> >>> >> >>> >> root@ubuntu:/sys/bus/pci/slots/4# echo 0 > power >>> >> >>> >> [__204.698522]_iommu:_Removing_device_0003:01:00.0_from_group_0 >>> >> [ 204.708704] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down >>> >> [ 204.708723] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down event >>> >> ignored; already powering off >>> >> >>> >> root@ubuntu:/sys/bus/pci/slots/4# >>> >> >>> >> [__254.820440]_iommu:_Adding_device_0003:01:00.0_to_group_8 >>> >> [ 254.820599] nvme nvme0: pci function 0003:01:00.0 >>> >> [ 254.820621] nvme 0003:01:00.0: enabling device (0000 -> 0002) >>> >> [ 261.948558] arm-smmu-v3 arm-smmu-v3.0.auto: event 0x0a received: >>> >> [ 261.948561] arm-smmu-v3 arm-smmu-v3.0.auto: 0x000001000000000a >>> >> [ 261.948563] arm-smmu-v3 arm-smmu-v3.0.auto: 0x0000000000000000 >>> >> [ 261.948564] arm-smmu-v3 arm-smmu-v3.0.auto: 0x0000000000000000 >>> >> [ 261.948566] arm-smmu-v3 arm-smmu-v3.0.auto: 0x0000000000000000 >>> >>> > Looks like C_BAD_CD. Can you please try with: >>> > iommu/arm-smmu-v3: Clear prior settings when updating STEs >>> >>> This resolved the issue. Can we pull Nate's patch to 4.10 so that I don't see >>> this issue again. >> >> I already sent the pull request to Joerg for 4.11. Do you see this problem >> without Sricharan's patches (i.e. vanilla mainline)? If so, we'll need to >> send the patch to stable after -rc1. > Using vanilla mainline, I see it most commonly when directly assigning > a device to a guest machine. I think I've also seen it after removing then > re-adding a PCI device. Basically anytime an STE's CTX pointer is changed > from a non-NULL value and STE[CFG] indicates translation will be performed. > I was not able to reproduce the issue with Vanilla kernel. I only tested hotplug. > Nate >> >> Will > -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Return-Path: Subject: Re: [PATCH V7 08/11] drivers: acpi: Handle IOMMU lookup failure with deferred probing or error To: Nate Watterson , Will Deacon References: <1485188293-20263-1-git-send-email-sricharan@codeaurora.org> <1485188293-20263-9-git-send-email-sricharan@codeaurora.org> <20170124123711.GA11996@red-moon> <93e79759-d614-9b36-d5ab-63e8eb725009@arm.com> <14751205-f034-7f0d-442a-854c3909425c@codeaurora.org> <5ba9f366d6e25397cdef8ad95b49e199@codeaurora.org> <175a3798-b824-ef1a-e112-9f6f472973ae@codeaurora.org> <20170130143851.GJ16461@arm.com> From: Sinan Kaya Message-ID: <1e048aff-0d77-b9f2-ebf8-2ba315b90ca7@codeaurora.org> Date: Mon, 30 Jan 2017 10:46:39 -0500 MIME-Version: 1.0 In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Lorenzo Pieralisi , Robin Murphy , linux-pci@vger.kernel.org, linux-acpi@vger.kernel.org, iommu@lists.linux-foundation.org, linux-arm-msm@vger.kernel.org, bhelgaas@google.com, Tomasz Nowicki , Sricharan R , linux-arm-kernel@lists.infradead.org Content-Type: text/plain; charset="us-ascii" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+bjorn=helgaas.com@lists.infradead.org List-ID: On 1/30/2017 9:54 AM, Nate Watterson wrote: > On 2017-01-30 09:38, Will Deacon wrote: >> On Mon, Jan 30, 2017 at 09:33:50AM -0500, Sinan Kaya wrote: >>> On 1/30/2017 9:23 AM, Nate Watterson wrote: >>> > On 2017-01-30 08:59, Sinan Kaya wrote: >>> >> On 1/30/2017 7:22 AM, Robin Murphy wrote: >>> >>> On 29/01/17 17:53, Sinan Kaya wrote: >>> >>>> On 1/24/2017 7:37 AM, Lorenzo Pieralisi wrote: >>> >>>>> [+hanjun, tomasz, sinan] >>> >>>>> >>> >>>>> It is quite a key patchset, I would be glad if they can test on their >>> >>>>> respective platforms with IORT. >>> >>>>> >>> >>>> >>> >>>> Tested on top of 4.10-rc5. >>> >>>> >>> >>>> 1. Platform Hidma device passed dmatest >>> >>>> 2. Seeing some USB stalls on a platform USB device. >>> >>>> 3. PCIe NVME drive probed and worked fine with MSI interrupts after boot. >>> >>>> 4. NVMe driver didn't probe following a hotplug insertion and received an >>> >>>> SMMU error event during the insertion. >>> >>> >>> >>> What was the SMMU error - a translation/permission fault (implying the >>> >>> wrong DMA ops) or a bad STE fault (implying we totally failed to tell >>> >>> the SMMU about the device at all)? >>> >>> >>> >> >>> >> root@ubuntu:/sys/bus/pci/slots/4# echo 0 > power >>> >> >>> >> [__204.698522]_iommu:_Removing_device_0003:01:00.0_from_group_0 >>> >> [ 204.708704] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down >>> >> [ 204.708723] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down event >>> >> ignored; already powering off >>> >> >>> >> root@ubuntu:/sys/bus/pci/slots/4# >>> >> >>> >> [__254.820440]_iommu:_Adding_device_0003:01:00.0_to_group_8 >>> >> [ 254.820599] nvme nvme0: pci function 0003:01:00.0 >>> >> [ 254.820621] nvme 0003:01:00.0: enabling device (0000 -> 0002) >>> >> [ 261.948558] arm-smmu-v3 arm-smmu-v3.0.auto: event 0x0a received: >>> >> [ 261.948561] arm-smmu-v3 arm-smmu-v3.0.auto: 0x000001000000000a >>> >> [ 261.948563] arm-smmu-v3 arm-smmu-v3.0.auto: 0x0000000000000000 >>> >> [ 261.948564] arm-smmu-v3 arm-smmu-v3.0.auto: 0x0000000000000000 >>> >> [ 261.948566] arm-smmu-v3 arm-smmu-v3.0.auto: 0x0000000000000000 >>> >>> > Looks like C_BAD_CD. Can you please try with: >>> > iommu/arm-smmu-v3: Clear prior settings when updating STEs >>> >>> This resolved the issue. Can we pull Nate's patch to 4.10 so that I don't see >>> this issue again. >> >> I already sent the pull request to Joerg for 4.11. Do you see this problem >> without Sricharan's patches (i.e. vanilla mainline)? If so, we'll need to >> send the patch to stable after -rc1. > Using vanilla mainline, I see it most commonly when directly assigning > a device to a guest machine. I think I've also seen it after removing then > re-adding a PCI device. Basically anytime an STE's CTX pointer is changed > from a non-NULL value and STE[CFG] indicates translation will be performed. > I was not able to reproduce the issue with Vanilla kernel. I only tested hotplug. > Nate >> >> Will > -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel From mboxrd@z Thu Jan 1 00:00:00 1970 From: okaya@codeaurora.org (Sinan Kaya) Date: Mon, 30 Jan 2017 10:46:39 -0500 Subject: [PATCH V7 08/11] drivers: acpi: Handle IOMMU lookup failure with deferred probing or error In-Reply-To: References: <1485188293-20263-1-git-send-email-sricharan@codeaurora.org> <1485188293-20263-9-git-send-email-sricharan@codeaurora.org> <20170124123711.GA11996@red-moon> <93e79759-d614-9b36-d5ab-63e8eb725009@arm.com> <14751205-f034-7f0d-442a-854c3909425c@codeaurora.org> <5ba9f366d6e25397cdef8ad95b49e199@codeaurora.org> <175a3798-b824-ef1a-e112-9f6f472973ae@codeaurora.org> <20170130143851.GJ16461@arm.com> Message-ID: <1e048aff-0d77-b9f2-ebf8-2ba315b90ca7@codeaurora.org> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On 1/30/2017 9:54 AM, Nate Watterson wrote: > On 2017-01-30 09:38, Will Deacon wrote: >> On Mon, Jan 30, 2017 at 09:33:50AM -0500, Sinan Kaya wrote: >>> On 1/30/2017 9:23 AM, Nate Watterson wrote: >>> > On 2017-01-30 08:59, Sinan Kaya wrote: >>> >> On 1/30/2017 7:22 AM, Robin Murphy wrote: >>> >>> On 29/01/17 17:53, Sinan Kaya wrote: >>> >>>> On 1/24/2017 7:37 AM, Lorenzo Pieralisi wrote: >>> >>>>> [+hanjun, tomasz, sinan] >>> >>>>> >>> >>>>> It is quite a key patchset, I would be glad if they can test on their >>> >>>>> respective platforms with IORT. >>> >>>>> >>> >>>> >>> >>>> Tested on top of 4.10-rc5. >>> >>>> >>> >>>> 1. Platform Hidma device passed dmatest >>> >>>> 2. Seeing some USB stalls on a platform USB device. >>> >>>> 3. PCIe NVME drive probed and worked fine with MSI interrupts after boot. >>> >>>> 4. NVMe driver didn't probe following a hotplug insertion and received an >>> >>>> SMMU error event during the insertion. >>> >>> >>> >>> What was the SMMU error - a translation/permission fault (implying the >>> >>> wrong DMA ops) or a bad STE fault (implying we totally failed to tell >>> >>> the SMMU about the device at all)? >>> >>> >>> >> >>> >> root at ubuntu:/sys/bus/pci/slots/4# echo 0 > power >>> >> >>> >> [__204.698522]_iommu:_Removing_device_0003:01:00.0_from_group_0 >>> >> [ 204.708704] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down >>> >> [ 204.708723] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down event >>> >> ignored; already powering off >>> >> >>> >> root at ubuntu:/sys/bus/pci/slots/4# >>> >> >>> >> [__254.820440]_iommu:_Adding_device_0003:01:00.0_to_group_8 >>> >> [ 254.820599] nvme nvme0: pci function 0003:01:00.0 >>> >> [ 254.820621] nvme 0003:01:00.0: enabling device (0000 -> 0002) >>> >> [ 261.948558] arm-smmu-v3 arm-smmu-v3.0.auto: event 0x0a received: >>> >> [ 261.948561] arm-smmu-v3 arm-smmu-v3.0.auto: 0x000001000000000a >>> >> [ 261.948563] arm-smmu-v3 arm-smmu-v3.0.auto: 0x0000000000000000 >>> >> [ 261.948564] arm-smmu-v3 arm-smmu-v3.0.auto: 0x0000000000000000 >>> >> [ 261.948566] arm-smmu-v3 arm-smmu-v3.0.auto: 0x0000000000000000 >>> >>> > Looks like C_BAD_CD. Can you please try with: >>> > iommu/arm-smmu-v3: Clear prior settings when updating STEs >>> >>> This resolved the issue. Can we pull Nate's patch to 4.10 so that I don't see >>> this issue again. >> >> I already sent the pull request to Joerg for 4.11. Do you see this problem >> without Sricharan's patches (i.e. vanilla mainline)? If so, we'll need to >> send the patch to stable after -rc1. > Using vanilla mainline, I see it most commonly when directly assigning > a device to a guest machine. I think I've also seen it after removing then > re-adding a PCI device. Basically anytime an STE's CTX pointer is changed > from a non-NULL value and STE[CFG] indicates translation will be performed. > I was not able to reproduce the issue with Vanilla kernel. I only tested hotplug. > Nate >> >> Will > -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.