* [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" @ 2022-11-14 11:03 Manivannan Sadhasivam 2022-11-14 11:29 ` Manivannan Sadhasivam ` (3 more replies) 0 siblings, 4 replies; 27+ messages in thread From: Manivannan Sadhasivam @ 2022-11-14 11:03 UTC (permalink / raw) To: catalin.marinas, will Cc: robin.murphy, amit.pundir, andersson, quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel, Manivannan Sadhasivam This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. As reported by Amit [1], dropping cache invalidation from arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform (most probably on other Qcom platforms too). The reason is, Qcom qcom_q6v5_mss driver copies the firmware metadata and shares it with modem for validation. The modem has a secure block (XPU) that will trigger a whole system crash if the shared memory is accessed by the CPU while modem is poking at it. To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory with no kernel mapping, vmap's it, copies the firmware metadata and unvmap's it. Finally the address is then shared with modem for metadata validation [2]. Now because of the removal of cache invalidation from arch_dma_prep_coherent(), there will be cache lines associated with this memory even after sharing with modem. So when the CPU accesses it, the XPU violation gets triggered. So let's revert this commit to get remoteproc's working (thereby avoiding full system crash) on Qcom platforms. [1] https://lore.kernel.org/linux-arm-kernel/CAMi1Hd1VBCFhf7+EXWHQWcGy4k=tcyLa7RGiFdprtRnegSG0Mw@mail.gmail.com/ [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c#n933 Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> --- Will, Catalin: Please share if you have any other suggestions to handle the resource sharing in the remoteproc driver that could avoid this revert. arch/arm64/mm/dma-mapping.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 3cb101e8cb29..7d7e9a046305 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -36,7 +36,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) { unsigned long start = (unsigned long)page_address(page); - dcache_clean_poc(start, start + size); + dcache_clean_inval_poc(start, start + size); } #ifdef CONFIG_IOMMU_DMA -- 2.25.1 ^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-11-14 11:03 [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" Manivannan Sadhasivam @ 2022-11-14 11:29 ` Manivannan Sadhasivam 2022-11-14 14:11 ` Will Deacon ` (2 subsequent siblings) 3 siblings, 0 replies; 27+ messages in thread From: Manivannan Sadhasivam @ 2022-11-14 11:29 UTC (permalink / raw) To: catalin.marinas, will Cc: robin.murphy, amit.pundir, andersson, quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > As reported by Amit [1], dropping cache invalidation from > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform s/SM8250/SDM845/g Sorry for the confusion. Thanks, Mani > (most probably on other Qcom platforms too). The reason is, Qcom > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > for validation. The modem has a secure block (XPU) that will trigger a > whole system crash if the shared memory is accessed by the CPU while modem > is poking at it. > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory > with no kernel mapping, vmap's it, copies the firmware metadata and > unvmap's it. Finally the address is then shared with modem for metadata > validation [2]. > > Now because of the removal of cache invalidation from > arch_dma_prep_coherent(), there will be cache lines associated with this > memory even after sharing with modem. So when the CPU accesses it, the XPU > violation gets triggered. > > So let's revert this commit to get remoteproc's working (thereby avoiding > full system crash) on Qcom platforms. > > [1] https://lore.kernel.org/linux-arm-kernel/CAMi1Hd1VBCFhf7+EXWHQWcGy4k=tcyLa7RGiFdprtRnegSG0Mw@mail.gmail.com/ > [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c#n933 > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> > --- > > Will, Catalin: Please share if you have any other suggestions to handle the > resource sharing in the remoteproc driver that could avoid this revert. > > arch/arm64/mm/dma-mapping.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c > index 3cb101e8cb29..7d7e9a046305 100644 > --- a/arch/arm64/mm/dma-mapping.c > +++ b/arch/arm64/mm/dma-mapping.c > @@ -36,7 +36,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) > { > unsigned long start = (unsigned long)page_address(page); > > - dcache_clean_poc(start, start + size); > + dcache_clean_inval_poc(start, start + size); > } > > #ifdef CONFIG_IOMMU_DMA > -- > 2.25.1 > -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-11-14 11:03 [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" Manivannan Sadhasivam 2022-11-14 11:29 ` Manivannan Sadhasivam @ 2022-11-14 14:11 ` Will Deacon 2022-11-14 15:14 ` Robin Murphy 2022-11-28 5:44 ` Thorsten Leemhuis 2022-12-08 4:59 ` Leonard Lausen 3 siblings, 1 reply; 27+ messages in thread From: Will Deacon @ 2022-11-14 14:11 UTC (permalink / raw) To: Manivannan Sadhasivam Cc: catalin.marinas, robin.murphy, amit.pundir, andersson, quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > As reported by Amit [1], dropping cache invalidation from > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > (most probably on other Qcom platforms too). The reason is, Qcom > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > for validation. The modem has a secure block (XPU) that will trigger a > whole system crash if the shared memory is accessed by the CPU while modem > is poking at it. > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory > with no kernel mapping, vmap's it, copies the firmware metadata and > unvmap's it. Finally the address is then shared with modem for metadata > validation [2]. > > Now because of the removal of cache invalidation from > arch_dma_prep_coherent(), there will be cache lines associated with this > memory even after sharing with modem. So when the CPU accesses it, the XPU > violation gets triggered. This last past is a non-sequitur: the buffer is no longer mapped on the CPU side, so how would the CPU access it? As I just replied to Amit, we need more information about what this "access" is and how it is being detected. Will ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-11-14 14:11 ` Will Deacon @ 2022-11-14 15:14 ` Robin Murphy 2022-11-14 17:38 ` Catalin Marinas 0 siblings, 1 reply; 27+ messages in thread From: Robin Murphy @ 2022-11-14 15:14 UTC (permalink / raw) To: Will Deacon, Manivannan Sadhasivam Cc: catalin.marinas, amit.pundir, andersson, quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel On 2022-11-14 14:11, Will Deacon wrote: > On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: >> This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. >> >> As reported by Amit [1], dropping cache invalidation from >> arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform >> (most probably on other Qcom platforms too). The reason is, Qcom >> qcom_q6v5_mss driver copies the firmware metadata and shares it with modem >> for validation. The modem has a secure block (XPU) that will trigger a >> whole system crash if the shared memory is accessed by the CPU while modem >> is poking at it. >> >> To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory >> with no kernel mapping, vmap's it, copies the firmware metadata and >> unvmap's it. Finally the address is then shared with modem for metadata >> validation [2]. >> >> Now because of the removal of cache invalidation from >> arch_dma_prep_coherent(), there will be cache lines associated with this >> memory even after sharing with modem. So when the CPU accesses it, the XPU >> violation gets triggered. > > This last past is a non-sequitur: the buffer is no longer mapped on the CPU > side, so how would the CPU access it? Right, for the previous change to have made a difference the offending part of this buffer must be present in some cache somewhere *before* the DMA buffer allocation completes. Clearly that driver is completely broken though. If the DMA allocation came from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() shenanigans wouldn't work, so if it backed by struct pages then the whole dance is still pointless because *a cacheable linear mapping exists*, and it's just relying on the reduced chance that anything's going to re-fetch the linear map address after those pages have been allocated, exactly as I called out previously[1]. Robin. [1] https://lore.kernel.org/linux-arm-kernel/97fface8-e40e-072c-4335-c94094884e93@arm.com/ > As I just replied to Amit, we need more information about what this > "access" is and how it is being detected. > > Will ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-11-14 15:14 ` Robin Murphy @ 2022-11-14 17:38 ` Catalin Marinas 2022-11-18 10:54 ` Manivannan Sadhasivam 0 siblings, 1 reply; 27+ messages in thread From: Catalin Marinas @ 2022-11-14 17:38 UTC (permalink / raw) To: Robin Murphy Cc: Will Deacon, Manivannan Sadhasivam, amit.pundir, andersson, quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote: > On 2022-11-14 14:11, Will Deacon wrote: > > On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: > > > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > > > > > As reported by Amit [1], dropping cache invalidation from > > > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > > > (most probably on other Qcom platforms too). The reason is, Qcom > > > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > > > for validation. The modem has a secure block (XPU) that will trigger a > > > whole system crash if the shared memory is accessed by the CPU while modem > > > is poking at it. > > > > > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory > > > with no kernel mapping, vmap's it, copies the firmware metadata and > > > unvmap's it. Finally the address is then shared with modem for metadata > > > validation [2]. > > > > > > Now because of the removal of cache invalidation from > > > arch_dma_prep_coherent(), there will be cache lines associated with this > > > memory even after sharing with modem. So when the CPU accesses it, the XPU > > > violation gets triggered. > > > > This last past is a non-sequitur: the buffer is no longer mapped on the CPU > > side, so how would the CPU access it? > > Right, for the previous change to have made a difference the offending part > of this buffer must be present in some cache somewhere *before* the DMA > buffer allocation completes. > > Clearly that driver is completely broken though. If the DMA allocation came > from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() > shenanigans wouldn't work, so if it backed by struct pages then the whole > dance is still pointless because *a cacheable linear mapping exists*, and > it's just relying on the reduced chance that anything's going to re-fetch > the linear map address after those pages have been allocated, exactly as I > called out previously[1]. So I guess a DMA pool that's not mapped in the linear map, together with memremap() instead of vmap(), would work around the issue. But the driver needs fixing, not the arch code. -- Catalin ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-11-14 17:38 ` Catalin Marinas @ 2022-11-18 10:54 ` Manivannan Sadhasivam 2022-11-18 12:33 ` Will Deacon 0 siblings, 1 reply; 27+ messages in thread From: Manivannan Sadhasivam @ 2022-11-18 10:54 UTC (permalink / raw) To: Catalin Marinas Cc: Robin Murphy, Will Deacon, amit.pundir, andersson, quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote: > On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote: > > On 2022-11-14 14:11, Will Deacon wrote: > > > On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: > > > > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > > > > > > > As reported by Amit [1], dropping cache invalidation from > > > > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > > > > (most probably on other Qcom platforms too). The reason is, Qcom > > > > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > > > > for validation. The modem has a secure block (XPU) that will trigger a > > > > whole system crash if the shared memory is accessed by the CPU while modem > > > > is poking at it. > > > > > > > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory > > > > with no kernel mapping, vmap's it, copies the firmware metadata and > > > > unvmap's it. Finally the address is then shared with modem for metadata > > > > validation [2]. > > > > > > > > Now because of the removal of cache invalidation from > > > > arch_dma_prep_coherent(), there will be cache lines associated with this > > > > memory even after sharing with modem. So when the CPU accesses it, the XPU > > > > violation gets triggered. > > > > > > This last past is a non-sequitur: the buffer is no longer mapped on the CPU > > > side, so how would the CPU access it? > > > > Right, for the previous change to have made a difference the offending part > > of this buffer must be present in some cache somewhere *before* the DMA > > buffer allocation completes. > > > > Clearly that driver is completely broken though. If the DMA allocation came > > from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() > > shenanigans wouldn't work, so if it backed by struct pages then the whole > > dance is still pointless because *a cacheable linear mapping exists*, and > > it's just relying on the reduced chance that anything's going to re-fetch > > the linear map address after those pages have been allocated, exactly as I > > called out previously[1]. > > So I guess a DMA pool that's not mapped in the linear map, together with > memremap() instead of vmap(), would work around the issue. But the > driver needs fixing, not the arch code. > Okay, thanks for the hint. Can you share how to allocate the dma-pool that's not part of the kernel's linear map? I looked into it but couldn't find a way. Thanks, Mani > -- > Catalin -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-11-18 10:54 ` Manivannan Sadhasivam @ 2022-11-18 12:33 ` Will Deacon 2022-11-21 6:42 ` Manivannan Sadhasivam 0 siblings, 1 reply; 27+ messages in thread From: Will Deacon @ 2022-11-18 12:33 UTC (permalink / raw) To: Manivannan Sadhasivam Cc: Catalin Marinas, Robin Murphy, amit.pundir, andersson, quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote: > On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote: > > On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote: > > > On 2022-11-14 14:11, Will Deacon wrote: > > > > On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: > > > > > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > > > > > > > > > As reported by Amit [1], dropping cache invalidation from > > > > > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > > > > > (most probably on other Qcom platforms too). The reason is, Qcom > > > > > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > > > > > for validation. The modem has a secure block (XPU) that will trigger a > > > > > whole system crash if the shared memory is accessed by the CPU while modem > > > > > is poking at it. > > > > > > > > > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory > > > > > with no kernel mapping, vmap's it, copies the firmware metadata and > > > > > unvmap's it. Finally the address is then shared with modem for metadata > > > > > validation [2]. > > > > > > > > > > Now because of the removal of cache invalidation from > > > > > arch_dma_prep_coherent(), there will be cache lines associated with this > > > > > memory even after sharing with modem. So when the CPU accesses it, the XPU > > > > > violation gets triggered. > > > > > > > > This last past is a non-sequitur: the buffer is no longer mapped on the CPU > > > > side, so how would the CPU access it? > > > > > > Right, for the previous change to have made a difference the offending part > > > of this buffer must be present in some cache somewhere *before* the DMA > > > buffer allocation completes. > > > > > > Clearly that driver is completely broken though. If the DMA allocation came > > > from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() > > > shenanigans wouldn't work, so if it backed by struct pages then the whole > > > dance is still pointless because *a cacheable linear mapping exists*, and > > > it's just relying on the reduced chance that anything's going to re-fetch > > > the linear map address after those pages have been allocated, exactly as I > > > called out previously[1]. > > > > So I guess a DMA pool that's not mapped in the linear map, together with > > memremap() instead of vmap(), would work around the issue. But the > > driver needs fixing, not the arch code. > > > > Okay, thanks for the hint. Can you share how to allocate the dma-pool that's > not part of the kernel's linear map? I looked into it but couldn't find a way. The no-map property should take care of this iirc Will ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-11-18 12:33 ` Will Deacon @ 2022-11-21 6:42 ` Manivannan Sadhasivam 2022-11-21 10:12 ` Sibi Sankar 0 siblings, 1 reply; 27+ messages in thread From: Manivannan Sadhasivam @ 2022-11-21 6:42 UTC (permalink / raw) To: Will Deacon Cc: Catalin Marinas, Robin Murphy, amit.pundir, andersson, quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel On Fri, Nov 18, 2022 at 12:33:49PM +0000, Will Deacon wrote: > On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote: > > On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote: > > > On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote: > > > > On 2022-11-14 14:11, Will Deacon wrote: > > > > > On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: > > > > > > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > > > > > > > > > > > As reported by Amit [1], dropping cache invalidation from > > > > > > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > > > > > > (most probably on other Qcom platforms too). The reason is, Qcom > > > > > > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > > > > > > for validation. The modem has a secure block (XPU) that will trigger a > > > > > > whole system crash if the shared memory is accessed by the CPU while modem > > > > > > is poking at it. > > > > > > > > > > > > To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory > > > > > > with no kernel mapping, vmap's it, copies the firmware metadata and > > > > > > unvmap's it. Finally the address is then shared with modem for metadata > > > > > > validation [2]. > > > > > > > > > > > > Now because of the removal of cache invalidation from > > > > > > arch_dma_prep_coherent(), there will be cache lines associated with this > > > > > > memory even after sharing with modem. So when the CPU accesses it, the XPU > > > > > > violation gets triggered. > > > > > > > > > > This last past is a non-sequitur: the buffer is no longer mapped on the CPU > > > > > side, so how would the CPU access it? > > > > > > > > Right, for the previous change to have made a difference the offending part > > > > of this buffer must be present in some cache somewhere *before* the DMA > > > > buffer allocation completes. > > > > > > > > Clearly that driver is completely broken though. If the DMA allocation came > > > > from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() > > > > shenanigans wouldn't work, so if it backed by struct pages then the whole > > > > dance is still pointless because *a cacheable linear mapping exists*, and > > > > it's just relying on the reduced chance that anything's going to re-fetch > > > > the linear map address after those pages have been allocated, exactly as I > > > > called out previously[1]. > > > > > > So I guess a DMA pool that's not mapped in the linear map, together with > > > memremap() instead of vmap(), would work around the issue. But the > > > driver needs fixing, not the arch code. > > > > > > > Okay, thanks for the hint. Can you share how to allocate the dma-pool that's > > not part of the kernel's linear map? I looked into it but couldn't find a way. > > The no-map property should take care of this iirc > Yeah, we have been using it in other places of the same driver. But as per Sibi, we used dynamic allocation for metadata validation since there was no memory reserved statically for that. But if we do not have a way to allocate a dynamic memory that is not part of kernel's linear map, then we may have to resort to using an existing reserved memory. Thanks, Mani > Will -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-11-21 6:42 ` Manivannan Sadhasivam @ 2022-11-21 10:12 ` Sibi Sankar 2022-11-24 11:55 ` Catalin Marinas 0 siblings, 1 reply; 27+ messages in thread From: Sibi Sankar @ 2022-11-21 10:12 UTC (permalink / raw) To: Manivannan Sadhasivam, Will Deacon Cc: Catalin Marinas, Robin Murphy, amit.pundir, andersson, sumit.semwal, linux-arm-kernel, linux-kernel On 11/21/22 12:12, Manivannan Sadhasivam wrote: > On Fri, Nov 18, 2022 at 12:33:49PM +0000, Will Deacon wrote: >> On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote: >>> On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote: >>>> On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote: >>>>> On 2022-11-14 14:11, Will Deacon wrote: >>>>>> On Mon, Nov 14, 2022 at 04:33:29PM +0530, Manivannan Sadhasivam wrote: >>>>>>> This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. >>>>>>> >>>>>>> As reported by Amit [1], dropping cache invalidation from >>>>>>> arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform >>>>>>> (most probably on other Qcom platforms too). The reason is, Qcom >>>>>>> qcom_q6v5_mss driver copies the firmware metadata and shares it with modem >>>>>>> for validation. The modem has a secure block (XPU) that will trigger a >>>>>>> whole system crash if the shared memory is accessed by the CPU while modem >>>>>>> is poking at it. >>>>>>> >>>>>>> To avoid this issue, the qcom_q6v5_mss driver allocates a chunk of memory >>>>>>> with no kernel mapping, vmap's it, copies the firmware metadata and >>>>>>> unvmap's it. Finally the address is then shared with modem for metadata >>>>>>> validation [2]. >>>>>>> >>>>>>> Now because of the removal of cache invalidation from >>>>>>> arch_dma_prep_coherent(), there will be cache lines associated with this >>>>>>> memory even after sharing with modem. So when the CPU accesses it, the XPU >>>>>>> violation gets triggered. >>>>>> >>>>>> This last past is a non-sequitur: the buffer is no longer mapped on the CPU >>>>>> side, so how would the CPU access it? >>>>> >>>>> Right, for the previous change to have made a difference the offending part >>>>> of this buffer must be present in some cache somewhere *before* the DMA >>>>> buffer allocation completes. >>>>> >>>>> Clearly that driver is completely broken though. If the DMA allocation came >>>>> from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() >>>>> shenanigans wouldn't work, so if it backed by struct pages then the whole >>>>> dance is still pointless because *a cacheable linear mapping exists*, and >>>>> it's just relying on the reduced chance that anything's going to re-fetch >>>>> the linear map address after those pages have been allocated, exactly as I >>>>> called out previously[1]. >>>> >>>> So I guess a DMA pool that's not mapped in the linear map, together with >>>> memremap() instead of vmap(), would work around the issue. But the >>>> driver needs fixing, not the arch code. >>>> >>> >>> Okay, thanks for the hint. Can you share how to allocate the dma-pool that's >>> not part of the kernel's linear map? I looked into it but couldn't find a way. >> >> The no-map property should take care of this iirc >> > > Yeah, we have been using it in other places of the same driver. But as per > Sibi, we used dynamic allocation for metadata validation since there was no > memory reserved statically for that. Will, Unlike the other portions in the driver that required statically defined no-map carveouts, metadata just needed a contiguous memory for authentication. Re-using existing carveouts for this metadata region may not work due to modem FW limitations and declaring a new carveout for metadata will break the device tree bindings. That's the reason for using DMA_ATTR_NO_KERNEL_MAPPING for dma_alloc_attr and vmpa/vunmap with VM_FLUSH_RESET_PERMS before passing the memory onto modem. Are there other suggestions for achieving the same without breaking bindings? - Sibi > > But if we do not have a way to allocate a dynamic memory that is not part of > kernel's linear map, then we may have to resort to using an existing reserved > memory. > > Thanks, > Mani > >> Will > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-11-21 10:12 ` Sibi Sankar @ 2022-11-24 11:55 ` Catalin Marinas 2022-12-01 9:29 ` Thorsten Leemhuis 0 siblings, 1 reply; 27+ messages in thread From: Catalin Marinas @ 2022-11-24 11:55 UTC (permalink / raw) To: Sibi Sankar Cc: Manivannan Sadhasivam, Will Deacon, Robin Murphy, amit.pundir, andersson, sumit.semwal, linux-arm-kernel, linux-kernel On Mon, Nov 21, 2022 at 03:42:27PM +0530, Sibi Sankar wrote: > On 11/21/22 12:12, Manivannan Sadhasivam wrote: > > On Fri, Nov 18, 2022 at 12:33:49PM +0000, Will Deacon wrote: > > > On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote: > > > > On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote: > > > > > On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote: > > > > > > Clearly that driver is completely broken though. If the DMA allocation came > > > > > > from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() > > > > > > shenanigans wouldn't work, so if it backed by struct pages then the whole > > > > > > dance is still pointless because *a cacheable linear mapping exists*, and > > > > > > it's just relying on the reduced chance that anything's going to re-fetch > > > > > > the linear map address after those pages have been allocated, exactly as I > > > > > > called out previously[1]. > > > > > > > > > > So I guess a DMA pool that's not mapped in the linear map, together with > > > > > memremap() instead of vmap(), would work around the issue. But the > > > > > driver needs fixing, not the arch code. > > > > > > > > Okay, thanks for the hint. Can you share how to allocate the dma-pool that's > > > > not part of the kernel's linear map? I looked into it but couldn't find a way. > > > > > > The no-map property should take care of this iirc > > > > Yeah, we have been using it in other places of the same driver. But as per > > Sibi, we used dynamic allocation for metadata validation since there was no > > memory reserved statically for that. > > Unlike the other portions in the driver that required statically defined > no-map carveouts, metadata just needed a contiguous memory for > authentication. Re-using existing carveouts for this metadata region > may not work due to modem FW limitations and declaring a new carveout for > metadata will break the device tree bindings. That's the reason for > using DMA_ATTR_NO_KERNEL_MAPPING for dma_alloc_attr and vmpa/vunmap with > VM_FLUSH_RESET_PERMS before passing the memory onto modem. Are there other > suggestions for achieving the same without breaking bindings? Your DMA_ATTR_NO_KERNEL_MAPPING workaround doesn't work, it only makes the failure rate smaller. All this attribute does is avoiding creating a non-cacheable mapping but you still have the kernel linear mapping in place that may be speculatively accessed by the CPU. You were just lucky so far not to have hit the issue. So I'd rather see this fixed properly with a no-map carveout. Maybe you can reuse an existing carveout if the driver already needs some and avoid changing the DT. More complicated options include allocating memory and unmapping it from the linear map with set_memory_valid(), though that's not exported to modules and it also requires the linear map to be pages only, not block mappings. Yet another option is to have the swiotlb buffer unmapped from the kernel linear map and use the bounce buffer for this. That's more involved (Robin has some patches, though for a different reason and they may not make it upstream). -- Catalin ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-11-24 11:55 ` Catalin Marinas @ 2022-12-01 9:29 ` Thorsten Leemhuis 2022-12-01 17:45 ` Catalin Marinas 0 siblings, 1 reply; 27+ messages in thread From: Thorsten Leemhuis @ 2022-12-01 9:29 UTC (permalink / raw) To: Catalin Marinas, Sibi Sankar Cc: Manivannan Sadhasivam, Will Deacon, Robin Murphy, amit.pundir, andersson, sumit.semwal, linux-arm-kernel, linux-kernel Hi, this is your Linux kernel regression tracker. Top-posting for once, to make this easily accessible to everyone. Has any progress been made to fix this regression? It afaics is not a release critical issue, but well, it still would be nice to get this fixed before 6.1 is released. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. On 24.11.22 12:55, Catalin Marinas wrote: > On Mon, Nov 21, 2022 at 03:42:27PM +0530, Sibi Sankar wrote: >> On 11/21/22 12:12, Manivannan Sadhasivam wrote: >>> On Fri, Nov 18, 2022 at 12:33:49PM +0000, Will Deacon wrote: >>>> On Fri, Nov 18, 2022 at 04:24:02PM +0530, Manivannan Sadhasivam wrote: >>>>> On Mon, Nov 14, 2022 at 05:38:00PM +0000, Catalin Marinas wrote: >>>>>> On Mon, Nov 14, 2022 at 03:14:21PM +0000, Robin Murphy wrote: >>>>>>> Clearly that driver is completely broken though. If the DMA allocation came >>>>>>> from a no-map carveout vma_dma_alloc_from_dev_coherent() then the vmap() >>>>>>> shenanigans wouldn't work, so if it backed by struct pages then the whole >>>>>>> dance is still pointless because *a cacheable linear mapping exists*, and >>>>>>> it's just relying on the reduced chance that anything's going to re-fetch >>>>>>> the linear map address after those pages have been allocated, exactly as I >>>>>>> called out previously[1]. >>>>>> >>>>>> So I guess a DMA pool that's not mapped in the linear map, together with >>>>>> memremap() instead of vmap(), would work around the issue. But the >>>>>> driver needs fixing, not the arch code. >>>>> >>>>> Okay, thanks for the hint. Can you share how to allocate the dma-pool that's >>>>> not part of the kernel's linear map? I looked into it but couldn't find a way. >>>> >>>> The no-map property should take care of this iirc >>> >>> Yeah, we have been using it in other places of the same driver. But as per >>> Sibi, we used dynamic allocation for metadata validation since there was no >>> memory reserved statically for that. >> >> Unlike the other portions in the driver that required statically defined >> no-map carveouts, metadata just needed a contiguous memory for >> authentication. Re-using existing carveouts for this metadata region >> may not work due to modem FW limitations and declaring a new carveout for >> metadata will break the device tree bindings. That's the reason for >> using DMA_ATTR_NO_KERNEL_MAPPING for dma_alloc_attr and vmpa/vunmap with >> VM_FLUSH_RESET_PERMS before passing the memory onto modem. Are there other >> suggestions for achieving the same without breaking bindings? > > Your DMA_ATTR_NO_KERNEL_MAPPING workaround doesn't work, it only makes > the failure rate smaller. All this attribute does is avoiding creating a > non-cacheable mapping but you still have the kernel linear mapping in > place that may be speculatively accessed by the CPU. You were just lucky > so far not to have hit the issue. So I'd rather see this fixed properly > with a no-map carveout. Maybe you can reuse an existing carveout if the > driver already needs some and avoid changing the DT. More complicated > options include allocating memory and unmapping it from the linear map > with set_memory_valid(), though that's not exported to modules and it > also requires the linear map to be pages only, not block mappings. > > Yet another option is to have the swiotlb buffer unmapped from the > kernel linear map and use the bounce buffer for this. That's more > involved (Robin has some patches, though for a different reason and they > may not make it upstream). > ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-12-01 9:29 ` Thorsten Leemhuis @ 2022-12-01 17:45 ` Catalin Marinas 2022-12-02 8:26 ` Amit Pundir 0 siblings, 1 reply; 27+ messages in thread From: Catalin Marinas @ 2022-12-01 17:45 UTC (permalink / raw) To: Thorsten Leemhuis Cc: Sibi Sankar, Manivannan Sadhasivam, Will Deacon, Robin Murphy, amit.pundir, andersson, sumit.semwal, linux-arm-kernel, linux-kernel On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > Has any progress been made to fix this regression? It afaics is not a > release critical issue, but well, it still would be nice to get this > fixed before 6.1 is released. The only (nearly) risk-free "fix" for 6.1 would be to revert the commit that exposed the driver bug. It doesn't fix the actual bug, it only makes it less likely to happen. I like the original commit removing the cache invalidation as it shows drivers not behaving properly but, as a workaround, we could add a command line option to force back the old behaviour (defaulting to the new one) until the driver is fixed. -- Catalin ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-12-01 17:45 ` Catalin Marinas @ 2022-12-02 8:26 ` Amit Pundir 2022-12-02 8:54 ` Thorsten Leemhuis 0 siblings, 1 reply; 27+ messages in thread From: Amit Pundir @ 2022-12-02 8:26 UTC (permalink / raw) To: Catalin Marinas Cc: Thorsten Leemhuis, Sibi Sankar, Manivannan Sadhasivam, Will Deacon, Robin Murphy, andersson, sumit.semwal, linux-arm-kernel, linux-kernel On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > > On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > > Has any progress been made to fix this regression? It afaics is not a > > release critical issue, but well, it still would be nice to get this > > fixed before 6.1 is released. > > The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > that exposed the driver bug. It doesn't fix the actual bug, it only > makes it less likely to happen. > > I like the original commit removing the cache invalidation as it shows > drivers not behaving properly but, as a workaround, we could add a > command line option to force back the old behaviour (defaulting to the > new one) until the driver is fixed. We use DB845c extensively for mainline and android-mainline[1] testing with AOSP, and it is broken for weeks now. So be it a temporary workaround or a proper driver fix in place, we'd really appreciate a quick fix here. I understand that the revert doesn't fix the actual driver bug, but we were very very lucky so far that we had never hit this issue before. So at this point I'll take the revert of the upstream commit as well, while a proper fix is being worked upon. Regards, Amit Pundir [1] https://android.googlesource.com/kernel/common/+/refs/heads/android-mainline > > -- > Catalin ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-12-02 8:26 ` Amit Pundir @ 2022-12-02 8:54 ` Thorsten Leemhuis 2022-12-02 10:03 ` Will Deacon 0 siblings, 1 reply; 27+ messages in thread From: Thorsten Leemhuis @ 2022-12-02 8:54 UTC (permalink / raw) To: Amit Pundir, Catalin Marinas Cc: Sibi Sankar, Manivannan Sadhasivam, Will Deacon, Robin Murphy, andersson, sumit.semwal, linux-arm-kernel, linux-kernel On 02.12.22 09:26, Amit Pundir wrote: > On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: >> >> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: >>> Has any progress been made to fix this regression? It afaics is not a >>> release critical issue, but well, it still would be nice to get this >>> fixed before 6.1 is released. >> >> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit >> that exposed the driver bug. It doesn't fix the actual bug, it only >> makes it less likely to happen. >> >> I like the original commit removing the cache invalidation as it shows >> drivers not behaving properly Yeah, I understand that, but I guess it's my job to ask at this point: "is continuing to live with the old behavior for one or two more cycles" that much of a problem"? >> but, as a workaround, we could add a >> command line option to force back the old behaviour (defaulting to the >> new one) until the driver is fixed. Well, sometimes that approach is fine to fix a regression, but I'm not sure this is one of those situations, as this... > We use DB845c extensively for mainline and android-mainline[1] testing > with AOSP, and it is broken for weeks now. So be it a temporary > workaround or a proper driver fix in place, we'd really appreciate a > quick fix here. ...doesn't sound like we are not talking about some odd corner case here. But in the end that would be up to Linus to decide. I'll point him to this thread once more in my weekly report anyway. Maybe I'll even suggest to revert this change, not sure yet. > [...] Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-12-02 8:54 ` Thorsten Leemhuis @ 2022-12-02 10:03 ` Will Deacon 2022-12-02 10:34 ` Thorsten Leemhuis 2022-12-02 10:54 ` Manivannan Sadhasivam 0 siblings, 2 replies; 27+ messages in thread From: Will Deacon @ 2022-12-02 10:03 UTC (permalink / raw) To: Thorsten Leemhuis Cc: Amit Pundir, Catalin Marinas, Sibi Sankar, Manivannan Sadhasivam, Robin Murphy, andersson, sumit.semwal, linux-arm-kernel, linux-kernel, hch, gregkh On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > On 02.12.22 09:26, Amit Pundir wrote: > > On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > >> > >> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > >>> Has any progress been made to fix this regression? It afaics is not a > >>> release critical issue, but well, it still would be nice to get this > >>> fixed before 6.1 is released. > >> > >> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > >> that exposed the driver bug. It doesn't fix the actual bug, it only > >> makes it less likely to happen. > >> > >> I like the original commit removing the cache invalidation as it shows > >> drivers not behaving properly > > Yeah, I understand that, but I guess it's my job to ask at this point: > "is continuing to live with the old behavior for one or two more cycles" > that much of a problem"? That wouldn't be a problem. The problem is that I haven't see any efforts from the Qualcomm side to actually fix the drivers so what makes you think the issue will be addressed in one or two more cycles? On the other hand, if there were patches out there trying to fix the drivers and we just needed to revert this change to buy them some time, then that would obviously be the right thing to do. > >> but, as a workaround, we could add a > >> command line option to force back the old behaviour (defaulting to the > >> new one) until the driver is fixed. > > Well, sometimes that approach is fine to fix a regression, but I'm not > sure this is one of those situations, as this... > > > We use DB845c extensively for mainline and android-mainline[1] testing > > with AOSP, and it is broken for weeks now. So be it a temporary > > workaround or a proper driver fix in place, we'd really appreciate a > > quick fix here. > > ...doesn't sound like we are not talking about some odd corner case > here. But in the end that would be up to Linus to decide. The issue is that these drivers are abusing the DMA API to manage buffers which are being transferred to trustzone. Even with the revert, this is broken (the CPU can speculate from the kernel's cacheable linear mapping of memory), it just appears to be less likely with the CPUs on this SoC. So we end up in a situation where the kernel is flakey on these devices but with even less incentive for the drivers to be fixed. As well as broken drivers, the patch has also identified broken device-tree files where DMA-coherent devices weher incorrectly being treated as non-coherent: https://lore.kernel.org/linux-arm-kernel/20221124142501.29314-1-johan+linaro@kernel.org/ so I do think it's something that's worth having as the default behaviour. > I'll point him to this thread once more in my weekly report anyway. > Maybe I'll even suggest to revert this change, not sure yet. As I said above, I think the revert makes sense if the drivers are actually being fixed, but I'm not seeing any movement at all on that front. Will ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-12-02 10:03 ` Will Deacon @ 2022-12-02 10:34 ` Thorsten Leemhuis 2022-12-02 16:10 ` Greg KH 2022-12-02 10:54 ` Manivannan Sadhasivam 1 sibling, 1 reply; 27+ messages in thread From: Thorsten Leemhuis @ 2022-12-02 10:34 UTC (permalink / raw) To: Will Deacon Cc: Amit Pundir, Catalin Marinas, Sibi Sankar, Manivannan Sadhasivam, Robin Murphy, andersson, sumit.semwal, linux-arm-kernel, linux-kernel, hch, gregkh On 02.12.22 11:03, Will Deacon wrote: > On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: >> On 02.12.22 09:26, Amit Pundir wrote: >>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: >>>> >>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: >>>>> Has any progress been made to fix this regression? It afaics is not a >>>>> release critical issue, but well, it still would be nice to get this >>>>> fixed before 6.1 is released. >>>> >>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit >>>> that exposed the driver bug. It doesn't fix the actual bug, it only >>>> makes it less likely to happen. >>>> >>>> I like the original commit removing the cache invalidation as it shows >>>> drivers not behaving properly >> >> Yeah, I understand that, but I guess it's my job to ask at this point: >> "is continuing to live with the old behavior for one or two more cycles" >> that much of a problem"? > > That wouldn't be a problem. The problem is that I haven't see any efforts > from the Qualcomm side to actually fix the drivers [...] Thx for sharing the details. I can fully understand your pain. But well, in the end it looks to me like this commit it intentionally breaking something that used to work -- which to my understanding of the "no regression rule" is not okay, even if things only worked by chance and not flawless. But well, as with every rule there are misunderstandings, grey areas, and situations where judgement calls have to be made. Then it's up to Linus to decide how to handle things. Hence I'll just point him to this thread and then he can decide. No biggie. And sorry if I'm being a PITA here, I just thing doing that is my duty as regression tracker in situations like this. Hope your won't mind. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight. ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-12-02 10:34 ` Thorsten Leemhuis @ 2022-12-02 16:10 ` Greg KH 2022-12-02 16:27 ` Thorsten Leemhuis 0 siblings, 1 reply; 27+ messages in thread From: Greg KH @ 2022-12-02 16:10 UTC (permalink / raw) To: Thorsten Leemhuis Cc: Will Deacon, Amit Pundir, Catalin Marinas, Sibi Sankar, Manivannan Sadhasivam, Robin Murphy, andersson, sumit.semwal, linux-arm-kernel, linux-kernel, hch On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote: > On 02.12.22 11:03, Will Deacon wrote: > > On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > >> On 02.12.22 09:26, Amit Pundir wrote: > >>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > >>>> > >>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > >>>>> Has any progress been made to fix this regression? It afaics is not a > >>>>> release critical issue, but well, it still would be nice to get this > >>>>> fixed before 6.1 is released. > >>>> > >>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > >>>> that exposed the driver bug. It doesn't fix the actual bug, it only > >>>> makes it less likely to happen. > >>>> > >>>> I like the original commit removing the cache invalidation as it shows > >>>> drivers not behaving properly > >> > >> Yeah, I understand that, but I guess it's my job to ask at this point: > >> "is continuing to live with the old behavior for one or two more cycles" > >> that much of a problem"? > > > > That wouldn't be a problem. The problem is that I haven't see any efforts > > from the Qualcomm side to actually fix the drivers [...] > > Thx for sharing the details. I can fully understand your pain. But well, > in the end it looks to me like this commit it intentionally breaking > something that used to work -- which to my understanding of the "no > regression rule" is not okay, even if things only worked by chance and > not flawless. "no regressions" for userspace code, this is broken, out-of-tree driver code, right? I do not think any in-kernel drivers have this issue today from what I can tell, but if I am wrong here, please let me know. We don't keep stable apis, or even functionality, for out-of-tree kernel code as that would be impossible for us to do for obvious reasons. thanks, greg kh ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-12-02 16:10 ` Greg KH @ 2022-12-02 16:27 ` Thorsten Leemhuis 2022-12-02 16:32 ` Greg KH 0 siblings, 1 reply; 27+ messages in thread From: Thorsten Leemhuis @ 2022-12-02 16:27 UTC (permalink / raw) To: Greg KH Cc: Will Deacon, Amit Pundir, Catalin Marinas, Sibi Sankar, Manivannan Sadhasivam, Robin Murphy, andersson, sumit.semwal, linux-arm-kernel, linux-kernel, hch On 02.12.22 17:10, Greg KH wrote: > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote: >> On 02.12.22 11:03, Will Deacon wrote: >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: >>>> On 02.12.22 09:26, Amit Pundir wrote: >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: >>>>>> >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: >>>>>>> Has any progress been made to fix this regression? It afaics is not a >>>>>>> release critical issue, but well, it still would be nice to get this >>>>>>> fixed before 6.1 is released. >>>>>> >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only >>>>>> makes it less likely to happen. >>>>>> >>>>>> I like the original commit removing the cache invalidation as it shows >>>>>> drivers not behaving properly >>>> >>>> Yeah, I understand that, but I guess it's my job to ask at this point: >>>> "is continuing to live with the old behavior for one or two more cycles" >>>> that much of a problem"? >>> >>> That wouldn't be a problem. The problem is that I haven't see any efforts >>> from the Qualcomm side to actually fix the drivers [...] >> >> Thx for sharing the details. I can fully understand your pain. But well, >> in the end it looks to me like this commit it intentionally breaking >> something that used to work -- which to my understanding of the "no >> regression rule" is not okay, even if things only worked by chance and >> not flawless. > > "no regressions" for userspace code, this is broken, out-of-tree driver > code, right? If so: apologies. But that's not the impression I got, as Amit wrote "I can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree drivers." here: https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/ > I do not think any in-kernel drivers have this issue today > from what I can tell, but if I am wrong here, please let me know. > > We don't keep stable apis, or even functionality, for out-of-tree kernel > code as that would be impossible for us to do for obvious reasons. Ciao, Thorsten ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-12-02 16:27 ` Thorsten Leemhuis @ 2022-12-02 16:32 ` Greg KH 2022-12-02 17:14 ` Manivannan Sadhasivam 0 siblings, 1 reply; 27+ messages in thread From: Greg KH @ 2022-12-02 16:32 UTC (permalink / raw) To: Thorsten Leemhuis Cc: Will Deacon, Amit Pundir, Catalin Marinas, Sibi Sankar, Manivannan Sadhasivam, Robin Murphy, andersson, sumit.semwal, linux-arm-kernel, linux-kernel, hch On Fri, Dec 02, 2022 at 05:27:24PM +0100, Thorsten Leemhuis wrote: > > > On 02.12.22 17:10, Greg KH wrote: > > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote: > >> On 02.12.22 11:03, Will Deacon wrote: > >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > >>>> On 02.12.22 09:26, Amit Pundir wrote: > >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > >>>>>> > >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > >>>>>>> Has any progress been made to fix this regression? It afaics is not a > >>>>>>> release critical issue, but well, it still would be nice to get this > >>>>>>> fixed before 6.1 is released. > >>>>>> > >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only > >>>>>> makes it less likely to happen. > >>>>>> > >>>>>> I like the original commit removing the cache invalidation as it shows > >>>>>> drivers not behaving properly > >>>> > >>>> Yeah, I understand that, but I guess it's my job to ask at this point: > >>>> "is continuing to live with the old behavior for one or two more cycles" > >>>> that much of a problem"? > >>> > >>> That wouldn't be a problem. The problem is that I haven't see any efforts > >>> from the Qualcomm side to actually fix the drivers [...] > >> > >> Thx for sharing the details. I can fully understand your pain. But well, > >> in the end it looks to me like this commit it intentionally breaking > >> something that used to work -- which to my understanding of the "no > >> regression rule" is not okay, even if things only worked by chance and > >> not flawless. > > > > "no regressions" for userspace code, this is broken, out-of-tree driver > > code, right? > > If so: apologies. But that's not the impression I got, as Amit wrote "I > can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree > drivers." here: > https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/ Ah, I missed that. Ok, what in-tree drivers are having problems being buggy? I can't seem to figure that out from that report at all. Does anyone know? thanks, greg k-h ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-12-02 16:32 ` Greg KH @ 2022-12-02 17:14 ` Manivannan Sadhasivam 2022-12-05 14:24 ` Will Deacon 0 siblings, 1 reply; 27+ messages in thread From: Manivannan Sadhasivam @ 2022-12-02 17:14 UTC (permalink / raw) To: Greg KH Cc: Thorsten Leemhuis, Will Deacon, Amit Pundir, Catalin Marinas, Sibi Sankar, Robin Murphy, andersson, sumit.semwal, linux-arm-kernel, linux-kernel, hch On Fri, Dec 02, 2022 at 05:32:51PM +0100, Greg KH wrote: > On Fri, Dec 02, 2022 at 05:27:24PM +0100, Thorsten Leemhuis wrote: > > > > > > On 02.12.22 17:10, Greg KH wrote: > > > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote: > > >> On 02.12.22 11:03, Will Deacon wrote: > > >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > > >>>> On 02.12.22 09:26, Amit Pundir wrote: > > >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > > >>>>>> > > >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > > >>>>>>> Has any progress been made to fix this regression? It afaics is not a > > >>>>>>> release critical issue, but well, it still would be nice to get this > > >>>>>>> fixed before 6.1 is released. > > >>>>>> > > >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > > >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only > > >>>>>> makes it less likely to happen. > > >>>>>> > > >>>>>> I like the original commit removing the cache invalidation as it shows > > >>>>>> drivers not behaving properly > > >>>> > > >>>> Yeah, I understand that, but I guess it's my job to ask at this point: > > >>>> "is continuing to live with the old behavior for one or two more cycles" > > >>>> that much of a problem"? > > >>> > > >>> That wouldn't be a problem. The problem is that I haven't see any efforts > > >>> from the Qualcomm side to actually fix the drivers [...] > > >> > > >> Thx for sharing the details. I can fully understand your pain. But well, > > >> in the end it looks to me like this commit it intentionally breaking > > >> something that used to work -- which to my understanding of the "no > > >> regression rule" is not okay, even if things only worked by chance and > > >> not flawless. > > > > > > "no regressions" for userspace code, this is broken, out-of-tree driver > > > code, right? > > > > If so: apologies. But that's not the impression I got, as Amit wrote "I > > can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree > > drivers." here: > > https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/ > > Ah, I missed that. > > Ok, what in-tree drivers are having problems being buggy? I can't seem > to figure that out from that report at all. Does anyone know? > It is the Qualcomm Q6V5_MSS remoteproc driver: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c Qualcomm is working on the fix but the patches are not ready yet. So if we can get this patch reverted in the meantime, that would be helpful. Thanks, Mani > thanks, > > greg k-h -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-12-02 17:14 ` Manivannan Sadhasivam @ 2022-12-05 14:24 ` Will Deacon 2022-12-06 9:21 ` Manivannan Sadhasivam 0 siblings, 1 reply; 27+ messages in thread From: Will Deacon @ 2022-12-05 14:24 UTC (permalink / raw) To: Manivannan Sadhasivam Cc: Greg KH, Thorsten Leemhuis, Amit Pundir, Catalin Marinas, Sibi Sankar, Robin Murphy, andersson, sumit.semwal, linux-arm-kernel, linux-kernel, hch On Fri, Dec 02, 2022 at 10:44:37PM +0530, Manivannan Sadhasivam wrote: > On Fri, Dec 02, 2022 at 05:32:51PM +0100, Greg KH wrote: > > On Fri, Dec 02, 2022 at 05:27:24PM +0100, Thorsten Leemhuis wrote: > > > On 02.12.22 17:10, Greg KH wrote: > > > > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote: > > > >> On 02.12.22 11:03, Will Deacon wrote: > > > >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > > > >>>> On 02.12.22 09:26, Amit Pundir wrote: > > > >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > > > >>>>>> > > > >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > > > >>>>>>> Has any progress been made to fix this regression? It afaics is not a > > > >>>>>>> release critical issue, but well, it still would be nice to get this > > > >>>>>>> fixed before 6.1 is released. > > > >>>>>> > > > >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > > > >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only > > > >>>>>> makes it less likely to happen. > > > >>>>>> > > > >>>>>> I like the original commit removing the cache invalidation as it shows > > > >>>>>> drivers not behaving properly > > > >>>> > > > >>>> Yeah, I understand that, but I guess it's my job to ask at this point: > > > >>>> "is continuing to live with the old behavior for one or two more cycles" > > > >>>> that much of a problem"? > > > >>> > > > >>> That wouldn't be a problem. The problem is that I haven't see any efforts > > > >>> from the Qualcomm side to actually fix the drivers [...] > > > >> > > > >> Thx for sharing the details. I can fully understand your pain. But well, > > > >> in the end it looks to me like this commit it intentionally breaking > > > >> something that used to work -- which to my understanding of the "no > > > >> regression rule" is not okay, even if things only worked by chance and > > > >> not flawless. > > > > > > > > "no regressions" for userspace code, this is broken, out-of-tree driver > > > > code, right? > > > > > > If so: apologies. But that's not the impression I got, as Amit wrote "I > > > can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree > > > drivers." here: > > > https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/ > > > > Ah, I missed that. > > > > Ok, what in-tree drivers are having problems being buggy? I can't seem > > to figure that out from that report at all. Does anyone know? > > > > It is the Qualcomm Q6V5_MSS remoteproc driver: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c > > Qualcomm is working on the fix but the patches are not ready yet. So if we can > get this patch reverted in the meantime, that would be helpful. It's good to hear that you're working to fix this, even if it's happening behind closed doors. Do you have a rough idea how soon you'll be able to post the remoteproc driver fixes? That would help us to figure out when to bring back the change if we were to revert it. Cheers, Will ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-12-05 14:24 ` Will Deacon @ 2022-12-06 9:21 ` Manivannan Sadhasivam 2022-12-06 9:58 ` Will Deacon 0 siblings, 1 reply; 27+ messages in thread From: Manivannan Sadhasivam @ 2022-12-06 9:21 UTC (permalink / raw) To: Will Deacon Cc: Greg KH, Thorsten Leemhuis, Amit Pundir, Catalin Marinas, Sibi Sankar, Robin Murphy, andersson, sumit.semwal, linux-arm-kernel, linux-kernel, hch On Mon, Dec 05, 2022 at 02:24:03PM +0000, Will Deacon wrote: > On Fri, Dec 02, 2022 at 10:44:37PM +0530, Manivannan Sadhasivam wrote: > > On Fri, Dec 02, 2022 at 05:32:51PM +0100, Greg KH wrote: > > > On Fri, Dec 02, 2022 at 05:27:24PM +0100, Thorsten Leemhuis wrote: > > > > On 02.12.22 17:10, Greg KH wrote: > > > > > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote: > > > > >> On 02.12.22 11:03, Will Deacon wrote: > > > > >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > > > > >>>> On 02.12.22 09:26, Amit Pundir wrote: > > > > >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > >>>>>> > > > > >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > > > > >>>>>>> Has any progress been made to fix this regression? It afaics is not a > > > > >>>>>>> release critical issue, but well, it still would be nice to get this > > > > >>>>>>> fixed before 6.1 is released. > > > > >>>>>> > > > > >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > > > > >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only > > > > >>>>>> makes it less likely to happen. > > > > >>>>>> > > > > >>>>>> I like the original commit removing the cache invalidation as it shows > > > > >>>>>> drivers not behaving properly > > > > >>>> > > > > >>>> Yeah, I understand that, but I guess it's my job to ask at this point: > > > > >>>> "is continuing to live with the old behavior for one or two more cycles" > > > > >>>> that much of a problem"? > > > > >>> > > > > >>> That wouldn't be a problem. The problem is that I haven't see any efforts > > > > >>> from the Qualcomm side to actually fix the drivers [...] > > > > >> > > > > >> Thx for sharing the details. I can fully understand your pain. But well, > > > > >> in the end it looks to me like this commit it intentionally breaking > > > > >> something that used to work -- which to my understanding of the "no > > > > >> regression rule" is not okay, even if things only worked by chance and > > > > >> not flawless. > > > > > > > > > > "no regressions" for userspace code, this is broken, out-of-tree driver > > > > > code, right? > > > > > > > > If so: apologies. But that's not the impression I got, as Amit wrote "I > > > > can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree > > > > drivers." here: > > > > https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/ > > > > > > Ah, I missed that. > > > > > > Ok, what in-tree drivers are having problems being buggy? I can't seem > > > to figure that out from that report at all. Does anyone know? > > > > > > > It is the Qualcomm Q6V5_MSS remoteproc driver: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c > > > > Qualcomm is working on the fix but the patches are not ready yet. So if we can > > get this patch reverted in the meantime, that would be helpful. > > It's good to hear that you're working to fix this, even if it's happening > behind closed doors. Do you have a rough idea how soon you'll be able to > post the remoteproc driver fixes? That would help us to figure out when > to bring back the change if we were to revert it. > Sibi is the one working on the fix. I believe he should be able to post the patches within this week. Thanks, Mani > Cheers, > > Will -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-12-06 9:21 ` Manivannan Sadhasivam @ 2022-12-06 9:58 ` Will Deacon 0 siblings, 0 replies; 27+ messages in thread From: Will Deacon @ 2022-12-06 9:58 UTC (permalink / raw) To: Manivannan Sadhasivam Cc: Greg KH, Thorsten Leemhuis, Amit Pundir, Catalin Marinas, Sibi Sankar, Robin Murphy, andersson, sumit.semwal, linux-arm-kernel, linux-kernel, hch On Tue, Dec 06, 2022 at 02:51:52PM +0530, Manivannan Sadhasivam wrote: > On Mon, Dec 05, 2022 at 02:24:03PM +0000, Will Deacon wrote: > > On Fri, Dec 02, 2022 at 10:44:37PM +0530, Manivannan Sadhasivam wrote: > > > On Fri, Dec 02, 2022 at 05:32:51PM +0100, Greg KH wrote: > > > > On Fri, Dec 02, 2022 at 05:27:24PM +0100, Thorsten Leemhuis wrote: > > > > > On 02.12.22 17:10, Greg KH wrote: > > > > > > On Fri, Dec 02, 2022 at 11:34:30AM +0100, Thorsten Leemhuis wrote: > > > > > >> On 02.12.22 11:03, Will Deacon wrote: > > > > > >>> On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > > > > > >>>> On 02.12.22 09:26, Amit Pundir wrote: > > > > > >>>>> On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > > > > > >>>>>> > > > > > >>>>>> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > > > > > >>>>>>> Has any progress been made to fix this regression? It afaics is not a > > > > > >>>>>>> release critical issue, but well, it still would be nice to get this > > > > > >>>>>>> fixed before 6.1 is released. > > > > > >>>>>> > > > > > >>>>>> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > > > > > >>>>>> that exposed the driver bug. It doesn't fix the actual bug, it only > > > > > >>>>>> makes it less likely to happen. > > > > > >>>>>> > > > > > >>>>>> I like the original commit removing the cache invalidation as it shows > > > > > >>>>>> drivers not behaving properly > > > > > >>>> > > > > > >>>> Yeah, I understand that, but I guess it's my job to ask at this point: > > > > > >>>> "is continuing to live with the old behavior for one or two more cycles" > > > > > >>>> that much of a problem"? > > > > > >>> > > > > > >>> That wouldn't be a problem. The problem is that I haven't see any efforts > > > > > >>> from the Qualcomm side to actually fix the drivers [...] > > > > > >> > > > > > >> Thx for sharing the details. I can fully understand your pain. But well, > > > > > >> in the end it looks to me like this commit it intentionally breaking > > > > > >> something that used to work -- which to my understanding of the "no > > > > > >> regression rule" is not okay, even if things only worked by chance and > > > > > >> not flawless. > > > > > > > > > > > > "no regressions" for userspace code, this is broken, out-of-tree driver > > > > > > code, right? > > > > > > > > > > If so: apologies. But that's not the impression I got, as Amit wrote "I > > > > > can reproduce this crash on vanilla v6.1-rc1 as well with no out-of-tree > > > > > drivers." here: > > > > > https://lore.kernel.org/linux-arm-kernel/CAMi1Hd3H2k1J8hJ6e-Miy5+nVDNzv6qQ3nN-9929B0GbHJkXEg@mail.gmail.com/ > > > > > > > > Ah, I missed that. > > > > > > > > Ok, what in-tree drivers are having problems being buggy? I can't seem > > > > to figure that out from that report at all. Does anyone know? > > > > > > > > > > It is the Qualcomm Q6V5_MSS remoteproc driver: > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/remoteproc/qcom_q6v5_mss.c > > > > > > Qualcomm is working on the fix but the patches are not ready yet. So if we can > > > get this patch reverted in the meantime, that would be helpful. > > > > It's good to hear that you're working to fix this, even if it's happening > > behind closed doors. Do you have a rough idea how soon you'll be able to > > post the remoteproc driver fixes? That would help us to figure out when > > to bring back the change if we were to revert it. > > > > Sibi is the one working on the fix. I believe he should be able to post the > patches within this week. Oh nice, that's a lot sooner than I expected! I'll send a revert out now, with a comment about where we're at. Will ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-12-02 10:03 ` Will Deacon 2022-12-02 10:34 ` Thorsten Leemhuis @ 2022-12-02 10:54 ` Manivannan Sadhasivam 1 sibling, 0 replies; 27+ messages in thread From: Manivannan Sadhasivam @ 2022-12-02 10:54 UTC (permalink / raw) To: Will Deacon Cc: Thorsten Leemhuis, Amit Pundir, Catalin Marinas, Sibi Sankar, Robin Murphy, andersson, sumit.semwal, linux-arm-kernel, linux-kernel, hch, gregkh On Fri, Dec 02, 2022 at 10:03:58AM +0000, Will Deacon wrote: > On Fri, Dec 02, 2022 at 09:54:05AM +0100, Thorsten Leemhuis wrote: > > On 02.12.22 09:26, Amit Pundir wrote: > > > On Thu, 1 Dec 2022 at 23:15, Catalin Marinas <catalin.marinas@arm.com> wrote: > > >> > > >> On Thu, Dec 01, 2022 at 10:29:39AM +0100, Thorsten Leemhuis wrote: > > >>> Has any progress been made to fix this regression? It afaics is not a > > >>> release critical issue, but well, it still would be nice to get this > > >>> fixed before 6.1 is released. > > >> > > >> The only (nearly) risk-free "fix" for 6.1 would be to revert the commit > > >> that exposed the driver bug. It doesn't fix the actual bug, it only > > >> makes it less likely to happen. > > >> > > >> I like the original commit removing the cache invalidation as it shows > > >> drivers not behaving properly > > > > Yeah, I understand that, but I guess it's my job to ask at this point: > > "is continuing to live with the old behavior for one or two more cycles" > > that much of a problem"? > > That wouldn't be a problem. The problem is that I haven't see any efforts > from the Qualcomm side to actually fix the drivers so what makes you think > the issue will be addressed in one or two more cycles? On the other hand, if > there were patches out there trying to fix the drivers and we just needed to > revert this change to buy them some time, then that would obviously be the > right thing to do. > There are efforts going on to fix the driver from Qualcomm. It's just that the patches are not available yet. The delay is mainly due to the internal communication that should happen between the internal teams. The fix would be use a separate no-map carveout for the usecase. But it'd be good to revert this patch untill those patches get merged. Thanks, Mani > > >> but, as a workaround, we could add a > > >> command line option to force back the old behaviour (defaulting to the > > >> new one) until the driver is fixed. > > > > Well, sometimes that approach is fine to fix a regression, but I'm not > > sure this is one of those situations, as this... > > > > > We use DB845c extensively for mainline and android-mainline[1] testing > > > with AOSP, and it is broken for weeks now. So be it a temporary > > > workaround or a proper driver fix in place, we'd really appreciate a > > > quick fix here. > > > > ...doesn't sound like we are not talking about some odd corner case > > here. But in the end that would be up to Linus to decide. > > The issue is that these drivers are abusing the DMA API to manage buffers > which are being transferred to trustzone. Even with the revert, this is > broken (the CPU can speculate from the kernel's cacheable linear mapping > of memory), it just appears to be less likely with the CPUs on this SoC. > So we end up in a situation where the kernel is flakey on these devices > but with even less incentive for the drivers to be fixed. > > As well as broken drivers, the patch has also identified broken device-tree > files where DMA-coherent devices weher incorrectly being treated as > non-coherent: > > https://lore.kernel.org/linux-arm-kernel/20221124142501.29314-1-johan+linaro@kernel.org/ > > so I do think it's something that's worth having as the default behaviour. > > > I'll point him to this thread once more in my weekly report anyway. > > Maybe I'll even suggest to revert this change, not sure yet. > > As I said above, I think the revert makes sense if the drivers are actually > being fixed, but I'm not seeing any movement at all on that front. > > Will -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-11-14 11:03 [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" Manivannan Sadhasivam 2022-11-14 11:29 ` Manivannan Sadhasivam 2022-11-14 14:11 ` Will Deacon @ 2022-11-28 5:44 ` Thorsten Leemhuis 2022-11-28 8:15 ` Manivannan Sadhasivam 2022-12-08 4:59 ` Leonard Lausen 3 siblings, 1 reply; 27+ messages in thread From: Thorsten Leemhuis @ 2022-11-28 5:44 UTC (permalink / raw) To: Manivannan Sadhasivam, catalin.marinas, will Cc: robin.murphy, amit.pundir, andersson, quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel On 14.11.22 12:03, Manivannan Sadhasivam wrote: > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > As reported by Amit [1], dropping cache invalidation from > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > (most probably on other Qcom platforms too). The reason is, Qcom > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > for validation. The modem has a secure block (XPU) that will trigger a > whole system crash if the shared memory is accessed by the CPU while modem > is poking at it. > [...] > [1] https://lore.kernel.org/linux-arm-kernel/CAMi1Hd1VBCFhf7+EXWHQWcGy4k=tcyLa7RGiFdprtRnegSG0Mw@mail.gmail.com/ > I have Amit's report on the list of tracked regressions. I also noticed the proposed change "arm64: dts: qcom: sc8280xp: fix PCIe DMA coherency": https://lore.kernel.org/all/20221124142501.29314-1-johan+linaro@kernel.org/ I have no expertise in this area, but it looked somewhat related to my eyes, so please allow me to quickly ask: is that related or even supposed to fix Amit's regression? Ciao, Thorsten ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-11-28 5:44 ` Thorsten Leemhuis @ 2022-11-28 8:15 ` Manivannan Sadhasivam 0 siblings, 0 replies; 27+ messages in thread From: Manivannan Sadhasivam @ 2022-11-28 8:15 UTC (permalink / raw) To: Thorsten Leemhuis Cc: catalin.marinas, will, robin.murphy, amit.pundir, andersson, quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel On Mon, Nov 28, 2022 at 06:44:13AM +0100, Thorsten Leemhuis wrote: > On 14.11.22 12:03, Manivannan Sadhasivam wrote: > > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > > > As reported by Amit [1], dropping cache invalidation from > > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > > (most probably on other Qcom platforms too). The reason is, Qcom > > qcom_q6v5_mss driver copies the firmware metadata and shares it with modem > > for validation. The modem has a secure block (XPU) that will trigger a > > whole system crash if the shared memory is accessed by the CPU while modem > > is poking at it. > > [...] > > [1] https://lore.kernel.org/linux-arm-kernel/CAMi1Hd1VBCFhf7+EXWHQWcGy4k=tcyLa7RGiFdprtRnegSG0Mw@mail.gmail.com/ > > > > I have Amit's report on the list of tracked regressions. I also noticed > the proposed change "arm64: dts: qcom: sc8280xp: fix PCIe DMA coherency": > https://lore.kernel.org/all/20221124142501.29314-1-johan+linaro@kernel.org/ > > I have no expertise in this area, but it looked somewhat related to my > eyes, so please allow me to quickly ask: is that related or even > supposed to fix Amit's regression? > The proposed patch doesn't fix the regression reported by Amit. But the patch itself fixes an issue that might be triggered more frequently by c44094eee32f. Thanks, Mani > Ciao, Thorsten -- மணிவண்ணன் சதாசிவம் ^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" 2022-11-14 11:03 [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" Manivannan Sadhasivam ` (2 preceding siblings ...) 2022-11-28 5:44 ` Thorsten Leemhuis @ 2022-12-08 4:59 ` Leonard Lausen 3 siblings, 0 replies; 27+ messages in thread From: Leonard Lausen @ 2022-12-08 4:59 UTC (permalink / raw) To: Manivannan Sadhasivam, catalin.marinas, will Cc: robin.murphy, amit.pundir, andersson, quic_sibis, sumit.semwal, linux-arm-kernel, linux-kernel, Manivannan Sadhasivam Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> writes: > This reverts commit c44094eee32f32f175aadc0efcac449d99b1bbf7. > > As reported by Amit [1], dropping cache invalidation from > arch_dma_prep_coherent() triggers a crash on the Qualcomm SM8250 platform > (most probably on other Qcom platforms too). On sc7180 with c44094ee applied, it does not trigger crash but makes Wifi dysfunctional by preventing initialization of ath10k_snoc. qcom-q6v5-mss 4080000.remoteproc: PBL returned unexpected status -284098560 With the revert of c44094ee, wifi works fine again. Thank you Leonard ^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2022-12-08 5:06 UTC | newest] Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-11-14 11:03 [PATCH] Revert "arm64: dma: Drop cache invalidation from arch_dma_prep_coherent()" Manivannan Sadhasivam 2022-11-14 11:29 ` Manivannan Sadhasivam 2022-11-14 14:11 ` Will Deacon 2022-11-14 15:14 ` Robin Murphy 2022-11-14 17:38 ` Catalin Marinas 2022-11-18 10:54 ` Manivannan Sadhasivam 2022-11-18 12:33 ` Will Deacon 2022-11-21 6:42 ` Manivannan Sadhasivam 2022-11-21 10:12 ` Sibi Sankar 2022-11-24 11:55 ` Catalin Marinas 2022-12-01 9:29 ` Thorsten Leemhuis 2022-12-01 17:45 ` Catalin Marinas 2022-12-02 8:26 ` Amit Pundir 2022-12-02 8:54 ` Thorsten Leemhuis 2022-12-02 10:03 ` Will Deacon 2022-12-02 10:34 ` Thorsten Leemhuis 2022-12-02 16:10 ` Greg KH 2022-12-02 16:27 ` Thorsten Leemhuis 2022-12-02 16:32 ` Greg KH 2022-12-02 17:14 ` Manivannan Sadhasivam 2022-12-05 14:24 ` Will Deacon 2022-12-06 9:21 ` Manivannan Sadhasivam 2022-12-06 9:58 ` Will Deacon 2022-12-02 10:54 ` Manivannan Sadhasivam 2022-11-28 5:44 ` Thorsten Leemhuis 2022-11-28 8:15 ` Manivannan Sadhasivam 2022-12-08 4:59 ` Leonard Lausen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).